## 1. Introduction

Clouds vary on smaller scales than the grid box sizes used in atmospheric numerical models.

One fundamental manifestation of subgrid cloud variability is partial cloudiness. Because the fraction of a grid box that is occupied by cloud greatly affects radiative transfer, cloud fraction parameterizations have been developed or evaluated by many authors (see, e.g., Sundqvist 1993; Tiedtke 1993; Mocko and Cotton 1995; Xu and Randall 1996a,b; and references therein).

Although cloud fraction is an important aspect of subgrid variability, it may be advantageous to parameterize subgrid variability more generally. Even in overcast regions, ignoring subgrid variability may lead to errors when modeling the many microphysical processes and thermodynamic properties that are nonlinear. To compute the rates of such nonlinear processes, a model that ignores subgrid variability would first compute grid box average properties and then insert these averages into a microphysical formula. The result so obtained differs, in general, from the quantity truly desired, which is the microphysical rate averaged over the grid box (Fowler et al. 1996; Fowler and Randall 1996; Stevens et al. 1996; Stevens et al. 1998; Kogan 1998).

A particularly pernicious class of nonlinearity occurs when the formula for the microphysical rate is either convex or concave.^{1} Then one may show that ignoring subgrid variability leads to biases (Cahalan et al. 1994; Rotstayn 2000; Pincus and Klein 2000; Larson et al. 2001). Because we define a bias to be an error that always has the same sign, the errors associated with a bias are strictly additive, rather than partially self-canceling. Such a bias occurs when a model diagnoses liquid water content from a convex function of conserved variables. Then the model sometimes underpredicts and never overpredicts average liquid water content in partly cloudy grid boxes, relative to the values that would be obtained if the model accounted for subgrid variability. Relatedly, it may be shown that such a model also underpredicts average temperature. Finally, since the Kessler parameterization for autoconversion of cloud droplets to raindrops (Kessler 1969) is convex, ignoring subgrid variability leads to underprediction of Kessler autoconversion.

To parameterize cloud fraction, liquid water content, temperature, or Kessler autoconversion rate, information about the spatial arrangement of parcels within a grid box is not required. What is needed in all cases is the probability density function (PDF) of the relevant microphysical properties over the grid box. In principle, if the relevant PDF were known with perfect accuracy and a local microphysical or thermodynamic formula were formulated with perfect accuracy, then the grid box average rate or property could be parameterized with perfect accuracy. Therefore we believe that one key to parameterization of the above quantities is the PDF of microphysical properties.

Considering the fundamental role of PDFs in parameterization, few papers have used boundary layer observations to compute PDFs of conserved variables in clouds. A number of studies have examined PDFs obtained from aircraft legs through clear air (Banta 1979; Grossman 1984; Cotton and Anthes 1989; Ek and Mahrt 1991). Furthermore, several studies have used satellite retrievals to calculate PDFs of cloud optical depth, which is an important quantity for radiative transfer (see, e.g., Wielicki and Parker 1994; Barker et al. 1996). To parameterize microphysical processes, however, what is most useful is PDFs of conserved variables in cloudy or partly cloudy regions. Such PDFs have been simulated by numerical models and examined by several authors (Bougeault 1981; Wyngaard and Moeng 1992; Lewellen and Yoh 1993; Xu and Randall 1996b; Wang and Stevens 2000). However, obtaining PDFs from models has a disadvantage over obtaining PDFs from observational data: the domain of large eddy simulations is usually limited to several kilometers, whereas variability of scalars in boundary layers often maximizes at larger scales (Cotton and Anthes 1989, 373–383). Therefore, a large eddy simulation may not produce the full variability that should be accounted for within, say, a 50-km mesoscale model grid box. Likewise, a model with coarser resolution has a larger domain but compromises representation of variability at small scales. [The recent high-resolution cloud-resolving simulation of Tompkins (2001, manuscript submitted to *J. Atmos. Sci.*) alleviates the problem greatly, but it is a simulation of deep convective clouds, not boundary layer clouds, which we study.] With the data used in the present paper, we can examine PDFs from legs ranging in length from 2 to 50 km.

Two papers do examine the PDF of total specific water content in clouds: Wood and Field (2000) provide several plots of such PDFs, and Price (2001) obtains such PDFs from a tethered balloon. Our paper extends their work as follows. We study numerous PDFs from boundary layers topped by stratocumulus, cumulus, and cumulus rising into stratocumulus. Our primary goal is to characterize these PDFs in terms of moments so that they may be parameterized in numerical models. We will address the question of how many moments are needed to provide a satisfactory fit to observed PDFs and propose specific families of PDFs that fit the observed PDFs well. The PDF parameterizations that work best depend on three moments. Although most models lack this information, our results provide motivation for modelers to seek ways of predicting or diagnosing higher-order moments. Next, we will calculate microphysical biases and determine how effectively they are removed when subgrid variability is parameterized by an approximate PDF. Finally, Manton and Cotton (1977) and Sommeria and Deardorff (1977) suggested that PDFs from short segments are approximately Gaussian; later, Bougeault (1981) and Xu and Randall (1996b) argued that PDFs from longer segments are non-Gaussian. We shall use observational data and the Kolmogorov–Smirnov test to assess whether cloudy PDFs are Gaussian.

## 2. Theoretical background

This paper is motivated primarily by three tasks in parameterization: 1) diagnosis of subgrid cloud fraction; 2) diagnosis of grid box average specific liquid water content *q*_{l}*T**A*_{K}^{2} This paper shall focus solely on warm boundary layer processes and ignore ice. Then we can assume that any water in excess of saturation immediately condenses. It turns out that to accomplish all three parameterization tasks, the PDF of only one variable is needed. This variable, *s,* is useful because 1) it can be predicted by a numerical model in terms of variables that are conserved under condensation—hence *s* is conserved under condensation as well; 2) we have approximately *s* = *q*_{l} if the total specific water content *q*_{t}, including liquid and vapor, exceeds saturation (although for unsaturated fluid elements, *q*_{l} is zero but *s* is negative); and 3) the single variable *s* indicates how *q*_{l} is affected by variations in both *q*_{t} and temperature.

*s*is defined as follows. We let

*q*

_{s}(

*T,*

*p*) be the saturation specific humidity, where

*T*is temperature, and

*p*is pressure. We ignore ice processes. Then (Lewellen and Yoh 1993; Sommeria and Deardorff 1977; Mellor 1977)

*L*is the latent heat of vaporization,

*c*

_{p}is the specific heat at constant pressure,

*e*

_{s}is the saturation vapor pressure over liquid, and

*R*

_{d}and

*R*

_{υ}are the gas constants for dry air and water vapor, respectively. Following Sommeria and Deardorff (1977), we have defined the liquid water temperature

*T*

_{l}. It is approximately conserved under condensation. Unlike the liquid water potential temperature,

*θ*

_{l},

*T*

_{l}is not conserved under changes in pressure.

*s,*

*P*(

*s*), one can compute cloud fraction, grid box average liquid water content and temperature, and grid box average Kessler autoconversion rate. Cloud fraction

*C*is given by

*H*(

*s*) is the Heaviside step function. This formula states that the cloud fraction equals that portion of the area under

*P*(

*s*) that corresponds to

*q*

_{t}in excess of saturation, that is, the area with

*s*> 0. The grid box–averaged

*q*

_{l}is given, in terms of

*s*and

*P*(

*s*), as

*T*

*q*

_{l}

*T*

_{l}via Eq. (4):

*T*

*T*

_{l}

*L*/

*c*

_{p})

*q*

_{l}

*A*

_{K}=

*K*

_{1}(

*q*

_{l}−

*q*

_{crit})

*H*(

*q*

_{l}−

*q*

_{crit}), where

*q*

_{crit}is a critical threshold below which no autoconversion occurs, and

*K*

_{1}is a constant that governs the rate of autoconversion. This paper adopts the values of

*q*

_{crit}(=0.5 g kg

^{−1}) and

*K*

_{1}(=10

^{−3}s

^{−1}) used in the Pennsylvania State University–National Center for Atmospheric Research Mesoscale Model version 5 (Grell et al. 1994). When liquid is present,

*s*approximates

*q*

_{l}, and hence we can write the grid box average of the Kessler autoconversion rate

*A*

_{K}

*s*and

*P*(

*s*):

## 3. Parameterizations of PDFs of *s*

Equations (5), (6), and (7) indicate the fundamental importance of *P*(*s*) for certain aspects of parameterization. Namely, if a numerical model could predict *P*(*s*), then it could also diagnose cloud fraction, *q*_{l}*T**s.* Then we will assume a functional form for *P*(*s*) that depends on a small number of parameters. Given the moments in a particular grid box, *P*(*s*) for that grid box can then be determined. This approach to parameterization, the “assumed PDF” method, has been used by Sommeria and Deardorff (1977), Bougeault (1981), Chen and Cotton (1987), Randall et al. (1992), Frankel et al. (1993), and Lappen (1999), among others.

The shape of a PDF is uniquely determined by specification of all of its infinitely many moments, if they exist and if the magnitude of the higher moments decreases rapidly enough (for details, see Kendall and Stuart 1963, 109–112). However, numerical models can feasibly predict or diagnose only a few moments, for example, 1, 2, or 3. Therefore, it becomes important to ascertain whether a few moments and an assumed functional form of the PDFs can characterize atmospheric PDFs with sufficient accuracy and generality. If so, and if accurate methods to obtain low-order moments can be developed, then the assumed PDF method offers promise of providing a rather general subgrid parameterization, one that is based on a sound theoretical foundation, namely Eqs. (5), (6), and (7), and testable empiricism, namely the assumed form of the PDF. In addition, it guarantees that quantities like cloud fraction and liquid water content are consistent with each other, since both are diagnosed from the same PDF (Lappen 1999). Furthermore, if in a particular circumstance an assumed PDF scheme performs poorly, one can improve it either by predicting more moments or by using data to refine the assumed functional form of the PDF, rather than having to construct an entirely new scheme.

We will use observational data to evaluate eight different families of functions, listed below, that can be used to parameterize a PDF *P*(*s*) with mean *s**σ,* and skewness *Sk.* Through experimentation with many PDF parameterizations, we have found that PDF parameterizations tend to perform better if they are constructed so as to ensure that they have the same low-order moments as the observed PDF insofar as possible. This has guided our selection of parameterizations below.

*s*:

*P*

_{d}

*s*

*δ*

*s*

*s*

*δ*denotes the Dirac delta function. This distribution is what is effectively assumed by models that contain no representation of subgrid variability, via a PDF or otherwise. This parameterization provides a baseline against which we can measure the quality of the more sophisticated parameterizations that follow.

*s*

*σ*; but it cannot represent skewed PDFs.

*s*to the range

*s*>

*s*

_{0}for

*Sk*> 0 and

*s*<

*s*

_{0}for

*Sk*< 0, where

*Sk*is the skewness of

*P*(

*s*). This parameterization is a generalization of the PDF in Bougeault (1982). It allows for skewed PDFs but not bimodal PDFs.

^{3}

Here *s*_{1} and *σ*_{1} are the mean and standard deviation, respectively, of one Gaussian with amplitude 0 ≤ *a* ≤ 1; *s*_{2} and *σ*_{2} are the mean and standard deviation of the other Gaussian. Predicting five moments in a numerical model would be complicated and expensive, but we include this PDF in our comparison in order to ascertain the best possible fit that can be expected from the double Gaussian functional form. The procedure we used to fit the PDF parameters (*a,* *s*_{1}, *σ*_{1}, *s*_{2}, *σ*_{2}) is described in the appendix.

*σ*

_{1}=

*σ*

_{2}= 0.6

*σ.*The simplifying assumption of equal widths permits an analytic solution for

*a,*

*s*

_{1}, and

*s*

_{2}:

*σ̃*

*σ*

_{1}/

*σ*=

*σ*

_{2}/

*σ*= 0.6. We have assumed without loss of generality that

*s*

_{1}>

*s*

_{2}.

*P*

_{dd}

*s*

*aδ*

*s*

*s*

_{1}

*a*

*δ*

*s*

*s*

_{2}

*σ*

_{1}→ 0 and

*σ*

_{2}→ 0 in (12). It is equivalent to a commonly used double-plume scheme in which each plume is assumed to be uniform (Lappen 1999). The locations of the delta functions (

*s*

_{1}and

*s*

_{2}) and the amplitude (

*a*) are given in terms of

*Sk,*

*σ,*and

*s*

*σ̃*

VII. Lewellen–Yoh (three parameters). Lewellen and Yoh (1993) assume a double Gaussian PDF and then eliminate two parameters so that the resulting PDF depends only on the mean, variance, and skewness of *s.* When the skewness vanishes, their PDF parameterization reduces to a single Gaussian, but when skewness approaches infinity, their PDF consists of one Gaussian with infinitesimal width, and the other Gaussian with width *σ*_{1} → *σ.* In this way, the Lewellen–Yoh parameterization can represent skewed PDFs.

*s*quite well, as we shall see. However, it would be difficult to adjust this scheme if in the future it does not turn out to fit a particular cloud regime or if an additional moment becomes available to a model. LWFGVC2, like Lewellen–Yoh, assumes a double Gaussian PDF and parameterizes two parameters; but LWFGVC2 can be readily modified upon inspecting the brief derivation of it below. Following Lewellen and Yoh (1993), we note that the definitions of the first three moments lead, respectively, to the following three relations:

*s*

*σ,*and

*Sk*are, respectively, the mean, standard deviation, and skewness of the PDF. We have retained the definition

*s̃*

_{1,2}≡ (

*s*

_{1,2}−

*s*

*σ.*

*σ*

_{1}and

*σ*

_{2}, in terms of the moments. We adopt the convention that the subscript 1 denotes the Gaussian with the larger mean. Using the values of

*σ*

_{1}/

*σ*and

*σ*

_{2}/

*σ*computed by the five-parameter double Gaussian, we plot

*σ*

_{1}/

*σ*and

*σ*

_{2}/

*σ*versus skewness in Fig. 1. Based on this plot, we adopt the following formula for

*σ*

_{1}and

*σ*

_{2}:

*α*= 2 and

*γ*= 0.6. This formula, like Lewellen and Yoh (1993), ensures that when skewness vanishes, the PDF reduces to a Gaussian (i.e.,

*σ*

_{1}/

*σ*and

*σ*

_{2}/

*σ*approach unity). Equation (21) is shown as the solid curve in Fig. 1, along with the relationship of Lewellen and Yoh (1993) (dashed line).

*a,*

*s̃*

_{1}, and

*s̃*

_{2}in terms of

*σ̃*

_{1},

*σ̃*

_{2},and

*Sk.*Using Eqs. (17), (18), and (19), we find the following transcendental equation for

*a*:

*a*is found numerically, we can then solve for

*s̃*

_{1}, and

*s̃*

_{2}via the following formulas, derived from (17), (18), and (19):

*s*

_{1}>

*s*

_{2}.

Although LWFGVC2 and Lewellen–Yoh reduce to a Gaussian for zero skewness, LWFGVC1 does not. Among the three-parameter PDFs, LWFGVC1 occupies a middle ground in complexity. It is more realistic than the double delta function parameterization and has the advantage that it can be solved analytically, unlike the Lewellen–Yoh or LWFGVC2 parameterizations. Because LWFGVC1 sets the widths of the Gaussians equal, it fits highly skewed legs slightly worse than Lewellen–Yoh or LWFGVC2.

The Gaussian, double Gaussian, or gamma function forms can lead to nonzero probability of *s* when *s* is large but noninfinite. Large negative or positive *s* corresponds to unrealistic values of *q*_{t} and/or *q*_{s}. The error incurred is small and has been ignored.

## 4. Data and analysis procedure

This paper shall compare the aforementioned PDFs with aircraft data from ice-free cloudy boundary layers—specifically, stratocumulus layers, cumulus layers, and layers consisting of cumulus clouds rising into stratocumuli. The data were obtained during the Atlantic Stratocumulus Transition Experiment (ASTEX) field experiment, which studied the transition from stratocumulus to cumulus boundary layers over the North Atlantic Ocean in June 1992 (Albrecht et al. 1995), and the First ISCCP (International Satellite Cloud and Climatology Project) Regional Experiment (FIRE) field experiment, which studied marine stratocumuli off the coast of California in June and July of 1987 (Albrecht et al. 1988). In ASTEX, the boundary layers often contained cumuli rising into stratocumuli and sometimes contained only cumuli. In FIRE, the boundary layers were shallower and usually contained only stratocumuli. The dataset includes a large number of entirely clear legs as well as partly or entirely cloudy legs. We keep the clear legs because it is important for a fractional cloud cover parameterization to be able to predict the occurrence of clear skies and the onset of cloudiness.

The data were obtained by the Meteorological Research Flight C-130 aircraft with the following instrumentation. Liquid water content was measured with a Johnson–Williams hot-wire probe, which has a response time of about 1 s. Because of the slow response time, we feel that 2 km is roughly the smallest scale at which PDF of *s* can be constructed with reasonable accuracy. In ASTEX, total water content was measured with a fast-response Lyman-*α* hygrometer, logged at 64 Hz, referenced to out-of-cloud data from a General Eastern 1011B dew/frost point hygrometer. In FIRE, total water content was obtained by adding the liquid and vapor content obtained from the Johnson–Williams probe and the dew/frost point hygrometer. Temperature was measured using a Rosemount deiced total temperature sensor and corrected for dynamic heating effects. For further information about the instrumentation, see Rogers et al. (1995).

In this paper, we restrict ourselves to boundary layer clouds and hence exclude all legs whose mean pressure is less than 600 hPa. Furthermore, for simplicity, we examine only horizontal, not vertical, variability. Hence we discard all legs whose pressure has a standard deviation greater than 1 hPa. One should keep in mind, however, that within grid box–sized volumes there is vertical variability that tends to broaden the PDF (see Pincus and Klein 2000). The total number of 50-km legs used was 184 for ASTEX and 92 for FIRE. A small number of time series contained a narrow, clearly unphysical spike. We removed such outliers because they seriously distorted the higher moments. Finally, we perform no ensemble averaging or detrending of legs. Our goal is primarily to improve cloud parameterizations in numerical models, and Germano (1992) has shown that large eddy simulations can be interpreted as spatially or time-filtered (not ensemble averaged or detrended) solutions of the Navier–Stokes equations. According to Germano's interpretation, what models need to parameterize is the subgrid variability within a grid box for a single realization of a flow. This paper adopts this point of view. We spatially box filter the data by truncating all aircraft legs so that they have the same length, for example, 50, 10, 2 km, etc. We did not examine scales longer than 50 km because there are few legs in our dataset that are appreciably longer than 50 km.

## 5. Examples of PDFs

To illustrate the complexity of atmospheric PDFs and the quality of fits that can be expected from simple parameterizations, this section displays four PDFs of *s.* Each of the four PDFs corresponds to a single 50-km leg.

The first three PDFs are derived from a stratocumulus layer observed during the first Lagrangian intensive operations period on 12–13 June 1992 (flight A209). During these three legs, the boundary layer was well mixed, with little evidence of cumuli rising into stratocumuli. Cloud base was at about 250 m, and the boundary layer extended up to 700 m. See Bretherton and Pincus (1995) for meteorological conditions. Figure 2a depicts the interior of the stratocumulus layer. The PDF is unimodal and unskewed. Such PDFs were commonly observed in both cloudy and clear regions. Figure 2b depicts a PDF from the upper region of the same stratocumulus layer. The large negative skewness, −3.9, is due mostly to the small clear region encountered several kilometers into the leg. But negative skewness is also contributed by narrow downward spikes throughout the leg. These regions of low liquid water probably arise via entrainment of dry air from above the cloud (Nicholls and Turton 1986). Figure 2c depicts the clear air region beneath the stratocumulus layer. The PDF is weakly skewed but strongly bimodal. Bimodal PDF were not uncommon in the data. The bimodality in this particular case arises because of mesoscale variability: *s* has relatively high values (∼−0.35 g kg^{−1}) in the first 30 km of the leg and then shifts to lower values (∼−0.55 g kg^{−1}) in the last 20 km. Note that because the value of *s* differs at the end points of the leg, such a space series cannot be obtained by averaging over the full horizontal extent of a numerical model with periodic boundary conditions.

Figure 2d displays a fourth PDF. It is from a leg observed on 22 June 1992 (flight A215) through cumuli that were rising into broken stratocumuli. The PDF has strong positive skewness, 2.1, caused by the small area occupied by cumulus clouds.

Note that both the positively and negatively skewed PDF (Figs. 2d and 2b, respectively) have long tails, and do not resemble double delta function PDFs. This was a common feature of the highly skewed PDFs we observed. Although we believe this is a physical property of the PDFs, it is possible that some smoothing of the PDFs may be caused by nonzero resolution of the data. In the bimodal PDFs, one often cannot associate one maxima with cloud and the other with clear air. In Fig. 2d, cloud is associated with only a portion of one Gaussian.

The five-parameter double Gaussian parameterization provides a good fit to all four PDFs, as it does for the majority of PDFs in the dataset. LWFGVC2 fits all four PDFs well except for the strongly bimodal PDF in Fig. 2c. LWFGVC2 (and Lewellen–Yoh) are designed to reduce to a Gaussian when skewness vanishes, and hence they misrepresent unskewed, bimodal PDFs. Significant improvements are likely only if the next higher moment (i.e., the kurtosis) is incorporated into the scheme.

In summary, the data show that PDFs of *s* may be highly skewed (either positively or negatively), may be bimodal, but may also be unskewed and unimodal. Similar complexity was found for PDFs of *q*_{t} displayed by Wood and Field (2000).

## 6. Comparison of parameterizations of the PDF of *s*

This section tests how well the aforementioned eight parameterizations match observed PDFs of *s,* *P*(*s*), and how well the parameterizations diagnose cloud fraction and specific liquid water content *q*_{l}

We test the parameterizations separately on the ASTEX and FIRE datasets, rather than combining the two. Furthermore, we form a third set by selecting all ASTEX legs that intercepted some cloud and that occurred on flights during which, according to observers' notes, the C-130 sampled cumulus layers with no stratocumulus layer above or at most broken stratocumulus above. This set of nine legs is denoted “ASTEX cumulus legs” in what follows. Although this sample of cumulus legs is small, a preliminary look at cumulus legs is worthwhile because the statistics in cumulus and stratocumulus layers is expected to be quite different. We desire a parameterization that is sufficiently general to model both types of layers.

Now we assess how well the PDF parameterizations diagnose *C* and *q*_{l}*C* and *q*_{l}*C* and *q*_{l}*P*(*s*) as in Eqs. (5) and (6), where *P*(*s*) is obtained by binning observed space series of *s.* This procedure mitigates errors due to poor measurements and binning of data. To obtain the aforementioned eight parameterizations of *P*(*s*), we input observed moments of *s* into the formulas in section 3. Using observed moments isolates errors in the shape of *P*(*s*) from errors that stem from poor prediction of these moments by a numerical model.

The errors in diagnosis of cloud fraction for the eight parameterizations are listed in Fig. 3. Figure 3 also includes the cloud fraction parameterization of Xu and Randall (1996a), a two-parameter PDF that uses relative humidity and *q*_{l}*P*(*s*), and 2) cloud fraction is an integral of *P*(*s*)—hence positive and negative areas under the integral partially cancel, and so even a fairly poor fit to a PDF can yield an adequate prediction of cloud fraction. From Fig. 3, we see that the single delta function parameterization, that is, the assumption of uniformity, performs worst by far. Better than this are the two-parameter Xu–Randall and three-parameter double delta function parameterizations. Better than these is the (two-parameter, unskewed) single Gaussian PDF. Better than this are the other three-parameter parameterizations (Lewellen–Yoh, LWFGCV2, LWFGCV1, and generalized gamma distribution). Of these, the three-parameter generalized gamma parameterization performs slightly worse, in part because for large skewness the generalized gamma parameterization has an (integrable) singularity at *s* ≅ *s*_{0}, so that a bin placed at *s* = *s*_{0} has an unrealistically large amplitude. Somewhat better than the three-parameter parameterizations is the five-parameter double Gaussian parameterization, which performs best overall, as expected.

To give a visual impression of the improvement due to including an extra parameter, we display in Fig. 4 a scatterplot of observed cloud fraction versus cloud fraction predicted by the (two-parameter) Xu–Randall, (two-parameter) single-Gaussian, and (three-parameter) LWFGVC2 parameterizations. The decrease in scatter due to the additional parameter is evident. An exception to this rule is the three-parameter double delta parameterization, which usually performs worse than the two-parameter single Gaussian (see Fig. 3).

With the exception of the single-Gaussian parameterization, the PDF parameterizations predict cloud fraction comparably well at all leg lengths between 2 and 50 km. This implies that although variance and skewness may depend strongly on scale, the families of three-parameter PDFs we test are more or less equally valid for all scales shown. This offers hope that three-parameter PDF parameterizations may be largely independent of grid spacing from 2 to 50 km. The Gaussian PDF performs poorly for cumulus legs at large length scales because such legs contain skewed PDFs that lie outside the Gaussian family (see Fig. 12 below). The Gaussian PDF performs well for the FIRE data over a range of scales because those PDFs lie within the Gaussian family (even though large-scale PDFs have a large variance and hence differ greatly from small-scale PDFs).

The errors in diagnosis of *q*_{l} for the eight parameterizations are compared in Fig. 5. The results are similar to those for predictions of cloud cover, except that the five-parameter double Gaussian parameterization is no longer clearly superior to the three-parameter parameterizations. Also, the double delta function PDF ranks somewhat better and the generalized gamma function parameterization ranks somewhat worse than their respective rankings in the cloud fraction comparison. However all parameterizations except the single delta function predict *q*_{l}

These results help determine how complex a subgrid parameterization needs to be in order to have generality. First, all parameterizations we test fit *P*(*s*) substantially better than a single delta function, which corresponds to assuming uniformity. Therefore, when the shape of *P*(*s*) is important, it is probably better for a numerical model to include any of the parameterizations we tested rather than to assume uniformity within grid boxes. If two parameters (e.g., mean and variance of *s*) are available, the (unskewed) single-Gaussian parameterization probably performs as well as can be expected, given that observed skewnesses in ASTEX can be either negative or positive, with a near-zero average (see Fig. 2 above and Fig. 12 below). However, the single-Gaussian distribution does perform poorly for cumulus cloud legs when large grid boxes (∼50 km) are used. In these cases, the skewnesses are usually large and positive (Fig. 12 below). If mean, variance, and skewness of *s* are available, then a significant improvement over the single-Gaussian parameterization can be made. The Lewellen–Yoh, LWFGCV1, and LWFGCV2 parameterizations appear to be general enough to adequately represent the rather complex PDFs in stratocumulus and cumulus boundary layers. Even in cumulus legs, they perform almost as well as the five-parameter double Gaussian. Therefore, we can conjecture that although the prediction of an additional parameter, for example, kurtosis, would improve the fit in some cases (see Fig. 2c), the overall improvement would not be dramatic. The overall performance of the PDF parameterizations is summarized in Table 1.

## 7. Reduction of biases in numerical models

One purpose in parameterizing *P*(*s*) is to reduce or remove the (systematic) biases in liquid water content (*q*_{l}*T**A*_{K}*s* were assumed to be uniform throughout a grid box. Such biases arise when these quantities are diagnosed from a function that is either convex or concave. For instance, suppose that a numerical model predicts *P*(*s*) and diagnoses *q*_{l}*s* > 0) and some clear regions (*s* < 0) but is, on average, just saturated (i.e., *s**P*(*s*) is a single delta function, that is, ignores subgrid variability, would assume that no cloud is present. But, in fact, in this example, there does exist some cloud. Hence the model would underestimate *q*_{l}*T**A*_{K}

In this section, we address two questions: 1) How large are the biases in *q*_{l}*T**A*_{K}

First consider the bias in *q*_{l}*q*_{l}*q*_{l}*q*_{l}*q*_{l}*q*_{l}^{−1}). This is because the bias turns out to be largest when *q*_{t}*q*_{s}*q*_{t} is highly variable (Larson et al. 2001). Figure 7 shows that LWFGVC2 is clearly superior to the single delta function parameterization. Although LWFGVC2 does make errors, the errors have either sign, and consequently any residual under- or overestimate is barely discernable.

The bias is *q*_{l}*T**q*_{l}*T*_{l}*T**T*

Finally, the bias in Kessler autoconversion is illustrated in Fig. 9. The bias is largest when autoconversion has moderate values, that is, when the layer is weakly drizzling (Larson et al. 2001). Again the LWFGVC2 parameterization performs well.

Figure 10 shows how the magnitude of the biases varies with leg length. The *q*_{l} and *T* biases are plotted only for partly cloudy legs; likewise, the Kessler biases are plotted only for precipitating legs. (The fraction of partly cloudy and precipitating legs in the datasets both decrease monotonically with decreasing leg length.) Figures 10a,c show that as leg length increases, there are increases in both the *relative* *q*_{l} bias, [*q*_{l}(*s*)*q*_{l}(*s**q*_{l}(*s*)*relative* Kessler autoconversion bias, [*A*_{K}(*s*)*A*_{K}(*s**A*_{K}(*s*)*q*_{l}(*s**A*_{K}(*s**absolute T* bias, *T*(*s*)*T*(*s**absolute* Kessler bias, *A*_{K}(*s*)*A*_{K}(*s*

## 8. Gaussianity of small scales

Are atmospheric PDFs normally distributed? Manton and Cotton (1977) and Sommeria and Deardorff (1977) suggested that PDFs from short legs are Gaussian. The observational work of Banta (1979) hinted that this may not be true for PDFs from longer legs in subcloud regions. To our knowledge, the hypothesis of Gaussianity has not been tested using observations from cloudy regions.

To address this question, we examine PDFs calculated from legs as short as 200 m. Because of the slow response time of the liquid water probe, we cannot compute the PDF of *s* in such small legs. Instead, we compute PDF of the total specific water content *q*_{t} measured during ASTEX. These measurements were made with a fast-response Lyman-*α* hygrometer, which was logged at 64 Hz. Since the C-130 aircraft flew at approximately 100 m s^{−1}, a PDF constructed from a 200-m segment contains 128 points.

To assess whether or not the PDFs of *q*_{t}, *P*(*q*_{t}), are normal, we use the Kolmogorov–Smirnov test, because it does not require binning of the data and hence is free from ambiguities due to choice of bin width (Wilks 1995). Our null hypothesis, for a given leg, is that the distribution of *q*_{t} is drawn from a Gaussian distribution. We reject this hypothesis if the probability that it is true is less than 5%. The choice of rejection level is somewhat arbitrary, but 5% is commonly used (Wilks 1995). Our results are shown in Fig. 11. At the smallest leg length (200 m), we can reject the hypothesis of Gaussianity for 90% of the ASTEX legs and over 92% of the ASTEX cumulus legs. For lengths larger than 1 km, we can reject the hypothesis for nearly all legs. However, this does not allow us to conclude that long legs are less Gaussian than short legs. The primary reason that Gaussianity is rejected more often for long legs is probably that the Kolmogorov–Smirnov test is much more powerful for long legs, which have more sample points. Because the Kolmogorov–Smirnov test can detect such small deviations from Gaussianity, the test allows us to conclude that atmospheric PDFs, strictly speaking, are not usually drawn from a Gaussian distribution, even though in most cases the Gaussian parameterization still leads to an adequate prediction of quantities like cloud fraction. A mediocre approximation to a PDF can still yield an acceptable approximation of derived quantities.

Why are so many legs non-Gaussian? To address this question, we examine the skewness of the legs. Suppose a series of random samples is drawn from a Gaussian distribution, and the skewness of each sample is recorded. Then the mean of the skewnesses is zero and the standard deviation of the skewnesses is approximately *N**N* is the number of points in the sample (Stuart and Ord 1987, p. 338). In Fig. 12, we compare these values with the skewness observed in the data. We find that in the overall ASTEX dataset, the skewness can vary from sample to sample much more than one would expect if the parent distribution were Gaussian. Therefore, one reason that the observed PDFs are nonnormal is that their skewnesses vary greatly. In the cumulus legs, the variation of skewness is large, but also the mean skewness becomes significantly positive (

We perform the same analysis for the kurtosis, *s*^{′4}*s*^{′2}^{2} − 3. (A large positive kurtosis implies a strongly peaked PDF with long tails.) If random samples are drawn from a Gaussian distribution, the mean of the kurtosis is zero and the standard deviation is approximately *N*

One case in which non-Gaussian PDFs are especially likely to occur is at cloud edge. A leg that straddles a cloud edge may contain large values of *q*_{t} in cloud, small values of *q*_{t} in the clear air outside the cloud, and few intermediate values of *q*_{t}*.* The resulting PDF of *q*_{t} is bimodal. Although bimodal PDFs can arise because of mesoscale variability (see Fig. 2c), they can also occur at cloud edge even at very small scales.

## 9. Conclusions and discussion

We have used observational data to calculate probability density functions (PDFs) for stratocumulus, cumulus, and cumulus-rising-into-stratocumulus layers. The observed PDFs are complex. Although many of the PDFs are Gaussian, others are strongly bimodal, highly positively skewed, or highly negatively skewed. The highly skewed PDFs tend to have a long tail rather than a double delta function shape.

Despite this complexity, the observed PDFs can be adequately modeled by simple parameterizations. PDF parameterizations that depend on three parameters, for example, mean, variance, and skewness, appear to be general enough to model both stratocumulus and cumulus-layer PDFs. Particularly good fits were provided by the Lewellen–Yoh, LWFGVC2, and LWFGVC1 parameterizations. Remarkably, these schemes provided uniformly good diagnoses of cloud fraction and specific liquid water content over length scales ranging from 2 to 50 km, with no change in adjustable coefficients. Hence these parameterizations may be satisfactory over a wide range of grid box sizes. If three parameters are not available to a model, but two parameters are, then the model may use the single-Gaussian parameterization, which fits the data fairly well except for cumulus-layer PDFs when the grid box size exceeds roughly 25 km.

We examined the magnitude of biases in mean specific liquid water content *q*_{l}*T**A*_{K}*q*_{l}*T**A*_{K}*s* increases. If larger volumes contain more variance, as one would expect, then the biases would probably be larger in general circulation models than mesoscale models. The biases are also affected by the skewness of *s,* but it is unclear whether or not skewness increases for grid box sizes larger than 50 km (see Fig. 12).

We used the data to examine the hypothesis that atmospheric PDFs are Gaussian. For ASTEX data, this hypothesis can be rejected for legs longer than 1 km. Rather, ASTEX PDFs from long legs have large positive and negative skewness and kurtosis. For shorter legs, lack of sample points renders the results less conclusive.

In order to implement the Lewellen–Yoh, LWFGVC1, or LWFGVC2 parameterizations, a numerical model must provide the mean, variance, and skewness of *s* for each grid box and time step. This is a difficult problem in higher-order closure that we leave for future work.

## Acknowledgments

The authors would like to thank Adrian Tompkins, Robert Pincus, Stephen A. Klein, and P. W. Mielke Jr. for helpful comments. We also thank the RAF aircrew and MRF staff involved in the planning and execution of the ASTEX and FIRE field campaigns. V. E. Larson acknowledges financial support from the National Oceanic and Atmospheric Administration Contract NA67RJ0152. W. R. Cotton and J.-Ch. Golaz acknowledge support by the National Science Foundation under NSF-WEAVE Contract ATM-9904128.

## REFERENCES

Albrecht, B. A., D. A. Randall, and S. Nicholls, 1988: Observations of marine stratocumulus clouds during FIRE.

,*Bull. Amer. Meteor. Soc***69****,**618–626.——, Bretherton, C. S., D. Johnson, W. H. Schubert, and A. S. Frisch, 1995: The Atlantic stratocumulus transition experiment—ASTEX.

,*Bull. Amer. Meteor. Soc***76****,**889–904.Banta, R., 1979: Subgrid condensation in a cumulus cloud model. Preprints,

*Sixth Conf. on Probability and Statistics in Atmospheric Sciences,*Banff, AB, Canada, Amer. Meteor. Soc., 197–202.Barker, H. W., B. A. Wielicki, and L. Parker, 1996: A parameterization for computing grid-averaged solar fluxes for inhomogeneous marine boundary layer clouds. Part II: Validation using satellite data.

,*J. Atmos. Sci***53****,**2304–2316.Bougeault, P., 1981: Modeling the trade-wind cumulus boundary layer. Part I: Testing the ensemble cloud relations against numerical data.

,*J. Atmos. Sci***38****,**2414–2428.——,. 1982: Cloud-ensemble relations based on the gamma probability distribution for the higher-order models of the planetary boundary layer.

,*J. Atmos. Sci***39****,**2691–2700.Bretherton, C. S., and R. Pincus, 1995: Cloudiness and marine boundary layer dynamics in the ASTEX Lagrangian experiments. Part I: Synoptic setting and vertical structure.

,*J. Atmos. Sci***52****,**2707–2723.Cahalan, R. F., W. Ridgway, W. J. Wiscombe, T. L. Bell, and J. B. Snider, 1994: The albedo of fractal stratocumulus clouds.

,*J. Atmos. Sci***51****,**2434–2455.Chen, C., and W. R. Cotton, 1987: The physics of the marine stratocumulus-capped mixed layer.

,*J. Atmos. Sci***44****,**2951–2977.Cotton, W. R., and R. A. Anthes, 1989:

*Storm and Cloud Dynamics*. Academic Press, 883 pp.Ek, M., and L. Mahrt, 1991: A formulation for boundary-layer cloud cover.

,*Ann. Geophys***9****,**716–724.Fowler, L. D., and D. A. Randall, 1996: Liquid and ice cloud microphysics in the CSU general circulation model. Part III: Sensitivity to modeling assumptions.

,*J. Climate***9****,**561–586.——, ——, and Rutledge, S. A., 1996: Liquid and ice cloud microphysics in the CSU general circulation model. Part I: Model description and simulated microphysical processes.

,*J. Climate***9****,**489–529.Frankel, S. H., V. Adumitroaie, C. K. Madnia, and P. Givi, 1993: Large eddy simulation of turbulent reacting flow by assumed pdf methods.

*Engineering Applications of Large Eddy Simulations,*S. A. Ragab and U. Piomelli, Eds., Vol. 162, ASME, 81–101.Germano, M., 1992: Turbulence: The filtering approach.

,*J. Fluid Mech***238****,**325–336.Grell, G. A., J. Dudhia, and D. R. Stauffer, 1994: A description of the fifth-generation Penn State/NCAR Mesoscale Model (MM5). NCAR Tech. Note NCAR/TN-398+STR, 138 pp.

Grossman, R. L., 1984: Bivariate conditional sampling of moisture flux over a tropical ocean.

,*J. Atmos. Sci***41****,**3238–3253.Kendall, M. G., and A. Stuart, 1963:

*The Advanced Theory of Statistics*. Vol. 1. 2d ed. Charles Griffin and Company, 433 pp.Kessler, E., 1969:

*On the Distribution and Continuity of Water Substance in Atmospheric Circulation*.*Meteor. Monogr.,*No. 10, Amer. Meteor. Soc., 84 pp.Kogan, Y. L., 1998: On parameterization of cloud physics processes in mesoscale models. Preprints,

*Conf. on Cloud Physics,*Everett, WA, Amer. Meteor. Soc., 348–351.Lappen, C-L., 1999: The unification of mass flux and higher-order closure in the simulation of boundary layer turbulence. Ph.D. dissertation, Colorado State University, 329 pp.

Larson, V. E., R. Wood, P. R. Field, J-Ch Golaz, T. H. Vonder Haar, and W. R. Cotton, 2001: Systematic biases in the microphysics and thermodynamics of numerical models that ignore subgrid-scale variability.

,*J. Atmos. Sci***58****,**1117–1128.Lewellen, W. S., and S. Yoh, 1993: Binormal model of ensemble partial cloudiness.

,*J. Atmos. Sci***50****,**1228–1237.Manton, M. J., and W. R. Cotton, 1977: Formulation of approximate equations for modeling moist deep convection on the mesoscale. Colorado State University Atmospheric Science Paper 266, Colorado State University, Fort Collins, CO, 62 pp.

Mellor, G. L., 1977: The Gaussian cloud model relations.

,*J. Atmos. Sci***34****,**356–358.Mocko, D. M., and W. R. Cotton, 1995: Evaluation of fractional cloudiness parameterizations for use in a mesoscale model.

,*J. Atmos. Sci***52****,**2884–2901.Nicholls, S., and J. D. Turton, 1986: An observational study of the structure of stratiform cloud sheets: Part II. Entrainment.

,*Quart. J. Roy. Meteor. Soc***112****,**461–480.Pincus, R., and S. A. Klein, 2000: Unresolved spatial variability and microphysical process rates in large-scale models.

,*J. Geophys. Res***105****,**27059–27065.Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, 1992:

*Numerical Recipes in C: The Art of Scientific Computing*. 2d ed. Cambridge University Press, 994 pp.Price, J. D., 2001: A study on probability distributions of boundary layer humidity and associated errors in parameterized cloud fraction.

*Quart. J. Roy. Meteor. Soc.,*in press.Randall, D. A., Q. Shao, and C-H. Moeng, 1992: A second-order bulk boundary-layer model.

,*J. Atmos. Sci***49****,**1903–1923.Rogers, D. P., D. W. Johnson, and C. A. Friehe, 1995: The stable internal boundary layer over a coastal sea. Part I: Airborne measurements of the mean and turbulence structure.

,*J. Atmos. Sci***52****,**667–683.Rotstayn, L. D., 2000: On the “tuning” of autoconversion parameterizations in climate models.

,*J. Geophys. Res***105****,**15 495–15 507.Sommeria, G., and J. W. Deardorff, 1977: Subgrid-scale condensation in models of nonprecipitating clouds.

,*J. Atmos. Sci***34****,**344–355.Stevens, B., R. L. Walko, W. R. Cotton, and G. Feingold, 1996: The spurious production of cloud-edge supersaturations by Eulerian models.

,*Mon. Wea. Rev***124****,**1034–1041.——, Cotton, W. R., and G. Feingold, 1998: A critique of one- and two-dimensional models of boundary layer clouds with a binned representation of drop microphysics.

,*Atmos. Res***47–48****,**529–553.Stuart, A., and J. K. Ord, 1987:

*Kendall's Advanced Theory of Statistics*. Vol. 1. 5th ed. Oxford University Press, 604 pp.Sundqvist, H., 1993: Parameterization of clouds in large-scale numerical models.

*Aerosol–Cloud–Climate Interactions,*P. V. Hobbs, Ed., Academic Press, 175–204.Tiedtke, M., 1993: Representation of clouds in large-scale models.

,*Mon. Wea. Rev***121****,**3040–3061.Wang, S., and B. Stevens, 2000: Top-hat representation of turbulence statistics in cloud-topped boundary layers: A large eddy simulation study.

,*J. Atmos. Sci***57****,**423–441.Wielicki, B. A., and L. Parker, 1994: Frequency distribution of cloud liquid water path in oceanic boundary layer cloud as a function of cloud fraction. Preprints,

*Eighth Conf. on Atmospheric Radiation,*Nashville, TN, Amer. Meteor. Soc., 415–417.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences*. Academic Press, 467 pp.Wood, R., and P. R. Field, 2000: Relationships between total water, condensed water and cloud fraction examined using aircraft data.

,*J. Atmos. Sci***57****,**1888–1905.Wyngaard, J. C., and C-H. Moeng, 1992: Joint probability density.

,*Bound.-Layer Meteor***60****,**1–13.Xu, K-M., and D. A. Randall, 1996a: A semiempirical cloudiness parameterization for use in climate models.

,*J. Atmos. Sci***53****,**3084–3102.——, and ——,. 1996b: Evaluation of statistically based cloudiness parameterizations used in climate models.

,*J. Atmos. Sci***53****,**3103–3119.

## APPENDIX

### Computation of Parameters in the Five-Parameter Double-Gaussian PDF

To calculate best-fit values for the five PDF parameters (*a,* *s*_{1}, *σ*_{1}, *s*_{2}, *σ*_{2}) in the five-parameter double-Gaussian PDF, we use observational data and the following procedure. First, we choose first-guess values of these five parameters. Then, using this first guess, we compute the parameterized PDF *P*_{dg5}(*s*). To assess how closely *P*_{dg5}(*s*) fits the observed histogram of *s,* we calculate the statistical measure *χ*^{2}, which is a measure of the discrepancy between two histograms (Press et al. 1992, p. 621). The calculation of *χ*^{2} is repeated for values of the five parameters in the entire relevant region of parameter space surrounding the first guess, sampling points in parameter space at coarse resolution. From this large collection of sets of parameters, we select the set that corresponds to the best fit, that is, the smallest *χ*^{2}. This procedure ensures that we find a global, not merely local, minimum. Since we can only cover the parameter space at coarse resolution, we then refine this solution by using it as a first guess in the Nelder–Mead simplex minimizer (Press et al. 1992, 408–412) that is included in MATLAB.

Summary of errors in cloud fraction and specific liquid water content. This table lists the error averaged over all points shown in Figs. 3 and 5, i.e., averaged over all datasets and grid box sizes