## 1. Introduction

Much of the information that has been gathered about the in situ properties of clouds has been obtained using aircraft sampling. While aircraft provide a very high-resolution record of the internal structure of clouds, that information is limited to a relatively small volume with respect to the size of clouds and ensembles of clouds. Because aircraft sampling is expensive and time limited, sampling is usually directed by a scientist looking either out of the aircraft or at remotely sensed data, such as radar. The aircraft resource limitation and the role of the scientist in the sampling process have led many to wonder about how biased cloud sampling by aircraft is (e.g., Lucas et al. 1994; Neggers et al. 2003; Abel and Shipway 2007). In this study we assess what the theoretical potential bias is and suggest some ways to mitigate the sampling bias.

In real life, sampling clouds from an aircraft is a complicated process involving interplay between scientists, flight crew, air traffic control, cloud types, and mission goals. However, in order to make the problem tractable, the analysis needs to be abstracted in way that captures the essence of the sampling process. To simplify the study, clouds will be thought of as occupying a two-dimensional plane. The cloud field in this plane can be homogeneous (cloud fraction = 1) or heterogeneous (cloud fraction < 1). When the cloud field is homogeneous, there are no decisions to be made for sampling—the aircraft can fly in any direction and will always be in cloud, obtaining unbiased information about cloud structure. When the cloud field is highly heterogeneous (cloud fraction *choose the most vigorous cloud* or *choose the freshest cloud*; however, for the examples explored here, the sampling strategy will be based on the statement in Abel and Shipway (2007, p. 792): “The aircraft updraft penetration statistics are therefore biased towards larger updraft core sizes. This may be the result of the aircraft aiming for larger visible clouds.” Accordingly, a choose the larger cloud strategy was adopted for this study, but other strategies could be explored. For instance, it is clear that a choose the larger cloud strategy is practical in fields of small cumulus but typically not when cumulonimbus clouds are present. The aim is to demonstrate a methodology that can be used to quantify the sampling bias introduced through the repeated use of strategies to choose which cloud is sampled. The bias from a choose the larger strategy can be illustrated with a simple coin toss example. Given two coins, each with a value of 1 or 0, the coin toss possibilities can be constructed along with the value sampled according to a rule of always choosing the larger value. It is clear that the parent distribution for sampling 1 or 0 from the coins is uniform and 1 in 2. But based on the choose the larger sampling rule, the value of 1 is sampled three in four times. For three coins the results are even more biased relative to the parent distribution, with 1 being sampled seven out of eight times. The coin toss example indicates that increasing the number of clouds to choose between would increasingly bias the final distribution. During a flight the number of clouds to choose from will depend upon the cloud fraction, the field of view of the scientist, and the limitations on the aircraft flight track. As the cloud fraction increases, the number of potential clouds to choose from will likely increase, but eventually the clouds will merge and scenario I will dominate, removing the bias. In the following a choice between two clouds will be illustrated to provide a conservative estimate of the potential bias.

When aircraft are used to sample clouds with the aim of capturing an unbiased sampling of the parent distribution of the clouds, choices made during the sampling can hinder this aim. This work quantifies the potential observational bias for aircraft sampling cloud distributions. To achieve this, order statistics (e.g., Galambos 1978) will be applied.

## 2. Choosing clouds

*B*is a random variable equal to the maximum of the independent identically distributed random variables

*S*is a random variable equal to the minimum of the independent identically distributed random variables. The distribution of

*B*(or

*S*) is therefore the distribution of clouds an aircraft would have sampled were it to fly into the largest (or smallest) cloud of

*n*clouds from which to choose. This distribution will be biased relative to the original distribution of parent variables (

*y*and

*P*that

*y*plus the probability that

*y*:

*n*random variables by considering the probability that one random variable is in the range

*y*. This gives

## 3. Cloud distributions

Observations of large fields of cumulus clouds suggest that the cloud sizes (e.g., width) follow an exponential (e.g., Plank 1969) or gamma distribution,

*b*= 1,

*c*= 4) for choosing between two (

Restricting consideration to choosing between two clouds (*n* = 2), it is clear that the mode of the new distribution is a factor of 2 larger than the parent distribution. More useful, perhaps, is a comparison of the means for the *B* and *X* distributions. Examining the ratio of the mean of the distributions for the choose the larger biased and parent distributions (Fig. 3) shows that the main dependency is on the *b* parameter. Increasing *b* tends to make the distribution more sharply peaked, reducing the effect of sampling bias. There is little dependency on the *c* parameter. For atmospheric applications, the value of *b* is usually less than 2, and so the aircraft mean of the cloud parameter used to choose the larger from two clouds would be biased by up to a factor of 1.5 if they follow an underlying gamma distribution.

Figure 4 shows a gamma distribution as the parent distribution, the result of choosing the larger of the two distributions, the result of choosing the smaller of the two distributions, and half the sum of those biased distributions.

Being able to recover the parent distribution from this combination of biased distributions would require the active targeting of smaller clouds, and it would only work in the situation where the choice between clouds to sample is only two.

## 4. Summary

It is clear that when a rule is repeatedly followed to choose which cloud to sample, the resulting distribution will be biased relative to the parent distribution. Here we have used order statistics to quantitatively estimate the likely effect of biasing. Sampling of a parent gamma distribution that uses just the choose the larger rule for choosing between two clouds may overestimate the mean of the metric used to decide on the cloud by a factor of 1.5. Therefore, if cloud width, for example, is used to choose the largest cloud, then it is to be expected that the aircraft mean cloud width would be larger than the mean of the parent distribution. Other variables, such as liquid water content and vertical velocity, may not be biased in the same way. However, if the metric used to choose the largest cloud can be related to other variables via a power law, then those variables would also follow a gamma distribution and be biased in a similar way. For instance, if radar reflectivity were used to choose the largest cloud, then this could be related to the water content of the cloud via a power law that would also be biased high.

To deal with bias in aircraft cloud sampling, the following recommendations are made.

If the goal is to obtain an unbiased sample of the cloud population, then repeatedly following simple sampling strategies (e.g., choose the larger cloud) should be avoided. For instance, random sampling of convective clouds could be achieved by flying between fixed ground points.

If the underlying distribution of the parent distribution used to select clouds follows a gamma distribution, then an additional error bar could be included with the observations. This would indicate the potential bias in the mean value due to using a choose the larger rule. The methods presented in this study can be used to estimate the effect for other distributions.

If sampling can be practically accomplished to produce a choose the larger and choose the smaller biased distributions based on a choice between just two clouds, then these can be combined to produce a more realistic representation of the parent distribution.

## REFERENCES

Abel, S. J., and Shipway B. J. , 2007: A comparison of cloud-resolving model simulations of trade wind cumulus with aircraft observations taken during RICO.

,*Quart. J. Roy. Meteor. Soc.***133**, 781–794, doi:10.1002/qj.55.Galambos, J., 1978.

*The Asymptotic Theory of Extreme Order Statistics.*Wiley Series in Probability and Statistics: Probability and Statistics Section, Vol. 104, John Wiley and Sons Inc., 352 pp.Lucas, C., Zipser E. J. , and Lemone M. A. , 1994: Vertical velocity in oceanic convection off tropical Australia.

,*J. Atmos. Sci.***51**, 3183–3193, doi:10.1175/1520-0469(1994)051<3183:VVIOCO>2.0.CO;2.Neggers, R. A. J., Duynkerke P. G. , and Rodts S. M. A. , 2003: Shallow cumulus convection: A validation of large-eddy simulation against aircraft and Landsat.

,*Quart. J. Roy. Meteor. Soc.***129**, 2671–2696, doi:10.1256/qj.02.93.Plank, V. G., 1969: The size distributions of cumulus clouds in representative Florida populations.

,*J. Appl. Meteor.***8**, 46–67, doi:10.1175/1520-0450(1969)008<0046:TSDOCC>2.0.CO;2.