## Abstract

An algorithm is described for inverting individual particle properties from statistics of ensemble observations, thereby dispelling the notion that coincident particles create inherently erroneous data in particle probes. The algorithm assumes that the observed property obeys superposition, that the particles are independently randomly distributed in space, and that the particle distribution is stationary over the accumulation distance. The fundamental principle of the algorithm is based on a derived analytical relationship between ensemble and individual particle statistics with fully defined derivatives. This enables rapid convergence of forward inversions. Furthermore, this relationship has no dependence on the particular instrument realization, so the accuracy of the relationship is not fundamentally constrained by the accuracy to which a measurement system can be characterized or modeled. This algorithm is presented in terms of a single observed property, but the derivation is valid for correlated multiparameter retrievals. Because data are collected in histograms, this technique would require relatively little storage and network bandwidth on an aircraft data system. This statistical analysis is derived here for measuring particle geometric extinction cross sections, but it could also be applied to other particle properties, such as scattering cross-section and phase matrix elements. In this example application, a simulated beam passes through a sampled environment onto a single detector to periodically measure beam extinction. This measured extinction may be the result of one or more particles, but it is shown that the probability distribution function of the ensemble (multiparticle) extinction measurement can be used to obtain the distribution of individual particle extinction cross sections (used here as a proxy for particle size distribution).

## 1. Introduction

It is well known that many clouds show increasing nonuniformity, in excess of random fluctuations, as smaller and smaller scales are examined observationally (e.g., Austin et al. 1985; Gerber et al. 2005; Beals et al. 2015) and through direct numerical simulation of turbulence in clouds (e.g., Shaw et al. 1998; Franklin et al. 2005). For many applications, such as parameterizations of drop size distributions in cloud models (e.g., Khairoutdinov and Kogan 2000), there is a need to assume that a given drop size distribution is representative for a relatively large cloud volume. Recognizing that some nonuniformity is inherent, cloud physics observations typically provide measured drop size distributions that are assumed uniform over the integration distance, for example, about 100 m for 1-Hz measurements at typical aircraft speeds.

Most aircraft-based cloud particle probes operate by measuring the properties of one particle at a time. With scattering probes such as the Forward Scattering Spectrometer Probe (FSSP; PMS Inc.) and the Cloud Droplet Probe (CDP; Droplet Measurement Technology), the sample volume is constrained to minimize the probability of two particles in the same measurement volume. Considerable effort has been directed at avoiding coincident particles in such probes, including the use of qualifier channels to identify data points where multiple particles are in the sample volume and development of correction algorithms (e.g., Cooper 1988; Brenguier 1989; Brenguier and Amodei 1989; Baumgardner et al. 1985; Brenguier et al. 1998; Lance et al. 2010; Knollenberg 1976).

A limitation of the particle-at-a-time approach is the long sample paths required to form particle size distributions. In order for a histogram to be a valid representation of a cloud, one has to assume that the cloud is uniform over the length at which the size distribution was formed. That assumption becomes less valid for longer pathlengths. This can become particularly problematic when investigating clouds with low particle concentrations. In such cases, the small sample volumes of scattering probes (approximately 25 cm^{3} s^{–1} at typical aircraft speeds) make it particularly challenging to build up meaningful statistics on rarer cloud particles.

In contrast, holographic cloud probes (Spuler and Fugal 2011) capture a single large volume of particles and can produce size-, shape-, and position-resolved distributions of particles at a single point. This highly detailed capture of a cloud has proven scientifically useful (Beals et al. 2015), but the processing time (approximately 1 month of processing per hour of data) makes it impractical for standard deployment. Where at NCAR we fly a CDP on nearly all field projects, the Holographic Detector for Clouds (HOLODEC) provides too much information at too high a cost to be practical for such general use.

In both the holographic and particle-at-a-time probes, the probe’s focus is on determining the properties of individual particles. Very often, however, the desired scientific data are statistical. For example, size distributions are volume-scaled probability distribution functions (PDF). To this end, I propose a statistical approach for measuring particle characteristics by performing ensemble measurements of particle properties that obey superposition (the measured property is equal to the sum of all individual particle properties).

I will show that there is an explicit analytical relationship between the PDF of ensemble (multiparticle at a time) particle measurements and the individual particle distributions. This relationship is linear in log-Fourier space, which allows forward inversions to converge quickly for real-time distribution retrievals and shows how coincident particles in a sample volume do not prohibit measurement of individual particle distributions. It also allows raw data to be stored as ensemble histograms that require relatively low network bandwidth and storage on the aircraft data system. Furthermore, performing ensemble measurements allows us to expand the instrument sample volume, creating opportunity to capture particle size distributions on smaller scales and in lower concentrations. The approach presented here represents a fundamental deviation from existing cloud characterization techniques.

In this work, I will describe the algorithm as a measurement of particle geometric extinction cross section, but it can be extended to other properties, including forward scattering, backscattering, and polarization, as well as nonoptical properties, such as power dissipated by hot-wire probes. Furthermore, the algorithm is not exclusively limited to atmospheric applications and may be applied to any statistical event that is randomly distributed in space or time, is approximately stationary, and the measurement obeys superposition.

In the following, function and variable definitions are provided in tables by subject. General independent variables are in Table 1, variables and functions related to the algorithm derivation are in Table 2, and definitions of symbols are provided in Table 3.

## 2. Algorithm derivation

The algorithm presented here is based on the concept that features in ensemble PDFs can be unique markers of their constituent components. Consider a case where a laser illuminates a detector to measure the total ensemble extinction of the sample volume (the fraction of beam attenuation divided by the beam area). For the purpose of illustration, we assume the particles in the volume are randomly distributed in space and the population consists of either all 20- or all 40-*μ*m-diameter particles. If we measure only the average extinction, then the problem of determining the particle size is intractable because what the 20-*μ*m particles lack in extinction cross section can be compensated with larger concentration. If instead we record a histogram to estimate the PDF of the extinction (assuming each sample is independent), we are likely to know which particle size is in the volume. Figure 1 shows the expected PDFs of a case where the average ensemble extinction cross section is 1571 *μ*m^{2} (the geometric extinction cross section associated with an average of five 20-*μ*m particles in the sample volume). Both distributions extinguish the light by the same average amount, but their PDFs remain distinctively different. While the number of particles in the sample volume is represented as a discrete random variable, the resulting attenuation of the beam is a continuous random variable (allowing that particle size is a continuous random variable). In this example, the particles are assumed to be perfectly monodisperse and therefore their respective size PDFs contain delta functions. Thus, the ensemble extinction PDF is zero in between each allowable mode associated with integer multiples of the individual particle extinction. The two cases produce distinctively different height, width, and mode spacing. The clear distinction between the two PDFs illustrates how one can use a histogram of ensemble observations to make determinations about the individual particle properties.

This example is an idealized case to help the reader conceptualize that information content about individual particles persists even when measurements are made in ensemble. I will show mathematically that this concept extends to more practical instances of particle characterization where the particle distributions are not monodisperse nor are the bins represented strictly by delta functions.

Without the loss of generality, the treatment in this work uses a geometric extinction cross section as the sensed particle parameter as a proxy for particle size. This is intended only as an example, where the purpose of this work is to demonstrate the capabilities of ensemble statistical inversions. A number of complications arising from extinction measurements (see Korolev et al. 2014) will be ignored.

The parameters used in this work consist of extinction cross section *σ* (m^{2}) and the ensemble extinction cross section *ε* (referred to as “ensemble extinction” for brevity). The ensemble extinction is assumed to be the linear sum of all extinction cross sections contained in the sample volume,

This approximation holds where particles are sufficiently sparse that the scattered (or attenuated) field from particles does not significantly affect the incident field seen by other particles. For uniform beams, *ε* defines how much light is extinguished at a particular instant in time (the effects of nonuniform beams are discussed in section 3). It has units of meters squared (m^{2}), and it *is not* the same as the extinction coefficient commonly represented as *α* with units of per meter (m^{−1}). The ensemble extinction represents the parameter space used by this algorithm, where the extinction coefficient depends on the particular properties of a particle probe design that are not relevant to the actual processing routine.

One can obtain an estimate of the medium’s extinction coefficient from ensemble extinction using

where is the average measured ensemble extinction and is the instantaneous sample volume of the probe. The instantaneous sample volume represents the total volume captured by the probe in a single snapshot. This contrasts with the integrated sample volume, which is approximately the forward projected area of the probe’s instantaneous sample volume multiplied by the pathlength over which data are accumulated.

The following subsections contain the derivation for the analytical relationship between particle extinction cross-section distribution and ensemble extinction distribution. In the first subsection, I derive the ensemble extinction PDF for one particle bin (e.g., an approximate monodisperse size distribution). I then expand that result in the second subsection to account for all particle size bins and the associated gradient calculation for use in forward inversion routines.

### a. Ensemble extinction from one particle bin

The basis of this algorithm is built on three assumptions. The first assumption is that the particles are randomly distributed in space, the second is that the distribution under investigation is approximately stationary, and the third is that the measurement being performed is linear, such that the ensemble measurement from two or more particles is the linear sum of each of those individual particle properties.

I will use the example of extinction (applied as a proxy for particle size), but the algorithm could be applied to a number of other particle measurement techniques. Because the intent is to provide an example application with physical meaning for the algorithm, I do not attempt to address the challenges associated with extinction measurements described in Korolev et al. (2014). The analysis conducted here is performed in terms of the measured variables’ ensemble extinction *ε* (m^{2}) and extinction cross section *σ* (m^{2}) instead of the equivalent particle diameter. This is essential to maintain the linearity required for the algorithm, and it further demonstrates the benefits of performing analysis in terms of measured variables as recommended by Rosenberg et al. (2012).

For a particle of extinction cross section , where *i* is the particle bin number, the probability distribution for the number of particles in a particular sample volume is a Poisson distribution, written here as

where *N* is the number of particles in the volume and is the expected number of particles given by the product of the mean number density and the instantaneous sample volume ().

Each particle maps one to one to a particular ensemble extinction. For a true monodisperse case, this means the ensemble extinction PDF of a single particle is a delta function. For better physical representation, we should allow for some finite width in the bin, which I will refer to as the bin function, . The bin function should have unit area and a Fourier transform that converges. It can be unique or the same for each bin. In this work all derivations will be presented allowing for to be different for all *i*, but all simulation analysis will use a normalized Gaussian function that is identical for all bins. Further study of the bin function would likely produce improvement in algorithm performance. For example, larger, more rare particles might benefit from larger bin functions, or bin functions could be used to represent principle components of an expected particle distribution rather than specific size bins.

Accounting for the bin function, the ensemble extinction PDF for a single particle from bin *i* is

where is the extinction cross section of a particle from the *i*th bin, *ε* is ensemble extinction, is the Dirac delta function, and the asterisk () is the convolution operator.

If particle counts in the sample volume are low, we can neglect overlap and multiple scatter, and the combination of more than one particle will result in a total ensemble extinction that is the sum of each individual particle’s extinction. This means that the multiparticle ensemble PDF is the convolution of each single-particle PDF. For the monodisperse case, the ensemble extinction PDF of *N* particles requires convolution operations [the operation uses *N* instances of ]. The result of the convolution operations is also weighted by the probability of *N* particles in the sample volume as defined in Eq. (3). Thus, the ensemble extinction PDF of *N* particles from bin *i* can be written as

where is the ensemble extinction PDF when *N* particles from bin *i* are present, is the mean number of particles in bin *i* and is a notation that I define here to represent the self-convolution of *N* of the preceding function. This operation is analogous to a power operator where convolution is used instead of multiplication. When , it evaluates and when , the operator is an identity such that .

The total PDF (accounting for all possible values of *N*) of the monodisperse ensemble extinction is then the sum of the PDFs for all *N*. Using the definitions in Eqs. (4) and (5), we obtain the total ensemble extinction PDF for particles in bin *i*,

If we take the Fourier transform of Eq. (6) with respect to the ensemble extinction *ε*, then the convolution operations become multiplication and we obtain

where is the Fourier transform of , we have substituted the definition for from Eq. (3) in the first set of brackets, is the frequency basis of *ε*, and is the Fourier transform of . We can rearrange Eq. (7) to group all powers of *N* together and pull constant terms out of the summation,

From Eq. (8) we can see that the summation is the infinite series of an exponential where the terms in the numerator are the argument of the exponent. The result is a simple expression for the Fourier transform of the ensemble extinction PDF as a function of the average number of particles in the *i*th bin of the particle extinction cross-section (size) distribution ,

It is notable that by taking the natural log of Eq. (9), we obtain a linear relationship between the log-Fourier transform of the ensemble extinction PDF and the mean number of particles () in the bin *i*,

### b. Ensemble extinction from all particle bins

The results of Eqs. (9) and (10) show how we can find the ensemble extinction PDF from the mean number of particles in a bin *i* through an analytical relationship. We can now apply this result to include all bins in the particle extinction cross-section distribution.

We now let multiple particle bins and an independent noise source contribute to the total ensemble extinction. Again, the ensemble extinction is the sum of all particles present in the sample volume. Thus, the total ensemble extinction PDF is the convolution of all individual bin PDFs and the noise PDF,

where each bin PDF is defined by Eq. (5) replacing *i* with the bin number and is the PDF of an independent noise source. We now represent the number of particles in each bin by the vector and the corresponding extinction cross section of each bin by , where both vectors have *C* elements. Again, the convolution operations in Eq. (11) become multiplication if we move to Fourier space,

where is the Fourier transform of , is the Fourier transform of the noise PDF , and the Fourier transform of each bin PDF is given by Eq. (9).

Finally, if we take the natural log of Eq. (12), then the multiplication operations become sums, and we can substitute the results from the previous section in Eq. (10) to obtain

Note that there is a linear relationship between the log-Fourier transform of the ensemble extinction PDF and the average number of particles in the individual extinction cross-section bins. Equation (13) represents an inner product between the particle extinction distribution in the volume and a set of analysis-defined coefficients. If we discretize the frequency components, then Eq. (13) can be represented by the matrix equation

where

where each term represents a column with a row dimension equal to the length of . The matrix has *C* columns, which are the number of particle extinction cross-section bins for the particular analysis.

Note that the terms used to compute depend on the analysis setup (number of bins, extinction bin selection, and bin function); all of those terms remain constant for a given analysis. This matrix need only be computed once prior to data analysis and is not dependent on the physical description of the instrument nor on the particles under analysis. Furthermore, there is an explicit calculation for determining the Jacobian or gradient of the function. This is important because gradient calculations are generally needed for forward inversion routines (these will be applied in the next section to retrieve particle distributions). Numerical gradient calculations tend to be computationally expensive and thus an explicit gradient definition serves to significantly improve the speed at which the routine converges.

If particle distribution analysis is conducted in log-Fourier space, the Jacobian is . However, the measured version of , , is computed numerically from a histogram of actual extinction data , and is thus prone to phase wrapping. This is problematic when taking the natural log of because the phase wrapping appears as discontinuities in the data. Instead, we typically perform the forward inversion in Fourier space (without the log operation), which is given by Eq. (12) but more easily represented in matrix notation as

The Jacobian of Eq. (16) is then given below in columns as

where is used here to denote elementwise multiplication and is the *i*th column of the matrix .

In a minimization of squares routine, we can define the error function as

where is the measured version of and is computed as the Fourier transform of the measured ensemble extinction histogram and ^{†} is the Hermitian conjugate operator. The gradient of the error function can be generally calculated as

The derivation presented above shows that there is an explicit mathematical relationship between a particle ensemble PDF and the individual particle PDF. This relationship has a fully defined derivative that allows for fast convergence when applying forward inversion techniques to solve for the individual particle distribution . The equations for these calculations are presented here for a single-parameter retrieval. Note, however, that this calculation is not limited to single-parameter retrievals. Fourier transforms can be applied across any number of dimensions (such as polarization and backscatter), so the calculation can scale accordingly. In such a case, the matrix becomes a higher-order tensor (with order 2 times the retrieval order), but the basics of the calculation remain fundamentally the same.

## 3. Nonuniform sensitivity

In many measurement systems, the sensed ensemble parameter is subject to variability due to nonuniform measurement sensitivity. This may be a result of nonuniform intensity in the illuminating laser, or partial capture of a particle (e.g., clipping of edges by the field stop) in the probe. In this example, I will discuss the effect from nonuniform beam intensity without loss of generality to other effects. At small amounts of variation, this effect can be accounted for in the analysis. However, at some point the cross coupling between different particle sizes will become too large and the calculation will have no unique inverse.

The problem of nonuniform sensitivity can be set up in terms of a matrix equation, where the actual particles are coupled into a distribution of measured bins,

where is the actual particle PDF or size distribution, is the coupling matrix computed based on nonuniform beam intensity, and is the measured particle distribution skewed by the nonuniform beam intensity.

To account for the skewing caused by nonuniform beam intensity, we must compute , which can be included in the analysis equations shown earlier by substituting everywhere for . Similar to , the coupling matrix can be computed once, prior to the actual particle inversion analysis. However, unlike , is dependent on instrument characteristics and must be estimated through instrument characterization.

In the case of a laser used for measuring particle extinction, a detector measures the incident optical power,

where is the detected power, is the unattenuated laser power, is the particle extinction cross section, and is the intensity of the laser at the particle’s location in space. Consider a case where we assume a uniform beam intensity of . This is likely the average beam intensity obtained by dividing by the beam area, but it can be any arbitrary value. The detected power is processed assuming an intensity to obtain a measured extinction cross section of , but if the beam is not uniform, then the particle may have had an extinction cross section and experienced a local beam intensity ,

Rearranging Eq. (22) gives

Because we have bins with finite widths, there is a range of intensities that can cause a particle of extinction cross section to be measured as . Solving Eq. (23) for the limits of those intensities gives the lower bound,

where is the measured extinction cross section, is the bin separation, is the actual particle extinction cross section, is the reference intensity used for the analysis, and is the lowest local intensity that will cause a particle of extinction cross section to be sized in extinction cross section bin .

Similarly, the upper intensity limit is given by

where now is the highest local intensity that will cause a particle of extinction cross section to be sized in extinction cross section bin .

The individual elements of are given by the probability that the particle is in an intensity field with a value between and . This is written in terms of the beam intensity PDF as

In principle, is a measured quantity that might be obtained by recording the sample volume beam intensity with a charge-coupled device (CCD).

As the condition of approaches infinity, it becomes impossible to account for nonuniform sensitivity. Thus, experiments should still be designed for sensitivity that is as uniform as possible. The use of in the analysis can only make small adjustments to obtain a more accurate solution, and the value of those adjustments must be weighted against uncertainty in the probe characterization.

## 4. Application of algorithm

Assuming is nonsingular, one can solve Eq. (14) for the particle extinction cross-section PDF (particle size distribution), , by applying

Typically will be overdefined, so the inverse matrix will use the pseudoinverse. In practice, the log of a Fourier transform will contain an explicit phase term that is prone to phase wrapping as the number of particles become large. Phase wrapping appears as a discontinuity in the function and breaks the linearity of the inversion calculation. It is possible to unwrap the phase of the measured Fourier transform, allowing the linear inversion in Eq. (27). However, results from using Eq. (27) did not appear to perform as well as the forward inversion method for the case presented here. Additionally, the linear inversion does not generalize to multidimensional parameter retrievals where there is no practical inverse for tensors of order greater than two and phase unwrapping algorithms become computationally expensive. At this point, the methodology for performing the linear inversion has not been fully explored, but it likely offers at least two benefits: It allows for error propagation to estimate uncertainty in the retrieved distribution, and it does not require an initial guess for the particle distribution where the quality of initial guess can impact the accuracy of the retrieved data.

In this work a forward inversion in Fourier space (without the log operation) is used to solve for the particle distribution. A predetermined allows for fast calculation of the Fourier transform ensemble extinction PDF through optimization of the particle extinction cross-section distribution . Furthermore, the explicit definition for the error function gradient in Eq. (19) enables even faster optimization by significantly reducing the number of function calls required at each optimization step.

Processing an ensemble extinction distribution starts with a preanalysis that need only be performed once prior to dataset analysis.

Preanalysis definitions and calculations:

Define ensemble extinction

*ε*with periodic grid resolution (to enable Fourier transform). The maximum extinction generally should be larger than the maximum expected bin of to accommodate multiples of large particles.Define particle extinction cross section bin centers to obtain . The bin separation can be different for each grid point, but it should be an integer multiple of , so that each grid point of is centered on a grid point of

*ε*(FFTs are not well suited to subsample offsets). The smallest bin must be the same or larger than the smallest resolvable*ε*grid point.Define ensemble extinction frequency axis from the definition of

*ε*[see Eq. (28) below].Define particle bin functions and calculate their Fourier transform .

Calculate using the definitions from steps 1 to 4 and Eq. (15).

Measure the noise PDF and calculate its Fourier transform .

Calculate based on probe characterization.

Note that the frequency axis is obtained directly from *ε*, where both discrete axes have the same number of points and the frequency resolution is given by

where is the number of ensemble extinction bins and is the extinction bin separation.

When data are collected, they are formed into a histogram of ensemble extinction to estimate the PDF. The data analysis steps must be performed each time we wish to retrieve an individual particle extinction cross-section PDF.

Data analysis:

## 5. Monte Carlo simulation

A Monte Carlo simulation is used here to demonstrate that the algorithm can reconstruct an arbitrary particle size distribution from ensemble data. A brief description of the simulation is provided below, and the simulation parameters are summarized in Table 4.

The simulation produces a bimodal particle size distribution consisting of a gamma distribution in combination with a normal distribution. While this is unlikely to be observed in clouds, this is done to demonstrate the ability of the algorithm to calculate an arbitrary particle distribution.

The gamma distribution is calculated as a function of particle radius and has the form

where is the gamma distribution function and is the particle radius.

The normal distribution is calculated as a function of extinction cross section and has the form

where is the normal distribution function, *σ* remains the particle extinction cross section, is the average extinction cross section, and is the standard deviation of the extinction cross section.

For the simulation we let the total number of particles in the normal distribution be one-fifth of the number of particles in the gamma distribution. The total number density of particles (including both distributions) is 64 cm^{−3}.

The particles are randomly distributed in space using a random number generator and passed through a Gaussian laser beam. The collected light is stopped down using a circular field stop, placed so that the entrance window (image of the field stop in object space) is in the center of the sample volume. The resulting measured intensity is determined by spatially integrating the total power of the light at the end of the sample path. The sample area of the probe is given by

where is the laser pathlength through the sample volume and is the diameter of the circular entrance window. For the given beam length and entrance window diameter, the simulated probe has a sample area of 46.875 mm^{2} (or 4.7 L s^{−1} at an aircraft speed of 100 m s^{−1}), which is nearly 200 times larger than the CDP sample area of 0.24 mm^{2} (or 24 cm^{3} s^{−1} at 100 m s^{−1}).

The instantaneous sample volume is the volume captured at a single instant by the instrument. For the simulated instrument, it is approximated by

where is the area of the entrance window.

The probe description given above is essentially a laser beam illuminating a single detector. The design presented here is strictly conceptual for the purpose of demonstrating the application of the ensemble inversion algorithm. For the purpose of understanding this algorithm, the probe is best considered in terms of sample area () and instantaneous sample volume (). For comparison, a CDP has an approximate instantaneous sample volume of . The specific details of the optical system should not be interpreted as suggestions for system design parameters of an extinction probe.

The steady-state voltage from the laser illuminating the detector is 2.8 V, and the direct current (dc) bias in the detector circuit is 2 mV. The simulated electronic noise is normally distributed with a 6-mV standard deviation.

The simulated data includes three segments. First, I simulate the detector with no laser illumination. This segment is used by the analysis routine to determine the unilluminated detector bias, , which is needed to convert the voltage signal to extinction. Next, I simulate the system with illumination but no particles. This segment is used in the analysis to determine the steady-state voltage, , for conversion of voltage to extinction. The segment is also used to determine noise statistics for the inversion. The third segment contains the actual particle data. In this segment, randomly distributed particles pass through the illuminating laser and the resulting detected voltage is recorded in the dataset.

The data processing routine is separate from the simulator and only has access to the three segments of voltage measurements described above, the entrance window area, the sample pathlength and the beam intensity distribution (which is effectively uniform for the given beam radius and entrance window). The initial guess for (needed for the forward inversion) is set by multiplying a decaying exponential by an array of random numbers.

The particle extinction cross section resolution, is set to 8.9 × 10^{–11} m^{2} which is roughly equivalent to a 10-*μ*m diameter spherical particle in the first bin. The particle bin function is the same for all bins and is a normal distribution with a standard deviation 1.5 times the extinction cross section resolution . The particle retrieval contains 58 extinction cross section bins. The total ensemble extinction histogram must be larger than the particle size bins to account for multiple counts of the largest particles. For this analysis *ε* has 231 bins. The analysis parameters used are summarized in Table 5.

Ensemble extinction is calculated from the voltage data using

where is the voltage recorded when particles are passing through the beam, is the average unilluminated voltage determined in the first data segment, is the average voltage when the detector is illuminated but there are no particles and is determined during the second data segment, and is the area of the entrance window (image of field stop in object space), defining the data collection area. Note that this measurement requires above which the extinction signal will saturate. The simulation assumes that detector response and particle blurring are negligible for the simulated probe. Figure 2 shows an example of the ensemble extinction time series data that is analyzed to eventually estimate the particle size distribution.

The time series of the ensemble extinction data is used during the second segment, where no particles are present, to obtain noise statistics and .

Finally the third segment is down-sampled to every fifth point to ensure each data point samples an independent volume. The resulting samples are used to produce a histogram of ensemble extinction (see Fig. 3). The error function is defined based on the histogram’s Fourier Transform (computed using an FFT; see Fig. 4). The error function defined in Eq. (18) is minimized using the *fmin_slsqp* routine in scientific Python. The optimizer is run unconstrained, so it effectively uses Newton’s method. By explicitly defining the gradient from Eq. (19), the number of function evaluations is significantly reduced compared to instances where the function has to evaluate the gradient numerically.

The result of the ensemble extinction fit is shown in ensemble extinction space (*ε*) in Fig. 3 and in Fourier space () in Fig. 4. Figure 5 shows the retrieved particle size distribution presented as diameter of area equivalent spheres with the bin functions imparted on the final product. Notably the ensemble extinction fit in Fig. 3 is able to carry its fit terms even into regions where sampling statistics are relatively poor (higher ensemble extinction). However to a certain extent, the numbers of relatively rare larger particles are difficult to quantify accurately. The particle size distribution estimate is shown with a logarithmic vertical axis to demonstrate the limitations of particle sizing when those particles are rare. The retrieval presented here appears to do a good job estimating the particle size distribution over approximately six orders of magnitude. The size distribution estimate is also poor in the smallest bin. In this region, the wings of the larger extinction bin functions may overly impact this bin, skewing the concentration estimate of the smallest particles. It may be possible to mitigate this by picking a better bin function. In general, further study is needed to determine the best bin functions, but given the relatively simplistic approach taken here, the algorithm still performs quite well.

For the processing parameters given in Table 5 the optimization routine takes about 0.2 s. The processing routine is run in python on an Intel i7 running Scientific Linux (Carbon). Multithreading is not implemented so only 1 of the 3.7-GHz processors are used for the processing routine. This suggests that real-time retrievals may be realistically accomplished at one to two second intervals.

Once we have a PDF for the particle extinction cross section (size) distribution, the particle number density can be computed by dividing by the instrument’s instantaneous sample volume.

where is the vector of particle number density (corresponding to extinction cross sections ), is a vector of the mean number of particles in each bin and is determined through the optimization process and is the instantaneous sample volume.

If we reduce the distance over which the ensemble extinction histogram is accumulated, the particle size distribution loses accuracy. This is also the case for particle-at-a-time probes, where reduced particle counts result in greater concentration uncertainty. To compare the two approaches, we consider a CDP-like probe (Cloud Droplet Probe, Droplet Measurement Technologies, Boulder, CO) with a similar sample pathlength and a sample area of 0.24 mm^{2}. Figure 6 shows the particle size distribution estimate of the ensemble algorithm based on the Monte Carlo simulation when the probe accumulates ensemble histograms over a 50 m and a 100 m path. The expected CDP uncertainty envelopes (actual distribution ± standard deviation) are also plotted for the corresponding pathlength. The CDP uncertainty envelopes are estimated based on the standard deviation of the expected number of particles given by

where is the standard deviation in the CDP number density estimate as a function of particle diameter, is the actual number density distribution, is the projected area of the CDP (0.24 mm^{2}) and is the sampled pathlength. This equation only accounts for particle counting uncertainty, assumes that there are no rejected particles due to coincidence and assumes all particles are sized correctly.

Though reducing the accumulation pathlength to 50 m increases the error in the particle size retrieval, the algorithm still produces concentration estimates comfortably within the limits of the CDP counting error (with the exception of the first bin). As the accumulation pathlength increases, the CDP uncertainty due to counting statistics decrease, as does the error in the presented algorithm. This assumes that the cloud’s distribution has not changed significantly over the accumulation path of the histogram. We should be aware, however, that the algorithm consistently overestimates the number concentration the smallest particle bin due to the wings of larger bin functions. This suggests more work on the particle bin functions is needed to obtain accurate results in the smallest bin.

The algorithm presented here offers an opportunity to use larger sample volume probes, and therefore shorter accumulation pathlengths in estimating particle size distributions. This algorithm is not limited to the use of extinction for particle sizing. It could potentially be applied to fast hot-wire probes and forward scattering probes. Furthermore, because the Fourier Transform is easily expanded to higher dimensions, it can also be expanded to include correlated multivariate distributions (e.g., extinction, backscatter, and polarization matrix elements).

## 6. Algorithm limitations

While this algorithm presents a new approach to sensing through ensemble statistics, I would be remiss if I did not acknowledge it has some limitations.

I have shown in the Monte Carlo simulation results that small particle concentrations can be skewed due to a lack of resolution (see Fig. 5), but they can also be skewed due to poor noise metrics. The algorithm will tend to compensate for noise by adjusting the number of particles in the smallest bins. This can be mitigated by carefully choosing the bin resolution and the bin function to suppress the ability of the optimization routine to fit high-frequency noise (where high frequency is defined in terms of *ε*). However, if the bin resolution becomes too large, then the algorithm will not be able to match the high-frequency components of the actual particle size distribution and the quality of fit will also suffer. Thus, there is a trade-off in bin sizing. Large bins will suppress actual high-frequency size distribution terms, and small bins will increase susceptibility to noise in the measured ensemble extinction histogram, largely due to counting statistics.

Like particle-at-a-time probes, the algorithm’s size distribution accuracy is connected to the sample size. To resolve uncommon particles, one has to capture data over a relatively long pathlength. However, because of the use of a forward inversion, the exact effect of this error is less obvious than the sampling errors resulting from particle-at-a-time methods. This makes it difficult to quantify uncertainty in the algorithm-retrieved distributions. On the other hand, this algorithm offers larger sample volumes that may allow for smaller pathlengths, particularly when looking at larger particles. As was shown in Fig. 6, a probe designed to capitalize on the presented algorithm may still have advantages. Also, an ensemble probe might be combined with a CDP or FSSP, or other smaller sample volume versions of itself in order to perform better particle size estimates through additional constraints.

The ability of the optimizer to determine the particle size distribution from ensemble measurements is fundamentally dictated by the uniqueness of the recorded ensemble distribution. The ensemble extinction PDF is unique when the mean number of particles in the sample volume is relatively small. As the mean number of particles increases, the distribution’s identifying distinctions become smaller. The presence of noise will then prevent accurate reconstruction of the particle distribution. Thus, the sample volume of a probe needs to be small enough that particles are not present in high numbers.

In the simulation presented here, there is an average of 0.4 particles in the sample volume. For comparison, the mean number of particles in the CDP sample volume for the same simulated cloud is on the order of 0.002 particles. The simulation is rerun for cases of increasing probe sample volume to show how increases in the mean number of particles can negatively impact the distribution retrieval. The retrieval results are shown in Fig. 7 for a mean particle number between 0.4 and 7.0. The algorithm continues to do relatively well obtaining an accurate size distribution up to an average of 5.3 particles in the sample volume. The simulation where an average of 7.0 particles is in the sample volume still manages to capture the bimodal nature of the particle size distribution, but the particle bins are quite inaccurate. The corresponding measured ensemble histograms are shown in Fig. 8, which provide some insight into why the algorithm has difficulty reconstructing particle size distributions in high concentrations. As the mean number of particles increases, the distribution shifts to higher overall values and also smooths. This would be expected for the large number of convolution (or low-pass filter) operations associated with large particle numbers. At an average of 3.5 particles in the sample volume, the multimode structure of the distribution is no longer clearly visible. Thus, there is clearly a limit to how large of a sample volume a probe can have and still expect to retrieve individual particle properties.

The algorithm requires that the measurement is sensitive to any significant concentration of particles. Unlike 2D probes, which can largely ignore small high-concentration particles, the method presented here requires an ability to account for the ensemble effects of those particles. The only exception to this is when even the ensemble effects of the particles are significantly below the detection noise of the instrument.

There may be reasonable concerns about the validity of the assumptions required for application of this algorithm. The algorithm assumes that the particle size distribution is stationary over the histogram accumulation range. The processing of an ensemble histogram over a nonstationary particle size distribution does not produce a particle retrieval equal to the average of the nonstationary histograms. This is because separate distributions essentially break the possibility of collocation of different sized particles. For slow variations, this does not present any significant issue. Even sudden and large changes do not appear to have a severe impact on the validity of the algorithm. I have simulated the bimodal case presented earlier, where now the two modes are separated in space. The first half of the sample has a particle distribution given by the normal distribution. At the second half of the sample, the particle size distribution undergoes an abrupt change and is represented by the gamma distribution. The resulting retrieved particle distribution deviates from the average of the two distributions (see Fig. 9), but the algorithm still performs relatively well. The impact of nonstationary distributions is most likely going to depend on mean particles in the volume. Where the number becomes high, cross interactions between bins becomes more significant and the requirement for stationary distributions may increase.

Research into the heterogeneity of clouds also may pose an additional uncertainty in the distribution retrievals. Analysis of HOLODEC data in Beals et al. (2015) suggests that particles are not always, if ever, uniformly distributed in space. The correlations in spatial distributions of particles might result in rapidly varying nonstationary size distributions. However, this issue would be mitigated when the probe’s instantaneous sample volume is larger than the centimeter scales over which these correlations exist. Observations on large spatial scales still appear to obey Poisson statistics. This has been used for identifying and removing shattering artifacts in optical array probes (Field et al. 2006). Further, bimodal FSSP data appear to obey Poisson statistics if we assume the smallest mode is the result of particle shattering (Field et al. 2003).

The requirements for stationary distributions and homogeneously mixed volumes of cloud might be used for benefit. Differences between particle-at-a-time and ensemble probe retrievals may serve as markers for cloud heterogeneity or rapid changes in particle size distributions.

## 7. Conclusions

In this work I have shown that randomly distributed particles have explicit relationships between their ensemble and individual particle statistics. Practically speaking, this means that particle size distributions can be retrieved from ensemble histograms as long as the measurement method obeys superposition. As a consequence of this approach, coincidence errors are nonexistent because the algorithm is specifically designed for cases where multiple particles are in the sample volume. Forward inversions using this mathematical framework have short convergence times because the calculation and its derivatives are explicitly defined. This work suggests it is possible to design particle probes that have larger sample volumes than the current particle-at-a-time instruments without significantly increasing processing time and data storage requirements. The probe design simulated here has a sample area approximately 200 times larger than that of the CDP and, notably, approximately 5 times that expected in the third bin of a 25-*μ*m 2D probe (62.5-*μ*m diameter). These larger sample volumes would reduce requirements on the accumulation pathlength and enable better characterization of clouds with low particle concentration and improve measurement of rare but microphysically important particles, such as drizzle drops.

The algorithm described here is strictly based on measurement statistics. It does not rely on a model of a physical instrument (and therefore the possible inaccuracies of said model). Furthermore, it allows us to drop consideration of individual particle properties so raw measurements can be compressed into histograms. This reduces data system loading and storage requirements. The processing is also fast, where a 58-bin particle size distribution is obtained in less than 1 s using a standard PC.

The analysis presented here is for a single measurement inversion. However, the algorithm is built on Fourier transforms, which easily scale to higher dimensions. Thus, it can be used for correlated multiparameter retrievals, such as scattering cross section and polarization properties. As a result, it may be particularly useful for measurements in cirrus clouds, which are typically composed of nonspherical ice crystals in low concentrations.

There are some practical limitations to this algorithm. Inversion of particle distributions becomes difficult as the density of particles becomes large. Thus, the ideal probe sample volume depends somewhat on the particle concentrations under investigation. The case analyzed here lost significant accuracy when the mean number of particles in the sample volume was greater than 5. As with most other measurement techniques, there is also a need to obtain near-uniform measurement sensitivity across the entire sample volume. Because this analysis used forward inversion techniques, there are some practical challenges in determining uncertainty in the retrieved particle distribution. This makes it difficult to obtain an analytical assessment of confidence in the algorithm’s performance. Finally, deviations from the assumed stationary particle distributions and homogeneous mixing in actual clouds may result in additional uncertainty. Most of the issues presented here are not fundamental and as this algorithm continues to develop, some of the current limitations may be better addressed or mitigated.

Finally, the algorithm presented here is not exclusively limited to cloud probe studies. Fundamentally, it is a signal processing method that may be applied to any ensemble measurement where the observed property obeys superposition and the events under investigation are randomly distributed in time or space.

## REFERENCES

*Int. Conf. Cloud Physics*, Boulder, CO, Amer. Meteor. Soc.,

## Footnotes

^{a}

The National Center for Atmospheric Research is sponsored by the National Science Foundation.