## Introduction

In physics there are many magnitudes that depend on others in an exponential way. In other cases, the values of a magnitude follow an exponential frequency. An example of the former is the number of radioactive nuclei that persist in time without disintegrating. Drop size distributions are an example of the latter case. The exponential function is the same in both cases, but the two concepts are essentially different.

*N*is the number of unstable nuclei remaining after a time span

*t*, with

*N*

_{0}being the initial number of nuclei. In this equation the parameter of the exponential

*λ*(called the disintegration constant) is related to the half-life of the radioactive isotope. If we have several pairs of data (

*N*

_{i},

*t*

_{i}), we can determine

*λ*from Eq. (1) merely by taking the natural logarithm of both members:which is the equation of a straight line, if we consider time

*t*the independent variable and ln

*N*the dependent variable. Thus, a simple least squares fit allows us to obtain an estimation of the slope

*λ*.

*N*is the number of drops per unit of volume according to size

*t*. This is the drop size distribution proposed by Marshall and Palmer (1948), which Smith (2003) compares to the probability density function as follows:The difference may seem insignificant at first sight, but a controversy arises when it comes to calculating

*λ*.

## Calculating *λ*

In the second case, the value of *N* in Eq. (1) represents the number of drops with a size of approximately *t*, which means that all of the drops in the sample have to be grouped into classes. Because the distribution is exponential, it may happen that in the larger sizes some of the classes are empty. Fraile et al. (1992) have noted that in this case it is not possible to use Eq. (2) for determining *λ* by the least squares fit because if *N* = 0, ln*N* makes no sense.

These authors have also noted that, even if there are no empty size groups, the least squares fitting of the straight line in Eq. (2) places more importance on the larger sizes.

From an experimental perspective, there are instrumental problems when accurately measuring the smallest hydrometeors. These problems are the result of a number of different causes. For instance, the hydrometeors may not reach the lower threshold, or the problems may be the result of the method, the resolution of the equipment, overlapping, etc. Korolev et al. (1998) have pointed out some of these indeterminacies in measuring the number of small hydrometeors with an Optical Array Probe 2D2-C. Fraile et al. (2004) have also noticed these problems in measuring hailstones.

Furthermore, it is well known that in microphysical measurements the largest particles are more important in estimating the liquid water content, because this parameter is proportional to the cubed diameter of the drop. Similarly, the largest hydrometeors are also the ones that contribute more importantly to the reflectivity factor, because this factor depends on the sixth power of the diameter of the drops.

This may award greater importance to the larger sizes of the spectrum. However, if the aim is to calculate size distributions, it is necessary to employ a method that applies equally to the whole size spectrum. If one part of the spectrum needs to be highlighted, an appropriate weight should be established for it, but not the weight that a particular fitting method may introduce.

The aim of this paper is both to demonstrate that the least squares fit of the straight line in Eq. (2) introduces a bias, and to offer an alternative way for calculating *λ* parameter in an exponential distribution.

### Differences in weight factors in the least squares fit

*n*data points (

*x*,

_{i}*y*) that follow an approximate relationship of the following type:the form of

_{i}*f*is known, even though the

*k*parameters

*A*(

_{j}*j*= 1, . . . ,

*k*) on which it depends are unknown. Calculating

*A*by means of the least squares fit is equivalent to estimating the value of the parameters that minimize the quantityThen, the system of

_{j}*k*equations that leads to determining

*k*parameters isIn case some of the points are more reliable than others, we may incorporate a weight factor

*w*for each pair indicating their reliability. Consequently, instead of Eq. (4) we will have to minimizeIf the points (

_{i}*x*,

_{i}*y*) still show an approximately exponential relationship, that is, ifthen the least squares fit is equivalent to estimating the value of

_{i}*λ*and

*N*

_{0}that will minimize the quantityand, if the reliability of the points were different, the expression that would have to be minimized would beIt will be illustrated below that the least squares fitting to the straight line in Eq. (2)—built from an exponential—will lead to an expression that is similar to the one in Eq. (8), that is, attributing different weights to the points.

*x*,

_{i}*y*) comply with Eq. (6), they also confirm thatwhich is a linear equation in

_{i}*x*. The least squares fitting to that straight line is equivalent to estimating the value of

_{i}*λ*and

*N*

_{0}that minimize the quantityIf we call the relative residual

*p*= [

_{i}*N*

_{0}exp(−

*λx*) −

_{i}*y*]/

_{i}*y*, thenBecause, according to Eq. (6),

_{i}*p*is very small, the Mercator series (a Taylor expansion of the natural logarithm) leads toAs second-order and higher-order terms are neglected, the result isIt is obvious that minimizing Eq. (9) is equivalent to minimizing Eq. (8) if the weight

_{i}*w*isWhat does this mean? It simply refers to the fact that transforming an exponential distribution into a linear function to subsequently estimate the parameters of the line by means of the least squares fit is broadly equivalent to applying the least squares fit to the exponential function with a different weight assigned to each point (

_{i}*x*,

_{i}*y*). In addition, because

_{i}*w*is a growing exponential function, more weight is assigned to the higher values of

_{i}*x*. In consequence, the fit is better for the points with a higher

_{i}*x*value. This fact can be illustrated graphically when representing an exponential distribution with the value

_{i}*λ*calculated in the way described above, together with the data points (

*x*,

_{i}*y*) from which

_{i}*λ*has been estimated. These points will be farther away from the exponential curve for the lower values of

*x*. This is the usual result when this method is applied to calculate

_{i}*λ*(Fraile et al. 1992). In Fig. 1 it can be compared with the exponential function that is obtained from calculating

*λ*with the moment method. In this case, hailstones larger than 5 mm have been measured, and, consequently, an extension of this method was used, as described in section 2b.

*λ*depends on a weight function (10), which, in its turn, depends on

*λ*. Therefore, we suggest that the value of

*λ*should be calculated from the probability density function (3) by means of the moment method. This method is generally known to be biased (Wallis et al. 1974). However, in the case of an exponential distribution it is identical to the maximum likelihood method (Sneyers 1990), which is not biased. Other methods may equally be used, for instance, the chi-square minimization method (Cramér 1999). Moreover, if we take into account the fact that the exponential distribution is a particular case of the gamma distributionwhen the shape parameter is

*α*= 1, other methods may be used, for instance, the ones proposed by Wilks (1990) for the gamma.

### A method for fitting truncated exponential distributions

The estimated value of *λ* according to the method of moments is the inverse of the mean value of the sample. The exponential distribution lies between zero and ∞. But there are situations in which only one range of values can be observed; this is the case of solid precipitation, which is only labeled “hail” if the size surpasses 5 mm in diameter (WMO 1992), or in the case of drops, if the data are provided by equipment that measures only sizes in a particular interval. In both cases, the mean value of the sample does not coincide with the inverse of the expected value of *λ*. This could be a valid point to insist on using the linear fit, because the method of moments cannot be used in these conditions. However, a simple procedure will be suggested below for calculating *λ* on the basis of the method of moments.

*x*

_{0}(e.g.,

*x*

_{0}= 5 mm in the case of hail), the expected value of

*x*between

*x*

_{0}and

*∞*isand approximating

*E*(

*x*) by means ofif we have a sample of

*n*data

*x*, the value of

_{i}*λ*iswhich is very easy to calculate, because it is a change in the coordinates

*x*, taking

_{i}*x*

_{0}as the point of origin.

*x*

_{0}and the upper one

*x*, the expected value of

_{u}*x*isand we may reorganize this as follows:Consequently, if we want to fit

*n*values of

*x*∈ [

_{i}*x*

_{0},

*x*] to a truncated exponential, the expected value of

_{u}*x*will be given by Eq. (11). Equation (12) may be used for calculating a new value of

*λ*, starting from an initial value that is introduced in the member on the right-hand side. In other words, Eq. (12) allows us to determine the parameter of the exponential distribution by means of subsequent iterations: for a given value of

*λ*, we obtain the value of

_{k}*λ*

_{k}_{+1}=

*f*(

*λ*), whereThe

_{k}*λ*error can be made as small as required. If we call this error

*ε*, after

*m*iterations the following will result:

*λ*

_{m}_{+1}−

*λ*<

_{m}*ε*.

*f*(

*λ*) tends to a fixed point under iteration, that is, Eq. (13) converges, if the slope of the curve Eq. (13) in the fixed point (i.e., the derivative of the function in that point) takes values between −1 and 1 (Kaplan and Glass 1995). The function derived from Eq. (13) with respect to

_{k}*λ*iswhich is always positive, because both the numerator and denominator are always positive. This result indicates that whether the function is convergent or not, the trend is monotonic (not oscillatory). Once the derivative has been seen to be positive, to verify the condition mentioned in the previous paragraph it is only necessary to test thatwhich is equivalent to testing thatTo assess the extent to which the convergence criterion in Eq. (14) applies to the examples suggested (hailstone and drop size), we have applied this equation to hail data that are registered during a summer campaign in the hailpad network of León (Spain) and to the distributions of hydrometeors measured inside convective clouds by means of an optical array probe (OAP) 2D2-C (Sánchez et al. 2001). The result shows that in all cases the inequality persists, thus, demonstrating that the iteration function is useful, at least in the examples described. In addition, not too many iterations are necessary to obtain acceptable results. For example, for the hailpads chosen, with five iterations

_{k}*λ*takes a value that differs from the final value (the one it will supposedly reach with infinite iterations), which is less than 5% in 60% of the cases. Moreover, in 60% of the cases the

*λ*that is calculated with 10 iterations approaches the final value with a difference of less than 1%. With 15 iterations this percentage increases to almost 70%. The gradual decrease of the difference with the final result (or relative error) is represented in Fig. 2 according to the number of iterations.

In the paragraphs above we have calculated *λ* when there is only a lower threshold *x*_{0} and when there are upper and lower thresholds in the sample. The remaining case—an upper threshold only—is equivalent to using Eq. (12) when *x*_{0} = 0. As expected (and this may be easily tested), if *x _{u}* is very high, the value of

*λ*will be very similar to the inverse of the mean value. In any case, whatever the value of

*x*, the new

_{u}*λ*will always be lower than the one calculated directly as the inverse of the mean value.

## Conclusions

- This paper has demonstrated that calculating the
*λ*parameter in an exponential distribution by means of the least squares fitting to the straight line in Eq. (2) incorporates a weight factor that assigns more importance to the higher values of the independent variable. - In consequence, we suggest that this parameter should be calculated by means of the method of moments or maximum likelihood, whose results are identical in the exponential distribution.
- There are also exponentially distributed data that do not extend to the whole domain of the exponential distribution (from zero to infinity), usually resulting from the sampling techniques employed. In these cases, the sample has to be restricted to a reduced interval, and we suggest the use of a straightforward iterative technique based on the method of moments.

## Acknowledgments

The authors are grateful to Dr. Maite Trobajo for her suggestions and to Dr. Noelia Ramón for translating the paper into English. The reviewers have greatly contributed to improving the paper. The study was supported by the Consejería de Educación y Cultura, Junta de Castilla y León (Grant LE53/03).

## REFERENCES

Cramér, H. 1999.

*Mathematical Methods of Statistics*. Princeton University Press, 575 pp.Fraile, R., , A. Castro, , and J. L. Sánchez. 1992. Analysis of hailstone size distributions from a hailpad network.

*Atmos. Res.*28:311–326.Fraile, R., , L. López, , C. Palencia, , and A. Castro. 2004. Hailstone size distribution and dent overlap in hailpads.

*Proc. 14th Int. Conf. on Clouds and Precipitation*, Bologna, Italy, ICCP, 763–766.Kaplan, D., and L. Glass. 1995.

*Understanding Nonlinear Dynamics*. Springer-Verlag, 440 pp.Korolev, A. V., , J. W. Strapp, , and G. A. Isaac. 1998. Evaluation of the accuracy of PMS optical array probes.

*J. Atmos. Oceanic Technol.*15:708–720.Marshall, J. S., and W. M. Palmer. 1948. The distribution of raindrops with size.

*J. Meteor.*5:165–166.Sánchez, J. L., , E. García-Ortega, , and J. L. Marcos. 2001. Characterization of the distribution of cloud spectra for thunderstorms in the western Mediterranean area. Preprints,

*Symp. on Precipitation Extremes: Prediction, Impacts, and Responses*, Albuquerque, NM, Amer. Meteor. Soc., 129–132.Smith, P. L. 2003. Raindrop size distributions: Exponential or gamma—Does the difference matter?

*J. Appl. Meteor.*42:1031–1034.Sneyers, R. 1990. On the statistical analysis of series of observations. WMO Tech. Note 143, 192 pp.

Wallis, J. R., , N. C. Matalas, , and J. R. Slack. 1974. Just a moment.

*Water Resour. Res.*10:211–219.Wilks, D. S. 1990. Maximum-likelihood-estimation for the gamma-distribution using data containing zeros.

*J. Climate*3:1495–1501.WMO 1992.

*International Meteorological Vocabulary*. World Meteorological Organization, 784 pp.