## Introduction

The statistical characterization of rain is useful in understanding the large-scale space and time variability of the process. From the perspective of spaceborne sensors, knowledge of the large-scale properties of the rain can help to assess the accuracy of the retrievals by imposing constraints that must be satisfied by the spatial or temporal averages of the high-resolution estimates. Data from the Tropical Rainfall Measuring Mission (TRMM) precipitation radar (PR) are well suited to treatment by statistical methods in that rain data are sparsely sampled in time. Moreover, the high-resolution estimates are often of limited accuracy at high rain rates because of attenuation effects and at light rain rates because of receiver sensitivity. The effects of attenuation are particularly relevant to the PR. Because it operates at a higher frequency than most ground-based weather radars and because methods of attenuation correction are subject to errors, an approach that can circumvent an attenuation-correction procedure may offer insight into the performance of the standard techniques.

In this paper we use the TRMM PR data to investigate the behavior of statistical methods, the purpose of which is to estimate rainfall over large space–time domains. Examination of large-scale rain characteristics provides a useful focal point. The high correlation between the mean and standard deviation of rain rate implies that the two-parameter conditional distribution of this quantity can be approximated by a one-parameter distribution. This property is used to explore the behavior of the area–time integral (ATI) methods in which fractional area above a threshold is related to the mean rain rate. Unlike typical ground-based radar data, attenuation effects in the higher-frequency spaceborne radar data are not negligible at higher rain thresholds, and modifications to the fractional area method are needed. In the usual application of the ATI method, a correlation is established between these quantities. However, if a particular form of the rain-rate distribution is assumed and if the ratio of the mean to standard deviation is known then the distribution can be extracted from a measurement of fractional area above a threshold. The second method is an extension of this idea in which the distribution is estimated from binned data over a range of light- to moderate rain rates where the effects of attenuation are small. By assuming that the rain rates are lognormally distributed, the estimates at the lower thresholds are used to estimate the parameters of the distribution. From these parameters, an estimate of the mean rain rate follows directly. In the paper we take as the standard of comparison the sample mean of the high-resolution rain rates that have been corrected for attenuation. Data from version 4 of the operational TRMM algorithms are used throughout.

## Description of level-3 PR products

*R*over latitude–longitude grids of 5° × 5° and 0.5° × 0.5°. These sample statistics, denoted by

*E*

_{S}(

*R*|

*R*> 0) and

*s*(

*R*|

*R*> 0), are conditioned on the presence of rain. Using

*N*and

*N*

_{t}to represent, respectively, the number of rain observations and the total number of observations over the space–time domain, then the unconditioned sample mean and standard deviation

*E*

_{S}(

*R*) and

*s*(

*R*) can be written as

*p*of rain is given by

*p*

*N*

*N*

_{t}

*p*

*h*

*N*

*h*

*N*

_{t}

*N*(

*h*) is the number of rain measurements at height

*h*above the earth ellipsoid. For all products discussed in the paper,

*h*is taken to be 2 km and the height dependence is omitted. It is critical to note that the rain rates in 3a25 are derived from the instantaneous high-resolution rain-rate estimates of algorithm 2a25 in which the measured radar reflectivities

*Z*

_{m}are first corrected for attenuation to obtain

*Z,*after which an

*R–Z*relationship is applied (Iguchi et al. 2000). The

*R–Z*relationship is a power law of the form

*R*

*aZ*

^{b}

*a*and

*b*are functions of rain type (stratiform/convective) and height. A potential benefit of the statistical methods discussed here is that they can be used to bypass errors associated with attenuation correction methods such as the surface reference and Hitschfeld–Bordan techniques (Hitschfeld and Bordan 1954; Meneghini et al. 2000). For the statistical algorithms, we assume that we have access only to the

*Z*

_{m}so that the apparent rain rate, using the same

*a*and

*b*parameters, is obtained from

*R*

_{a}

*aZ*

^{b}

_{m}

The primary objective of the paper is to compare the sample mean of the high-resolution estimates of rain rate based on (6) with the mean estimated from the high-resolution estimates from (7) using statistical methods. In effect, the sample mean of (6) is used as the standard of comparison against which the statistical methods are evaluated.

To understand the sampling, note that the PR swath of 215 km consists of 49 fields of view, corresponding to a cross-track scan out to ±17°. An antenna beamwidth of 0.71° yields a horizontal resolution near nadir of about 4.3 km. Completing a scan every 0.6 s, the instrument provides continuous coverage over the swath. A single observation is defined as the range versus radar return power made over one such field of view where the sampling in range is 0.25 km. Because 49/0.6 observations are made each second, approximately 2.12 × 10^{8} observations are made per month. Dividing by the number of 5° × 5° cells that make up the TRMM PR coverage (36°S–36°N) then, on average, 1.84 × 10^{5} observations are collected at each 5° × 5° cell. Because the satellite spends a smaller fraction of time over the low latitudes, the number of observations for cells near the equator is approximately 1.5 × 10^{5}. For a region where the probability of rain is 5%, the number of rain samples will be on the order of 10^{4}. The observations are not spaced uniformly in time, however, but arrive in clusters because only 35–40 partial overpasses of a cell occur each month. A schematic of the measurement is shown in Fig. 1.

The primary product of 3a26 is the set of three parameters of the lognormal rain-rate distribution function over a 5° × 5° × 1 month grid. The retrieval is based on the apparent rain rates given by (7). A secondary product consists of the fractional areas over 25 rain-rate thresholds at each grid point. To account for the effects of attenuation, the parameters of the distributions and the fractional areas are computed for six values of the attenuation proxy variable *Q,* defined below.

## Fractional area–rain-rate relationships

*R*

_{t}, Pr(

*R*>

*R*

_{t}), is equal to the fractional area above

*R*

_{t}, FrA(

*R*

_{t}). The basic principle of the fractional area method is that the sample mean rain rate

*E*

_{S}(

*R*) over a space–time region is linearly related to FrA(

*R*

_{t}) so that

*E*

_{S}

*R*

*η*

_{t}

*R*

*R*

_{t}

*η*

_{t}

*R*

_{t}

*η*

_{t}depends on the choice of

*R*

_{t}. In this paper, FrA(

*R*

_{t}) is defined as the ratio of the area above

*R*

_{t}to the total area, including rain and no-rain regions.

*n*times during a month and that, during the

*k*th overpass, rain rates {

*R*(

*k,*1),

*R*(

*k,*2), . . . ,

*R*(

*k, m*

_{k})} are collected, where

*R*(

*k, j*) represents the attenuation-corrected rain-rate estimate, obtained from (6), at the

*j*th beam position containing rain. Note that

*m*

_{k}is the number of beam positions in the

*k*th overpass at which rain is detected. The conditional sample mean rain rate is computed from

*t*

_{k}is the total number of beam positions (measurements) in the

*k*th overpass, then the unconditioned sample mean rain rate is

*E*

_{S}(

*R*) (mm h

^{−1}) in subsequent plots, we use the estimated monthly rain accumulation (mm month

^{−1}), obtained by multiplying

*E*

_{S}(

*R*) by the number of hours in the month. Equations (11) and (12) are the output products of the 3a25 algorithm and represent the sample mean of the high-resolution, attenuation-corrected rain rates. As noted above, the basic measurement for the statistical methods is taken to be the apparent rain rates, obtained without attenuation compensation, so that the fractional area above a threshold

*R*

_{ti}(where

*i*is a given threshold level corresponding to some rain rate

*R*) is computed from the following:

*R*

_{ti}) is the ratio of the number of beam positions at which the apparent rain rate

*R*

_{a}, given by (7), is greater than

*R*

_{ti}to the total number of beam positions, including raining and nonraining measurements. Here,

*U*is the unit step function, where

*U*(

*x*) is 1 for

*x*> 0 and is 0 otherwise.

*Q*as a proxy for path attenuation:

*c*= 0.2 ln10

*β,*and

*α*and

*β*are the parameters in the power-law relationship between the specific attenuation

*k*and the radar reflectivity factor

*Z*:

*k*=

*αZ*

^{β}. Using the Hitschfeld–Bordan equation (Hitschfeld and Bordan 1954) it can be shown that (Meneghini 1998)

*Q,*defined by (14), approaches 0 and 1, respectively, as the path attenuation goes to zero and infinity. For our case the relevant path attenuation is that along the radar beam from the storm top to a range that intersects a surface 2 km above the ellipsoid. In modifying FrA it is conceptually easier to consider first the fractional area below a threshold or, equivalently, the probability distribution function at

*R*

_{ti}; in particular, we use the following representation for the joint distribution

*F*(

*R*

_{ti},

*Q*

_{j}):

*Q*(

*k, m*) is the value of

*Q*computed from (14) for the

*m*th rain-rate measurement of the

*k*th overpass. If

*Q*

_{j}is taken to be 0.999, then all or almost all of the measurements of

*Q*will be such that

*U*[

*Q*

_{j}−

*Q*(

*k, m*)] = 1 so that

*F*(

*R*

_{ti},

*Q*

_{j}) reduces to (13). On the other hand, if

*Q*

_{j}is chosen as some intermediate value, say

*q,*then values of

*Q*(

*k, m*) that are greater than

*q*are excluded from the sum. The modified expression for the fractional area is then given by

*R*

_{ti}

*Q*

_{j}

*F*

*R*

_{ti}

*Q*

_{j}

Plots of FrA(*R*_{ti}, *Q*_{j} = 0.3), computed from (17), versus *E*_{S}(*R*), computed from (11) and multiplied by 720 h month^{−1}, are shown in Fig. 2 for *R*_{ti} = 0.65, 2.7, and 11.5 mm h^{−1}. Each point represents values of FrA(*R*_{ti}, *Q*_{j} = 0.3) and *E*_{S}(*R*) calculated at a particular 5° × 5° region for measurements made during September 1999. Note that the maximum number of space–time boxes that make up the TRMM coverage is 72 × 16 = 1152; however, regions where the probability of rain is low (<0.05%) have been eliminated so the actual number of data points is fewer. Although all correlations are high, the highest of these, with correlation coefficient of 0.99, is obtained at a rain-rate threshold of 2.7 mm h^{−1}. Kedem and Pavlopoulos (1991) and Short et al. (1993a) have shown empirically and theoretically that the optimum threshold tends to be close to the conditional mean rain rate.

No attempt is made in this paper to apply statistical methods separately to stratiform and convective rain regions (e.g., Atlas et al. 2000). Note, nonetheless, that regions exist where stratiform or convective rains prevail. For the data collected over 5° × 5° boxes in September of 1999 in which more than 10 mm of rain falls, stratiform rain accounts for more than 80% of the total rainfall in nearly 3% of the cases. Convective rain accounts for more than 80% of the total rainfall in over 3% of the cases. Despite the existence of these regions dominated by stratiform or convective rain, the relation between fractional area and mean rainfall is insensitive to the stratiform–convective fraction. This result is in agreement with the results of Cheng and Brown (1993) who conclude that the fractional area method works nearly as well in predominately frontal rain as in convective rain.

Summaries of the results are shown in Fig. 3 where the correlation coefficient, rms error (mm h^{−1}) and the coefficient *η*_{t} (mm h^{−1}), [*E*_{S}(*R*) = *η*_{t} FrA(*R*_{t})], are shown versus the rain-rate threshold *R*_{t} for the first 10 months of 1999. For these results, *Q*_{j} = 0.3. For most months, the maximum correlation and minimum rms error are achieved at a rain-rate threshold of about 3.6 mm h^{−1}. It can be seen, however, that the peak of the correlation (and valley of the rms error) is broad, and thresholds from about 2 to 5 mm h^{−1}. give results that are almost as good. It should be noted that the rms error is computed with respect to the regression line *E*_{S}(*R*) = *η*_{t} FrA(*R*_{t}), with *η*_{t} = Σ_{i} [*E*_{S}(*R*)_{i} FrA_{i}(*R*_{t})]/Σ_{i} [FrA_{i}(*R*_{t}) FrA_{i}(*R*_{t})], where *E*_{S}(*R*)_{i} and FrA_{i}(*R*_{t}) are, respectively, the sample mean of the rain rate and fractional area greater than *R*_{t} at the *i*th space–time box (5° × 5° × 1 month); the summations are taken over all boxes for which the probability of rain is greater than 0.05% For the values plotted in the figure, the units of *η*_{t}, *E*_{S}(*R*), and rms error are millimeters per hour.

Figure 4 shows the effects on the FrA-versus-*E*_{S}(*R*) relationship when the *Q* threshold is chosen to be 0.999. As already noted, at high *Q* thresholds, (17) reduces to the standard expression (13). A comparison of the plots in Figs. 3 and 4 show that the results are approximately the same up to rain-rate thresholds of about 5 mm h^{−1}. However, for larger values of the threshold *R*_{t}, the *E*_{S}(*R*)–FrA(*R*_{t}) relationship for *Q*_{j} = 0.999 degrades, as indicated by the increase in the rms error and decrease in the correlation coefficient. The results indicate that caution is needed in applying an ATI technique to data from an attenuating-wavelength radar, particularly if high rain-rate thresholds are used. For shorter wavelength, where the attenuation is more severe, degradation in the correlation of the *E*_{S}(*R*)–FrA(*R*_{t}) relation will occur at lower thresholds.

## Mean and standard deviation of rain rates

*R*

_{t}) and

*E*

_{S}(

*R*) is closely related to the fact that the mean and standard deviation of the rain rate are also well correlated. The correlation is illustrated in the top plots of Figs. 5 and 6 where scatterplots of the conditional standard deviation versus conditional mean rain rate are shown for June 1999. In the first case (Fig. 5), latitude–longitude boxes of 0.5° × 0.5° are used where

*E*

_{S}(

*R*|

*R*> 0) is computed from (9) and the corresponding sample standard deviation

*s*(

*R*|

*R*> 0) is computed from

*E*

_{S}(

*R*|

*R*> 0) and

*N*(the total number of beam positions at which rain is detected) are defined by (9) and (10), respectively, and where the nonzero high-resolution rain rates

*R*(

*k, m*) are corrected for attenuation. A second case is shown in the top plot of Fig. 6 for latitude–longitude boxes of 5° × 5°. Assuming a linear relationship between these quantities of the form

*s*

*R*

*R*

*γE*

_{S}

*R*

*R*

*γ*= 1.41 and 1.86 for the 0.5° × 0.5° and 5° × 5° region, respectively. Short et al. (1993b), using rain gauge data from Darwin, Australia, and Florida, obtained a slope of 5/3 ≈ 1.67 and also found this to be representative of the GATE radar data. Sauvageot (1994), using disdrometer data from 13 sites ranging from the Tropics to midlatitudes, found a somewhat higher slope of

*E*

_{S}(

*R*) plots, a natural question to ask is how stable the slopes are from month to month. The results are summarized in Fig. 7, where

*γ*has been plotted for the 10 months from January to October 1999 for box sizes of 0.5°, 1°, 2°, 2.5°, 5°, and 10° on a side. Of particular interest here is the mean slope for the rain-rate statistics over the 5° × 5° grid:it was found to be equal to 1.85 and to be nearly the same as the slope for the June 1999 data shown in Fig. 6. The functional behavior of the slope

*γ*with box size, shown in Fig. 7, is closely related to the spatial correlation of the rain: as the box size decreases, rain rates gathered over a single overpass become more highly correlated because they are taken over an area that is comparable to the correlation length of the rain. This higher correlation decreases the sample variance for a given mean rain rate and leads to smaller values of

*γ.*The degree of correlation among the rain samples may explain the differences in the slope parameters derived by Short et al. (1993), Savageot (1994), and those given here. This behavior is also important in assessing how well the statistical methods perform over regions other than the 5° × 5° grids considered here. A discussion of this issue, however, is beyond the scope of the paper.

## Estimate of the lognormal distribution using fractional area

*R*= 0 and with a lognormal distribution for

*R*> 0 (Lopez 1977; Kedem et al. 1990;Atlas et al. 1990; Rosenfeld et al. 1990), the probability distribution function can be written as

*F*

_{LN}

*R*

*p*

*U*

*R*

*p*

*u*

*p*is the probability of rain and erf is the error function

*u*

*R*

*μ*

*σ.*

*σ*as a function of

*γ*:

*σ*

^{2}

*γ*

^{2}

*γ,*equal to 1.41 and 1.85, respectively, for 0.5° × 0.5° and 5° × 5° grids, yield

*σ*values of 1.05 and 1.22, respectively.

*E*

_{S}(

*R*|

*R*> 0),

*s*(

*R*|

*R*> 0)] displayed in the upper panels of Figs. 5 and 6, (23) and (24) yield the corresponding data (

*μ, σ*) that are shown in the lower panel of these figures. In contrast to the approximation (25), the

*σ*data are not constant;nevertheless, the variability is small, particularly at the higher values of

*μ.*If, in accordance with (25),

*σ*is assumed to be constant and if the probability of rain is computed from a ratio of rain counts to total counts, then the only remaining unknown parameter of the distribution is

*μ.*But

*μ*can be estimated from the relationship between FrA(

*R*

_{ti},

*Q*

_{j}) and

*F*(

*R*

_{ti},

*Q*

_{j}). From (17) and (20) we have

*R*

_{ti}

*Q*

_{j}

*F*

_{LN}

*R*

_{ti}

*Q*

_{j}

*p*

*u*

*R*

_{ti}

*μ*

*u*(

*R*

_{ti}) can be expressed as

*u*

*R*

_{ti}

*R*

_{ti}

*μ*

*γ*

^{2}

^{−1/2}

In summary, to estimate the mean monthly rain rate *E*(*R*) over a 5° × 5° grid, *p* is found from the ratio of the number of rain counts to the total number of counts, *σ* is set equal to 1.22, and *μ* is solved numerically from (26) and (27). Here, *E*(*R*) follows from the equation *p* exp(*μ* + *σ*^{2}/2), which is multiplied by 720 to give monthly accumulation. Result of the procedure are shown in Figs. 8 and 9 for *Q*_{j} = 0.3 and 0.999, respectively. Note that the monthly rain accumulation at each 5° × 5° cell is derived in two ways: from the sample mean of the attenuation-corrected rain rates, given by the 3a25 product (ordinate), and from the fractional-area method just described (abscissa). The three plots in each figure correspond to the different sets of values for *μ* derived from (26) and (27) for rain-rate thresholds *R*_{ti} = 0.648, 2.73, and 11.53 mm h^{−1}. Despite high values of the correlation, the fractional biases for *R*_{ti} = 0.648 and 2.73 mm h^{−1} are −23% and −17%, respectively. At *R*_{ti} = 11.53 mm h^{−1}, the absolute bias is smaller, with a positive fractional bias of 8%. The results of Fig. 9 show the effects of including virtually all rain rates (*Q*_{j} = 0.999). The change has negligible effect at thresholds of 0.648 and 2.73 mm h^{−1} but a substantial effect at the 11.53–mm h^{−1} threshold, for which the correlation coefficient decreases from 0.99 to 0.95 and the fractional bias changes from 8% to −32%.

Simulations of the method using sets of lognormally distributed random numbers (with *σ* constant and assumed known) indicate that, under these ideal conditions, the estimated mean is virtually unbiased and insensitive to the choice of rain-rate threshold. This result is in contrast to the large biases in the experimental data just shown that change significantly with the choice of rain-rate threshold. To try to understand the reason for this behavior, recall that the mean and standard deviation are derived from the “true” rain rates but are applied to rain rates without attenuation correction. Because attenuation acts to increase the number of occurrences of low rain rates at the expense of higher rain rates, the distribution of apparent rain rates has “too many” low rain-rate occurrences and “too few” high rain-rate occurrences. This fact leads to underestimates in *μ* at low rain rates and overestimates of *μ* at high rain rates. However, this explanation alone is not sufficient to account for the observed behavior. One other reason arises from the fact that the distributions of *Z*_{m} and *Z* are not identical, even at small values because *Z* is modified to account for nonuniform beamfilling effects (Iguchi et al. 2000). This correction is most important at higher rain rates, but it can affect light rain-rate cases also. Most importantly, the estimation procedure depends on two assumptions: the rain rate is lognormally distributed and one of the parameters of the distribution, *σ,* is a known constant for all sets of observations. In the method described below, the lognormal assumption is retained while the constant *σ* assumption is removed. Moreover, the rain probability *p* is no longer approximated by the ratio of rain to total counts but is derived as one of the parameters of the lognormal fit.

## Estimation of the lognormal distribution using multiple thresholds

A way to circumvent some of the problems with estimating mean rain rate from the fractional area is to use (16) at multiple rain-rate thresholds. A nonlinear least squares fit through the data yields estimates of the parameters *p, μ,* and *σ* at each 5° × 5° × 1 month grid point. Details of the method can be found in Meneghini and Jones (1993) and Meneghini (1998). Kedem et al. (1997) compare the nonlinear least squares fitting with a minimum chi-square estimator; although the chi-square approach yields a smaller asymptotic variance, simulations indicate that the estimators give similar results. Martin (1999) describes and analyzes a censoring method that does not use grouped data. This approach is not presently applicable to the TRMM dataset, however, because the large data volume requires storing the data in the form of histograms rather than as individual data points. It should also be noted that a somewhat similar technique of lognormal fitting to the data has been developed for spaceborne microwave radiometer data (Chang et al. 1993; Chiu et al. 1993; Wilheit et al. 1991).

To apply the method, *F*(*R*_{ti}, *Q*_{j}) is evaluated using (16) at rain-rate thresholds *R*_{ti} = {*R*_{t1}, *R*_{t2}, . . . , *R*_{tn}}, *n* = 25, for six values of *Q*_{j} (0.1, 0.2, 0.3, 0.5, 0.75, and 0.999). Values of *F*(*R*_{ti}, *Q*_{j}) at low and high rain-rate thresholds are eliminated either by the requirement that the distribution increase by some fraction in going from *F*(*R*_{ti}, *Q*_{j}) to *F*[*R*_{t(i+1)}, *Q*_{j}] or that the count value *NF*(*R*_{ti}, *Q*_{j}) increase by a fixed number of counts as the rain-rate threshold is increased. (Recall that *N* is number of rain occurrences in the grid box.) What is left after filtering is a set of *F*(*R*_{ti}, *Q*_{j}) values for each *Q*_{j}. It might appear at first that this filtering of the data is not necessary and that the use of *Q* alone is sufficient. The reason it is needed is to account for cases of nonuniform rain rate in which *Q* is small but *R*_{a} is large. This situation occurs, for example, over shallow, heavy-rain-rate regions where the path attenuation is small (small *Q*) but the apparent rain rate is large. In general, however, attenuation effects will significantly lower the count number at high rain-rate thresholds. Because the count number is not representative of the true rain-rate distribution at high rain rates, it is necessary to perform this secondary filtering. Note that, for path-integrated rain-rate estimation, filtering by *Q* alone is sufficient because the path-integrated rain rate increases monotonically with *Q* (Meneghini 1998).

*F*(

*R*

_{ti},

*Q*

_{j});

*i*=

*I*(1),

*I*(2), . . . ,

*I*(

*m*)}, where

*I*(

*k*+ 1) =

*I*(

*k*) + 1 for

*k*= 1, . . . ,

*m*− 1, and where the upper index

*m*is determined by

*Q*

_{j}and the filtering just mentioned. Consider next the sum of the squared differences between the lognormal and measured distributions at the rain-rate thresholds

*I*(1), . . . ,

*I*(

*m*), keeping

*Q*

_{j}fixed:

From (28), the unknown parameters *p, μ,* and *σ* of the lognormal distribution *F*_{LN} are obtained by a nonlinear least squares estimation (Marquardt 1963; Press et al. 1992). The mean rain rate follows from the expression: *p* exp(*μ* + *σ*^{2}/2). The results for June 1999 are shown in Fig. 10 for *Q*_{j} = 0.3 (top plot) and *Q*_{j} = 0.999 (middle plot). For the *Q*_{j} = 0.3 case, the fractional bias relative to the sample mean of the attenuation-corrected rain rates shown along the abscissa is −2%. By changing *Q*_{j} to 0.999, the fractional bias increases to −14.5%. On the other hand, the correlation coefficient increases from 0.906 to 0.991 when *Q*_{j} is increased from 0.3 to 0.999. The poorer correlation at *Q*_{j} = 0.3 is primarily caused by instabilities in the fitting procedure in a relatively small number of the 5° × 5° × 1 month fits. A way of avoiding these instabilities is discussed below.

The bottom plot of Fig. 10 shows a comparison between the sample means of the high-resolution rain rates with attenuation correction (abscissa) and without (ordinate). Of the three comparisons with the 3a25 product shown, this gives the highest negative fractional bias, −24%. At first glance it might appear that the sample mean of rain rates without attenuation compensation should be approximately the same as the *Q*_{j} = 0.999 results because both use the rain rates without attenuation correction and with little or no filtering. However, in the thresholding approach the data are fit to a lognormal model, which almost always forces a greater fraction of the rain rates to be at a higher rain rate than does the distribution obtained from the sample data. This behavior can be understood by noting that, at higher rain rates, *Z*_{m} deviates increasingly from the lognormal distribution because of attenuation. However, because the lognormal parameters are determined by a least squares fit, the data at light rain rates (where the distribution more closely follows a lognormal one) influences the distribution at the higher rain rates.

A better understanding of the method, and the point just made, can be gained by considering specific distribution fits. Shown in Fig. 11 are four such examples for 5° × 5° regions using data collected during June 1999. The labeling convention for the latitude–longitude boxes is as follows: (lat, long) = (1, 1) represents the region of 40°–35°S, 180°–175°W (lat, long) = (16, 72) represents the region of 35°–40°N, 175°–179.99°E. For each plot, the data input to the lognormal fitting routine for the *Q* = 0.3 and *Q* = 0.999 cases are represented by × and □, respectively. The outputs of the lognormal fitting procedure are represented by the dashed and solid lines for the *Q* = 0.3 and *Q* = 0.999 cases, respectively. Note that the values of the sample mean of the high-resolution rain-rate data without attenuation correction (both multiplied by 720 h month^{−1}) are shown on the legend as *E*(*Z*_{m}–*R*) and can be calculated approximately from the data represented by the symbol □. The mean derived from the lognormal fit to these data, *E*(*Z*_{m}–*R, Q* = 0.999), can be computed from the distribution represented by the solid line and, as noted above, is almost always greater than *E*(*Z*_{m}–*R*). The mean estimated from the distribution given by the dashed line is denoted in the legend by *E*(*Z*_{m}–*R, Q* = 0.3). The 3a25 result, obtained from a sample mean of the attenuation-corrected high-resolution rain-rate estimates, is denoted by *E*(3a25). The top plots are fairly typical in that *E*(*Z*_{m}–*R, Q* = 0.3) and *E*(3a25) are in good agreement whereas *E*(*Z*_{m}–*R, Q* = 0.999) and *E*(*Z*_{m}–*R*) show increasingly negative biases relative to the *E*(3a25) result. These trends have already been depicted in Fig. 10.

The lower plots of Fig. 11 show evidence of instabilities in the estimation. In the lower-right panel, the mean derived from the *Q* = 0.3 lognormal fit (461.2 mm month^{−1}) has a positive bias of 11% with respect to the 3a25 result. Applying a threshold in *Q* has the effect of truncating the distribution (not scaling it), so that *F*(*R*_{ti}, *Q*_{j}) tends not to unity as *R*_{ti} goes to infinity but to the ratio of the number of measurements for which *Q* < *Q*_{j} to the total number of measurements. If values of *F*(*R*_{ti}, *Q*_{j}) in this asymptotic region are incorrectly included in the fitting of the lognormal distribution, an overestimation of the mean occurs. The example shown in the bottom-left panel shows a more severe overestimation problem in which the *Q* = 0.3 lognormal fit (389.9 mm month^{−1}) is positively biased with respect to the 3a25 result by about 95%. Although this kind of instability occurs seldom, it causes the correlation coefficient for the entire dataset to decrease substantially as evidenced by the relatively low value of 0.906 for the *Q* = 0.3 case shown in the upper panel of Fig. 10. Most instabilities of this type are associated with a negative second derivative of the distribution *F* at low rain rates and usually can be eliminated by increasing the value of the threshold *Q.* An example of this type of behavior is shown in the bottom-left plot of Fig. 11 for which *d*^{2}*F*/*dR*^{2}|_{R=0.01} is less than 0 at *Q* = 0.3 but *d*^{2}*F*/*dR*^{2}|_{R=0.01} is greater than 0 at *Q* = 0.999. This result implies that a more stable estimate can be obtained by the use of the following rule: beginning at *Q* = 0.3 (or 0.2), check if *d*^{2}*F*/*dR*^{2}|_{R=0.01} is greater than 0; if the inequality is satisfied, choose the parameters of this distribution to compute the mean. If, however, *d*^{2}*F*/*dR*^{2}|_{R=0.01} is less than 0, increase *Q* until *d*^{2}*F*/*dR*^{2}|_{R=0.01} is greater than 0 at which point the parameters from this distribution are used to calculate the mean rain rate. Results of the procedure are shown in Fig. 12. For the results in the top panel, we select the parameters from the distribution for which *d*^{2}*F*/*dR*^{2}|_{R=0.01} is greater than 0 first occurs, beginning with *Q* = 0.3. For the lower plot, the same procedure is used but beginning with *Q* = 0.2 rather than *Q* = 0.3. In the first case, the correlation coefficient improves to 0.986 from the result of 0.906 shown in Fig. 10; however, the negative fractional bias increases to −6% from the previous −2%. In effect, elimination of the unstable fits improves the correlation but comes at the expense of an increase in the bias. For the lower panel, where the search begins at *Q* = 0.2, there is an obvious positive bias in the thresholding result at high rain rates with an overall fractional bias of +6%. In a theoretical study of the thresholding method using measured drop size distributions to simulate the rain rate, values of *Q* between 0.2 and 0.3 were found to be optimum (Meneghini 1998).

Results for the first 10 months of 1999 are shown in Table 1, in which the percentage biases of four estimates (*Z*_{m}*R*; *Q* = 0.2 with filtering; *Q* = 0.3 with filtering; *Q* = 0.3 without filtering) relative to the sample mean of the attenuation-corrected data (product 3a25) are given. The sample mean of rain rates estimated from the uncorrected reflectivity factors *Z*_{m} consistently yields underestimates of about 25%. Better agreement is obtained by finding the mean value from the estimated distribution: use of *Q* = 0.3 tends to underestimate, with average absolute biases of 3.7% (without filtering) and 6.9% (with filtering), *Q* = 0.2 usually yields overestimates with an average absolute bias of 5.1%. These results suggest that a threshold somewhere between 0.2 and 0.3 should yield the smallest bias with respect to the 3a25 results. This hypothesis will be tested in future modifications of the 3a26 operational algorithm.

## Discussion and summary

In applications of data from an attenuating-wavelength weather radar, it has long been recognized that a measure of total path attenuation is useful in bounding the errors that arise in reconstructing the rain-rate profile. The constraint is usually applied at the highest horizontal resolution of the instrument which, in the case of the TRMM PR, is on the order of 4 km. The need for a constraint also arises in the estimation of rainfall over large space–time regions. The problem is inherently a statistical one, and it is natural to search for regularities in the data that can be used to interpret the large-scale rainfall estimates. There are two useful approximations that can be used toward this end: rain rates obey a lognormal distribution and the conditional sample mean and standard deviation of rain rates are linearly related. These approximations imply that certain properties of the distribution can be inferred without direct measurement. In particular, if the rain rates can be measured accurately at lighter rain rates for which the effects of attenuation are small, then the values of the distribution outside this regime can be inferred from the use of a lognormal assumption. In principle, the technique permits us to estimate rain rates over large space–timescales that, at the resolution of the instrument, may not be accurately measured either because of receiver sensitivity or signal attenuation. The statistical approach, moreover, provides a check on attenuation correction techniques that are applied at the highest instrument resolution.

Application and preliminary assessment of the technique was made using data from the TRMM PR. For the multiple-threshold method, a threshold parameter *Q* of 0.3 yields monthly accumulations that are in good agreement with the sample mean of the high-resolution, attenuation-corrected rain rates. In contrast, the sample mean of the high-resolution estimates, uncorrected for attenuation, yields underestimates on the order of 25%. Although the results indicate that the statistical and“deterministic” methods of attenuation correction give comparable answers, it is premature to claim that the statistical approach can be used to validate the deterministic one. Caution is needed because of several unresolved issues. Despite some support for the choice of *Q* = 0.2 or 0.3 (Meneghini 1998), the rain rates in that study were simulated by means of measured drop size distributions; the variability of *R* with radar range and some other error sources were not included in the model. Another factor is the filtering problem noted in section 6 in which there is uncertainty as to the proper cut-off point at the high rain-rate threshold. The problem exists in another guise at the low end: because of statistical fluctuations in the return power, noise fluctuations can be mistaken for rain so that the count values at the low thresholds tend to be unreliable. Ideally, elimination of the lowest thresholds would have a small effect on the results. This is sometimes not the case, however, and better filtering techniques at both the high and low ends are needed. Although the paper has focused on the mean rain rates and use of the lognormal assumption to recover the high rain rates, an important issue to be explored is how well the estimated distribution represents rain rates below the minimum detectable level.

Despite these drawbacks, it is significant that statistical techniques of this kind may be applicable to spaceborne attenuating-wavelength radar data and that “tuning” of the method is not required to account for seasonal variations, different climatological regimes, or rain types (convective/stratiform). This simplicity is useful when dealing with global rainfall datasets for which adjusting for these factors is a daunting task. It should be kept in mind, however, that most of the results given in the paper were derived over 5° × 5° × 1 month regions. Although good correlations between the sample mean and standard deviation persist over regions as small as 0.5° × 0.5° × 1 month, indicating that statistical techniques can be applied to these smaller space–time regions, this application has not yet been demonstrated.

Any assessment of the accuracy of global rainfall estimates is subject to the criticism that there exists no universally accepted “truth.” In this paper, we have used as validation the results obtained by the sample mean of the high-resolution, attenuation-corrected rain rates. There is no claim that either approach provides the correct answer, although this is certainly the goal, but only that the statistical method, based on a lognormal rain model, gives results similar to those obtained by correcting for attenuation at the resolution of the instrument. Neither approach is independent of errors arising from offsets in the radar constant, an incorrect choice of a *Z–R* relationship, nonuniform beamfilling, or coarse temporal sampling. Nevertheless, the techniques do respond differently to these sources of error, and it may be possible to distinguish and to assess their effects in future studies.

## Acknowledgments

The work is supported in part by Dr. Ramesh Kakar of NASA HQ under the TRMM Science Program.

## REFERENCES

Atlas, D., and T. L. Bell, 1992: The relation of radar to cloud area–time integrals and implications for rain measurements from space.

*Mon. Wea. Rev.,***120,**1997–2008.Atlas, D., D. Rosenfeld, and D. A. Short, 1990: The estimation of convective rainfall by area integrals. 1: The theoretical and empirical basis.

*J. Geophys. Res.,***95,**2153–2160.Atlas, D., C. W. Ulbrich, F. D. Marks, R. A. Black, E. Amitai, P. T. Willis, and C. E. Samsury, 2000: Partitioning tropical oceanic convective and stratiform rains by draft strength.

*J. Geophys. Res.,***105,**2259–2267.Chang, A. T. C., L. S. Chiu, and T. T. Wilheit, 1993: Random errors of oceanic monthly rainfall derived from SSM/I using probability distribution functions.

*Mon. Wea. Rev.,***121,**2351–2354.Cheng, M., and R. Brown, 1993: Estimation of area-average rainfall for frontal rain using the threshold method.

*Quart. J. Roy. Meteor. Soc.,***119,**825–844.Chiu, L. S., 1988: Estimating areal rainfall from rain area.

*Tropical Rainfall Measurements,*J. S. Theon and N. Fugono, Eds., A. Deepak Publishing, 361–367.Chiu, L. S., A. T. C. Chang, and J. Janowiak, 1993: Comparison of monthly rain rates derived from GPI and SSM/I using probability distribution functions.

*J. Appl. Meteor.,***32,**323–334.Donneaud, A. A, P. L. Smith, S. A. Dennis, and S. Sengupta, 1981:A simple method for estimating convective rain over an area.

*Water Resour. Res.,***17,**1676–1682.Donneaud, A. A, S. Ionescu-Niscov, D. L. Priegnitz, and P. L. Smith, 1984: The area–time integral as an indicator for convective rain volumes.

*J. Climate Appl. Meteor.,***23,**555–561.Hitschfeld, W., and J. Bordan, 1954: Errors inherent in the radar measurement of rainfall at attenuating wavelengths.

*J. Atmos. Sci.,***11,**58–67.Iguchi, T., T. Kozu, R. Meneghini, J. Awaka, and K. Okamoto, 2000:Rain-profiling algorithm for the TRMM precipitation radar.

*J. Appl. Meteor.,***39,**2038–2052.Johnson, L. R., P. L. Smith, T. H. Vonder Harr, and D. Reinke, 1994:The relationship between area–time integrals determined from satellite infrared data by means of a fixed-threshold approach and convective rainfall volumes.

*Mon. Wea. Rev.,***122,**440–448.Kedem, B., and H. Pavlopoulos, 1991: On the threshold method for rainfall estimation: Choosing the optimal threshold level.

*J. Amer. Stat. Assoc.,***86,**626–633.Kedem, B., L. S. Chiu, and G. R. North, 1990: Estimation of mean rain rate: Applications to satellite observations.

*J. Geophys. Res.,***95,**1965–1972.Kedem, B., R. Pfeiffer, and D. A. Short, 1997: Variability of space–time mean rain rate.

*J. Appl. Meteor.,***36,**443–451.Lopez, R. E., 1977: The lognormal distribution and cumulus cloud populations.

*Mon. Wea. Rev.,***105,**865–872.Lopez, R. E., D. Atlas, D. Rosenfeld, J. L. Thomas, D. O. Blanchard, and R. E. Holle, 1989: Estimation of areal rainfall using the radar echo area time integral.

*J. Appl. Meteor.,***28,**1162–1174.Marquardt, D. W., 1963: An algorithm for least-squares estimation of nonlinear parameters.

*J. Soc. Ind. Appl. Math.,***11,**431–441.Martin, D., 1999: Estimation of mean rain rate through censoring.

*J. Appl. Meteor.,***38,**797–805.Meneghini, R., 1998: Application of a threshold method to airborne–spaceborne attenuating-wavelength radar for the estimation of space–time rain-rate statistics.

*J. Appl. Meteor.,***37,**924–938.Meneghini, R., and J. A. Jones, 1993: An approach to estimate the area rain-rate distribution from spaceborne radar by the use of multiple thresholds.

*J. Appl. Meteor.,***32,**386–398.Meneghini, R., T. Iguchi, T. Kozu, K. Okamoto, L. Liao, J. A. Jones, and J. Kwiatkowski, 2000: Use of the surface reference technique for path attenuation estimates from the TRMM precipitation radar.

*J. Appl. Meteor.,***39,**2053–2070.Morrissey, M. L., 1994: The effect of data resolution on the area threshold method.

*J. Appl. Meteor.,***33,**1263–1270.Morrissey, M. L., W. F. Krajewski, and M. J. McPhaden, 1994: Estimating rainfall in the Tropics using the fractional time raining.

*J. Appl. Meteor.,***33,**387–393.Oki, R., and A. Sumi, 1994: Sampling simulation of TRMM rainfall estimation using radar–AMeDAS composites.

*J. Appl. Meteor.,***33,**1597–1608.Press, W. H., S. Teukolsky, T. Vetterling, and B. P. Flannery, 1992:

*Numerical Recipes in FORTRAN.*2d ed. Cambridge University Press, 963 pp.Rosenfeld, D., D. Atlas, and D. A. Short, 1990: The estimation of convective rainfall by area-integrals. Part 2: The height–area rainfall threshold (HART) method.

*J. Geophys. Res.,***95,**2161–2176.Sauvageot, H., 1994: The probability density function of rain rate and the estimation of rainfall by area integrals.

*J. Appl. Meteor.,***33,**1255–1262.Sauvageot, H., F. Mesnard, and R. S. Tenorio, 1999: The relation between the area-average rain rate and the rain cell size distribution parameters.

*J. Atmos. Sci.,***56,**57–70.Shimizu, K., D. A. Short, and B. Kedem, 1993: Single- and double-threshold methods for estimating the variance of area rain rate.

*J. Meteor. Soc. Japan,***71,**673–683.Short, D. A., K. Shimizu, and B. Kedem, 1993a: Optimal threshold for the estimation of area rain-rate moments by the threshold method.

*J. Appl. Meteor.,***32,**182–192.Short, D. A., D. B. Wolff, D. Rosenfeld, and D. Atlas, 1993b: A study of the threshold method utilizing rain gauge data.

*J. Appl. Meteor.,***32,**1379–1387.Wilheit, T. T., A. Chang, and L. Chiu, 1991: Retrieval of monthly rainfall indices from microwave radiometric measurements using probability distribution functions.

*J. Atmos. Oceanic Technol.,***8,**118–136.

Percentage bias of monthly mean rainfall (mm month^{−1}) relative to the 3a25 product