## Introduction

Satellites have a unique ability of observing rainfall over a large area at a high spatial resolution but discretely in time. The temporal gaps in the observations coupled with rainfall space–time variability cause temporal sampling error in the inferred space–time-averaged rainfall statistics. The effects of the sampling error in the mean rain-rate estimation has been studied by several researchers (e.g., Laughlin 1981; North and Nakamoto 1989; Salby and Callaghan 1997; Bell and Kundu 2000; Steiner et al. 2003; Gebremichael and Krajewski 2004). However, not much work has been done to assess the impact of sampling error on inferred spatial statistics.

The fundamental question we address in this study is, Can the rainfall observations obtained from infrequent observations, typical of satellites, yield meaningful estimates of the spatial statistics of time-averaged rainfall? We used the following spatial statistics: properties of the spatial probability distribution function, spatial scaling exponents, and cross correlations. The research presented here attempts to quantify the uncertainty of spatial statistics derived from infrequent observations. The uncertainty analysis is based on the Monte Carlo resampling technique and is expressed through the bias and the root-mean-square difference.

We used 5 yr of 15-min time series data to examine the inferred spatial statistics from a variety of satellite sensor scenarios: sampling intervals ranging from 3 h [typical of the proposed Global Precipitation Mission (GPM) satellite] to 24 h [typical of the Special Sensor Microwave Imager (SSM/I) satellite], and spatial resolutions ranging from 4 km [e.g., the Tropical Rainfall Measuring Mission (TRMM) precipitation radar resolution] to 32 km (e.g., the TRMM Microwave Imager 19.3-GHz channel resolution). These scales are mainly relevant in hydrological applications. Errors in inferred statistics are potentially sensitive to the time scales analyzed, as errors in rainfall amounts may cancel out when averaged over longer periods. The analysis here thus considers rainfall-averaging periods ranging from 1 day to 1 month.

Our approach involves data-based Monte Carlo resampling experiments. Resampling uses a dense observation time series and divides it into sample series that would have been obtained with less frequent sampling. This technique has been used in various studies to estimate the uncertainty of satellite-derived rainfall estimates (e.g., Oki and Sumi 1994; Steiner 1996; Gebremichael and Krajewski 2004). In this approach the rainfall estimate over a period of 1 day or longer obtained with the dense observation time series (i.e., 15-min resolution) is assumed to represent rainfall. Initial validation assessment of this dataset by Nelson et al. (2003a) indicates that the difference between such an estimate and the corresponding true rainfall is randomly scattered around zero (i.e., negligible bias). Hence, in this work, we assume that such estimates give a realistic or plausible (but not necessarily true) distribution of the rainfall field. The results of this study are therefore valid as far as the statistical properties of rainfall agree with those obtained from this dataset. Our study examines only cases of satellite overpasses at regular time intervals making flush (100% coverage at each overpass) visits. In this sense, our results represent the best-case scenario.

## Data and method

### Rainfall data

Our dataset consists of 5 yr (1996–2000) of rainfall estimates at a resolution of 4 km × 4 km and 15 min. This dataset was developed as part of the Global Water and Energy Cycle Continental-Scale International Project (GCIP) and is available for the entire Mississippi River basin (Nelson et al. 2003a, b). For convenience, we refer to it hereinafter as the MRB dataset. The MRB dataset was constructed from about 50 Weather Surveillance Radar-1988 Doppler (WSR-88D) radars of the Next-Generation Weather Radar (NEXRAD) network located across the basin. We used a subset of this dataset covering a square with side length of 512 km (Fig. 1), which covers 128 × 128 pixels. This region is far from the mountainous areas where there are beam blockage problems. Also, we limited this study to warm wet seasons from May through September, during which the MRB estimates are expected to perform better (Nelson et al. 2003a). The statistics of the dataset are given in Gebremichael and Krajewski (2004).

### Spatial statistics

Let the rainfall process *R _{AT}*(

*j, k*) be the random process field,

*R*being the rainfall intensity averaged over an area of

*A*=

*L*×

*L*and a period of

*T*at the grid indexed by (

*j, k*). We used the following statistics to characterize the spatial variability of

*R*(

_{AT}*j, k*): properties of spatial distribution function (mean, standard deviation, coefficient of variation) and multiscaling exponents. We also used cross correlations (Pearson correlation coefficient and Spearman correlation coefficient) to assess the agreement between the estimated and the reference

*R*(

_{AT}*j, k*) fields. The Pearson correlation is a measure of the degree of linear dependence, whereas the Spearman correlation is a measure of the monotonic dependence. The mathematical definitions of the properties of the distribution function and the correlations are available in most statistics books. Below we give the description, meaning, and interpretation of the multiscaling exponents.

*L*

_{0}. Consider a two-dimensional (

*d*= 2) region with dimensions

*L*

_{0}×

*L*

_{0}. The region is successively divided into

*b*equal parts (

*b*= 2

*d*) at each step, and the

*i*th subregion after

*n*levels of subdivision is denoted by Δ

*. At the first level, the region is subdivided into*

^{i}_{n}*b*= 4 subregions denoted by Δ

^{i}

_{1},

*i*= 1, 2, . . . , 4. At the second level, each of the above subregions is further subdivided into

*b*= 4 subregions, which are denoted by Δ

^{i}

_{2},

*i*= 1, 2, . . . , 16, for a total of

*b*

^{2}= 16 subregions. At the

*n*th level, we have a total of

*b*subregions. Denoting the side length at the

^{n}*n*th level as

*L*the scale factor at level

_{n}*n*is given byFor the subregion Δ

*, denote the*

^{i}_{n}*volume*of water falling in this subregion as

*μ*(Δ

*).*

^{i}_{n}*q*is the moment order. The scaling analysis in space can be performed by investigating the behavior of spatial moments (2) for different spatial scales

*λ*. The rainfall intensity is said to exhibit spatial scale invariance at moment order

_{n}*q*if the following relationship holds:in the limit as

*n*goes to infinity. So for scale invariance to hold, the parameters

*τ*(

*q*), called as (multi)scaling parameters, should not depend on the spatial scale

*λ*. This presupposes the existence of a finite scaling range between two scales referred to here as the smallest scale (

_{n}*L*

_{min}) and the largest scale (

*L*

_{max}). This approach enables us to estimate the scaling parameters from a single scene.

Our interest is in the scaling function *τ*(*q*) and, in particular, its parameters *τ*(0), *τ*(2), and *τ*(3). The intermittence scaling parameter *τ*(0) is the fractal dimension of the support of *μ* and measures the rate of growth of the fraction of the rainy areas with scale (Hentschel and Procaccia 1983). The second-order moment scaling parameter *τ*(2) measures the variability (in the second-order sense) of positive rain rate with scale within the rainy areas.

Estimation begins with deriving rainfall maps at different spatial scales. From each scene of data, we estimated *τ*(*q*) as a slope of the regression equation “log*M _{n}*(

*q*) versus −log

*λ*” obtained by logarithmically transforming (3) and evenly weighted least squares regression.

_{n}To get a feel for the scaling parameter values, consider the extreme cases of the single rainy pixel (i.e., only one rainy pixel at all scales) and the uniform measure (i.e., all pixels at all scales are rainy, and all pixels at a given scale receive exactly the same amount of rainfall). At the largest scale, *M*_{0}(*q* = 0) = *c*, where *c* is the proportionality constant in (3). If there is any rain, *c* = 1, because there is only one box at that scale. Consider the minimum and maximum values of *τ*(0). If *τ*(0) = 0, then *M _{n}*(0) = 1 at all scales, and so there is a single box with rain at each scale. If

*τ*(0) = 2, then

*M*(0) =

_{n}*λ*

^{−2}

_{n}. Notice that

*λ*

^{−2}

_{n}also represents the number of boxes at scale

*λ*so that

_{n}*τ*(0) = 2 corresponds to rain everywhere. So

*τ*(0) has the range 0 ≤

*τ*(0) ≤ 2, with increasing

*τ*(0) indicating increasing rainy areas. Consider now

*τ*(2);

*τ*(2) = 0 implies the single-rainy-pixel case and

*τ*(2) = −2 implies the uniform-rain-field case. In general for

*τ*(

*q*),

*q*> 1,

*τ*(

*q*) is bounded from above by zero, and this case represents the strongest possible intensities at each scale. The more negative

*τ*(

*q*) becomes, the less intense the rain gets at each smaller scale.

### Method

We used a Monte Carlo simulation technique to determine the effects of temporal sampling errors on inferred rainfall spatial statistics. There are five steps involved in this process:

- generation of one “true” or ensemble space–time rainfall field
*R*(_{AT}*j, k*), - generation of one sample space–time rainfall field
*R̂*(_{AT}*j*,*k*), - calculation of the spatial statistics from each sample and ensemble rainfall field,
- repeating the preceding steps 5000 times, and
- comparison of the spatial statistics derived from sample and ensemble rainfall fields.

Let us discuss steps 1 and 2 in more detail. The original rainfall data averaged over spatial resolution *A* at time *t _{i}* are assumed to represent the true 15-min rainfall

*S*(

_{A}*t*). In fact, this assumption is not critical for the results of our study as long as the estimates represent plausible true rainfall (in the statistical sense). Consider the set

_{i}*R*= {

_{A}*S*(

_{A}*t*

_{1}),

*S*(

_{A}*t*

_{2}), . . . ,

*S*(

_{A}*t*)} in which

_{n}*S*(

_{A}*t*

_{1}) refers to the true rainfall at the first 15-min period,

*S*(

_{A}*t*

_{2}) to that at the second 15-min period, and so on, and

*n*refers to the total number of 15-min periods within the averaging period

*T*. For

*T*= 1 day,

*n*= 96; for

*T*= 1 month,

*n*varies between 2688 and 2976 depending on the month. If we use all of the elements of

*R*, we obtain the rainfall averaged over area

_{A}*A*and period

*T*; that is, we obtain

*R*. If we use only a subset of

_{A}*R*as per the desired sampling interval Δ

_{A}*t*we obtain

*R̂*, an estimate of

_{AT}*R*. After combining all months, we applied the moving-block bootstrap resampling technique (Kunsch 1989), described in Gebremichael and Krajewski (2004), to obtain one realization of {

_{AT}*R̂*,

_{AT}*R*}. Spatial statistics calculated based on

_{AT}*R*(

_{AT}*j, k*) were considered as the true values and were taken as ensemble quantities. Spatial statistics calculated based on

*R̂*(

_{AT}*j*,

*k*) were considered as estimates and were taken as sample quantities.

*R*(

_{AT}*j, k*) and

*R̂*(

_{AT}*j*,

*k*). Denote the spatial statistics calculated from

*R*(

_{AT}*j, k*) as 〈var〉

*and denote those calculated from*

_{e,i}*R̂*(

_{AT}*j*,

*k*) as 〈var〉

*, where the subscripts*

_{s,i}*e*and

*s*refer to “ensemble” and “sample” fields, and the subscript

*i*refers to the

*i*th generation, where

*i*goes from 1 to 5000. The error due to the temporal sampling can be characterized by its relative error, calculated as

The variability of 〈var〉* _{e,i}* is a result of the rainfall temporal variability (i.e., different bootstraps give rise to different statistics). The variability of the estimator 〈var〉

*at a given sampling interval is a result of the rainfall temporal variability as well as the randomness in the sampling times. The accuracy of the estimator 〈var〉*

_{s,i}*was measured by the relative bias and the relative standard error of all possible 〈var〉*

_{s,i}*values. We define the relative bias as the mean of*

_{s,i}*ε*

_{rel}and the relative standard error as the standard deviation of

*ε*

_{rel}. A good estimator should have low bias and small standard error.

We considered four sampling intervals, three temporal periods, and two spatial resolutions. The sampling intervals Δ*t* are 3, 6, 12, and 24 h. The sampling times are randomized. The temporal periods *T* are 1 day, 5 days, and 1 month. The spatial resolutions *L* are squares with side lengths of 4 and 32 km. In steps 1 and 2 above, the 15-min rainfall for the grid of 32 km × 32 km was calculated as the arithmetic mean of 64 pixels within the grid.

## Results and discussion

In Fig. 2 (top panels) we present the relative bias and standard error of the spatial mean estimators due to temporal sampling for four different sampling intervals, three time periods, and two spatial resolutions. The performance of the mean estimators does not depend on the spatial resolution, as expected. The mean estimators have negligible biases at all sampling intervals for rainfall averaged over a period of 5 days or 1 month. On the contrary, the estimators overestimate daily rainfall by about 45% at Δ*t* = 12 h and 134% at Δ*t* = 24 h. The figure also shows a sharp increase in the standard error between the estimators for the daily rainfall and for the longer periods of rainfall. Going from 3-h sampling intervals to 24-h sampling intervals causes the relative standard error to increase from 0.44 to 4.2; for the monthly mean estimators the corresponding value increases from 0.04 to 0.40. It is worthwhile to compare these results with estimates obtained from the empirical equation proposed by Gebremichael and Krajewski (2004). For a mean rain rate of 0.1375 mm h^{−1} (calculated over the warm season), (8) of Gebremichael and Krajewski (2004) yields standard error estimates of 0.01 at Δ*t* = 12 h and 0.41 at Δ*t* = 24 h, which are close to the values we obtained in this study.

Figure 2 (middle panels) shows the relative bias and standard error of the spatial standard deviation estimators. At a spatial resolution of 4 km, the bias in the standard deviation estimate ranges from 50% (85%) at Δ*t* = 3 h to 300% (500%) at Δ*t* = 24 h for monthly (daily) rainfall. Better temporal sampling produces average fields that are “less noisy” spatially, as is often pointed out by groups that attempt to improve satellite maps of rainfall by merging the estimates of many satellites. This phenomenon is attributed to the basic statistical fact that the variance of the averages decreases with increasing the number of samples. The effect of the spatial resolution on the performance of the standard deviation estimators is generally small. Figure 2 (bottom panels) shows the performance of the coefficient-of-variation (CV) estimators. The bias and standard error of the CV estimator increase with increasing sampling interval and/or decreasing averaging period. This pattern is consistent with that obtained for the standard-deviation estimators, as expected.

The correlation between *R̂ _{AT}*(

*j*,

*k*) and

*R*(

_{AT}*j, k*) on the basis of a pixel-by-pixel comparison for the entire gridded spatial domain is another measure of assessing the effect of sampling on the spatial distribution of rainfall. In Fig. 3 we show the distribution of the Pearson and Spearman correlation between

*R̂*(

_{AT}*j*,

*k*) and

*R*(

_{AT}*j, k*). The two correlation measures are close to each other. The correlations vary little with spatial resolution and averaging period. The correlation decreases rapidly with increasing sampling interval. These results imply that the mean Pearson correlation for monthly rainfall derived from TRMM satellite is very low (about 0.3), and it reaches 0.65 for monthly rainfall derived from GPM satellite.

In Fig. 4 we present the distribution of relative errors in *τ*(*q*) with *q* = 0.0, 2.0, and 3.0. Let us analyze the results at 4-km spatial resolution (left panels in Fig. 4). For monthly rainfall, the bias in *τ*(0) estimator is negligibly small at sampling intervals of 12 h or shorter and is within 10% at Δ*t* = 24 h. For 5-day or shorter averaging periods, *τ*(0) estimators give negative biases at all sampling intervals, implying that *R̂ _{AT}*(

*j*,

*k*) fields have consistently more proportion of dry areas than those obtained from

*R*(

_{AT}*j, k*). The estimators of higher-order moment scaling exponents, that is,

*τ*(2) and

*τ*(3), give negative bias at all sampling intervals and averaging periods, implying that

*R̂*(

_{AT}*j*,

*k*) fields consist of more localized rain events than those obtained from

*R*(

_{AT}*j, k*). The underestimation of

*τ*(

*q*), 0 ≤

*q*≤ 3, is smaller than 40% at all sampling intervals and time scales considered. The bias and standard error in

*τ*(

*q*) generally increase with increasing sampling interval and/or decreasing averaging period.

The results discussed so far were based on the comparison of the ensemble and sample rainfall fields at the same spatial resolution. This approach is reasonable to study the impacts of the temporal sampling on inferred descriptive statistics, because these statistics vary with the spatial resolution. The multiscaling exponents, however, are only a function of the moment order and do not depend on the spatial resolution. Yet the spatial resolution could affect the estimation accuracy of the exponents. Higher-spatial-resolution fields give a larger number of subdivisions, which increases the accuracy of the *τ*(*q*) estimates. In Fig. 4 (right panels), we compare the *τ*(*q*) values estimated from sample fields at 32-km resolution with those estimated from the true fields at 4-km resolution. It is apparent that different-resolution sample fields lead to different performances of the *τ*(*q*) estimators.

## Conclusions

We have evaluated the effect of temporal sampling on inferred rainfall spatial statistics using 5 yr of 15-min radar-based rainfall data over a 512 km × 512 km spatial domain in the central United States. To measure the spatial structure of the rainfall field, we have used the following statistics: moments of spatial rainfall distribution, spatial scaling exponents, and spatial cross correlations between sampled and true rainfall fields. Our results lead to the following conclusions:

- the expected value of the relative error in the mean rain-rate estimate is zero for rainfall averaged over 5 days or longer,
- better temporal sampling produces average fields that are less noisy spatially,
- an increase in the sampling interval causes the sampled rainfall to be increasingly less correlated with the true rainfall map, and
- the spatial scaling exponent estimators for moment orders between 0 and 3 could yield a bias of 40% or less depending on the space–time scale and sampling interval.

The results of this study are valid in so far as the statistical properties of actual rainfall agree with those obtained from this dataset. Without having a good understanding of the statistical structure of radar-based rainfall, it is difficult to verify this assumption (Krajewski et al. 1996; Krajewski and Smith 2002). The sampling times used in this study are randomized, and so they represent rainfall fields obtained from sensors on precessing orbits. Larger errors in these statistics could result for estimates obtained from sun-synchronous sensors like the SSM/I. The result of our analyses provides a basis for understanding the impact of temporal statistics on inferred spatial statistics, which complements our work on the distribution functions of the temporal sampling error (Gebremichael and Krajewski 2005) and the effect of the sampling error on the area-averaged rainfall estimate (Gebremichael and Krajewski 2004).

## Acknowledgments

This research was supported by the NOAA Office of Global Programs through Grant NA57WHO517 to the second author, by NASA through Grant NAG5-9664, and by the NASA Earth System Science Fellowship to the first author. The second author also acknowledges the support of the Rose and Joseph Summers professorship endowment.

## REFERENCES

Bell, T. L., and P. K. Kundu. 2000. Dependence of satellite sampling error on monthly averaged rain rates: Comparison of simple models and recent studies.

*J. Climate*13:449–462.Gebremichael, M., and W. F. Krajewski. 2004. Characterization of the temporal sampling error in space-time-averaged rainfall estimates using parametric and nonparametric approaches.

*J. Geophys. Res.*109.D11110, doi:10.1029/2004JD004509.Gebremichael, M., and W. F. Krajewski. 2005. On the distribution of temporal sampling errors in the area-time averaged rainfall estimates derived from satellites.

*Atmos. Res.*73:243–259.Hentschel, H. G. R., and I. Procaccia. 1983. The infinite number of generalizations dimensions of fractals and strange attractors.

*Physica D*8:435–444.Krajewski, W. F., and J. A. Smith. 2002. Radar hydrology: Rainfall estimation.

*Adv. Water Resour.*25:1387–1394.Krajewski, W. F., , E. N. Anagnostou, , and G. J. Ciach. 1996. Effects of the radar observation process on inferred rainfall statistics.

*J. Geophys. Res.*101:26493–26502.Kunsch, H. R. 1989. The jackknife and the bootstrap for general stationary observations.

*Ann. Stat.*17:1217–1241.Laughlin, C. R. 1981. On the effect of temporal sampling on the observation of mean rainfall.

*Precipitation Measurements from Space: Workshop Report*, D. Atlas and O. W. Thiele, Eds., NASA, D-59–D-66.Nelson, B. R., , W. F. Krajewski, , A. Kruger, , J. A. Smith, , and M. L. Baeck. 2003a. Archival precipitation data set for the Mississippi river basin: Algorithm development.

*J. Geophys. Res.*108.8811, doi:10.1029/2002JD003158.Nelson, B. R., , W. F. Krajewski, , A. Kruger, , J. A. Smith, , and M. L. Baeck. 2003b. Archival precipitation data set for the Mississippi River basin: Development of a GIS-based data browser.

*Comput. Geosci.*29:595–604.North, G. R., and S. Nakamoto. 1989. Formalism for comparing rain estimation designs.

*J. Atmos. Oceanic Technol.*6:985–992.Oki, R., and A. Sumi. 1994. Sampling simulation of TRMM rainfall estimation using radar-AMeDAS composites.

*J. Appl. Meteor.*33:1597–1608.Over, T. M., and V. K. Gupta. 1994. Statistical analysis of mesoscale rainfall: Dependence of a random cascade generator on large-scale forcing.

*J. Appl. Meteor.*33:1526–1542.Over, T. M., and V. K. Gupta. 1996. A space-time theory of mesoscale rainfall using random cascades.

*J. Geophys. Res.*101:26319–26332.Salby, M. L., and P. Callaghan. 1997. Sampling error in climate properties derived from satellite measurements: Consequences of undersampled diurnal variability.

*J. Climate*10:18–36.Steiner, M. 1996. Uncertainty of estimates of monthly areal rainfall for temporally sparse remote observations.

*Water Resour. Res.*32:373–388.Steiner, M., , T. L. Bell, , Y. Zhang, , and E. F. Wood. 2003. Comparison of two methods for estimating the sampling-related uncertainty of satellite rainfall averages based on a large radar dataset.

*J. Climate*16:3759–3778.