## 1. Introduction

Several satellite programs are under way to measure precipitation especially in remote areas such as the tropical oceans (see Simpson et al. 1988; Wilheit et al. 1991;Theon et al. 1992). This paper concerns itself with the ground validation program of the Tropical Rainfall Measuring Mission (TRMM), which was launched in November 1997 (Simpson et al. 1988) and has now generated over a year of data from several sensors. One problem inherent in all such missions is that of ground validation. We present here an analysis of validating the satellite estimations with point gauge measurements. This is a complex comparison because the two sensors are measuring different quantities: 1) the point gauge measures precipitation nearly continuously in time at a point on the surface, while 2) the satellite measures a snapshot in time (actually about a 5–10-min average) of an area average over its field of view (FOV) at the surface (typically 20 km across). In comparing two simultaneous measurements the point gauge may be located at some random position within the FOV. While in the long run these two measurements should agree, there is likely to be a large random (zero mean) difference between the two because of the different space–time sampling configurations. It should be possible, however, by taking enough simultaneous pairs of measurements to compare them and check for bias in the satellite retrieval algorithm.

There are various forms of ground validation for precipitation. One is comparing horizontal precipitation patterns from ground-based or airplane-based radars. This method is very useful in judging the qualitative patterns of the precipitation, but because of the uncertainty in the conversion of radar reflectivity to surface rain rate, it cannot alone answer the question of whether rain rates derived from the satellite are biased. Hence, we turn to a more certain estimate of the rain rate at the ground (the point gauge) but surrender the certainty of matching sampling designs.

We proceed in our purely theoretical study by asking about the size of the random errors incurred in comparing the two designs, satellite and point gauge. We make the null hypothesis that both measurements are correct and see what the distribution of differences is due solely to the differences in sampling design. In principle, we can reduce these kinds of random errors by collecting more and more pairs and averaging. We seek the number of such pairs that would allow us to see a real bias in the satellite measurement if it should occur. The question is will a few such pairs suffice to reveal a significant bias or not. As we shall see in this somewhat idealized study it takes many months of data to reduce the random error enough to detect a bias of 10%. This figure comes from assuming an FOV of about 20 km and an autocorrelation distance of a few kilometers. Larger autocorrelation lengths will lead to smoother fields and therefore a smaller number of pairs to detect bias at the 10% level.

Our approach is to take a hypothetical FOV and subdivide it into *M* subareas, which we call tiles. We then construct a random field for the rain rate, which is somewhat like real rain. The random field is constructed by drawings from a Bernoulli distribution where values in neighboring tiles are independent. This effectively means that the horizontal correlation length is about half a tile width. This model of rain rates is of course highly simplified but provides some insight into the problem since it allows mostly analytical results that can be interpreted easily.

*n*th visit, the satellite measurement

^{n}

_{s}

^{n}

_{g}

*ε*

^{2}

_{1}

^{n}

_{s}

^{n}

_{g}

^{2}

*N*visits is As we can see from (2), the advantage of this strategy is that we can easily add new measurements and thus we can make the error as small as we please (i.e., where

*N*is sufficiently large).

In this paper, the point gauge is used as the ground-truth measurement to validate satellite precipitation retrieval algorithms at the FOV spatial level (typically about 20 km).

Because the probability distribution of real rain has a large nonzero contribution at zero rain rate (usually greater than 90%), many of the visits will lead to (no rain, no rain) measurement pairs or perhaps (no rain, rain) pairs, where the second entry is the FOV average. For this reason, we can consider the following three ground-truth designs based on the point gauge measurement.

- Design 1 uses all visits even though either of the two measurements (gauge, satellite) may have no rain.
- Design 2 throws out all the visits when the FOV average has no rain. Note that when the FOV has no rain, the gauge also has no rain.
- Design 3 throws out all the visits when the gauge has no rain. Note that the FOV average can have rain even though the gauge has no rain.

We use a spatial white noise Bernoulli random field as the rain-rate model. This means that for each tile within the FOV it is either raining at (fixed) rate *r* or it is not raining at all. For a particular overpass of the satellite and for a given tile the probability of rain being nonzero is *p.* While this is a fairly crude as a model of rain fields, simplicity is important here to establish the main principles involved in the problem. We derive the probability density function, ensemble mean, and the variance of gauge measurements for each design. At the same time, we examine the relationship between ground-truth designs proposed here and evaluate each design.

To keep the numbers specific and roughly relevant to TRMM we choose some parameters to have nominal values such as the FOV width to be 20 km. The finest resolution taken for the tiles within the FOV used in simulating the random field is 4 km, which is the same resolution as Global Atmospheric Research Program Atlantic Tropical Experiment (GATE) radar data. This resolution is also consistent with the fact that the satellite actually measures 5–10-min averages of the rain rate. Such time averaging is thought to be roughly equivalent to spatial averaging to about 4 km.

## 2. Definitions

*ψ*(

**r**,

*t*) defined in the

**r**= (

*x, y*) plane and along the time axis

*t.*As a typical experiment we envision a point gauge located at some fixed site

**r**

_{g}. The satellite measurement based on

*R,*which is the FOV for a visit, is where

*R*is the FOV of this visit and

*A*is the area of

*R.*The instantaneous precipitation rate from this gauge measurement is

_{g}

*ψ*

**r**

_{g}

*t*

_{g}and Ψ

_{s}.

**r**

_{g}. The satellite passes over the site, and one of the FOVs in the swath along the ground track covers the gauge. We have two measurements taken from the

*n*th visit,

^{n}

_{s}

^{n}

_{g}

^{n}

_{s}, Ψ

^{n}

_{g}

^{n}

_{s}, Ψ

^{n}

_{g}

^{n}

_{s}

^{n}

_{s}, Ψ

^{n}

_{g}

^{n}

_{g}

_{si}, Ψ

_{gi}), where

*i*= 1, 2, 3 corresponds to the design index. We form the difference between the satellite measurement and ground measurement and call this the error

*ε*

^{n}

_{di}

^{n}

_{si}

^{n}

_{gi}

*n*th visit. The mean-square error, which is usually used as an index of the accuracy in estimating the ground measurement by satellite measurement, is defined by

*ε*

^{2}

_{di}

_{si}

_{gi}

^{2}

The ground-truth design must satisfy two conditions to detect the *retrieval bias* of the retrieval algorithm we want to check. 1) The error *ε*_{di} = Ψ_{si} − Ψ_{gi} must have no bias, that is, 〈*ε*_{di}〉 = 〈Ψ_{si} − Ψ_{gi}〉 = 0. If in addition to a retrieval bias, there is a bias due to the design itself, which we refer to as *design bias,* we could mistakenly attribute an error to the retrieval algorithm that is actually inherent in the design. (In reality both types of bias are likely to be present). 2) The mean-square error 〈*ε*^{2}_{di}

*Y*|

*X*>

*x*〉 denotes the conditional mean of

*Y,*given that

*X*>

*x.*We can also express the mean-square error for design 2 as

*ε*

^{2}

_{d2}〉

_{s2}

_{g2}

^{2}

_{s1}

_{g1}

^{2}

_{s1}

*L*×

*L*km

^{2}FOV into

*M*=

*N*

^{2}tiles to treat the random field effectively as a multivariate vector. Figure 1 shows a schematic diagram of such an FOV of

*L*km on a side. The satellite measurement then can be written as where

*N*=

*L*/

*l*is the number of subdivisions that have been chosen for the partitioning of the FOV and

*ψ*(

*i, j*) represents the area-average rain rate in an

*l*km ×

*l*km grid square, which we call a tile. For the fixed FOV size, if we increase the number of tiles

*M*=

*N*

^{2}, then the resolution (roughly equivalent to autocorrelation length)

*l*will be smaller. Conversely, if we fix the resolution

*l,*the FOV size will be large when the number of tiles

*M*increases. In our study we will present numerical results in which case

*l*is fixed and is given the same spatial averaging (4 km) as the gridded GATE data. As discussed before, the satellite measures the column average of rain rate, which is equivalent to a time average of a few minutes at the surface.

_{g}

*ψ*

*x*

_{g}

*y*

_{g}

*i*

_{g},

*j*

_{g}) is the position of the gauge site. For convenience of notation, we will use

*ψ*(

**r**

_{k}) (

*k*= 1, 2, . . . ,

*M*) and

*ψ*(

**r**

_{g}) instead of

*ψ*(

*i, j*) (

*i, j*= 1, 2, . . . ,

*N*) and

*ψ*(

*i*

_{g},

*j*

_{g}), respectively. The satellite and gauge measurements are then

**r**

_{g}to be randomly located within an FOV. Therefore, we have the uniform distribution as a probability distribution of

**r**

_{g}:

## 3. Ground-truth designs for Bernoulli random field

We use the Bernoulli distribution as the distribution of rain rate to compare the ground-truth designs described above because this distribution describes the no-rain phenomenon of real rain in its simplest possible form. Let *x* of the *M* tiles be raining with rain rate *r,* and in the rest of the *M* − *x* tiles there is no rain. We assume that the probability of rain rate in an individual tile is *p* and one tile is independent of the other. In this section, we present some numerical values for the 4-km resolution (as in GATE radar data). This means that when we increase the number of tiles *M,* the FOV increases. In order to change the tile size for fixed FOV size and to see the effect of the tile size using the results in this section, one needs to adjust the probability of rain *p* and rain rate *r* because it is dependent on the resolution.

### a. The error distributions

*ε*

_{d1}= Ψ

_{s1}− Ψ

_{g1}can be derived as

_{g1}and Ψ

_{s1}conditional on Ψ

_{s1}> 0. Since we assumed that the random field is white noise, the probability that there is rain inside the FOV is

*P*(Ψ

_{s1}> 0) = 1 − (1 −

*p*)

^{M}. Note that when the random field is not white noise, the probability

*P*(Ψ

_{s1}> 0) cannot be written in this form. For convenience of notation, we will use

*P*

_{M}=

*P*(Ψ

_{s1}> 0) = 1 − (1 −

*p*)

^{M}hereafter. The distributions for design 2 are derived as We can derive the error distribution

*P*(

*ε*

_{d2}) =

*P*(

*ε*

_{d1}|Ψ

_{s1}> 0), Design 3 uses only the visits when the gauge has rain. The distributions of ground and satellite measurement for design 3 are The distribution of error

*ε*

_{d3}=

*P*(

*ε*

_{d1}|Ψ

_{s1}> 0) can be derived as The probability density functions for ground-truth designs proposed in this study are in Fig. 2. The 4 mm h

^{−1}rain rate conditional on rain and 20 km × 20 km FOV size (resolution is 4 km and number of tiles

*M*is 25) is used to make Fig. 2. We can see in Fig. 2 that the distribution of the error is bimodal and this phenomenon is even more clear when the probability of rain rate

*p*is 0.5. We also see that the center of the error distribution for design 3 is far from zero. This indicates that the ensemble mean of the error for design 3 is not zero and thus has negative bias (design bias).

### b. Mean of the error

_{s1}〉 =

*rp*and 〈Ψ

_{g1}〉 =

*rp*. Thus, the mean of the error for design 1 is

*ε*

_{d1}

_{s1}

_{g1}

_{s2}〉 =

*rp*/

*P*

_{M}, and the mean of ground measurement is 〈Ψ

_{g2}〉 =

*rp*/

*P*

_{M}for design 2. We thus know that the satellite measurement is also an unbiased estimator of the ground measurement for design 2 because we have

*ε*

_{d2}

_{s2}

_{g2}

*p*is 1, the ground-truth designs in our study are all equivalent. We can see in Eq. (23) that if

*p*= 1, the bias disappears. As discussed in section 1, the satellite measurement should be an unbiased estimator of the ground measurement in order to detect a retrieval bias in the algorithm converting microwave temperature to rain rate. In this sense, we have shown that design 3 cannot be used as the ground-truth design.

### c. Mean-square error for a single visit

In section 3b, we showed that design 3 cannot be used as a ground-truth design, so we henceforth only consider design 1 and design 2 as ground-truth designs and compute the mean-square error for each.

*P*

_{M}= 1 − (1 −

*p*)

^{M}is the probability that the satellite measurement has rain inside the FOV. From Eqs. (25) and (26), we can see that When we compute the mse(d2), we need information about the conditional random field, which is a random field on the condition that the satellite measurement has rain inside the FOV. But (27) gives us the way to compute mse(d2) without using the conditional random field. We can compute mse(d2) if we have mse(d1), which is computed from the random field, and if we know the probability that the satellite measurement has rain

*P*(Ψ

_{s1}> 0) =

*P*

_{M}. We already know that the probability of the satellite measurement having rain is

*P*(Ψ

_{s1}> 0) = 1 − (1 −

*p*)

^{M}for a white noise random field. However, it will not be easy to compute

*P*(Ψ

_{s1}> 0) for a nonwhite noise (spatially) random field.

The values of rmse(d1) and rmse(d2), which are the square root-mean-square errors for design 1 and design 2, are given in Fig. 3. Note that rmse(d1) and rmse(d2) increase if the probability of rain *p* increases where 0 < *p* < 0.5. When the FOV increases, rmse(d1) increases but rmse(d2) decreases. This result derives from the probability that the satellite measurement, if it has rain, will increase if the FOV size increases. When the rain rate *r* increases, rmse(d1) and rmse(d2) increase. In Fig. 4, note that rmse(1) and rmse(2) are almost the same for a rain rate *r* = 4, the probability of rain *p* = 0.1, and typical FOV size (20 km across). The values *r* = 4 and *p* = 0.1 are chosen to match rain-rate statistics from GATE data.

*ε*

^{2}

_{di}

*σ*

^{2}

_{gi}

*W*

^{2}(

*di*) as the dimensionless mean-square error. We choose

*σ*

^{2}

_{gi}

*σ*

^{2}

_{g1}

_{g1}

_{g1}

^{2}

*r*

^{2}

*p*

*p*

*W*

^{2}(

*d*1) and

*W*

^{2}(

*d*2). When the probability of the rain

*p*is 1, the probability

*P*

_{M}= 1 − (1 −

*p*)

^{M}= 1 and thus the

*W*

^{2}(

*d*1) and

*W*

^{2}(

*d*2) are equal. The square roots of

*W*

^{2}(

*d*1) and

*W*

^{2}(

*d*2) are given in Fig. 5. For the values of

*r*= 4,

*p*= 0.1, and FOV size (20 km across), the

*W*

^{2}(

*d*1) and

*W*

^{2}(

*d*2) are almost the same.

### d. Mean-square error for *N* visits

*N*visits is the same as the mean of error for one visit. The variance of average error for

*N*visits is By the central limit theorem (see Wilks 1995 for reference), we also know the distribution of the average error converges to the normal distribution when the number of visits

*N*is sufficiently large, even though the error distribution for a single visit is further from a normal distribution (see Figs. 2 and 6). Figure 6 shows schematically the error distribution of one visit and the average error distribution of

*N*independent visits. From (33), we can easily make the error as small as we want by adding a new measurement.

*N*visits is then Let

*N*

_{1}and

*N*

_{2}be the number of visits to achieve the same tolerance level of detecting the retrieval bias using design 1 and design 2, respectively. We can derive the relationship between

*N*

_{1}and

*N*

_{2}using Using Eqs. (30) and (32), we thus have

When the probability of rain *p* is small (usually less than 10%), the number of visits *N*_{1} and *N*_{2} are almost the same. Remember that *N*_{1} is the number of all visits and *N*_{2} is the number of visits when the satellite has rain.

We have already found that *W*^{2}(*d*1) and *W*^{2}(*d*2) are almost the same for the values *p* = 0.1, *r* = 4, and FOV size 20 km. For these values, the number of visits to achieve 10% of the standard deviation of the gauge measurement is *N*_{1} = 96 for design 1 and is *N*_{2} = 97 for design 2. Since the probability that the satellite measurement has rain *P*_{M} = 1 − (1 − *p*)^{M} is almost 1 for *p* = 0.1, we obtained almost the same number of visits to achieve 10% tolerance level.

## 4. Summary and conclusions

In this paper we have considered three ground-truth designs based on the point gauge measurement to validate the satellite measurement. Design 1 uses data pairs from all visits. Design 2 uses visits only when the FOV average has rain. Design 3 uses the visits only when the gauge has rain. The ground-truth designs we proposed in this paper are based upon properties encountered with real rain.

We modeled the rain-rate field by subdividing the FOV into *M* tiles and took the random rain variable for each tile to be rain at a fixed rate *r* with probability *p,* or no rain to be 1 − *p.* The number of tiles in a fixed FOV size corresponds in some sense to the autocorrelation length of the rain field. Once this model is adopted all comparisons between the point gauge (randomly located in the FOV) and the FOV average corresponding to the satellite measurement can be calculated.

We found that the error distribution is bimodal for design 1 and design 2 but not for design 3. We have shown that the satellite measurement is an unbiased estimator of the gauge measurement for design 1 and design 2. However, design 3 has a serious disadvantage as the ground-truth design because it exhibits a large design bias.

The efficiencies of design 1 and design 2 are indexed by the mean-square error (difference) between the satellite and gauge estimates. We have shown that the mean-square error of design 2 is equal to the mean-square error of design 1 divided by the probability that the satellite measurement has rain inside the FOV. This fact gives us a way to compute the mean-square error of design 2 without using the conditional random field. The dimensionless mean-square errors of design 1 and design 2 are almost the same for the typical rain statistics.

Our major finding is that for an FOV width of 20 km and autocorrelation length of a few kilometers, the number of measurement pairs (containing rain) necessary to distinguish a bias of 10% is of the order of 100. This means many months of data will need to be taken for a single-point gauge to detect such a bias with any confidence.

Our model of rain is too simplified and needs to be made more realistic at the cost of clarity. The Bernoulli field is too uniformly “speckly” compared to real rain, which consists of “patches” of speckly areas. We are presently looking at fields that have this more realistic property.

The first author (E.H.) wishes to thank the Korea Research Foundation for its support. The second author (G.R.N.) thanks the NASA TRMM program for its support.

## REFERENCES

Bell, T. L., A. Abdullah, R. L. Martin, and G. R. North, 1990: Sampling error for satellite-derived tropical rainfall: Monte Carlo study using a space–time stochastic model.

*J. Geophys. Res.,***95,**2195–2205.Ha, E., and G. R. North, 1994: Use of multiple gauges and microwave attenuation of precipitation for satellite verification.

*J. Atmos. Oceanic Technol.,***11,**629–636.North, G. R., J. B. Valdes, E. Ha, and S. P. Shen, 1994: The ground-truth problem for satellite estimates of rain rate.

*J. Atmos. Oceanic Technol.,***11,**1035–1041.Simpson, J., R. F. Adler, and G. R. North, 1988: A proposed rainfall measuring mission (TRMM) satellite.

*Bull. Amer. Meter. Soc.,***69,**278–295.Theon, J. S., T. Matsuno, T. Sakata, and N. Fugono, Eds., 1992:

*The Global Role of Tropical Rainfall.*A. Deepak, 280 pp.Thiele, O. W., 1992: Ground truth for rain measurement from space.

*The Global Role of Tropical Rainfall,*J. S. Theon, T. Matsuno, T. Sakata, and N. Fugono, Eds., A. Deepak, 245–260.Wilheit, T. T., and L. S. Chiu, 1991: Retrieval of monthly rainfall indices from microwave radiometric measurements using probability distribution functions.

*J. Atmos. Oceanic Technol.,***8,**118–136.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences.*Academic Press, 623 pp.