Introduction
Detailed estimation of the error associated with precipitation estimates is a difficult problem requiring statistical information not generally provided in current global precipitation datasets (North et al. 1991). Nonetheless, some estimate of error is required for both the creation and use of precipitation estimates. From the producers’ perspective, estimates of error for each independent dataset, whether from satellites or rain gauges, provide key information for combining the datasets. This was certainly true for development of the Satellite-Gauge-Model (SGM) combination technique (Huffman et al. 1995) and production of the Global Precipitation Climatology Project (GPCP) Version 1 Combined Precipitation Data Set (Huffman et al. 1997). From the users’ perspective, error estimates allow inferences about the reliability (or even advisability) of comparisons between datasets. Furthermore, both producers and users are best served when the error estimates are spatially and temporally varying, rather than being quoted as single dataset-average values.
The random term alone is considered in this paper for several reasons. First, the random error is frequently the dominant term for the sparsely sampled datasets that Huffman et al. (1995, 1997) and many other papers treat. Second, precipitation estimation techniques usually strive to generate unbiased estimates (“small” b). Third, even when b is not small, its analysis is relatively well understood, for example, by computing long-term or large-area averages in which the random errors can be assumed to average out. Note, however, that the lack of calibration data frequently prevents such an analysis for b.
In the following sections the random error is characterized by deriving a functional form for rms fluctuations in instantaneous estimates, and then for rms error in space–time-average estimates. A simplification is introduced to accommodate limitations in the datasets at hand. Finally, the practical application of this work is illustrated with the various precipitation datasets used in the GPCP Version 1 Dataset. The illustration is aided by defining a “quality index” that simplifies the comparison of the rms errors between and within datasets.
A functional representation of rms error
How does H vary? One might expect H to be well behaved because probability distributions of instantaneous precipitation rate (accumulated over a period such as a month) tend to look similar (i.e., lognormal; Kedem et al. 1990). In fact, the result is much stronger. Analytic examples (shown in Fig. 1 and summarized in Table 1) demonstrate quite stable behavior over a wide range of probability distributions. Turning to a more realistic case, the Goddard Scattering Algorithm, Version 2 (GSCAT2, Adler et al. 1994) was used to generate the necessary monthly variables from Special Sensor Microwave/Imager (SSM/I) data in August 1987. The SSM/I is a low-orbit, multichannel passive microwave radiometer. GSCAT2 uses all the SSM/I channels to identify nonprecipitating pixels and uses the scattering signal in the highest frequency channel to infer precipitation rate. The set of pixel-by-pixel GSCAT2 precipitation estimates was used to compute all the terms in (9) on a global 2.5° latitude × 2.5° longitude grid, allowing H to be computed for each grid box. (Note that H could equally well be computed on a routine basis for other datasets if σi and p are computed in addition to r̄.) Figure 2 shows that H is tightly clustered around 1.5 at high values of r̄, and tends toward an average of 1.3 at lower values of r̄, with most points ranging over 1.2–1.7. Figure 3 shows that deviations from 1.5 tend to be regionally coherent. This result is equally true in other months (not shown). To summarize, analytic and GSCAT2 results suggest that a well-behaved H with modest variationaccording to climatic regime is a fundamental property of the “true” rain field. One should expect estimated precipitation datasets to show the same behavior, providing insight into σi even if the datasets lack explicit information on σi.
The expression in (11) is consistent with Eq. 30 in Bell et al. (1990) if the 1 in the parentheses is neglected (as it will be below) and H absorbs the proportionality constant. The current result clarifies the basis for considering H to be nearly constant as Bell et al. (1990) asserted.
An approximation to rms error
The exact representation of σr in (11) presents considerable practical problems when applied to the global monthly precipitation datasets ordinarily available. In particular, this is true in generating error estimates for use in the SGM technique. First, one usually lacks the σi to calculate H directly. The discussion in section 2 implies that H can be approximated as a global constant with the value 1.5. As long as the GSCAT2 results are somewhat representative, this choice yields rms error inaccuracies of about +12% to −6% at low values of r̄ (for H ranging over 1.2–1.7), and considerably better accuracies at higher values of r̄.
A third practical issue is that r̄ is most easily estimated by the average over the finite set E of precipitation estimates. However, (11) produces zero σr any time r̄ is zero, even though r̄ has some uncertainty. This effect is accounted for by adding the constant S to the leading r̄, where S is a measure of the standard deviation in calibration when r̄ = 0.
The fourth problem is in estimating NI on a global basis, which requires information on the (spatially and temporally varying) correlation time and distances of the algorithm-estimated precipitation field. These values are affected both by the correlations in the true precipitation and by correlations in the algorithmic errors. For simplicity it is assumed that NI is related to the reported number of samples, N, by the constant multiplier I as NI = IN. Thereby it is assumed both that the space–time distribution of samples within a dataset always yields about the same degree of oversampling (if any), and that the correlation distances and times within that dataset are relatively constant. Both of these conditions are severe enough that the author believes it mandatory to develop a better model of NI in future research.
Global constants
The individual GPCP datasets for which rms errors are computed in this paper are the SSM/I scattering estimates over land, SSM/Iemission estimates over water, microwave-adjusted infrared estimates [the adjusted GOES precipitation index (AGPI)], and the Global Precipitation Climatology Centre (GPCC) gauge analysis (see Huffman et al. 1997 for descriptions). As shown in Table 2 under “units of N,” the datasets have rather different measures of sampling. For the purposes of computing the GPCP Version 1 Dataset several of the units were modified (shown as “modified units of N” in Table 2). The number-of-samples variable for each SSM/I technique was converted to an approximate number of 55 km × 55 km boxes that would be fully covered by the original number of samples that was accumulated in the 2.5° × 2.5° grid box over the month. The number-of-samples variable for the AGPI was the count of infrared images that covered the 2.5° × 2.5° grid box during the month. The number-of-samples variable for the GPCC rain gauge analysis started as the number of active rain gauges in the 2.5° × 2.5° grid box during the month. Then the number-of-samples variable was modified because the SPHEREMAP technique, which GPCC uses for analysis, considers data outside the box. Following Huffman et al. (1995) this extra information was roughly modeled by adding a fraction (0.125) of the number of surrounding boxes containing data to the number of gauges in the box.
The scarcity of dense rain gauge networks around the globe prevents one from simply computing the global H/I and S for each data source. Rather, calibration must be done with a few datasets that are believed representative. The primary source for calibration in this study is the rain gauge analysis produced by the GPCP’s SRDC, located at the National Climatic Data Center (NCDC). The SRDC uses the PRISM technique (Daly et al. 1994) to analyze 2.5° × 2.5° grid cells that have reasonably dense rain gauge coverage (around 50 per 2.5° box). The SRDC data are drawn from the years 1987–91 for seven “test sites” (most with more than one 2.5° × 2.5° grid box) around the globe. The Morrissey et al. (1995) collection of tropical Pacific atoll gauge data is also used as a source of calibration. The atolls provide sparse coverage (typically 1–4 stations per 2.5° × 2.5° grid box) throughout the years 1987–95. These data are believed to be approximately representative of open ocean conditions in the tropical Pacific. The atoll reports are analyzed onto the GPCP’s 2.5° × 2.5° grid using simple grid-box averages for this study. Finally, the SSM/I and AGPI estimates are compared to the GPCC gauge analysis as a consistency check.
The distribution of stations contributing to the GPCC rain gauge analysis in a 2.5° × 2.5° grid box varies wildy from place to place, so the author chose to compute the constants for the GPCC gauge analysis using all available months for all of the SRDC test sites that had good coverage at the time of this study, namely, the southeastern United States (SEUS), Canada, Honduras, Thailand, and Australia. On the other hand, satellite data coverage tends to be fairly uniform in tropical and subtropical regions, so the satellite dataset constants were computed for SRDC test sites and months that were judged to best represent the dominant climatic regime. The summertime SEUS was considered to have the best chance of representing the convective regime typical of the moist Tropics, both land and ocean, since its precipitation is dominated by convection, but there is no strong orography. The SSM/I emission estimates are not available at the SRDC test sites, so the Pacific atoll data were used for setting the constants.
Table 2 contains the values of H/I and S that are computed for the different datasets using the calibration datasets discussed above. In each case a variety of values was specified for H/I and S, and the values that minimized inaccuracies over the whole range of r̄ were chosen. The final results represent a subjective trade-off because the fits for gauge and SSM/I scattering had to be biased in order to get reasonable checks against the GPCC and atoll gauge analyses. The corresponding performance of the error estimates in the dependent datasets is shown in Fig. 5. Each of the techniques’ points were produced by binning on rain totals with breakpoints at 0.33, 1.67, 3.33, 5.00, 6.67, and 10.00 mm day−1. In each case the observed rms errors have been adjusted for bias, following (15). Over the whole range of rain rates the rms error estimate replicates the general behavior of the observed rms error, although differences indicate that the functional form is only a first approximation.
An example of the rms error estimates is shown in Fig. 6. Curves of constant sample size are plotted to display relative error as a function of precipitation rate. The sample sizes displayed for AGPI and SSM/I emission (Fig. 6b) roughly bound the normal range of population sizes. Overall, the curves agree with previous work.For comparison, the Bell et al. (1996) estimates of relative error for GATE and for SSM/I in the western Tropical Pacific during November 1992–February 1993 are scaled to 2.5° × 2.5° boxes and plotted on Fig. 6b. The GATE results are lower than either SSM/I estimate, perhaps because there is no algorithmic error in the calculation from GATE. The fitted Bell et al. (1996) SSM/I curve is systematically above the SSM/I curves in this study, although the separation is less than their error bars. It is not clear why this difference occurs, although it might be related to regional differences in precipitation.
Quality index
A final error-related issue uncovered in Huffman et al. (1997) is the visual representation of error. By substituting p = r̄/rc in (11) as before, and assuming that rc is nonzero for r̄ = 0, it is clear that rms error must go to zero as r̄ goes to zero. On the other hand, relative rms error (σr/r̄) must be unbounded as r̄ goes to zero. Either extreme makes it difficult to compare “error” between regions with different precipitation rates.
The quality index plot corresponding to Fig. 6 is shown in Fig. 7. Note that the satellite estimates become relatively better for increasing r̄, due to the decreasing importance of S.
An example
The technique developed above was applied in computing the GPCP Version 1 Dataset (Huffman et al. 1997). The combination of SSM/I, infrared, and rain gauge data (the SG estimate) for August 1987 is shown in Fig. 8 together with quality index, rms error, and relative rms error. These images display many of the qualities discussed above. The rms error is larger in regions of higher precipitation, while the relative rms error is smaller. The quality index clearly displays boundaries in data coverage, such as the switch between AGPI and SSM/I at 40°N and 40°S, the gap in geosynchronous infrared in the Indian Ocean, and the addition of gauges (in varying amounts) in land areas. Regions having dense gauge coverage, such as Europe and China, havesmaller values of both rms and relative rms error, and higher values of quality index. Notice how large portions of the tropical land areas have quality index values close to the satellite values alone, due to sparse gauge coverage. Although all three error representations are based on the computation of rms error, the different representations provide different insights into the character of the errors.
What do these rms error estimates tell us about the utility of the precipitation field (remembering that this is only the random part of the error)? The original TRMM goal was relative error less than 10% for monthly 5° × 5° estimates (Simpson et al. 1988). Since then, work such as this study has motivated discussions about adopting an absolute floor to prevent the 10% threshold from requiring tiny error limits in light-rain areas that are not operationally meaningful. For the current study (which considers monthly 2.5° × 2.5° data) the TRMM goal becomes 20% and a reasonable, small floor of 0.4 mm day−1 (12.4 mm for a 31-day month such as August) is chosen for illustration. In Fig. 8 the subtropical highs and intertropical convergence zone meet these goals, but the large regions with intermediate precipitation rates require additional data. Over land, conscientious assembly of existing, but fragmented rain gauge datasets (as the GPCC is doing) will have beneficial effects in many countries. Over the remainder of the globe, including some remote land areas, the introduction of additional microwave-sensing satellites would easily reduce the random error problem. Throughout, improved analysis techniques would also reduce the errors.
Concluding remarks
A simple exact expression has been developed for estimating the rms error in space–time-average precipitation estimates using quantities that are variables of the averaged dataset. This approach accounts for rms errors in the sampling of both the true precipitation field and of the measurement–algorithm-induced errors. These errors may have a correlation structure that is not accessible in monthly averages, but can complicate the analysis and should be addressed in the future. The nondimensional second moment of the precipitation distribution was shown to be a key variable, and analytic and data work showed that it is relatively constant over a wide range of precipitation rates. Several approximations were introduced to allow computation of rms error with the available data. The constants in the functional form were separately set for each estimation technique based on the datasets believed to be representative of rms error for that technique. The resulting rms error fields perform tolerably well in comparison to observed rms errors. The quality index was introduced to make it easier to interpret the error pattern. Inspection of the various error representations makes clear the need for more data in the monthly estimates.
A great deal of work remains to be done to adequately characterize error. The greatest need is to set the constants on a regional basis, instead of having a single global value. A somewhat related issue is that the model for the number of independent samples should be refined. As well, it would clearly be advantageous to have calibration–validation datasets representative of all the various climate regimes around the globe. Finally, error estimation would be facilitated if groups producing precipitation estimates computed estimates of higher-order statistics. In particular, estimates of the variance of instantaneous precipitation values and the fractional coverage by precipitation would allow the use of the exact expression in (11) rather than the parameterized expression in (13).
Acknowledgments
This research was conducted as partof the prelaunch algorithm development work for the Tropical Rainfall Measurement Mission funded by NASA under Dr. R. Kakar. Scientific discussions with Drs. R. F. Adler and T. L. Bell helped the author shape his understanding of the error estimation problem. Comments by the anonymous reviewers and Dr. I. Polyak improved the final paper.
REFERENCES
Adler, R. F., G. J. Huffman, and P. R. Keehn, 1994: Global rain estimates from microwave-adjusted geosynchronous IR data. Remote Sens. Rev.,11, 125–152.
Bell, T. L., A. Abdullah, R. L. Martin, and G. R. North, 1990: Sampling errors for satellite-derived tropical rainfall: Monte Carlo study using a space-time stochastic model. J. Geophys. Res.,95, 2195–2205.
——, P. K. Kundu, and C. Kummerow, 1996: Sampling error of satellite estimates of gridded rainfall. Preprints, 13th Conf. on Probability and Statistics in the Atmospheric Sciences, San Francisco, CA, Amer. Meteor. Soc., 296–300.
Daly, C., R. P. Neilson, and D. L. Phillips, 1994: A statistical-topographic model for mapping climatological precipitation over mountainous terrain. J. Appl. Meteor.,33, 140–158.
Huffman, G. J., R. F. Adler, B. Rudolf, U. Schneider, and P. R. Keehn, 1995: Global precipitation estimates based on a technique for combining satellite-based estimates, rain gauge analysis, and NWP model precipitation information. J. Climate,8, 1284–1295.
——, ——, P. Arkin, A. Chang, R. Ferraro, A. Gruber, J. Janowiak, A. McNab, B. Rudolf, and U. Schneider, 1997: The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset. Bull. Amer. Meteor. Soc.,78, 5–20.
Kedem, B., L. S. Chiu, and G. R. North, 1990: Estimation of mean rain rate: Application to satellite observations. J. Geophys. Res.,95, 1965–1972.
Morrissey, M. L., M. A. Schafer, S. E. Postawko, and B. Gibson, 1995: The Pacific rain gage rainfall database. Water Resour. Res.,31, 2111–2113.
North, G. R., and S. Nakamoto, 1989: Formalism for comparing rain estimation designs. J. Atmos. Oceanic Technol.,6, 985–992.
——, S. S. P. Shen, and R. B. Upson, 1991: Combining rain gages with satellite measurements for optimal estimates of area-time averaged rain rates. Water Resour. Res.,27, 2785–2790.
Simpson, J., R. F. Adler, and G. R. North, 1988: A proposed Tropical Rainfall Measuring Mission satellite. Bull. Amer. Meteor. Soc.,69, 278–295.
Schematics of analytic examples of the probability distribution of precipitation: (a) spike, (b) boxcar, and (c) triangle. All distributions have a delta function of value (1 − p) at r = 0. Both dimensional and nondimensional coordinates are shown.
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
Scattergram of H as a function of r̄ using GSCAT2 precipitation estimates for August 1987.
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
Estimates of H using GSCAT2 precipitation estimates for August 1987. Regions with zero and no precipitation estimates for the month are denoted by gray and black, respectively.
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
Scattergram of rc as a function of r̄ for theGSCAT2 precipitation estimates for August 1987.
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
Comparison of observed and estimated rms error σr for the calibration datasets used to set constants for the four precipitation estimation techniques: (a) SSM/I scattering technique, with observations in solid and estimates in dotted; and SSM/I emission technique (combined with errors in the atoll gauge analysis), with observations in dash–dotted and estimates in dashed; and (b) gauge analysis, with observations in solid and estimates in dotted; and AGPI technique, with observations in dash–dotted, and estimates in dashed. The observed rms errors are corrected for bias as in (15).
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
Relative rms error σr/r̄ as a function of r̄ for three precipitation estimation techniques with various sample sizes: (a) GPCC rain gauge analysis for 0.5, 2, 8, and 32 gauges in a 2.5° × 2.5° grid box, depicted with solid, dotted, dash–dotted, and dashed lines, respectively; and (b) SSM/I emission estimates for 550 and 700 55 km × 55 km equivalent images in a month, depicted with solid and dotted lines, respectively; and AGPI estimates for 200 and 240 2.5° × 2.5° images in a month, depicted with dash–dotted and dashed lines, respectively. The GATE and SSM/I estimates from Bell et al. (1996) are plotted in (b) for comparison as a single star and a long-dashed line, respectively. The SSM/I scattering estimates are so close to the SSM/I emission estimates that they are not plotted.
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
Relative rms error σr/r̄ as a function of quality index Q for three precipitation estimation techniques: (a) GPCC rain gauge analysis for 0.33, 1.67, 5, and 10 mm day−1 in a 2.5° × 2.5° grid box, depicted with solid, dotted, dash–dotted, and dashed lines, respectively; and (b) SSM/I emission estimates for 550 and 700 55 km × 55 km equivalent images in a month, depicted with solid and dotted lines, respectively; and AGPI estimates for 200 and 240 2.5° × 2.5° images in a month, depicted with dash–dotted and dashed lines, respectively. The SSM/I scattering estimates are so close to the SSM/I emission estimates that they are not plotted.
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
GPCP combined satellite–gauge (SG) precipitation estimate for August 1987 in millimeters per day (top) with three representations of the estimated error field: (upper middle) quality index Q in equivalent gauges, (lower middle) rms error σr in millimeters per day, and (bottom) relative rms error σr/r̄ in percent. Regions with no estimate are denoted by black.
Citation: Journal of Applied Meteorology 36, 9; 10.1175/1520-0450(1997)036<1191:EORMSR>2.0.CO;2
Summary of analytic probability distribution results. The contant R is a location on the r axis (Fig. 1).
Datasets, original and modified dimensional units of N (see text), and computed values of the constants H/I and S used in the GPCP Version 1 Dataset.