## Abstract

Characterization of the error associated with satellite rainfall estimates is a necessary component of deterministic and probabilistic frameworks involving spaceborne passive and active microwave measurements for applications ranging from water budget studies to forecasting natural hazards related to extreme rainfall events. The authors focus here on the error structure of NASA’s Tropical Rainfall Measurement Mission (TRMM) Precipitation Radar (PR) quantitative precipitation estimation (QPE) at ground. The problem is addressed by comparison of PR QPEs with reference values derived from ground-based measurements using NOAA/NSSL ground radar–based National Mosaic and QPE system (NMQ/Q2). A preliminary investigation of this subject has been carried out at the PR estimation scale (instantaneous and 5 km) using a 3-month data sample in the southern part of the United States. The primary contribution of this study is the presentation of the detailed steps required to derive a trustworthy reference rainfall dataset from Q2 at the PR pixel resolution. It relies on a bias correction and a radar quality index, both of which provide a basis to filter out the less trustworthy Q2 values. Several aspects of PR errors are revealed and quantified including sensitivity to the processing steps with the reference rainfall, comparisons of rainfall detectability and rainfall-rate distributions, spatial representativeness of error, and separation of systematic biases and random errors. The methodology and framework developed herein applies more generally to rainfall-rate estimates from other sensors on board low-earth-orbiting satellites such as microwave imagers and dual-wavelength radars such as with the Global Precipitation Measurement (GPM) mission.

## 1. Introduction

Reliable quantitative information on the spatial distribution of rainfall is essential for hydrologic and climatic applications, which range from real-time flood forecasting to evaluation of regional and global atmospheric model simulations. Given their quasi-global coverage, satellite-based quantitative rainfall estimates are becoming widely used for such purposes. Converting satellite measurements into quantitative precipitation estimates poses challenges. The link between the observations and surface rain rates depends on the calibration and operating protocol of the instrument itself, the spatial heterogeneity of the rain fields (e.g., coexistence of convective and stratiform precipitation within a single instrumental field of view and vertical heterogeneity of rainfall), the indirect nature of the measurement, and the retrieval algorithm used. As underlined by the Program to Evaluate High Resolution Precipitation Products (Turk et al. 2008) led by the International Precipitation Working Group (IPWG; see http://www.isac.cnr.it/~ipwg/), characterizing the error structure of satellite rainfall products is recognized as a major issue for the usefulness of the estimates (Yang et al. 2006; Zeweldi and Gebremichael 2009; Sapiano and Arkin 2009; Wolff and Fisher 2009). The error characterization is needed for data assimilation and climate analysis (Stephens and Kummerow 2007) and more specifically over land in hydrological modeling of natural hazards and budgeting water resources (Grimes and Diop 2003; Lebel et al. 2009).

In this study, we focus primarily on the Tropical Rainfall Measurement Mission (TRMM) Precipitation Radar (PR) quantitative precipitation estimation (QPE) at ground. The methodology presented herein would equally apply to all satellite precipitation products—in particular those on board low-earth-orbiting satellites. The TRMM PR is currently the only active instrument measuring rainfall from a satellite platform conjointly with a radiometer [TRMM Microwave Imager (TMI)]. PR rainfall estimates are often considered as a reference for TMI-based rainfall estimates (e.g., Yang et al. 2006; Wolff and Fisher 2008). It impacts rain estimates from polar-orbiting passive microwave measurements and a number of satellite-based high-resolution precipitation products (Ebert 2007; Bergès et al. 2010; Ushio et al. 2006). Given the variety of potential sources of error in PR-based QPE and the impact of correction algorithms, the only practical solution is to evaluate PR QPE with respect to an external, independent reference rainfall dataset. The reference is derived from high-resolution ground validation measurements using NOAA/National Severe Storms Laboratory (NSSL) ground radar–based National Mosaic and QPE system (NMQ; Zhang et al. 2011). These products yield instantaneous rainfall-rate products over vast regions including regions of the conterminous United States (CONUS) covered by Next Generation Weather Radar (NEXRAD) data. While a number of studies have investigated the quality of PR estimates in various regions of the world (e.g., Adeyewa and Nakamura 2003; Wolff and Fisher 2008; 2009; Amitai et al. 2009, 2012), our aim is to perform a systematic and comprehensive evaluation for regions over the southern CONUS. We will characterize errors in PR estimates at the pixel measurement scale in order to minimize additional uncertainties caused by resampling. Systematic and stochastic errors of PR estimates will be documented in terms of bias and spatial structure.

One should note that it is not possible to “validate” the PR estimates in a strict sense because independent rainfall estimates with no uncertainty do not exist. Many errors affect the estimation of rainfall from ground-based radars, like nonuniform beam filling, conversion of reflectivity to rain intensity, and calibration. While we do not know the truth at ground, the available independent measurements do provide a useful reference to help identify possible biases and the general levels of uncertainty associated with PR estimates. The reference rainfall accuracy issue will be investigated by systematically comparing PR estimates with different references at ground. Three levels of processing to remove biases characterize these references.

Rainfall estimates from low-earth-orbiting satellites suffer from their poor temporal sampling (Wolff and Fisher 2008; 2009; Lin and Hou 2008). Hence, representative samples of direct comparisons between instantaneous coincident measurements from ground and space are difficult to achieve without a sufficient number of overpasses. This study uses three months (March–May 2011) of satellite overpasses over the lower CONUS. The data are pixel matched in both time and space, and statistics are provided for comparing reference rain intensities to satellite-based estimates. The quasi-instantaneous matching is performed at the scale of the PR measurement scale (4.5 × 4.5 km^{2}).

The PR data and steps required to refine the Q2 ground-based rainfall to arrive at the reference rainfall used for comparisons are presented in section 2. Section 3 assesses the ability of PR rain retrievals to represent the rainfall variability derived from the reference data in terms of rainfall detectability, sensitivity, and spatial structure. Section 4 provides an empirical error model of the PR estimates versus reference rainfall and segregates systematic and random error. The paper is closed with concluding remarks in section 5.

## 2. Data sources

One of the first challenges encountered is the lack of knowledge about the true averaged rainfall for the spatial domains considered. One wants to compare instantaneous satellite rainfall estimates *R*(*A*) with reference rainfall *R*_{ref}(*A*) for a spatial domain *A* (which may be a satellite mesh, watershed, etc.) to characterize the accuracy of the satellite QPEs. The true (and unknown) area-averaged rainfall accumulation, denoted *R*_{true}(*A*), is written as

where **x** is the location vector. The reference rainfall *R*_{ref}(*A*) is a proxy of *R*_{true}(*A*). The final products of the satellite data processing are gridded rainfall fields. Satellite QPEs may then be written as

where *a _{i}* denotes a satellite pixel and

*N*is the number of pixels covering the domain of interest. The reference data

*R*

_{ref}(

*A*) used to evaluate the satellite estimates should spatially match the corresponding true rainfall averaged over the same area

*A*.

### a. Original ground-based products

The NOAA/NSSL NMQ/Q2 (http://nmq.ou.edu; Zhang et al. 2011) is a set of experimental radar products comprising high-resolution (0.01°, 5 min) instantaneous rainfall-rate mosaics available over CONUS. The NMQ system combines information from all ground-based radars comprising the Weather Surveillance Radar-1988 Doppler (WSR-88D) network (NEXRAD), mosaics reflectivity data onto a common 3D grid, and estimates surface rainfall accumulations and types to arrive at accurate ground-based estimates of rainfall (Zhang et al. 2005; Lakshmanan et al. 2007; Vasiloff et al. 2007; Kitzmiller et al. 2011). Figure 1 shows an example of CONUS coverage of Q2 rainfall at 0725 UTC on 11 April 2011 highlighting several rainy systems associated with orography in the west and a wide frontal system in the central part of the domain.

At hourly time step, Q2 adjusts radar estimates with automated rain gauge networks using a spatially variable bias multiplicative factor. A radar quality index (RQI) is produced at the (0.01°, 5 min) resolution. While the true quality of the Q2 QPEs varies in space and time because of a number of complicating factors [e.g., measurements errors, nonprecipitation echoes, uncertainties in *Z*–*R* relationships, and variability in the vertical profile of reflectivity (VPR)], the RQI represents the radar QPE uncertainty associated with reflectivity changes with height and near the melting layer (Zhang et al. 2011). It applies to the radar beam used for QPE—that is, hybrid scan reflectivity comprising elevation angles closest to the surface. The RQI field is composed of a static part relative to the radar beam sampling characteristics such as percent blockage, beam height and width, and a dynamic part accounting for the freezing level height. The static part is illustrated in Fig. 1, where the reduced radar coverage in the western part of the United States results in lower RQI values. The dynamic part causes the RQI values to decrease in cool season months when the freezing level is lower and the radar samples the melting layer and the ice phase at closer range, and to increase in the warm season when the freezing level is at higher altitudes. This is illustrated in Fig. 1, where the freezing level is lower behind a cold frontal system, which deteriorates the already limited coverage in the western part of CONUS.

The original Q2 products utilized in this study are (i) the radar-only instantaneous rain-rate National Mosaic updated every 5 min, (ii) the radar-only rain-rate National Mosaic at hourly time step, (iii) the hourly rain gauge–corrected National Mosaic product, and (iv) the RQI. The primary Q2 product used for comparison with PR is the radar-only instantaneous rain-rate mosaic. Current Q2 radar products do not include an instantaneous gauge-adjusted rain-rate mosaic. For this study and similarly to Amitai et al. (2009, 2012), a second reference rainfall was derived from an instantaneous bias-corrected Q2 product. Pixel-by-pixel ratios between the hourly gauge-adjusted and the hourly radar-only products are calculated. These hourly ratios are then applied as multiplicative adjustment factors to the radar-only 5-min product. Extreme adjustment factors [outside the (0.1–10) range] are discarded and no comparison is performed with PR for the corresponding Q2 values. Thus, the gauge adjustment also serves as a data quality control procedure. A subsequent reference is derived from the bias-corrected Q2 product filtered using the RQI index. Only the rain rates associated with the best RQI values (i.e., equal to 1) were retained. This selection ensures that only Q2 estimates representing the best measurements conditions (i.e., no beam blockage and radar beam below the melting level of rainfall) are retained.

As radar-based only, the first reference may have issues like nonuniform beamfilling because of VPR effects, inaccurate conversion from reflectivity to rain intensity, and calibration errors. The blending with rain gauge data in the second reference should significantly mitigate these biases and provide a more accurate reference. The filtering through the radar quality index for the last level of processing eliminates a large part of the impact of the VPR. One should note these incremental improvements of the Q2 products may not screen out all possible errors in ground-based radar estimates. In particular, the gauge adjustment may suffer from representativeness errors from scarce rain gauge network density and from the difference of the temporal resolution between the hourly adjustment factors applied downscale to 5-min Q2 rain rates. Nevertheless, they provide the best possible reference at the scale of PR in terms of sampling conditions and unbiased estimates.

### b. Q2-based reference rainfall

In the current study, all significant rain fields observed coincidentally by TRMM overpasses and the NEXRAD radar network from March to May 2011 are collected. The Q2 products closest in time to the TRMM satellite local overpass schedule time are used. To compute the reference rainfall, a block-Q2 rainfall pixel is computed to match each PR pixel in case of TRMM overpasses in a similar manner to Kirstetter et al. (2010, 2012).

Although the quantitative interpretation of the weather radar signal in terms of rainfall may be complex, radars enable a reliable evaluation of area-averaged rainfall estimates. The spatial variability of rainfall at small scales and the resolution difference between radar and PR (as much as 2 orders of magnitude in area) may cause significant discrepancies in the statistical sampling properties and adds statistical noise in the comparison (see e.g., Ciach and Krajewski 1999 for a similar issue when comparing point-measurement rain gauge to area-rainfall radar data). An approximate 2.5-km radius around the center of the PR pixel location was considered. All of the Q2 pixels (rainy and nonrainy) found within this circular region were located to compute unconditional mean rain rates for the Q2 at the PR pixel scale. The numbers of Q2 pixels associated with each PR pixel vary from case to case, but tend to average about 25 (with native Q2 resolution being 1 km^{2}). When more than five Q2 pixels have missing values, the PR and Q2 data are discarded from the comparison. To estimate PR pixel–averaged ground rainfall accumulation (and the associated sampling errors), a weighted mean estimator is considered to determine the reference rainfall *R*_{ref}(*A*) over the PR pixel *A* from Q2 products. As the representativeness of the rainfall sampled by PR is related to the characteristics of the radar beam, the weighting function is given by the PR beam pattern inside a PR pixel. The reference rainfall is therefore

where notations have been simplified for the sake of convenience. Here Q2 denotes the Q2 rain-rate product for the mesh *a _{i}*. The value

*R*

_{ref}(

*A*) depends on the number

*n*of Q2 meshes inside the PR pixel; the weights

*ω*are derived from the two-way normalized power-gain function of the PR antenna

*f*(assumed to be Gaussian) and the beamwidth

*θ*

_{0}, and each

*ω*is computed over the domain

_{i}*θ*

_{mesh}corresponding to the Q2 mesh

*a*. It is assumed the PR resolution remains constant (circle of approximately 5 km) whatever the radar beam off-nadir inclination angle. Additional research may be needed to take into account the deformation of the resolution with off-nadir angle (Takahashi et al. 2006).

_{i}Two weighted standard errors are computed with the reference rainfall. The first one is the weighted sample standard deviation, which represents the variability of the Q2 rainfall (at native resolution) inside the PR pixel:

It is used to select the PR–reference pairs for which the *R*_{ref}(*A*) is trustworthy. The second one is the standard deviation relative to the weighted mean *R*_{ref}(*A*):

It allows us to assess the *R*_{ref}(*A*) estimation quality.

Matched PR and *R*_{ref}(*A*) estimates only exist at locations where both the PR and ground radars have taken actual observations. This technique averages the minimum number of Q2 meshes needed to produce spatially coincident sample *R*_{ref}(*A*) estimates. The advantages of the current technique over gridded approaches are that there is no interpolation, extrapolation, smoothing, or oversampling of PR data. The PR rainfall statistical characteristics are preserved because the product remains untouched: the total rainfall amount, the total rainy area, and the probability distribution function (PDF) shapes. All of these properties may therefore be compared to the reference at once.

Figure 2 shows an example of continuous mapping of the weighted mean estimator for the reference rainfall *R*_{ref}(*A*). The estimator is a smoother of the original Q2 rain field. The maximum of the rainfall rate decreases from 145 to 130 mm h^{−1}. The total rainfall area increases, mainly at the edges of the rain field. To avoid a contamination of the PR–reference comparison by the uncertainty on the ground reference, the reference pixels were segregated into “robust” [*R*_{ref}(*A*) > *σ*_{footprint}] and “nonrobust” [*R*_{ref}(*A*) < *σ*_{footprint}] estimators. This procedure illustrated in Fig. 2 filters out the reference values at the edges of the rain fields. Nonrobust reference values are discarded for quantitative comparison. The robustness check is applied to the three Q2 products considered for reference (native Q2, bias-corrected Q2, and RQI+bias-corrected Q2). As an example for the “RQI+bias-corrected Q2” the averaged relative error (*σ*_{ref}/*R*_{ref}) of the reference decreases from 832% to 16%. The ratio of the mean error to the standard deviation of the reference [*σ*_{ref}/σ(*R*_{ref})] decreases slightly from 5.6% to 5.4%. This method of reference selection therefore increases the reliability and representativeness of the block-Q2 values that constitute our ground reference.

### c. PR-based rainfall

The PR measures reflectivity profiles at Ku band. Surface rain rates are estimated over the southern United States up to a latitude of 37°N (Fig. 1). Artifacts such as contamination by surface backscatter, attenuation and extinction of the signal, nonuniform beam filling, brightband effects and accuracy of the *Z*–*R* relationship (Wolff and Fisher 2008) must be accounted for. In the present study, the surface rain rate at each PR pixel location is a standard TRMM product (2A25 v6) described in Iguchi et al. (2000). The scan geometry and sampling rate of the PR lead to pixels spaced approximately 5.1 km cross and along track over a 245-km-wide swath. The minimum theoretical detectable rain rate by the PR is fixed by its sensitivity and is about 17 dB*Z*, or ~0.5 mm h^{−1}.

### d. Comparison samples

Several factors—including rainfall intermittency, discrete temporal sampling of TRMM, and censoring of reference values for required quality—reduce the number of comparison samples for reference and PR estimates over the comparison period. Table 1 provides the number of these samples for the reference values, inclusive and exclusive of nonrainy pixels. The comparison sample sizes in Table 1 are primarily driven by the number of rain events and the overpass frequency of TRMM, then by the censoring of reference values. The quality control in the bias adjustment discarded 26% of original Q2 values and an additional 34% were filtered using RQI. Note that after two levels of processing and censoring, the comparison sample size for the RQI+bias-corrected Q2 remains significant at 393 347. This is credited to the large number of samples offered by the high-resolution, gridded Q2 product.

To assess the representativeness of our spatially and temporally limited samples, we compared the statistics of the reference rainfall resampled to the PR pixel resolution with respect to the whole reference dataset (CONUS-wide below 38°N, which do not necessarily match a TRMM overpass). Figure 3 shows quantile–quantile plots between (i) the whole reference dataset (*x* axis) and (ii) the subset of pixels that matched to PR pixel resolution for the different reference datasets. Table 2 provides values of the conditional mean and standard deviation. The PR-resampled reference rainfall distribution does not show a clear deviation from the 1:1 degree line compared to the whole distribution. The reference distributions are fairly stable given the different censoring levels with the mean of the PR-resampled distributions being within 7% of the one for the whole dataset. We may therefore consider each reference dataset to be quite representative of the corresponding whole rainfall distribution.

Similarly, we compared the different PR datasets to assess the impact of Q2 censoring on their representativeness. Figure 4 shows quantile–quantile plots between (i) the complete (“native”) PR dataset (*x* axis) and (ii) the censored subsets according to the bias-corrected and bias+RQI-corrected Q2 samples. Table 3 provides values of the conditional mean and standard deviation for each set. The different PR rainfall distributions do not show a clear deviation from the 1:1 degree line compared to the native PR rainfall distribution. The means and standard deviations of the bias-corrected-censored and bias+RQI-corrected-censored distributions are less than 1% and around 10% higher, respectively. We may therefore consider the representativeness of each PR dataset, following censoring steps, to be quite comparable to each other.

## 3. Rainfall data analysis

This section reviews the ability of PR rain retrievals to represent the rainfall variability derived from the Q2 data. First, contingency tables provide information on the reference rainfall reliability and on the influence of PR sensitivity to detect rainfall occurrence. The PDF of rainfall estimates provide in-depth information on the sensor’s global ability to capture rain regimes given the influence of its sensitivity and the several factors (attenuation of the radar signal, nonuniform beam filling, and accuracy of the *Z*–*R* relationship). Another feature to compare is the spatial structure of rainfall fields.

### a. Contingency tables

Table 4 shows the contingency tables for PR rain/no-rain occurrence relative to the references with percentile of hits (*H*; both Q2 and PR detect rain), misses (*M*; PR does not detect rain while Q2 does), false alarms (*F*; PR detects rain while Q2 does not), and correct rejections (*C*; both Q2 and PR do not detect rain). The reference data are separated into three subsamples: the nonrobust set (*R*_{ref} < *σ*_{footprint}; see section 2b), the robust set (*R*_{ref} > *σ*_{footprint}) and the “whole” Q2 set. Reference null values are considered as robust. All coincident and collocated PR values are considered and sorted according to the reference samples. Table 5 provides the mean rainfall values according to the same contingency tables with PR on the left-hand side of the “/” sign and the reference on the right-hand side.

The false detections (*M* + *F*) of PR are mainly associated with the nonrobust reference data, with a rate of more than 80% for all nonrobust sets, while around 50% are improperly classified when using the robust reference dataset. The misses (*M*) are the main contributors to the false detection population (i.e., approximately 85% for the whole dataset). These misses of PR are coincident with low reference values (less than 0.15 mm h^{−1} for the nonrobust set for all references; see Table 5). By comparison, the correct detections (*H* + *C*) of PR are mainly associated with the robust reference set from 45% to 52%. For the same robust reference sets, the hits of PR are coincident with the higher reference values with mean rainfall rates more than 6 mm h^{−1}. One should note for all references that (i) the mean PR (*F*) values are significantly lower than the PR (*H*) values and (ii) the mean reference (*M*) values are significantly lower than the mean reference (*H*) values. Finally, both mean reference and PR values are higher for the robust Q2 set than for the nonrobust Q2 set. Table 6 shows the discarded rain volumes in question; the misses of PR represent less than 12% of the reference rainfall volume, while false alarms represent less than 16% of the PR rainfall volume. Note the lowest values (less than 8%) are obtained with the bias+RQI-corrected Q2 reference.

The impact of the reference rainfall on the contingency scores is shown in Fig. 5. Contingency values are used to compute probability of detection (POD), false alarm rates (FAR), and critical success index (CSI). Scores are generally better for the robust reference than for the nonrobust one. Within this category, CSI shows a general increase with sequential Q2 data quality steps, while the FAR shows the lowest values with additional processing of the Q2 reference. A general convergence between the Q2 reference and PR estimates is therefore acknowledged as a function of the reference accuracy.

Considering that 80% of the whole reference rain-rate dataset that are not detected by the PR are lower than 0.3 mm h^{−1}, the sensitivity of PR is close to this value. The misses are likely associated with high intermittency and/or the “rain/no-rain” limits of rain fields. These features are missed by the PR because the rain rates are close to the detection threshold. Further, we calculated PR’s POD for different rainfall-rate thresholds (not shown) using the robust, bias+RQI-corrected Q2 reference. The POD increased from 56% at the rain/no-rain detection level to 71% using a threshold of 0.5 mm h^{−1} and leveled off to 76% for 1.0 mm h^{−1}. This suggests that the PR can indeed capture the main rain regions but loses the weaker echoes (Schumacher and Houze 2000), probably because of its sensitivity. The false alarms may be due to shallow rain not detected by ground-based radars when occurring at significant distance (greater than 100 km). This is supported by the positive impact of the RQI on the false alarm rate, which as seen on Fig. 1 limits the range of ground radar data selection.

### b. Probability distributions by occurrence and rain volume

Hereafter, the PR rain estimates are the conditional ones (nonzero rainfall) coincident and collocated with nonzero reference estimates. The robust reference rain-rate datasets are used. Two PDFs for PR versus reference rainfall are computed and shown in Fig. 6: (i) the PDF by occurrence (PDF_{c}) and (ii) the PDF by rain volume (PDF_{v}) (Wolff and Fisher 2009; Amitai et al. 2009, 2012). The PDF_{c} provides statistical information on the rain-rate distribution and highlights the estimate’s sensitivity as a function of rain rate; it is computed as a ratio between the number of the rain rates inside each bin and the total number of rain rates. The PDF_{v} represents the relative contribution of each rain-rate bin to the total rainfall volume; it is computed as a ratio between the sum of the rain rates inside each bin and the total sum of rain rates. It is therefore an important characteristic of the instantaneous products from the perspective of building merged rainfall accumulations; it enables a comparison of PDFs based on estimates derived from instruments characterized by different detection limits (in particular at weak intensities).

The rain rates of PR exhibit similar PDF_{c} for all references. Compared to references’ PDF_{c}, PR tends to overestimate light rain rates (in the interval [0.3–0.5] mm h^{−1}). But, PR demonstrates poor detection of the lightest rain rates (below ~0.3 mm h^{−1}) compared to the two bias-corrected references. This is consistent with the concept of rain area “edges” that might be only partially detected by PR, resulting in misses associated with low rain rates (see previous section). PR PDF_{c} presents similar features with references for rain rates >~1 mm h^{−1}. One may note the improved convergence between PR and reference rainfall PDF_{c} in the rain-rate interval [0.5–1.0] mm h^{−1} with the sequential Q2 data quality steps.

Despite the low occurrence of relatively high rain rates (>10 mm h^{−1}), their contribution to the total rainfall volume is significant (greater than 60%). As a consequence, the mode of PDF_{v} for PR is shifted toward lower rain rates (~18 mm h^{−1}) compared to the reference’s mode (~60 mm h^{−1}), which is in agreement with the results found in Amitai et al. (2006, 2009). This is attributed to high rainfall rates (>10 mm h^{−1}), which are underestimated by PR because insufficient correction due to attenuation losses for the 2A25 version 6 (as suggested by Wolff and Fisher (2008)), nonuniform beam filling effects, and/or inaccurate conversion from reflectivity to rain intensity. Note that it is difficult to distinguish between these different influences by comparing solely rain rates at ground.

### c. Spatial structure of estimated rainfall fields

For hydrological applications, the total amount of water over a basin as well as the location and spatial correlation within the catchment might be important. It is therefore relevant to assess the ability of space-based estimates to retrieve the spatial structure of rainfall fields as seen by the reference. To describe the structure by a relatively simple function, we use a normalized variogram, which represents the spatial correlation of the rain field (Journel and Huijbregts 1978; Lebel et al. 1987; Kirstetter et al. 2010, 2012). An appropriate model is fit to the empirical normalized variogram. Among the set of classical models, the exponential model was found most suitable. It is expressed as

where the three parameters are the nugget (*C*_{0}), the sill (*C*), and the variogram range parameter (*d*). The exponential model reaches its sill asymptotically as *h* → ∞. The “effective range” corresponds to the mean decorrelation distance of the estimates. It is the distance where the variogram reaches 95% of its maximum and corresponds to 3*d* for the exponential model. The nugget parameter can be used to describe a possible discontinuity of the variogram at the origin that may be due to (i) the process variability at scales poorly resolved by the observation system and/or (ii) measurement errors. In the following, these parameters are used to characterize the structure of rainfall.

Spatially normalized variograms of references and PR estimates are displayed in Fig. 7. Table 7 summarizes the parameters of these variograms. The variogram ranges of PR are quite similar to the three references’ (approximately 18 km). The nugget values, however, are more distinct. While it is ~32% for the Q2 references, it is significantly higher for PR (approximately 45% of the sill). These decorrelations of spatial structure at short interdistances suggest the resolution of the PR measurements may be limited when sampling the variability of small, disorganized rainfall structures associated with localized convection. The smaller reference nugget is an indication of the better sampling of the rain field by the reference rainfall, which is an issue previously discussed in section 2b. It must also be noted that the upscaling of the reference estimates from their original resolution to the PR resolution tends to smooth the original Q2 rain field. The comparatively higher nugget with PR may be caused by the rain intermittency, contamination by surface backscatter, attenuation of the signal, brightband effects, or inaccuracy of the *Z*–*R* relationship. An interesting feature is that both sensors present a slightly decreasing nugget with the sequential Q2 data quality steps. This feature could be attributed to the censoring of the reference, which filters out complicated sampling situations for the ground-based radars.

## 4. Quantitative error modeling

### a. Correlations and biases

Scatterplots of PR versus reference rainfall are presented for the three sets of Q2 reference in Fig. 8. Classical performance criteria of satellite-based rainfall estimation compared to reference values are listed in Table 8: correlation coefficient and mean relative error (MRE), expressed in percentage and defined as MRE = (PR_{mean} − Ref_{mean})/Ref_{mean}. The comparisons between the PR and reference estimates are assessed on a point-to-point basis. A rainy pixel is included in the statistics if both PR and the reference are nonzero to emphasize the PR ability to quantify precipitation when it is raining (the case of PR having zero rainfall when it is raining has been addressed in section 3a). This is particularly significant given the significant misses of PR.

The two sensors present coherent mean and standard deviation values as long as the representativeness of the comparison samples are kept in mind. As expected, the means of the three PR sets are quite similar. In all cases the PR underestimates the reference mean values by ~17%. This is once again attributed to the significant underestimation of the higher rain rates in the 2A25-v6 products, presumably because of a combination of several factors like attenuation losses, inaccurate *Z*−*R* relationship, and/or nonuniform beam filling. The variations of the reference mean for the three sets explain in large part the variations in the apparent bias of PR relative to the reference. The native reference set is affected by (i) a global overestimation of rain rates, which could be due to the inaccuracy of the *Z*−*R* relationship, and (ii) an underestimation of rain rates linked to partial beam blockage and VPR effects (i.e., overshooting above the melting layer by the radar beam far from the radar). The gauge-based bias correction of the native Q2 product decreases the mean reference values, so the negative bias of PR is apparently improved. The additional RQI filtering removes the underestimation of Q2 at far range so the bias of PR is degraded. The reference shows higher standard deviation than the PR in coherence with the PDF features presented in section 3b.

The correlation coefficients between PR and Q2 reference estimates are moderate (around 0.6). One could note the best correlation between the two sensors is achieved with the bias+RQI-corrected reference. The differences between the two products on a point-to-point comparison basis can be attributed to sample volume discrepancies, timing and navigation mismatches, and the uncertainties in the respective rainfall estimates. The significantly greater nugget in the PR variogram than in the reference variogram is also an indication of the greater level of noise in the PR rain field spatial structure, which may limit the correlation between the two series on a point-to-point comparison.

### b. Error model

The departures of PR estimates from the references are analyzed in this section on a point-to-point basis. The uncertainties associated with satellite estimates of rainfall include systematic errors as well as random effects from several sources (Yang et al. 2006; Kirstetter et al. 2012). There is a fundamental issue in segregating the proportion of the scatter due to purely random error and the proportion due to conditional biases of the PR estimates that may be either positive or negative, producing additional scatter.

With the true rainfall being unknown, the residuals are defined as the difference *ɛ* = (*R* − *R*_{ref}) between the reference rainfall (*R*_{ref}) and the satellite estimates (*R*). Only pairs for which *R*_{ref} and *R* are both nonzero are considered in the calculations in order to emphasize the PR ability to quantify precipitation where it is raining. The sets of *ɛ* distributions are studied using the generalized additive models for location, scale, and shape (GAMLSS; Rigby and Stasinopoulos 2005) technique. As a preliminary step, *R*_{ref} is considered as the main driving (explanatory) variable conditioning the departures of PR estimates from references.

Generalized linear models for location, scale, and shape aim at modeling the parameters of a response variable’s distribution. Two main assumptions are made: 1) the response variable *ɛ* is a random variable following a known parametric distribution with density *f*(*ɛ* | *μ*, *σ*) conditional on the parameters (*μ*, *σ*), and 2) the observations *ɛ* are mutually independent given the parameter vectors (*μ*, *σ*). Each parameter is modeled as a function of *R*_{ref} (the explanatory variable) using monotonic (linear/nonlinear or smooth) link functions. More details are provided by Rigby and Stasinopoulos (2001, 2005), Akantziliotou Rigby and Stasinopoulos (2002), and Stasinopoulos and Rigby (2007). A wide variety of distributional forms are available within GAMLSS. To simplify and distinguish between systematic and random errors, a number of conditional densities with the first two moments as parameters are considered here: the location *μ* (mean) describing systematic errors and the scale *σ* (standard deviation) representative of random errors. For a given conditional distribution of the response variable, the conditional quantiles can be expressed as a function of the location and scale. GAMLSS is best fitted using the algorithm GAMLSS in the R software package (Stasinopoulos and Rigby 2007). The rainfall trends for each parameter are fitted using locally weighted scatterplot smoothing (LOESS), which are more flexible than polynomials or fractional polynomials for modeling complex nonlinear relationships. It is a polynomial curve determined by *R*_{ref}, which is fitted locally by weighted polynomial regression, giving more weight to points near the point whose response is being estimated and less weight to points farther away (see Cleveland et al. 1991).

Several two-parameter density functions (lognormal, normal, reverse gumbel, logistic, gamma, etc.) have been tested to fit the data. The distributions of residuals (not shown here) were generally found to be unimodal and asymmetric. The goodness of fit on the whole dataset has been checked by investigating the Akaike information criteria (AIC) for each of the semiparametric density fits. The reverse Gumbel distribution , where *μ* is the mean and *σ* the standard deviation of the residual population) was found to be the most appropriate. Figure 9 shows the residuals as a function of *R*_{ref} as well as the fitted GAM model for PR in the representative case of the bias+RQI-corrected reference. The conditional PDF of residuals *ɛ* present a high conditional shift versus the 0 line and a high conditional spread. Note that for *R*_{ref} > ~50 mm h^{−1}, the model is quite undetermined because of the lack of observed residuals. All models show that PR present a tendency to overestimate light rain rates (the median of residuals is positive) and underestimate higher rain rates (negative median of residuals); that is, PR underestimates *R*_{ref} = 20 mm h^{−1} rain rates with an occurrence of 70% and with a representative bias of −7 mm h^{−1} and underestimates *R*_{ref} = 40 mm h^{−1} with an occurrence of 92% and with a representative bias of −24 mm h^{−1}. This is likely to be once again due to an inaccurate *Z*−*R* relationship, nonuniform beam filling, and/or insufficient correction of PR attenuation for heavier rain rates.

In case of a nonsymmetric density for residuals or in case of extreme values, the median is preferred to the expectation for a better representativeness of the systematic component of the residuals. The systematic error component (i.e., conditional bias) is therefore described by the conditional median of these distributions. For the same reason we consider the interquantile (q90–q10) value to assess the random part of the error. It is computed after having applied the error separation variance correction to the conditional standard deviation *σ* extracted from the GAM model. The error separation variance concept (Ciach and Krajewski 1999; Teo and Grimes 2007; Kirstetter et al. 2010) makes it possible to evaluate the variance of the PR with respect to the true unknown rainfall. We assume the errors on the reference rainfall and on the PR estimates to be uncorrelated. Introducing the true rainfall *R*_{true} in the expression of the variance of the residuals between the PR and reference values leads to (see Kirstetter et al. 2010 for details)

Fortunately, as can be seen in Fig. 10, the reference estimation standard deviations are lower than the standard deviations of the PR–reference residuals, indicating the reference values to be comparatively reliable to evaluate PR. The standard deviation of the PR residuals with respect to the true rainfall is significantly reduced compared with the PR–reference residual standard deviation. One may note the standard deviations increase up to a reference value (~50 mm h^{−1}) beyond which we believe sampling issues lead to a stabilization or a decrease of the standard deviations. We therefore apply the modeling up to this limit only. As ~98% of the reference values are under his limit, this choice will not lead to any significant lack of representativeness.

Figure 11 shows the conditional biases and random errors of PR relative to the three Q2 references. The global bias (see previous section and Table 8) of PR results from a balance between overestimation of light rain rates and underestimation of high rain rates. The underestimation is more frequent, inducing a global negative bias. The conditional biases of PR relative to the references are quite similar. Note the bias-corrected conditional bias is shifted to the right compared to the native one, so overestimation of light rain rates is more significant and the underestimation of higher rain rates less pronounced, which is consistent with the reduced negative global bias for this specific reference (see Table 8). Note also the negative slope of the bias+RQI-corrected conditional bias is lower than for the two other (the conditional bias is less significant), which could be seen as a sign of a better convergence between PR estimates and this Q2 reference. This is confirmed when considering the random part of error. The bias+RQI-corrected curve shows the lowest random errors up to *R*_{ref} = 4 mm h^{−1} (more than 65% of the reference rain rates are under this value). The random error increases consistently with *R*_{ref}. It is systematically higher for the bias-corrected than for the native reference—a result consistent when applying a bias correction (Ciach et al. 2000). It represents a significant part of error, suggesting that other factors than *R*_{ref} could be considered to evaluate the error of PR rain-rate estimates at ground.

## 5. Conclusions

In preparation for National Aeronautics and Space Administration (NASA)’s future Global Precipitation Measurement (GPM) mission, a 3-month data sample of TRMM PR–based rainfall products have been compared to surface rainfall derived from Q2 over the lower conterminous United States. The major advantage of the Q2 ground-based reference dataset is its resolution in both time and space commensurate with rainfall estimates derived from sensors on board low-earth-orbiting satellites. The comparisons have been performed at the PR pixel resolution. A framework is proposed herein to address methodological issues so as to provide a preliminary version of an error model for satellite QPEs. The error model is empirically derived and is thus prone to be specific to the dataset considered and the PR/Q2 data processing implemented. However, the results show similarities with previous rainfall comparisons over West Africa and thus give credence to the developed framework (Kirstetter et al. 2012). Results from the error model presented herein provide insights into the most significant characteristics of PR rainfall retrieval errors that need to be taken into account when such data are used in applications.

A consistent result noted throughout each analysis was the increased consistency between PR and the Q2 reference following sequential data quality control steps, including bias correction using rain gauges and filtering using the radar quality index (RQI) product. This finding, alone, highlights the importance of matching the scales and refining the accuracy of the reference dataset as much as possible before reaching meaningful conclusions about the PR accuracy.

Different error sources were identified and quantified for PR rain-rate estimates. The most significant error is most likely due to a combination of inaccurate *Z*−*R* relationship, nonuniform beam filling, and/or attenuation of the PR radar signal. It is difficult to distinguish between these different influences by comparing solely rain rates at ground. Segregating rain from no-rain boundaries is also a driving contributor to the PR rain-rate errors, probably linked to the lack of sensitivity in the most inhomogeneous and light parts of the edges of rainy regions. Nevertheless, the variogram analysis showed that the PR adequately represents the spatial structure of the rain fields. The scatterplots revealed PR-estimated rain rates are only moderately correlated (Pearson correlation coefficient of 0.6) to the best reference rainfall on a point-to-point basis.

The statistical model developed here quantifies the relation between instantaneous PR rainfall and the corresponding reference rainfall. It consists of a deterministic additive function and a random uncertainty component, both conditioned on given reference values. The contribution of systematic PR errors is confirmed to be quite large because of the aforementioned signal attenuation issue.

In terms of perspectives, the relative contributions of errors linked to rainfall type and off-nadir angle need to be evaluated, as well as influence of the underlying terrain. The same framework and reference rainfall datasets can be readily applied to rainfall retrievals from other sensors on board low-earth-orbiting satellites [i.e., TMI, Advanced Microwave Scanning Radiometer for Earth Observing System (EOS) (AMSR-E), Special Sensor Microwave Imager (SSM/I), and Microwave Analysis and Detection of Rain and Atmospheric Structures (MADRAS)]. This framework will also be applied to GPM rainfall estimates following its launch in 2013. Another important issue to study is how the various error sources in PR, which is often used as a calibrator, propagate when merging with geostationary infrared data for a number of satellite-based, high-resolution precipitation products. Finally, the error model discussed in this study would be useful to generate rainfall ensembles and in hydrologic error propagation studies of satellites estimates.

## Acknowledgments

We are very much indebted to the team responsible for the NMQ/Q2 products, especially Carrie Langston. We want to thank two anonymous reviewers whose comments were very useful for improving the manuscript. This work was funded by a postdoctoral grant from the NASA Global Precipitation Measurement Mission Ground Validation Management.