## Abstract

This study proposes a new framework, Precipitation Uncertainties for Satellite Hydrology (PUSH), to provide time-varying, global estimates of errors for high-time-resolution, multisatellite precipitation products using a technique calibrated with high-quality validation data. Errors are estimated for the widely used Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) 3B42 product at daily/0.25° resolution, using the NOAA Climate Prediction Center (CPC) Unified gauge dataset as the benchmark. PUSH estimates the probability distribution of reference precipitation given the satellite observation, from which the error can be computed as the difference (or ratio) between the satellite product and the estimated reference. The framework proposes different modeling approaches for each combination of rain and no-rain cases: correct no-precipitation detection (both satellite and gauges measure no precipitation), missed precipitation (satellite records a zero, but the gauges detect precipitation), false alarm (satellite detects precipitation, but the reference is zero), and hit (both satellite and gauges detect precipitation). Each case is explored and explicitly modeled to create a unified approach that combines all four scenarios. Results show that the estimated probability distributions are able to reproduce the probability density functions of the benchmark precipitation, in terms of both expected values and quantiles of the distribution. The spatial pattern of the error is also adequately reproduced by PUSH, and good agreement between observed and estimated errors is observed. The model is also able to capture missed precipitation and false detection uncertainties, whose contribution to the total error can be significant. The resulting error estimates could be attached to the corresponding high-resolution satellite precipitation products.

## 1. Introduction

Merged satellite–gauge estimates of precipitation have been very commonly used in a range of areas, such as hazard mitigation (floods and droughts), agricultural planning, and water resources management (Hong et al. 2007b; Krajewski et al. 2006; Hossain and Anagnostou 2004; Johnson et al. 1993). The advantage of using merged datasets rather than single-source datasets is that merged datasets are able to exploit the strengths of several different types of precipitation estimates to give the best high-resolution estimate possible. One downside of this approach is that the resulting estimates are affected by a range of measurement and algorithm errors, as well as sampling errors that are not easily characterized. Although the quality of satellite precipitation products have improved significantly in past decades, the development of estimates of associated errors has received less attention, and improvements to error models have lagged behind improvements to the precipitation estimates. Error estimates are of fundamental importance, particularly in hydrological applications and climate studies, since they allow inferences about the reliability of satellite products and their operational application. A quantitative analysis of those errors is critical for a successful deployment of satellite precipitation products in hydrological modeling, land data assimilation systems, and water management policy.

Only a few studies propose a method to estimate the satellite precipitation error. One example of routinely produced errors for a merged precipitation product is the Global Precipitation Climatology Project (GPCP) analysis, a globally complete estimate of surface precipitation at the 2.5° × 2.5° latitude–longitude resolution that spans the period 1979 to the present (Adler et al. 2003; Huffman et al. 2009). These errors are provided for every grid box at monthly and daily time scales, based on the method of Huffman (1997), and were independently validated by Gebremichael et al. (2003). Error estimates are also available for the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA, often known by its TRMM product designations of 3B42 and 3B43) analyses, following Huffman (1997). However, these error estimates are not based on a sound theoretical foundation, as the methodology proposed by Huffman (1997) depends on the samples being functionally independent, which is not the case when considering the 3-hourly data. Therefore, the formulation would need an estimate of the number of independent samples.

In this study, we will focus on error estimates for daily precipitation estimates, which are of particular interest for several reasons. Daily rainfall amounts are a basic building block in many investigations, including hydrologic studies over land, ocean salinity estimation, and water budget calculations (e.g., Hong et al. 2007a; Wood et al. 2004; Nijssen et al. 2001; Liebmann and Allured 2005). Additionally, the daily resolution is the finest time scale for which extensive high-density gauge data are available around the globe, which is fundamental to calibrating and validating the satellite precipitation error estimates.

Numerous recent studies have evaluated the accuracy of high-resolution global satellite precipitation products. For example, a detailed uncertainty analysis is provided by the Pilot Evaluation of High Resolution Precipitation Products (Arkin et al. 2008) and by the International Precipitation Working Group (IPWG), who assessed a comparison of several satellite products and numerical weather prediction models over land (Ebert et al. 2007). They showed how the performance of satellite precipitation estimates depends on the rainfall regime, with more accurate estimates during summers and at lower latitudes. The IPWG validation over the United States also suggested that the infrared–passive microwave (IR-PMW) merged satellite estimates perform almost as well as ground-based radar in terms of daily bias and frequency of precipitation. Many other studies have evaluated satellite precipitation uncertainty at a different range of scales and for global high-resolution satellite datasets. For instance, Sapiano and Arkin (2009) performed a subdaily intercomparison of five high-resolution datasets (i.e., at least 0.25°, 3 hourly) with gauge data. They showed that satellite data are effective at representing high-resolution precipitation, with 1) correlations against the gauge observations as high as 0.7 and 2) relatively high biases over land (except for gauge-adjusted products) and ocean. Studies by Tian et al. (2007) and Anagnostou et al. (2010) presented an exhaustive error analysis of two widely used satellite precipitation products over Oklahoma: TMPA 3B42-V6 and the NOAA Climate Prediction Center morphing technique (CMORPH). They showed that CMORPH has season-dependent biases, with overestimation in summer and underestimation in winter, and that the probability of rain detection was higher for CMORPH than for TMPA, although this was accompanied by a higher false alarm probability. Furthermore, past studies have proposed error metrics to evaluate satellite precipitation data that are mutually interpretable by users (e.g., hydrologists) and algorithm developers. Hossain and Huffman (2008) demonstrated that a holistic representation of the satellite error structure should include information on how the error varies in space and time, as well as how far off each estimate is from the true value. They also observed that some error metrics are scale dependent, such as the probability of rain detection, whereas conventional metrics such as correlation coefficient, frequency bias, and false alarm ratio were found to be modestly affected by scales.

Taking a different approach, several studies have investigated the sampling error in satellite precipitation data and proposed the power law model to quantify the standard deviation of the sampling error (e.g., Wilheit 1988; Bell et al. 1990; Gebremichael and Krajewski 2004; Steiner et al. 2003; Hong et al. 2006). Nevertheless, this model assumes a lognormal distribution of precipitation errors, which has been shown to be unrealistic, especially at high rain rates (Gebremichael and Krajewski 2005). A third approach is the two-dimensional satellite rainfall error model (SREM2D), developed by Hossain and Anagnostou (2006) to generate ensembles for a given precipitation field. SREM2D employs stochastic space–time formulations to characterize the multidimensional error structure of satellite retrievals by accounting for the joint probability of successful delineation of rainy and nonrainy areas. However, this model is expensive both in terms of parameter requirements and numerical computation. Another attempt to estimate uncertainties associated with satellite precipitation products at fine space and time scales (0.25°, 3 hourly) is provided by Gebremichael et al. (2011), who developed a nonparametric model to generate the distribution of actual precipitation values, given a satellite estimate. Specifically, they calibrated conditional density functions of satellite precipitation for each grid box for 1-yr-long time series using gauge-adjusted ground radar rain fields in the southern Unites States. However, the applicability of this framework to other areas of the globe and to a longer dataset still needs to be investigated. In conclusion, no error estimates are routinely provided for the publicly available products at fine spatial and temporal resolutions that are being used for research and for real-time applications, such as input into quasi-global hydrological models for estimation of global floods (Wu et al. 2012). Consequently, high-time-resolution satellite precipitation products still lack a companion estimate of their associated uncertainty.

The recent series of work by Gebregiorgis and Hossain (2013, 2014) have shown that it is possible, at daily time scales, to estimate the variance of the satellite precipitation error using information such as satellite precipitation rate and geophysical features. The technique they propose is parsimonious in its computation and easy to implement. Gebregiorgis and Hossain have also shown that the method works consistently not only across the United States, but also in other areas of the world in the Middle East, Asia, as well as in the Mediterranean region. However, this approach has limitations. The proposed model adopts a power law model to estimate the error variance, by assuming that the variance is an appropriate estimator of the satellite precipitation error. The model estimates the error variance more accurately when the false alarm and the hit components of the error are dominant, but it performs poorly when the missed precipitation represents a large component of the satellite error.

Background land surface emissivities play the most critical role in satellite rainfall estimation uncertainty. This background emissivity that hinders precipitation discrimination naturally depends on factors like climate, vegetation, and terrain. However, the main objective of our work is to establish a statistical error model with rain rate as the explicit predictor. The influence of other variables, such as geophysical features, will be reflected in the model’s parameters, and these parameters will most certainly vary from region to region. The current paper focuses on the relatively homogeneous Oklahoma region, in order to develop and test a correct and robust single-variable regression model based on the new multiplicative-model suggested by Tian et al. (2013).

Specifically, this study proposes the new framework Precipitation Uncertainties for Satellite Hydrology (PUSH) to provide an estimate of the error associated with high-time-resolution satellite precipitation products at each grid box for each time step. The error is decomposed in four components. When the satellite does not detect precipitation, the ground truth could also measure a zero (correct no-precipitation detection) or measure a value greater than zero (missed precipitation). In the case when the satellite detects precipitation, the reference might record a zero (false alarm) or detect precipitation (hit). The methodology section presents the approach proposed to evaluate each single case. In particular, we aim to define error estimates of one widely used merged dataset of surface precipitation, the TMPA, which is compared to the reference National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC) Unified gauge dataset (CPC-UNI) for model calibration and validation. Once the error estimates are tested and the model extended to other regions of the world, they could be attached to high-resolution satellite precipitation products to allow inferences about their reliability in operational applications.

## 2. Dataset and study area

The satellite precipitation product employed in the study is the TMPA, version 7 (Huffman et al. 2007, 2010). In the TMPA algorithm, TRMM observations [TRMM radar and passive microwave TRMM Combined Instrument (TCI)] are used to adjust passive microwave estimates from the TRMM Microwave Imager (TMI), the Special Sensor Microwave Imager (SSM/I), the Special Sensor Microwave Imager/Sounder (SSM/IS), the Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E), the Advanced Microwave Sounding Unit-B (AMSU-B), and the Microwave Humidity Sounder (MHS). IR information from geosynchronous satellites is calibrated to be consistent with the microwave precipitation estimates and used to fill gaps in the microwave data. Thus, the methodology uses the optimal/best estimate of precipitation for each grid box and ensures that data boundaries between PMW and IR estimates are minimized. Monthly adjustment by Global Precipitation Climatology Centre (GPCC) gauge analyses (Schneider et al. 2008) completes the computation. The TMPA is computed for the entire TRMM period (from January 1998 to present) as the 3B42 product. It has a near-global coverage (50°N–50°S) and is available at the 3-h time scale on a 0.25° grid. The TMPA is available both as postanalysis (3B42-version 7) and in real time (3B42-RT), with the latter calibrated climatologically by the TCI and 3B43. We choose to use the best available product, which is the second reprocessing of TMPA 3B42-version 7 (hereinafter TMPA-V7).

To evaluate the satellite precipitation product, a reference/validation dataset is needed. The CPC-UNI (Higgins et al. 2000; Chen et al. 2008) has been chosen for this purpose, being available at the same spatial resolution as TMPA-V7. Specifically, CPC-UNI is a daily analysis covering the contiguous United States on a 0.25° × 0.25° grid that uses an optimal interpolation objective analysis technique. The CPC-UNI dataset is available from 1948 to 2006, and a real-time version is available from 2007 to present. The real-time analysis uses slightly over 8000 daily gauges, which are quality controlled, and additional non-real-time daily gauge data are received to complete a full repository of over 12 000 station reports. Precipitation values are accumulated from 1200 UTC of the previous day to 1200 UTC of the current day, and the number of gauge records available at each grid point is made available with the data. To match the data as closely as possible, the TMPA-V7 is aggregated to the 1200–1200 UTC daily precipitation from its native 3-hourly amount.

The GPCC gauge data used to adjust TMPA-V7 and CPC-UNI use similar input data over the continental United States but different methodologies to provide precipitation analyses based on station precipitation data. In particular, the GPCC data are corrected for undercatch based on the Legates (1987) climatological corrections. Since such a correction is not routinely applied to the CPC-UNI dataset and could lead to systematic biases, the Legates (1987) climatological corrections have also been applied to the CPC-UNI dataset for this study. These multiplicative correction factors have a magnitude between 1.06 and 1.36 over Oklahoma, with higher correction values during the winter months and in the northwestern area of the study region. During summer, the corrections are close to unity. A discussion about the CPC-UNI gauge analysis error is presented at the end of the next section.

The Oklahoma region has been chosen as the focus area because of its relatively uniform terrain and the dense coverage by in situ rain gauges. A continental climate characterizes this region with cold winters and hot summer seasons. Its topography rises gently to the west from an altitude of 88 m MSL in the southeastern corner to a height of 1515 m MSL on the tip of the “panhandle.” The study domain is mapped onto a Cartesian 0.25° × 0.25° equiangle grid, located in the latitude range of 34.5°–37°N and the longitude range of 94.5°–100°W. The time series spans from 1998 to 2011 (calibration period), with 2012 used for validation purposes.

Figure 1 shows a comparison of the average daily precipitation intensity estimated by the reference gauge analysis and TMPA-V7 over the study region from 1998 to 2012. The 15-yr average precipitation maps show that the western half of the study area is characterized by drier conditions than the eastern half. Some mixed cover and broadleaf forests characterize the eastern region, whereas the remaining area is covered by croplands and grasslands. Moreover, winters, mainly characterized by stratiform precipitation systems, are drier than summers, when strong convective systems are more frequent. A map of the average number of gauges available during the study period is shown in Fig. 1c. No spatial pattern is observed in the gauge density map, which gives confidence that no interpolation bias is present in the CPC-UNI analysis.

## 3. Methodology

In this study, we present the PUSH model to estimate errors in high-time-resolution satellite precipitation products. Specifically, given the satellite observation *x*, the framework provides an estimate of the probability density function (PDF) of the actual precipitation *y* at daily time scale and 0.25° spatial resolution. The satellite error can then be computed as either the difference or the ratio between the satellite *x* and the expected value of the estimated precipitation distribution. The PDF of *y* is modeled differently in each of the following four cases (Fig. 2):

Correct no-precipitation detection: the satellite estimate is less than a threshold th that should be chosen to be close to zero (i.e., 0.1 mm day

^{−1}), and the gauge analysis is also less than th (case 00);Missed precipitation: the satellite estimate is less than th, but the ground reference detects precipitation (case 01);

False alarm: the satellite estimate is greater than or equal to th (i.e., the satellite reports precipitation), but the gauge analysis reports no precipitation (case 10);

Hit: both satellite and gauges detect precipitation; however, they can differ in the measured amount (case 11).

### a. Case 00: Correct no-precipitation detection

When both the satellite and the gauge reference indicate no precipitation, the satellite estimate is assumed to be nearly error-free. The two estimates can differ by an amount no greater than the threshold value that has been set so that the actual precipitation *y*_{00} is modeled as a uniform distribution between 0 mm day^{−1} and th = 0.1 mm day^{−1}:

Thus, in case 00, given a satellite retrieval less than th, the probability of precipitation less than 0, as well as the probability of precipitation greater than or equal to th, are zero.

Additionally, the conditional probability *p*_{00} of having a gauge observation less than the threshold given that the satellite estimate is also less than the specified threshold is evaluated based on the calibration time series (1998–2011) and the reference dataset, and is defined as follows:

During the calibration period, TMPA-V7 and the gauge analysis both measured no precipitation 85% of the grids/days.

### b. Case 01: Missed precipitation

If the satellite estimate is less than the threshold th and the ground observation is greater than or equal to the threshold, the satellite has missed that event. For the study area and period, the probability of missing precipitation (*p*_{01} = 1 − *p*_{00}), given a TMPA-V7 estimate less than th, is equal to 0.15. Past studies have shown how the gamma distribution is particularly suitable for modeling precipitation PDFs, having the ability to reproduce a large range of unimodal distributions (Sapiano et al. 2006; Yan and Gebremichael 2009). For this reason, the estimated PDF of the missed precipitation amount *y*_{01} is modeled using a gamma distribution:

where *a*_{01} and *b*_{01} are the shape parameter and the scale parameter, respectively, and is the gamma function evaluated at *a*_{01}*.* The shape and scale parameters are nonnegative.

### c. Case 10: False alarm

A false alarm is recorded when the satellite estimate is greater than or equal to the threshold but the ground measurement is lower than the threshold. In this case, the error would be exactly the same as the magnitude of the satellite estimate if no threshold was applied and the estimated precipitation would be equal to zero. But, since a threshold is used in this model, a small random component exists and it is modeled as a normal distribution. The estimated reference precipitation *y*_{10} is a linear function of the satellite retrieval *x*:

where are parameters whose values, estimated using an ordinary least squares (OLS) method, are ~ 10^{−2} mm day^{−1}, and are the residuals, assumed to follow a normal distribution with mean *μ*_{10} and standard deviation *σ*_{10 }.

The probability of false alarm conditional to a satellite retrieval greater than or equal to a given threshold is defined as

Previous studies have shown how the probability of false alarm is a function of the satellite precipitation estimate *x* itself (Anagnostou et al. 2010; Gebremichael et al. 2011). Specifically, *p*_{10} decreases monotonically as *x* increases. The exponential decay model was demonstrated to be particularly suitable for the false alarm probability (Hossain and Anagnostou 2006):

where the parameters *A*_{p}, *B*_{p}, and *k*_{p} are evaluated on the basis of the calibration dataset. In Eq. (6), we guarantee that the output remains in the probability simplex by checking the sum of the two calibrated parameters (*A*_{p} + *B*_{p}), which represents the intercept and which therefore corresponds to the highest value that *p*_{10} can assume. If (*A*_{p} + *B*_{p}) is less than 1, then all the *p*_{10} values will be also less than 1. If it is not, then all the *p*_{10} values that result higher than 1 will be automatically set to 1.

### d. Case 11: Hit

Given a satellite estimate greater than or equal to the threshold, there is a probability *p*_{11} that the gauge analysis also records a value greater than or equal to *th*:

However, the two estimates can differ in their amount (hit error). A multiplicative error model is chosen because of its ability to produce superior predictions of the error characteristics compared to a traditional additive model (Tian et al. 2013):

where the random error is a multiplicative factor, and the residual systematic error is defined by *a*_{11} and *b*_{11}. If we perform a natural logarithm transformation, the multiplicative model becomes

Two different approaches have been investigated to estimate the actual precipitation distribution for the hit case. Past studies have assumed the random error *ε*_{11} to be normally distributed (on the log scale) with zero mean and variance *σ*^{2} and have adopted a simple OLS procedure to estimate parameters (Hossain and Anagnostou 2006; Tian et al. 2013). However, Gebremichael and Krajewski (2005) demonstrated that it is unrealistic to assume the satellite precipitation error to be lognormally distributed since the errors of precipitation are still not necessarily symmetrically distributed around zero, even on the log scale. Alternatively, the hit error can be modeled with a gamma distribution (Sapiano et al. 2006). Specifically, the natural logarithm of the conditional mean precipitation given the satellite estimate *x*, that is will be linearly related to the explanatory variable *x*, and the parameters will be estimated using a more flexible regression approach, the generalized linear model (GLM; Nelder and Wedderburn 1972). The GLM is a generalization of OLS linear regression that allows for response variables that have arbitrary distributions. The traditional OLS regression predicts the expected value of a given variable as a linear combination of a set of observations. The GLM allows the linear model to be related to the response variable through a link function, rather than assuming that the response variable itself must vary linearly. Tian et al. (2013) showed that precipitation should be modeled using a multiplicative model, which has been proven to be superior to the additive one in precipitation error estimation. The link function is therefore chosen to be the logarithmic function [see Eq. (9)]. The GLM formulation is the following:

where Gamma[.] is the gamma distribution and represents the chosen model formulation. The estimated precipitation, given the model, is assumed to follow a gamma distribution with scale parameter *μ*_{11} and shape parameter *ν*_{11}, and the logarithm of the mean precipitation is linearly related to the logarithm of the satellite estimate *x* (i.e., the explanatory variable) by parameters . The implication of using the logarithmic link is that the relationship between the explanatory variable and the response variable becomes multiplicative. The shape parameter *ν*_{11} is constant in the original GLM formulation, which would also imply a constant coefficient of variation (i.e., the ratio of the mean and the standard deviation). However, in order to allow more flexibility in the gamma distribution, the shape parameter is defined as a function of the satellite estimate *x*. Specifically, a moving window is used to fit the GLM for different ranges of precipitation, and a shape parameter is assigned to each satellite precipitation magnitude bin. The definition of the two parameters *μ*_{11} and *ν*_{11} refers to the definition given by GLM, and not to the traditional formulation of the gamma distribution.

Figure 3 shows the comparison between the OLS approach and the gamma-GLM approach for the hit case. Figure 3a shows a density scatterplot of CPC-UNI observations (linear scale) versus TMPA-V7 (log scale) for the entire time series. The 25th, 50th, and 75th quantiles of the precipitation distribution estimated using the two different approaches are superimposed. The semilogarithmic scale is chosen because, on one hand, the predictor is the logarithm of the satellite retrieval, but, on the other hand, the model aims to estimate the reference precipitation in the actual space (i.e., mm day^{−1}). The two 75th quantiles evaluated with the OLS and GLM look close to each other, but important differences emerge in the median and 25th quantiles. The OLS model underestimates the reference precipitation, especially at high values of satellite rain rates (>20 mm day^{−1}). As already highlighted by Gebremichael and Krajewski (2005), this behavior degrades the predictive power of the OLS model at higher rain rates. Figures 3b and 3c illustrate the implication of this problem: predicted precipitation from the OLS and GLM techniques are shown versus the CPC-UNI reference. The estimates from the GLM model converge toward the 1:1 line better than the estimates obtained with the OLS approach. As previously noticed, the performance of OLS deteriorates at high rain rates, making it less reliable in the estimation of the error for high precipitation values. Some statistics have been computed to verify what is shown in Fig. 3. The mean of the residuals is 0.01 (5.15) mm day^{−1} using GLM (OLS), and the standard deviation of residuals is 12.0 and 13.1 mm day^{−1}, when using GLM and OLS respectively, showing less dispersion of the residuals around the mean when GLM is implemented. According to the computed statistics and the graphic analysis, it can be concluded that the GLM model outperforms the traditional OLS, especially at high satellite rain rates.

### e. The estimated precipitation distribution

When the satellite estimate becomes available, there are two possible outcomes: either the satellite reports no precipitation (i.e., the value is below a threshold—case 0) or it detects precipitation (case 1). For each of the two cases, the proposed framework can be used to compute an estimate of the reference precipitation distribution and, hence, the error of the satellite estimate. First, to estimate the PDF of *y*_{0} for the case in which the satellite estimate is lower than the threshold th, the two components of the error *y*_{00} and *y*_{01} are combined through a weighted average, where weights are given by the corresponding probabilities of occurrence *p*_{00} and *p*_{01}:

Figure 4a shows the histogram of the uniform distribution that models the no-precipitation detection error *y*_{00}, the gamma distribution that models the missed precipitation *y*_{01}, and the combination of the two *y*_{0}. Table 1 lists the 25th, 50th, and 75th quantiles of the estimated error PDF for case 0.

Second, to provide a formulation of the error when the satellite estimate is greater than or equal to th, the false alarm distribution *y*_{10} and the hit error distribution *y*_{11} are combined into a weighted average with weights *p*_{10} and *p*_{11}:

Figure 4b shows the estimated empirical distribution of actual precipitation for the given satellite rain rate of 2.5 mm day^{−1} (histogram), together with the two single components: the normal distribution of the falsely detected precipitation and the gamma distribution that models the reference precipitation for the hit cases. Unlike case 0, here the quantiles are no longer discrete, but a function of the satellite rain rate, as shown in Table 1 for selected values of satellite precipitation. For example, if the satellite reports 5 mm day^{−1}, the expected value is 6.24 mm day^{−1}, the estimated median is 3.55 mm day^{−1}, the 25th quantile is 0.65 mm day^{−1}, and the 75th quantile is 8.95 mm day^{−1}.

The proposed method inherently treats both the systematic and the random errors. In fact, PUSH is built to reproduce the PDF of the reference precipitation. The error is computed as the ratio (difference on the log scale) between the satellite retrievals and the estimated reference precipitation, and it represents a combination of both systematic and random error. TMPA-V7 is gauge adjusted and so it is expected that the bias error ought to be very near zero, and the error is primarily random. The error can then be evaluated based on comparisons between the gauge analysis and the satellite product. Once calibration is completed based on the 1998–2011 time series (including all the months), the goodness of fit of the model is assessed by using a subset of the calibration period, that is, a dependent period (2004–06 for verification) and an independent period (2012 for validation). The verification scenario tests whether the model and the calibrated parameters are working as intended, whether the model introduces any biases, or whether any other design issue is observed. The verification scenario also sets the standards to assess the error model performance in the validation scenario, when the framework is applied to an independent dataset.

One of the goals of this work is to accommodate different user demands. For instance, a user could require the error as a single estimate (regular user), such as the expected value (or the median) of the estimated reference precipitation. On the other hand, a user might be interested in the whole distribution of the reference precipitation (expert user), and in this case, an estimate of the quantiles of the empirical PDF of *y* would be provided.

The model described so far is based on the assumption that the reference gauge dataset is free from errors. In reality, this is not the case, as shown in several past studies (e.g., Amitai et al. 2012; Villarini et al. 2008; Villarini and Krajewski 2007; Gebremichael et al. 2003). To reduce that error, one could consider only the grid cells that contain gauge data, but a portion of uncertainty would still be present because of the area-point difference. A common method to quantify the rain gauge area-point difference variance is based on the variance reduction factor (VRF), proposed by Rodriguez-Iturbe and Mejia (1974). This index measures the reduction of the uncertainty due to the approximation of the true precipitation with the areal average of point observations (Morrissey et al. 1995; Krajewski et al. 2000; Ciach and Krajewski 2006). All the studies mentioned provide assessments of the gauge errors for specific regions of the world and mostly at the basin scale, which makes it difficult to generalize those estimates to another area with a different temporal and spatial resolution. For this reason, PUSH defines the error as difference (or ratio) between the satellite product and the gauge analysis, with the knowledge that the estimated error is an overestimation of the actual error, which should be reduced by the error in the gauge observations.

## 4. Results and discussion

The PUSH model is calibrated based on a 14-yr-long time series (1998–2011) of TMPA-V7 daily precipitation estimates and CPC-UNI data, used as reference. The framework is then applied to estimate errors in TMPA-V7 and validated using CPC-UNI.

### a. Model calibration and cross validation

The study region is characterized by a continental climate, with hot summers dominated by convective systems and cold winters that are mainly subject to frontal systems. Since seasonality plays an important role in the area, a parameterization that depends on the season might be necessary. To test whether different seasons require a different fitted model, three model calibrations were performed: one using winter months only [December–February (DJF)], another using summer months only [June–August (JJA)], and a third using the full dataset (all months). Satellite precipitation errors are then estimated for winters and summers separately using different calibrations: 1) the correct corresponding calibration parameters (i.e., parameters calibrated using DJF only are applied to winters, and parameters calibrated using JJA only are applied to summers), 2) the reverse-season calibration (i.e., parameters calibrated for the DJF are applied to JJA, and parameters calibrated for the JJA are applied to DJF), and 3) the full dataset calibration (i.e., parameters calibrated based on all months are applied to winters and summers). The scope of this analysis is to verify whether the model calibration needs to be performed by separating the two main seasons or by using all months.

Two common statistical metrics are used to evaluate the performance of all the model calibrations experiments: the correlation coefficient and the root-mean-square deviation (RMSD) between the estimated precipitation and the CPC-UNI observations. These two metrics are widely used to capture the degree of mismatch between an estimate and the corresponding reference (Barnston 1992). Specifically, Entekhabi et al. (2010) showed that the RMSD is highly sensitive to biases, whereas the correlation coefficient is indifferent to any bias in mean or amplitude of variations. Figures 5a and 5b illustrate these metrics for winter and summer seasons by adopting the three different calibration datasets. As expected, the seasonal calibration applied to the corresponding season yields the highest correlation coefficient and the lowest RMSD. However, for both seasons, the difference between the seasonal calibration and the reverse-season calibration, or the full dataset one, is minimal. It can be concluded that the model parameterization does not show an important dependence on seasonality: correlation coefficients are about 0.70, and RMSDs are ~5.0 mm day^{−1}. Therefore, no seasonal separation is needed and the parameters calibrated by considering the full dataset can be applied to any season without considerably degrading the performance of the proposed error scheme. This result also suggests that the calibrated framework can apply in other locations as well.

A cross-validation analysis was computed to verify how results of the model calibration could be generalized to an independent dataset. The entire time series (1998–2012) is partitioned into subsets by removing 1 year each time. Model parameters are then fitted to all the remaining years and applied to the independent year (the one that was removed by the time series). Figures 5c and 5d show the two metrics used previously to evaluate the calibration experiments (correlation coefficient and RMSD). Both metrics show a limited range of variations across the 14 samples. Specifically, the smallest recorded correlation is 0.71 and the largest is 0.78, with a mean of 0.75 and a standard deviation of 0.03 across the different subsets. The minimum RMSD is 4.78 mm day^{−1} and the maximum 7.37 mm day^{−1}, with an average of 5.83 mm day^{−1} and a standard deviation of 0.79 mm day^{−1}. Therefore, the minimum and the maximum values are within the average plus/minus twice the standard deviation. Figures 5e and 5f show the correlation coefficient and the RMSD as a function of the number of years included in the calibration dataset. The correlation coefficient slightly improves by expanding the calibration period, and the RMSD stays almost constant. This shows that the training dataset is sufficient and the calibrated model is stable enough to apply those parameters to future realization of TMPA-V7 to estimate the corresponding precipitation distributions.

In conclusion, it was shown that seasonality does not play a role in the model calibration; therefore, all months are included for calibrating the model parameters. Specifically, the full 1998–2011 time series is used, while the year of 2012 is employed as an independent period for validation purposes. Results from the cross-validation analysis showed that the proposed error scheme and the calibrated parameters can be generalized to future realizations of TMPA-V7, since the model fitting is practically independent of the calibration dataset.

### b. Model performance

A first step to evaluate the performance of the PUSH model is to compare the estimated probability distribution function with the reference precipitation distribution. The left panels in Fig. 6 show gauge precipitation probability density histograms for both the dependent and independent datasets, for TMPA-V7 estimate lower than the threshold th (i.e., case 0). The corresponding modeled distributions are superimposed, and they quite faithfully reproduce the reference precipitation distributions both in terms of shape and magnitude. Specifically, the expected value of the reference precipitation is estimated to be 0.35 mm day^{−1}, which is exactly equal to the observed mean during the calibration period (Table 1). During both dependent and independent periods, PUSH slightly underestimates (overestimates) the error distribution at low (high) values of satellite precipitation, as shown by the top panel in Fig. 6c, which reports differences between estimated and observed PDFs, normalized by the observed probability density. This is due to the fact that, during the verification (2004–06) and validation (2012) periods, missed events are less frequent than during the full calibration time series, that is, the spike is smaller and the tail is larger than during 1998–2011. As a matter of fact, during the independent and dependent period, the precipitation mean for case 0 is 0.25 mm day^{−1} and 0.22 mm day^{−1}, respectively, which is smaller than the expected value of the modeled reference precipitation (0.35 mm day^{−1}). It is worth noting that this component of the proposed error scheme, which assesses errors when no precipitation is detected by the satellite, is an improvement on the traditional multiplicative models, which give an error estimate of zero every time zero precipitation is recorded (Reichle et al. 2007).

Figure 6 also shows cases in which the satellite estimates are larger than the threshold (i.e., case 1) during the dependent (Fig. 6a) and the independent period (Fig. 6b). The probability distribution functions are shown for the satellite rain rates: 2.5, 5, 10, and 20 mm day^{−1}. It can be noted how the model is able to reproduce shape and magnitude of the observed error histograms for all precipitation ranges. In particular, the empirical density function adequately models the spike (between zero and th) due to the false alarm component, as well as the tails of the distribution. The false alarm component decreases in intensity going to higher satellite retrievals: when *x* = 20 mm day^{−1}, the spike is no longer visible. Fig. 6c, which reports differences between estimated and observed precipitation PDFs, normalized by the reference probability density, shows no systematic difference in the model performance during the dependent (gray histograms) and independent period (black histograms). This substantiates the hypothesis that the calibrated parameters, based on the 13-yr dataset, can be applied to future realizations of TMPA-V7 without degrading the error estimate. Table 1 corroborates what was observed in Fig. 6, by listing the numerical values of means and quantiles of the *estimated* precipitation distributions together with corresponding means and quantiles of the *observed* distributions for all the cases shown in Fig. 6. As expected, the estimated distributions are very close to the observed values during the calibration period, both in terms of expected values and quantiles. When the dependent period is considered, the estimated distributions are close to the observed precipitation distribution, with no systematic under- or overestimation by the model, as also shown by Fig. 6c. Analogously, estimated and observed values are similar in magnitude during the independent period.

As mentioned above, PUSH aims to answer diverse users’ needs. The expert user, interested in the error distribution, can be given a full set of quantiles to build the empirical PDF of the estimated reference precipitation. Meanwhile, the regular user can be provided with a single estimate of the error, but the question is whether the expected value or the median should be conveyed. The minimum mean square error (MMSE) estimator is chosen as the criterion to answer this question. Since the expected value minimizes the mean square error by definition, the regular user will be most likely to prefer an estimate of the expected value rather than the median. The RMSD between the CPC-UNI data and the estimated expected value (or median) is computed to evaluate the errors that the user would encounter if using the median in place of the mean. During the dependent period (2004–06), the RMSD is 5.27 and 5.54 mm day^{−1} when considering the mean and median, respectively. During the independent period (2012), the RMSD is equal to 5.27 mm day^{−1} when using the expected value and 5.50 mm day^{−1} when using the median of the distribution.

An additional analysis to evaluate the model performance is shown in Fig. 7, where scatterplots of estimated quantiles versus observed quantiles are presented for both the verification and validation period. For the independent period, less data are available (365 points) and a larger scatter is observed, in comparison to the dependent period (3-yr time series). However, in both cases, most of the points are distributed around the 1:1 line, with no consistent under- or overestimation, which demonstrates that the model does not introduce any substantial bias. Moreover, Fig. 7 proves the ability of PUSH to reproduce the reference precipitation PDF, showing a good agreement between the quantiles of the estimated and observed distributions. In other words, the proposed error model is asymptotically (or statistically) valuable, even though it may not be perfect at each single time/location.

The PUSH framework provides an estimate of the reference precipitation *y*, from which the satellite error can be computed as either the difference or the ratio between *x* and *y*. To evaluate the estimate of the satellite error produced by the proposed model, Fig. 8 shows a density scatterplot of observed errors, defined as the ratio between TMPA-V7 and CPC-UNI (percentage error), as a function of the satellite estimate for both the dependent and independent period. The expected value and the modeled quantiles (25th, 50th, and 75th) are superimposed. The plots in Fig. 8 show the relationship that, given the satellite observation, provides an estimate of the percentage error, by using both mean and quantiles of the estimated reference precipitation probability distribution. The expected value lies on top of the most populated area of the scatterplots during both the verification and validation periods. This corroborates what has already been demonstrated: the expected value is a good estimate of the percentage error for the regular user, who is interested in a single estimate of the satellite error. Moreover, most of the points are bounded by the estimated 25th and 75th quantiles, which confirms that the proposed framework is able to reproduce the PDF of the reference precipitation and, therefore, the PDF of the satellite error.

### c. Case studies

Figures 9 and 10 summarize the results of tests for two precipitation events, one during the dependent period (4 July 2005) and one during the independent period (4 February 2012). These two events were intentionally chosen during two different seasons (summer and winter), but randomly picked among all the events, which presented a wide range of daily precipitation values across the study area. Specifically, events with recorded values >50 mm day^{−1} were chosen in order to verify the ability of the model to capture errors at high satellite rain rates. The top panels in the figures show daily precipitation maps of the reference CPC-UNI (left), the TMPA-V7 product (middle), and the mean of modeled reference precipitation distribution (right). The bottom panels show the observed error (left), computed as difference between CPC-UNI and TMPA-V7; the estimated error (middle), computed as difference between the CPC-UNI and the mean of the modeled distribution; and a scatterplot of estimated versus observed errors (right). In both precipitation events, the proposed model adequately reproduces the pattern of the error. The scatterplots also show a good agreement between observed and estimated errors, with some overestimation of the error at low values of satellite rain rates. As previously noted, because of the way it was conceived, the error model is statistically correct, but it may not be perfect at each single day and pixel. The relative RMSD (i.e., the ratio of RMSD to the mean reference precipitation) between observed and estimated precipitation for the case of 4 July 2005 is 0.49 and the correlation coefficient is 0.84. For the event on 4 February 2012, the relative RMSD is 0.53, whereas the correlation coefficient is 0.65. Both the visual analysis and the computed statistics illustrated that there is no remarkable difference in the model performance during the two precipitation events, intentionally chosen during two different seasons (summer and winter), which indicates a good performance of the model that is independent from the season.

It is important to note that the PUSH framework always assigns an error, that is, at each pixel and at each time step, even when the satellite reports 0 mm day^{−1}. As stated previously, this is different (and more accurate) than the traditional purely multiplicative error models, which would assign an error equal to zero every time the satellite reports no precipitation (Reichle et al. 2007). Although a multiplicative error structure is parsimonious in the input parameters and numerically convenient, it cannot holistically represent the significant precipitation detection uncertainties in satellite retrievals (Hossain and Anagnostou 2006; Maggioni et al. 2011). Furthermore, previous studies have shown that the contribution of missed events to total precipitation can be significant, especially toward higher latitudes and within the subtropical high-pressure regions (Behrangi et al. 2012). An additional aspect, that is often neglected in traditional rainfall error modeling but that is taken into consideration by the proposed model, is the false rain detection or false alarm. On 4 February 2012, TMPA-V7 detects some precipitation in the southwestern part of the study region (two pixels), where the reference measures a zero (Fig. 10). The proposed error framework estimates errors larger than zero to model the false alarm, with patterns that well reproduce the false alarms on the actual error map.

## 5. Summary and conclusions

The proposed PUSH framework generates error estimates at each grid box (0.25°) and time step (1 day) for the multisatellite precipitation product TMPA-V7. In particular, the model proposes a methodology to estimate the PDF of reference precipitation given the satellite observation, and the error is then computed as difference (or ratio) between the satellite product and the estimated reference. We separate the satellite precipitation error into four cases and propose a different error modeling approach for each of those: 1) case 00 is correct no-precipitation detection, where both the satellite and gauge estimates are less than the threshold th; 2) case 01 is missed precipitation, where the satellite estimate is less than th but the reference detects precipitation; 3) case 10 is false alarm, where the satellite estimate is ≥th, but the gauge analysis measures no precipitation; and 4) case 11 is hit, where both satellite and gauges detect precipitation, but they might differ in the estimated amount. The estimated error is a weighted combination of all four possibilities. Model calibration and validation are based on comparison with the gauge CPC-UNI analysis over the Oklahoma region.

As seasonality plays an important role in Oklahoma, an analysis was performed to investigate whether a model parameterization dependent on the season was necessary. Results showed that there was no considerable difference in adopting a full-year calibration (i.e., by including all months) versus separating the two main seasons (winter and summer). Therefore, model parameters were calibrated by considering the full 1998–2011 dataset. Moreover, a cross-validation analysis showed that the model fitting is nearly independent of the calibration dataset and can be generalized to independent datasets. Given the TMPA-V7 satellite estimate, the PUSH framework proposed in this study can be used to estimate any reference precipitation distribution once model parameters are calibrated based on available ground data.

This work has been conceived to accommodate different user demands: the regular user, who requires the error to be attached to the satellite precipitation product as one single estimate (i.e., the expected value), and the expert user, interested in the full distribution of the error distribution (i.e., a set of quantiles of the reference precipitation PDF). Results from this study are promising, as the estimated probability distributions are able to reproduce the PDFs of the benchmark precipitation (the CPC-UNI analysis) in terms of both expected values and quantiles of the distribution. Additionally, density scatterplot of observed errors, as a function of the satellite estimate, were analyzed by superimposing the expected value and the modeled quantiles (25th, 50th, and 75th). The expected value lies on top of the most populated area, and most of the points are bounded by the estimated 25th and 75th quantiles, which demonstrates that the expected value and the presented quantiles are good estimates of the error for the regular and the expert user, respectively.

Two case studies were selected to test the viability of the PUSH model for specific precipitation events. The pattern of the error was adequately reproduced by the proposed framework, and good agreement between the observed and estimated errors was observed. The model was also able to capture both missed precipitation errors and false detection uncertainties. The power of this model is its comprehensive approach that includes false alarms and missed events, too often neglected in traditional error models, and whose contribution to the total error could be significant.

The novel technique described here will benefit the scientific community, particularly since missed events and false alarms are of great importance when satellite precipitation products are used in a stochastic ensemble-based representation to simulate floods through hydrological models. Unlike previous satellite rainfall error models, such as SREM2D, the PUSH model does not require features of the error (bias, spatial correlation, false alarm probability, etc.) as input parameters. Given the satellite retrieval, the system provides an estimate of the reference precipitation PDF, which can be sampled for possible realizations of the reference observation. The necessity of an ensemble prediction system for flood forecasting and the ability of error corrections to improve runoff simulations driven by satellite precipitation have been proved in several studies (Maggioni et al. 2013; Verbunt et al. 2007; Atger 2001). An estimate of the error that takes into account false detection uncertainties and missed events, like the one proposed here, has the potential to enhance flood and flash flood forecasting (Pappenberger et al. 2005; Gouweleeuw et al. 2005).

The proposed error model is not as simple as exponential probability family models, which make Bayesian data fusion and data assimilation more tractable, but it is more realistic in reflecting the error characteristics. Recent studies, such as Nearing et al. (2013), demonstrated that more representative error models could improve data assimilation systems as well as produce more realistic model predictions. In addition, more generalized techniques, such as particle filtering (Doucet et al. 2000) have recently been developed to handle nonconventional error distributions, and the PUSH model should not pose insurmountable difficulties in this regard. Furthermore, the proposed error framework could find application in land data assimilation systems (LDAS). The quality of the assimilation estimates has been demonstrated to depend on the realism of the error estimates associated with the model (e.g., model formulation, forcings, and parameters). Arguably, the way in which model errors are currently handled in LDAS is very simplistic. This is particularly true for precipitation errors, whose multidimensional character at fine space and time scales is typically ignored in standard assimilation systems (Maggioni et al. 2012). PUSH could be used to perturb the forcing precipitation fields to generate equally probable realizations of precipitation. Thus, this improved error modeling technique may enhance the efficiency of data assimilation systems.

Future work on PUSH will look at other regions of the world, characterized by more complex terrain and different climatology, and at different satellite precipitation products and higher time resolutions (i.e., 3 h). In particular, variables such as topography and climate type will be investigated, as they have been shown to have an impact on the estimate of the satellite precipitation error (Gebregiorgis and Hossain 2013). Moreover, this framework will be parameterized (by surface, convective regime, etc.) for use over both land and ocean and to provide estimates of the error that could be attached to the TMPA-V7 precipitation estimates.

Future work should also investigate the applicability of this model to satellite precipitation products that are biased with respect to the gauge analysis, such as the TMPA 3B42-RT and its skill in estimating systematic errors. Moreover, the approach developed here could be a key contribution to the forthcoming National Aeronautics and Space Administration (NASA) Global Precipitation Measurement (GPM) mission (http://pmm.nasa.gov/GPM; Hou et al. 2008).

This methodology will be of particular use in regions of the world where gauges are sparse, since satellite retrievals represent the only available precipitation estimate on which hydrological applications (e.g., flood forecasting) and water resources management can rely. Future work will evaluate how model parameters calibrated over one region can be translated to regions with similar characteristics that have sparse or no rain gauge coverage.

## Acknowledgments

CPC US Unified Precipitation data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, United States, from their website (http://www.esrl.noaa.gov/psd/). TRMM Product 3B42 (V7) data were provided by NASA Precipitation Processing System (PPS) from their website (ftp://trmmopen.pps.eosdis.nasa.gov/trmmdata/).

## REFERENCES

*Extended Abstracts, Fifth Global Precipitation Measurement (GPM) Int. Planning Workshop,*Tokyo, Japan, Japan Aerospace Exploration Agency. [Available online at http://www.eorc.jaxa.jp/GPM/ws5/en/materials/6.8.3_abstract_Arkin.pdf.]

*Hydrol. Earth Syst. Sci.,*

**9,**365–380, doi:.

**110,**D06115, doi:.

*Precipitation: Advances in Measurement, Estimation and Prediction,*S. Michaelides, Ed., Springer-Verlag, 131–170.

*J. Hydrometeor.,*

**8,**38–55, doi:.

*Satellite Rainfall Applications for Surface Hydrology,*Springer, 3–22.

*Drought Assessment, Management, and Planning: Theory and Case Studies,*D. A. Wilhite, Ed., Springer, 31–47, doi:.

*Climatology of Global Precipitation.*Publications in Climatology, Vol. 40, University of Delaware, 85 pp.

*Bull. Amer. Meteor. Soc.,*

**86,**1567–1570, doi:.

*J. Hydrometeor.,*

**14,**1194–1211, doi:.

*J. Climate,*

**19,**4154–4166, doi:.

*J. Geophys. Res.,*

**113,**D11102, doi:.

*Tropical Rainfall Measurements,*J. S. Theon and N. Fugono, Eds., A. Deepak, 377–385.

*J. Hydrometeor.,*

**13,**1268–1284, doi:.

*Atmos. Res.,*

**92,**481–488, doi:.