Precipitation estimates from reanalyses and satellite observations are routinely used in hydrologic applications, but their accuracy is seldom systematically evaluated. This study used high-resolution gauge-only daily precipitation analyses for Australia (SILO) and South and East Asia [Asian Precipitation—Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE)] to calculate the daily detection and accuracy metrics for three reanalyses [ECMWF Re-Analysis Interim (ERA-Interim), Japanese 25-yr Reanalysis (JRA-25), and NCEP–Department of Energy (DOE) Global Reanalysis 2] and three satellite-based precipitation products [Tropical Rainfall Measuring Mission (TRMM) 3B42V6, Climate Prediction Center morphing technique (CMORPH), and Precipitation Estimation from Remotely Sensed Imagery Using Artificial Neural Networks (PERSIANN)]. A depth-frequency-adjusted ensemble mean of the reanalyses and satellite products was also evaluated. Reanalyses precipitation from ERA-Interim in southern Australia (SAu) and northern Australasia (NAu) showed higher detection performance. JRA-25 had a better performance in South and East Asia (SEA) except for the monsoon period, in which satellite estimates from TRMM and CMORPH outperformed the reanalyses. In terms of accuracy metrics (correlation coefficient, root-mean-square difference, and a precipitation intensity proxy, which is the ratio of monthly precipitation amount to total days with precipitation) and over the three subdomains, the depth-frequency-adjusted ensemble mean generally outperformed or was nearly as good as any of the single members. The results of the ensemble show that additional information is captured from the different precipitation products. This finding suggests that, depending on precipitation regime and location, combining (re)analysis and satellite products can lead to better precipitation estimates and, thus, more accurate hydrological applications than selecting any single product.
The accuracy of precipitation estimates to a great extent determines the accuracy of hydrological model outputs (Fekete et al. 2004; Fernandes et al. 2008; Voisin et al. 2008; Pan et al. 2010; Getirana et al. 2011; Van Dijk and Renzullo 2011; Yong et al. 2012). Gridded precipitation analysis based on gauging can be of dubious quality in areas where gauge or radar networks do not exist or are sparse, for example, in much of the tropics. Several precipitation estimates derived from satellite data or modeled through retrospective weather forecast model analysis (reanalysis) provide estimates that are independent from gauge networks. Both types of precipitation estimates have being increasingly used in hydrological applications [e.g., for reanalysis (Dedong et al. 2007; Li et al. 2009; Yan et al. 2010; Miguez-Macho and Fan 2012) and for satellite (Shrestha et al. 2008; Behrangi et al. 2011; Khan et al. 2012; among many others)].
Previous studies evaluating reanalyses and satellite precipitation estimates in areas with dense gauge or radar coverage suggest that convective precipitation (more typical of warmer seasons and lower latitudes) is better characterized by satellite precipitation, whereas frontal system precipitation (more typical of cooler seasons and higher latitudes) is better characterized by reanalysis (e.g., Gottschalck et al. 2005; Ebert et al. 2007; Ruane and Roads 2007; Tian et al. 2009; Sapiano and Arkin 2009; Vila et al. 2010). Estimates from these products can be very different, particularly over tropical areas with high precipitation (Bosilovich et al. 2008; Tian and Peters-Lidard 2010). The incorporation of rain gauge data to correct magnitudes and frequencies can reduce total errors and bring the intensity distribution for heavy precipitation closer to the gauge data (Ebert et al. 2007). It is also noted that more recent reanalyses have improved precipitation estimates for tropical areas, although notable biases still exist (Betts et al. 2006, 2009; Bosilovich et al. 2008; Uppala et al. 2007).
The above summary suggests that reanalysis and satellite datasets can be complementary. This would be particularly relevant in areas where adjustments are difficult or impossible because of the scarcity of rain gauge networks. The aims of this paper are to 1) evaluate and compare daily satellite and reanalysis precipitation estimates routinely used in large-scale hydrologic model applications against precipitation analysis based on dense ground networks in Australia and South and East Asia and 2) evaluate and compare a depth-frequency-adjusted ensemble mean of the products (see definition in section 2 below). The performance metrics are chosen to establish which precipitation product performs best for detection (occurrence) and estimation accuracy for daily precipitation (i.e., how close to the observed magnitude and/or frequency) in three subdomains with different precipitation regimes. Section 2 introduces the reanalyses, satellite, and evaluation precipitation datasets and performance metrics. Section 3 presents results of the performance evaluation experiments. Section 4 discusses the results and draws conclusions.
2. Data and methodology
Three recent reanalysis precipitation datasets with global coverage are considered in this paper: 1) the National Centers for Environmental Prediction–Department of Energy Global Reanalysis 2 (NCEP–DOE; Kanamitsu et al. 2002), 2) the European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis Interim (ERA-Interim; Dee et al. 2011), and 3) the Japanese 25-yr Reanalysis (JRA-25; Onogi et al. 2007). These reanalyses build and improve on earlier reanalysis versions by improving the forecasting model physics and incorporating new satellite and other conventional data.
Also included are three quasi-global satellite-based precipitation products that combine multiple microwave and infrared sensors: 1) the bias-corrected Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (Huffman et al. 2007) 3B42V6, which uses monthly gauge observations to scale precipitation estimates; 2) the Climate Prediction Center (CPC) morphing technique (CMORPH; Joyce at al. 2004); and 3) the Precipitation Estimation from Remotely Sensed Imagery using Artificial Neural Networks (PERSIANN; Sorooshian et al. 2000).
An ensemble of the six products was derived by calculating the simple mean daily precipitation and adjusting it to the depth-frequency distribution function of gauge-only daily precipitation analyses (used as evaluation data) by mapping the full spatiotemporal distribution of the ensemble estimates to that of the gauge analysis. In other words, if prob* = prob(Pens,i) is the probability of the ensemble mean precipitation Pens on day i, then the depth-frequency-adjusted precipitation estimate Padj is Padj = Pobs(prob*), with Pobs being the gauge analysis time series used. The adjustment is performed using all daily data for the grid cells selected from the gauge analysis products; consequently, the ensemble is not fully independent of the evaluation data.
The evaluation is performed at a daily temporal scale using two high-resolution gauge-only daily precipitation analyses available in Australia (SILO; Jeffrey et al. 2001) and South and East Asia [Asian Precipitation—Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE; Yatagai et al. 2012)]. All data were resampled to 1° resolution (simple averaging) as a compromise between the spatial resolutions of the different products (satellite estimate resolution data were 0.25°, whereas reanalyses ranged from 0.7° to 2.5°). Only grid cells with a density of more than one gauge per 500 km2 were considered (see Fig. 1 for location of the grid cells). The common period for all data was 2003–07, and the time series of each precipitation product had less than 5% of days with no data. A threshold of 1 mm day−1 was used to discriminate between “rain” and “no rain” in order to eliminate very light intensity “drizzle” that does not significantly contribute to daily precipitation but could have an undue impact on detection metrics. To account for differences in precipitation regime, the geographical domain was divided into three regions: southern Australia (SAu), mostly dominated by synoptic system precipitation during austral winter; northern Australasia (NAu), mostly dominated by convective precipitation during summer; and South and East Asia (SEA), mostly dominated by monsoon precipitation.
First, precipitation bias error estimates on annual and monthly time scales are computed following Adler et al. (2012). The standard deviation σ of the six products is used as a measure of the bias error. The dispersion among the product estimates captured in σ showcases the different physical assumptions and nature of both satellite and reanalyses precipitation retrievals. Subsequently, detection and accuracy metrics were computed for each grid cell. Every day in the estimated and gauge analysis was classified following Ebert et al. (2007) as a hit (H, observed precipitation correctly detected), miss (M, observed precipitation not detected by product), or false alarm (F, precipitation detected but none observed). The probability of detection, POD = H/(H + M), gives the fraction of precipitation occurrences correctly detected (range 0–1 and a perfect score of 1). The false alarm ratio, FAR = F/(H + F), gives the wrongly detected precipitation (range 0–1 and a perfect score of 0). The frequency bias, FB = (H + F)/(H + M), gives the ratio of the estimated to observed precipitation frequency (range 0–∞ and a perfect score of 1). The equitable threat score (ETS), used as an overall performance metric, gives the fraction of precipitation that was correctly detected, adjusted for correct detections (He) that would be expected because of random chance: ETS = (H − He)/(H + M + F − He), where He = (H + M)(H + F)/N and N is the total number of estimates (range −⅓–1, a perfect score of 1 and 0 indicating no skill).
Accuracy metrics used were correlation r, root-mean-square difference (RMSD), and a precipitation intensity proxy, namely, the percentage difference of the ratio of monthly precipitation amount to the total number of days with precipitation (MPDR). Both detection and accuracy metrics were mapped for spatial patterns and examined. Results were also stratified by season to assist in interpretation. Finally, monthly and subdomain aggregated time series were plotted to detect any evidence for drifts or step changes.
The mean annual precipitation of the six precipitation products used here is shown in Fig. 1a. The measure of the bias error, the mean annual precipitation standard deviation σ of the products, is shown in Fig. 1b. As expected, higher σ values occur in grid cells with higher precipitation, with the highest values (>1500 mm yr−1) occurring in grid cells located in the intertropical convergence zone (ITCZ), particularly in insular Southeast Asia. The mean annual σ in SEA is 554 mm yr−1, whereas it is 324 and 170 mm yr−1 in NAu and SAu, respectively. The months of January and July are used as an example of monthly bias. The ITCZ moves southward, and during January grid cells in northern Australia have the highest precipitation (>200 mm month−1) and σ (>100 mm month−1) (Figs. 1c,d). The mean January σ in SEA is 26 mm yr−1, whereas it is 76 and 15 mm yr−1 in NAu and SAu, respectively. In July, the ITCZ shifts northward, and many grid cells in SEA are affected by the Asia–Pacific monsoon, with higher July precipitation occurring in grid cells in Japan, Nepal, southern China, and Southeast Asia (>400 mm month−1) (Fig. 1e). Higher July σ (>100 mm month−1) is observed not only in these grid cells, but also in southwest Australia and Tasmania (Fig. 1f).
Figure 2 shows percentage frequency of exceedance curves for the six products (for daily precipitation >1 mm), the simple ensemble mean, and the depth-frequency-adjusted ensemble (data are aggregated over the whole geographical domain). All satellite products have lower frequencies than the reference for mean precipitation depths <10 mm day−1, whereas reanalyses agree reasonably well (Fig. 2a). The exceptions are NCEP–DOE, which exceeds reference depths almost across the range, and the simple ensemble mean, in which the simple averaging of all products enhances light precipitation depths (Fig. 2a). Conversely, only the bias-corrected TRMM 3B42V6 and the depth-frequency-adjusted ensemble show good agreement for mean precipitation depths >50 mm day−1 (Fig. 2b). Not surprisingly, the depth-frequency-adjusted ensemble shows this good agreement across the whole precipitation depth range.
In terms of ETS computed for the full time series, ERA-Interim performed best in SAu and parts of NAu close to SAu (Fig. 3a). NCEP–DOE performed best in parts of western and southern Australia, and JRA-25 performed best in most of Japan and South Korea. CMORPH and TRMM performed best in Southeast Asia. Results in continental Asia were mixed, with satellite products and the ensemble performing best in the tropics and reanalyses performing best in midlatitudes. From June to August (JJA), satellite data and the ensemble performed best in most of continental and southeast Asia and in areas in Japan most affected by the monsoon (Fig. 3b). JRA-25 and the ensemble generally performed best in December–February (DJF), except in insular Southeast Asia, where satellite products outperformed reanalyses and the ensemble (Fig. 3c).
Box plots in Fig. 4 highlight the superior detection performance of reanalyses for all geographical subdomains, with the exception of JJA (monsoon) in SEA (Figs. 4a–c). ERA-Interim performs better than satellite data in NAu during DJF. The ensemble shows performance somewhat intermediate to both product types. Seasonal variation in performance was not observed in SAu, but there was an improvement in CMORPH and TRMM ETS during DJF in NAu (Fig. 4b). In SEA, ETS for ERA-Interim and JRA-25 were higher than satellite, except for JJA, where CMORPH and TRMM were better (Fig. 4a).
In terms of accuracy metrics, the spatial results did not show clear seasonal variations; thus, results are presented for all months combined. For r, JRA-25 performed best in most of southeast and parts of southwest Australia, whereas NCEP–DOE did so in parts of southern Australia, Tasmania, and southwest Australia (Fig. 5a).
ERA-Interim performed best in north Australia, whereas a combination of satellite and the ensemble performed best in the tropics. Box plots for all months show that r for ERA-Interim in both SAu and NAu were better than for satellite precipitation, with higher r for NAu (Fig. 6a). The ensemble performed best in Nepal, close to the coastline in China, and in part of Japan. JRA-25 and ERA-Interim had better performance in inland north China and also in some parts of Japan. For SEA, mean r is substantially higher (0.62) than in SAu (0.17) and NAu (0.35), with TRMM being superior and CMORPH comparable to JRA-25 and ERA-Interim (Fig. 6a). The ensemble outperformed the other products in all subdomains.
ERA-Interim had the lowest RMSD in most of SAu and NAu (Fig. 5b). In China, satellite data generally performed best close to the coastline and reanalyses in the north, the ensemble in Nepal, and JRA-25 in most of Japan and South Korea. Box plots show that RMSD was slightly lower for ERA-Interim and JRA-25 in all subdomains. Errors in NCEP–DOE were systematically higher than the other datasets, whereas the ensemble RMSD was comparable to the best results in all subdomains (Fig. 6b).
MPDR results in SAu and NAu were mixed, but overall, the best performer was JRA-25, followed by the ensemble and ERA-Interim (Figs. 5c, 6c). Results were mixed in Japan. In China, satellite products and the ensemble generally performed best inland, whereas JRA-25 did so close to the coastline (Fig. 5c). Generally (besides NCEP–DOE), all products had less that 20% difference with observed MPDR with the exception of CMORPH and PERSIANN in SEA, which underestimated MDPR by 22% and 23%, respectively (Fig. 6c). ERA-Interim and PERSIANN systematically underestimated and TRMM and NCEP–DOE systematically overestimated MPDR (Fig. 6c).
Time series of monthly averaged ETS over the whole domain showed some seasonal variation, with an increase roughly during JJA and DJF, and a step change to reduced ETS for PERSIANN precipitation after 2005 (Fig. 7a). No clear patterns are evident for reanalysis data. The same step change in PERSIANN is present in the r time series, with again no obvious patterns for the other precipitation datasets (Fig. 7b). An analysis of PERSIANN FAR and POD over the subdomains revealed that an increase in false detections in SAu and a decrease in correct detections in SAu and SEA were the cause for the step change (not shown). This likely affected r, but only for small precipitation depths, as RMSD and MPDR appear not much affected. For all products, RMSD time series showed an increase in errors during JJA and less so during DJF, with NCEP–DOE having the largest errors (Fig. 7c). NCEP–DOE and, surprisingly, the bias-corrected TRMM produced high positive MPDR values through the analysis period; CMORPH mostly produced positive values, and the rest of the datasets mostly produced low negative MPDR values.
Table 1 shows the product ranking for detection and accuracy metrics over the whole geographical domain and for all months. The depth-frequency-adjusted ensemble mean outperformed both satellite and reanalyses for most metrics. Among individual products, JRA-25 outperformed the others in most metrics, but its high FB suggested that it tends to over predict precipitation occurrence. CMORPH agreed better with observed MPDR; however, this is possibly because of compensating underprediction in SEA and overprediction in SAu and NAu (Fig. 6c).
4. Discussion and conclusions
Three reanalyses (ERA-Interim, JRA-25, and NCEP–DOE) and three satellite-based precipitation products (TRMM 3B42V6, CMORPH, and PERSIANN) were systematically evaluated, along with a depth-frequency-adjusted ensemble of the products, against analysis data in relatively well gauged areas in Australia and South and East Asia. Large bias errors (in terms of standard deviation of the products) indicated areas in which choice of precipitation estimates used in hydrologic applications should be carefully considered. Bias errors were large in some areas of high precipitation, such as the ITCZ, and also in high latitudes during winter months (southern Australia and Tasmania).
Analysis of precipitation ETS showed that reanalyses generally outperformed satellite precipitation estimates in all subdomains, except for JJA in SEA, that is, the months affected by the Asia–Pacific monsoon (Fig. 4a) (Wang and LinHo 2002). This was expected because of the better capability of satellites to detect convective precipitation. The seasonal patterns observed in SAu are consistent with those reported by Ebert et al. (2007) and are attributed to the capabilities of reanalyses to capture synoptic precipitation (Fig. 4c). Reanalysis ETS in NAu outperformed satellite on an annual basis, and surprisingly, ERA-Interim was better than satellite precipitation during DJF and JJA. Ebert et al. (2007) attributed the better performance of reanalysis in NAu during JJA to remnant frontal systems brought in from midlatitudes or orographic lifting of moist ocean air during this season. Additional cause for better reanalysis performance in JJA and DJF may be due to the many grid cells in NAu close to SAu (Fig. 3a). Similar results to those of ETS were observed for r, where the ensemble showed an equal or superior performance. Its ETS values were between both types of products because of the lower POD of satellite products especially during winter months (Tian et al. 2009). RMSD was similar for all products, with the exception of NCEP–DOE, which had a higher RMSD (particularly in NAu and SEA). NCEP–DOE also had higher positive MPDR than the other products. Large positive precipitation biases in the tropics have been reported for NCEP–DOE in other studies as well (Fekete et al. 2004; Bosilovich et al. 2008; Getirana et al. 2011). Surprisingly, although gauge-scaled, TRMM had systematically higher MPDR values in all geographical domains, and its RMSD was comparable to that of other satellite products. It has been argued that monthly scaling can propagate errors over space and time and that these may be reflected in RMSD (Gao and Liu 2012). In addition, a climatological undercatch correction is applied to TRMM (Huffman et al. 2007; Su et al. 2008), which is not present in precipitation analysis used for evaluation herein. Satellite product precipitation under or overestimation appeared to be location dependent (e.g., Ebert et al. 2007; Nesbitt et al. 2008; Romilly and Gebremichael 2011; Vernimmen et al. 2012), even for the gauge-scaled TRMM 3B42V6 product (Nair et al. 2009; Stampoulis and Anagnostou; 2012). Demaria et al. (2011) found that there was no clear gain of TRMM 3B42V6 over satellite products that are not bias corrected for precipitation exceeding 30 mm day−1. They also showed that TRMM 3B42V6 would not necessarily improve estimates in areas with sparse gauges or if scaling introduces gauge data noise. Furthermore, because of data provider policies, it is not possible to know if some of the gauges used to calibrate TRMM 3B42V6 are also part of the analysis data used here (Scheel et al. 2011).
Over all months combined and over the whole geographical domain, reanalysis outperformed satellite data on detection metrics and agreement metrics. Our results did, however, confirm the strength of satellite data in detecting and estimating convective precipitation.
By combining reanalyses and satellite products in an ensemble, known strengths of both retrieval systems resulted in a reduction of system-specific and random errors (e.g., Bosilovich et al. 2009). Issues associated with simple averaging of the products, such as a large bias in precipitation area and a corresponding reduction in mean and maximum precipitation depth (Ebert 2001), were addressed using a procedure that adjusts the probability distribution of the ensemble to the observed precipitation depth frequency. Although the depth-frequency-adjusted ensemble is not fully independent of the evaluation data, our results provide strong evidence that the inclusion of gauge information is valuable by adjusting both high and low precipitation depths. The dependence was limited, since the adjustment was performed over the whole geographical domain rather than by region or even by grid cell. An adjustment by subdomain or climate type could well improve estimates even further.
The authors gratefully acknowledge funding from the National Water Commission and Microsoft Research. We are also grateful for the assistance and/or correspondence to the providers of satellite and reanalyses data. Tim Raupach from CSIRO Land and Water and Beth Ebert from the Bureau of Meteorology are also thanked for reviewing the manuscript and for providing valuable comments and suggestions. Tim also provided formatted Australian gauge and satellite data later used in this study.
Current affiliation: Fenner School of Environment and Society, College of Medicine, Biology and Environment, Australian National University, Canberra, ACT, Australia.