The last several years have seen the development of a number of new satellite-derived, globally complete, high-resolution precipitation products with a spatial resolution of at least 0.25° and a temporal resolution of at least 3-hourly. These products generally merge geostationary infrared data and polar-orbiting passive microwave data to take advantage of the frequent sampling of the infrared and the superior quality of the microwave. The Program to Evaluate High Resolution Precipitation Products (PEHRPP) was established to evaluate and intercompare these datasets at a variety of spatial and temporal resolutions with the intent of guiding dataset developers and informing the user community regarding the error characteristics of the products. As part of this project, the authors have performed a subdaily intercomparison of five high-resolution datasets [Climate Prediction Center morphing (CMORPH) technique; Tropical Rainfall Measuring Mission Multisatellite Precipitation Analysis (TMPA); Naval Research Laboratory (NRL) blended technique; National Environmental Satellite, Data, and Information Service Hydro-Estimator; and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN)] with existing subdaily gauge data over the United States and the Pacific Ocean. Results show that these data are effective at representing high-resolution precipitation, with correlations against 3-hourly gauge data as high as 0.7 for CMORPH, which had the highest correlations with the validation data. Biases are relatively high for most of the datasets over land (apart from the TMPA, which is gauge adjusted) and ocean, with a general tendency to overestimate warm season rainfall over the United States and to underestimate rainfall over the tropical Pacific Ocean. Additionally, all the products studied faithfully resolve the diurnal cycle of precipitation when compared with the validation data.
The measurement and analysis of precipitation presents multiple challenges for describing and understanding its role in the climate system. Although precipitation is a vital element of the global hydrological and energy cycles, it is discontinuous and highly variable in both space and time and is intimately associated with the changes of phase of water between vapor, liquid, and solid. Comprehensive coverage of the entire surface of the earth can only be obtained with observations from meteorological satellites. This capability, combined with the need for near-global precipitation data, has led to numerous attempts to estimate precipitation from remotely sensed information and to use the estimates in large-scale analyses. However, remote sensing of precipitation from satellites relies for the most part on inferring amounts from observing properties of cloud tops in visible or infrared imagery, or from the effects of raindrops or large ice particles on microwave radiation. Such inferences can be even more difficult, and large and complex errors are typical. Satellite-derived estimates also suffer from gaps in temporal sampling. The most successful space-based precipitation products at present are based on combinations of infrared and microwave observations (Ebert et al. 2007).
There are two broad categories of satellite precipitation estimates: 1) those based on geosynchronous infrared (IR) measurements and 2) those derived from polar-orbiting microwave observations. The geosynchronous nature of the IR observations permits a high sampling frequency (full globe scans are currently available every 15 min), but their relationship to precipitation is indirect—outgoing longwave radiation is used to infer the position, and cloud-top temperature of cloud masses and precipitation rates are inferred from these attributes (Arkin and Meisner 1987). Microwave imagers and sounders afford a more direct inference of precipitation, although they suffer from poor temporal sampling—most polar-orbiting satellites provide a near-complete full scan of the earth fewer than once per day. Reviews of methodology and specific examples of estimation of precipitation are provided by Arkin and Ardanuy (1989) and Levizzani et al. (2007). The best precipitation datasets for scientific applications requiring broad, complete, and consistent coverage in space and time have been merged IR/microwave (MW) records, such as the Climate Prediction Center (CPC) Merged Analysis of Precipitation (CMAP; Xie and Arkin 1997a) and the Global Precipitation Climatology Project (GPCP; Huffman et al. 1997; Adler et al. 2003) analyses. Initially, these datasets were available as 2.5° monthly and pentad (Xie and Arkin 1997b; Xie et al. 2003) versions. These analyses are relatively coarse in time and space but have proven extremely useful for a wide range of climate studies, enabling comprehensive descriptions of oceanic precipitation variability for the first time. A 1° daily version of the GPCP (Huffman et al. 2001) has been produced that uses data from the Tropical Rainfall Monitoring Mission (TRMM).
More recently, several new merged high-resolution (at least 0.25° and three hourly) precipitation products have emerged. In general, these datasets strive to use the high-quality (but poorly sampled) microwave estimates in tandem with the high-frequency IR data, with a variety of approaches for combining the MW with the IR. An ad hoc international collaboration called the Program to Evaluate High Resolution Precipitation Products (PEHRPP) was established under the auspices of the International Precipitation Working Group (Turk et al. 2008) to evaluate, intercompare, and validate the various high-resolution precipitation algorithms currently available and is intended to build on previous intercomparison studies, such as The WetNet (Dodge and Goodman 1994) Precipitation Intercomparison Projects (PIPs; Barrett et al. 1994; Kniveton et al. 1994; Smith et al. 1998; Adler et al. 2001) and the Global Precipitation Climatology Project (GPCP) Algorithm Intercomparison Projects (AIP; Arkin and Xie 1994; Ebert et al. 1996; Ebert and Manton 1998).
Data produced from five high-resolution precipitation product (HRPP) algorithms are used in this study: the TRMM Multisatellite Precipitation Analysis (TMPA), the CPC morphing technique (CMORPH), the National Environmental Satellite, Data, and Information Service (NESDIS) Hydro-Estimator, the Naval Research Laboratory (NRL) blended technique, and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN). In particular, we will examine the performance of an ensemble HRPP derived from the original products available for this study. We will also compare the performance of numerical weather prediction model forecasts to see whether there is any indication that they could be used as an estimate of precipitation in the regions examined.
Short algorithm descriptions are provided in section 2. The high-resolution precipitation products will be evaluated with respect to several 3-hourly rainfall observation datasets from gauges that are discussed in section 3. Results from the intercomparison of the HRPPs are presented in section 4, with results from model data in section 5. Finally, conclusions and discussion are in section 6.
2. High-resolution products
The five high-resolution gridded datasets used in this study have a spatial resolution of at least 0.25° latitude/longitude and a temporal resolution of at least every three hours. Table 1 lists these datasets and summarizes their inputs. For data at higher resolutions, simple totals are taken over a standard grid and over standard 3-h periods as defined by the data providers, although the definition of the 3-h periods used by TMPA differs from that used by the others by 1.5 h, which is accounted for by using the same averaging period for the validation data. All of the HRPPs use roughly the same inputs as described in Table 1, and most combine passive microwave (PMW) data from the Special Sensor Microwave Imager (SSM/I), TRMM Microwave Imager (TMI), Advanced Microwave Sounding Unit (AMSU), and Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E) PMW, with precipitation estimates obtained using the Goddard profiling (GPROF; Kummerow et al. 1996) algorithm, except for AMSU, which uses an operational algorithm based on Zhao and Weng (2002) and Weng et al. (2003). This is augmented by the higher-frequency estimates from geosynchronous IR estimates. Most HRPPs use the CPC merged IR data of Janowiak et al. (2001).
The TMPA (also known as 3B42; Huffman et al. 2003; Huffman et al. 2007) is a gauge-adjusted combination of two interim products: the PMW and the PMW-calibrated IR. The PMW data are first calibrated using the combined TMI and precipitation radar (PR) product and then used to calibrate the IR input. The PMW and IR are thus considered comparable with each other and are combined by using the PMW data where available and IR elsewhere. The TMPA is unique among the HRPPs, in that a gauge correction is applied over land. The 3-hourly PMW/IR estimate is summed to monthly resolution and merged with the monthly gauge data following the techniques described in Huffman et al. (1997). This ratio of this product and the nongauge-adjusted monthly average is used to define scaling factors between 0.2 and 2, which are used to scale the three-hourly real-time data and obtain the 3B42 version-6 precipitation estimates. Note that a real-time version of TMPA (referred to here as TMPA-RT) is also available without the gauge correction and is often referred to as 3B42-RT. The highest resolution of the TMPA (3B42) data is three hourly at 0.25° spatial resolution between 60°N and 60°S, and the record starts in January 1998.
A number of recent studies have used the TMPA for different applications. Hong et al. (2006, 2007) discussed the establishment of a real-time system for the prediction of landslides based on real-time precipitation estimates from the TMPA and surface characteristics related to landslide susceptibility. Curtis et al. (2007b) used the TMPA to investigate daily extremes associated with ENSO. Harris et al. (2007) used the real-time IR component of the TMPA for flood prediction over a single river basin in Kentucky. Several studies have focused on the regional validation of the TMPA against other sources. Katsanos et al. (2004) compared a single year of the TMPA-RT (which lacks the gauge adjustment) with a 12-hourly rain gauge network over the Mediterranean region. They found that the TMPA-RT tended to overestimate the gauge values, particularly for heavier rainfall amounts. Curtis et al. (2007a) examined the total precipitation from the TMPA associated with Hurricane Floyd over three river basins in North Carolina versus that obtained from a gauge network and radar estimates and found that the TMPA overestimated the total precipitation of extreme events, although the overestimation was much smaller for very heavy events. Villarini and Krajewski (2007) compared a single 0.25° grid box of the TMPA with a dense rain gauge network over Oklahoma. They found that the TMPA had higher correlations with the gauge data during the warm season but tended to underestimate low precipitation values.
The CMORPH (Joyce et al. 2004) is constructed from similar inputs as TMPA, but the original product is created on an 8-km grid at half-hourly time resolution. The different PMW records are calibrated to match the TMI using a histogram matching technique. At times and locations when PMW data are unavailable, the PMW estimates are propagated/interpolated using motion vectors derived from the IR data (see Joyce et al. 2004). A consequence of this combination method is that the analysis does not rely on the IR data for rainfall estimates. CMORPH data is available at the half-hourly, 8-km (0.7277°) spatial resolution between 60°N and 60°S from December 2002. The 3-hourly, 0.25° averages are also distributed and are used in this study.
The Hydro-Estimator (Scofield and Kuligowski 2003) is based on the NESDIS Auto-Estimator algorithm described in Vicente et al. (1998). In the Hydro-Estimator, pixels are defined as raining if their value is below the average value for the surrounding area. A standard rainfall distribution derived from more than 6000 collocated radar and satellite pixels is adjusted according to the difference between the pixel and the surrounding area, so that the highest rain rates are assigned to areas that are coldest relative to their surroundings. Atmospheric information from NWP model output is used to make two adjustments: 1) the estimation of precipitation from stratiform events is improved using model-estimated precipitable water to adjust the rain-rate curve based on moisture availability and 2) relative humidity to derive a bulk bias adjustment. The highest resolution of this data is every 15 min, with a 4-km spatial resolution between 60°N and 60°S. The record starts in 2003 over the United States only; global estimates are only available from 2008 onwards, so comparisons are limited to stations over the United States for this study. It should also be noted that the algorithm has been continuously improved; however, no reprocessing of older data has been performed, so that older data is likely to be of lower quality than data for more recent periods.
The NRL blended technique (NRL blended; Turk and Miller 2005) is constructed by calibrating the PMW to the PR between 40°S–40°N and the SSM/I outside this range. Then histogram matching is used to make the IR consistent with the PMW data. Winds of 850 hPa are used from the Navy Operational Global Atmospheric Prediction System (NOGAPS) along with topographic information to correct for orographic effects. Then the PMW and calibrated IR are combined according to a weighted mean at each grid box. Weights are constant for the PMW, whereas weights for the IR data are smaller when closer to a PMW overpass. The record starts in January 2003, although there are missing periods (notably September–December 2003). As with the Hydro-Estimator, reprocessing of the historical series has not been carried out, so the older parts of the dataset were produced with older versions of the algorithm and might be of lower quality.
The PERSIANN (Hsu et al. 1997; Sorooshian et al. 2000) dataset uses a neural network technique to estimate rainfall rates from IR data. The neural network is calibrated with TMI, SSM/I, and AMSU data and used to obtain precipitation estimates using the Janowiak et al. (2001) IR data as a basis. Surface type is also included in the neural network, and an adaptive process is used to iteratively adjust network parameters based on the error of the output compared to observations. Data are produced globally at a 4-km resolution for every 30 min and were aggregated to 0.25°, 3-hourly resolution for this study. Sorooshian et al. (2000) compared precipitation estimates from PERSIANN (trained with TMI data) with gauges and radar (as well as several TRMM products) between 30°S–30°N and 90°E–30°W. Over land, they report relatively high correlations against gauges and radar when aggregated to coarse resolutions (1° and 5° grids). PERSIANN also was used by Gupta et al. (2002) to predict flash floods over the United States. They used a combination of remotely sensed precipitation, runoff, land surface, and model data. They found the satellite data were useful in the identification of features, but they do not have adequate lead time required for flash flood forecasting from large, convective thunderstorm systems.
Obtaining suitable validation data for subdaily precipitation is challenging because most standard observations are daily at best. The first issue is that suitable data must be at least three hourly, although higher time resolutions are preferred because the start times of the HRPPs are not uniform; for example, the TMPA measurements are centered on 0000 UTC, 0300 UTC, and so on, which is 1.5 h offset from the other products, and higher-resolution validation data allows for the calculation of different 3-h periods that are coincident with all the HRPPs. A second issue is that the validation data were required to have undergone quality-control checks before being used. This is crucial for subdaily data because they are subject to many sources of error, which are normally eradicated through averaging for lower-frequency time scales. Most datasets with sufficient quality control and subdaily sampling cover a very limited period, which is a problem for precipitation validation because there is high natural variability (noise), and results can be variable when taken from short periods. Therefore, at least half-hourly quality-controlled gauge data are required over several years. Matched HRPP series were constructed for each gauge location by combining the surrounding four grid points using bilinear interpolation. Statistics are presented using these 3-hourly matched series.
Applying these principles, two precipitation validation datasets were employed in this study: 1) precipitation from the Atmospheric Radiation Measurement Program (ARM) sites over the southern Great Plains (SGP) and from the Tropical Atmosphere Ocean (TAO)/Triangle Trans-Ocean Buoy Network (TRITON) Autonomous Temperature Line Acquisition System (ATLAS) II buoy gauges over the tropical Pacific Ocean. The locations of the validation gauges used in this study are shown in Fig. 1. Both of these validation datasets are available at a sub 3-hourly resolution and have been aggregated to 3-hourly totals corresponding to the periods available from the high-resolution precipitation products. Figure 2 shows the data coverage time series for each of the sites used in the study. We required sites to have at least one year of data, although all of the SGP sites have data for nearly the whole 3.5-yr study period, and most of the TAO buoy data have two years or more of coverage.
The SGP site was established in early 1992 and was the first field site under the ARM project (Stokes and Schwartz 1994; Ackerman and Stokes 2003). The facility is made up of a number of sites in Oklahoma and Kansas that were chosen for their homogeneity. Most of the sites have surface measurements including precipitation from tipping-bucket rain gauges measuring every minute that were aggregated to give a half-hourly accumulation that was downloaded for use in this study.
The TAO/TRITON network is maintained by the NOAA Pacific Marine Environmental Laboratory (PMEL) and provides a range of real-time and archived data from a set of moored buoys in the tropical Pacific Ocean, primarily used for monitoring El Niño–Southern Oscillation (ENSO) conditions (Hayes et al. 1991). In addition to collecting information on oceanographic variables, many of the buoys also record meteorological variables, including precipitation. Serra et al. (2001) describe the instrument and processing procedures required for these data. A further correction to remove noise in the data was applied based on the work of Huffman and Lehman (2006), who suggested using only values over a threshold of 2 mm h−1 to remove some residual noise from the 10-minute data. A second correction for gauge undercatch also was applied. Serra et al. (2001) summarized some of the factors that might lead to errors in the TAO buoy gauges and found that the only significant problem was undercatch as a result of wind effects around the gauge. They discussed a number of studies that examine the gauge undercatch issue and ultimately suggested using the correction outlined by Koschmieder (1934), which corrects the precipitation amount according to a polynomial relationship with wind speed.
Two further datasets are included in the analysis in the same manner as the HRPPs. First, the National Centers for Environmental Prediction (NCEP) stage IV (Baldwin and Mitchell 1998; Lin and Mitchell 2005) radar–gauge product is used over the United States. The stage IV data is available on a 4-km grid from January 2002, so results are shown for the native grid as well as from an aggregated version on a 0.25° grid. Second, to evaluate the PEHRPP hypothesis that NWP forecast precipitation might be a useful estimate of precipitation in some cases, forecast precipitation from the NCEP Global Forecast System (GFS) NWP model was been obtained and compared to the gauge observations. The model is run every 6 hours, and the 12- and 15-h precipitation forecasts are used to avoid model spin-up issues. GFS model data are available globally at 1° resolution starting in March 2004. It is expected that this coarser resolution might leave the GFS at a slight disadvantage compared to the HRPPs because of the increased spatial smoothing, therefore the HRPPs are degraded to 1° resolution for the comparison with GFS precipitation.
4. Results from product comparisons
A number of other comparisons of the HRPPs have been conducted (Gottschalck et al. 2005; Brown 2006; Ebert et al. 2007; Ruane and Roads 2007; Tian et al. 2007), although none have examined the performance of the datasets over the oceans, and most have not assessed the subdaily performance. Here, we focus almost exclusively on the subdaily time scale although some results for daily data are shown for reference.
Comparisons between the SGP gauges and the high-resolution datasets were performed separately for 6-month warm and cold seasons (October–March and April–September). The high-resolution data were matched with the validation data and any missing data were removed (as was the case with the TAO buoy data, see Fig. 2), and the correlation of the three-hourly pair for each site was obtained. Figure 3 shows box plots of these correlations between each three-hourly SGP gauge and the matched 3-hourly HRPP series (e.g., there are 16 SGP stations, hence the box plots depict 16 three-hourly correlations). The box plots show the distribution of the correlations over all sites, and the mean correlation is included as a reference. Overall, the HRPPs are better correlated with the validation data in the warm season (Fig. 3b) than in the cold season, (Fig. 3a) as shown by the generally higher mean values and smaller spread of correlations. There are also minor differences in the skew of the distributions, but these are very small compared to the spread of correlations. (CMORPH has the highest 3-hourly correlations in both seasons, and NRL blended has the lowest mean correlations.) In the cold season (Fig. 3a), TMPA, the Hydro-Estimator, and PERSIANN are similarly correlated, with the validation data with mean values around 0.45 compared with about 0.55 for CMORPH. In the warm season, however (Fig. 3b), the correlations for TMPA exceed those for the Hydro-Estimator and PERSIANN and are almost as large as those for CMORPH, indicating greater skill for convective precipitation over the Great Plains, a phenomenon also observed by Villarini and Krajewski (2007) over Oklahoma. Perhaps, unsurprisingly, the spread of the correlations is somewhat similar for each of the HRPPs, with the exception of the Hydro-Estimator in winter. This general similarity suggests that differences between the HRPPs and the gauge data are systematic over these geographically close sites and that they are similar to each other.
The circles in Fig. 3 indicate the mean of the correlations between daily accumulations of the SGP gauges and the HRPPs, and the crosses indicate the mean of the correlations based on the 3-hourly accumulations. In all cases, the daily correlations are higher than those obtained from the 3-hourly data, although the improvement is smaller in the warm season when more short-lived convection is dominant. These daily results are similar to those obtained by Ebert et al. (2007) and Tian et al. (2007).
The stage IV data is included as a benchmark, and correlations are shown for estimates at 0.25° and 4-km spatial resolution. The latter exhibits excellent agreement with the validation data and has mean correlations of around 0.8 in both seasons. Correlations are slightly lower for the 0.25° resolution at around 0.7 in the both seasons, which shows the effect of the larger averaging area. The daily analysis of Tian et al. (2007) showed that correlations against gauge and radar data for the whole United States were highest east of the Rocky Mountains and were slightly lower over the West Coast during the warm season. In the cold season, they showed that daily correlations in the midwest and southeast United States were still high, but they were extremely low in the west and at all points north of about 40°N. These three-hourly results are consistent with those of Tian et al. (2007) and show that the HRPPs are still inferior to the stage IV data at this resolution.
Figure 4 shows similar box plots of 3-hourly correlations but with the TAO/TRITON buoys as the validation data. Note that the Hydro-Estimator has not been processed over this area for this period and is excluded. There were a number of remaining issues in using these data, the most significant of which is that the gauges cover a large area and are quite inhomogeneous. They have, therefore, been split into two groups: gauges west and east of 150°W, as indicated by the dashed line in Fig. 1. This split was arbitrarily chosen, as it is approximately in the middle of the gauges, but the rainfall characteristics on either side are quite different. Figure 1 includes the mean annual precipitation over the study period from the GPCP (Adler et al. 2003) 2.5°-resolution dataset and shows that locations west of 150°W generally receive more precipitation than locations east of 150°W. As with the SGP gauges, the TAO buoy gauges show that CMORPH generally exhibits the highest correlations of the HRPP on both sides of the Pacific and NRL blended lags behind the other HRPPs with a far larger spread. West of 150°W (Fig. 4a) the spread of correlations is small, reflecting the homogeneous nature of these gauges. There is a larger spread of correlations east of 150°W (Fig. 4b), and some of these even have correlations close to zero. The heterogeneity of the sites east of 150°W is because of the mix of sites receiving heavy rainfall under the ITCZ and those receiving very little rainfall under the subtropical high-pressure zone south of the equator. Once again, the daily correlations (indicated on the figure by circles) are higher than the 3-hourly correlations, and those for the TMPA are as high as those for CMORPH, indicating the TMPA has higher daily skill (even in the absence of the gauge correction). This suggests the improvement in skill is related to issues in the detection of 3-hourly events, which are remedied by the smoothed daily resolution.
The validity of the gauge undercatch correction is unknown, so it is important to compare the results using the correction with those obtained from the data without the undercatch correction. Triangles in Fig. 4 show the mean of the correlations against the TAO/TRITON buoy estimates without the undercatch correction. Generally, similar results are found with the uncorrected data and the CMORPH, NRL blended, and PERSIANN datasets. However, the TMPA now yields higher 3-hourly correlations that are of a similar magnitude to those obtained with CMORPH. The exact reasons for this are unclear, but the improvement is likely to be in the estimation of amounts under windy conditions because the undercatch correction does not change the probability of precipitation and is larger under windy conditions.
Of all of the HRPPs considered in this study, only TMPA has a gauge bias correction (applied as a monthly-mean correction over land only). The rest of the datasets simply rely on the satellite estimates of precipitation amount. The efficacy of this bias adjustment is clear in both seasons from Fig. 5, which shows that the TMPA performs as well as the gauge-adjusted stage IV data and has less spread in the warm season. There is still some spread in the gauge-adjusted datasets because the SGP gauges were not used in the gauge adjustment and also as a result of the differences between point and area-averaged estimates already discussed. Studies by Katsanos et al. (2004), Gottschalck et al. (2005), and Harris et al. (2007) all found biases in the unadjusted TMPA, so it is clear that the gauge correction is of great benefit.
In the cold season (Fig. 5a), CMORPH, NRL blended, and PERSIANN have small positive biases mostly below 50% of the mean (which is relatively low for noisy satellite estimates). However, these three datasets have large positive biases in the warm season (Fig. 5b) as a result of the overestimation of convective events. In particular, CMORPH overestimates precipitation at all sites by between 50% and 175%, despite having the highest 3-hourly correlations with these data. Conversely, the U.S.-only version of the Hydro-Estimator has a small positive bias in the warm season (with a median value of 25%) and a larger bias than the other datasets in the cold season (median ∼40%). This may reflect that the Hydro-Estimator was constructed for flood nowcasting and is tuned for accuracy in the summer months. Clearly, the bias correction of the TMPA offers a significant advantage compared to the other datasets, implying that this sort of correction be considered where possible for all HRPPs. These results are similar to those obtained for the whole of the United States by Tian et al. (2007), who showed that the CMORPH summer bias extends over much of the central United States but also found a slight negative bias in the winter for CMORPH (particularly over the West Coast).
Figure 6 shows the bias of the HRPPs compared to the TAO buoys with the wind undercatch correction (triangles indicate the mean percentage bias without the correction). The TMPA (which is not gauge corrected over the ocean) and CMORPH both underestimate precipitation by about 25% in the west Pacific (Fig. 6a) and by 40%–80% in the east Pacific (Fig. 6b). Distinctly different results are obtained for NRL blended and PERSIANN; both underestimate precipitation in the east Pacific, but NRL blended has a smaller underestimate of precipitation in the west Pacific and PERSIANN has a near-zero median bias. Both NRL blended and PERSIANN have far larger spreads of percentage biases in the west Pacific than TMPA and CMORPH.
Figure 7 is similar to Fig. 6, but it shows absolute values of the bias rather than percentage biases. As would be expected, the broad patterns are very similar. However, the mean bias over all sites is somewhat smaller in the east Pacific, and the percentage bias is inflated because of the small mean value at most of the sites. There is still substantial spread in Fig. 7b as a result of some sites in the east Pacific being located under the ITCZ, which has higher mean rainfall and is more disposed to higher biases. This is illustrated by Fig. 8, which shows the absolute biases from the east Pacific sites plotted against the mean at each site for each HRPP. There is a very strong relationship, with larger (smaller) biases occurring at sites with larger (smaller) mean values for all HRPPs, which indicates that the spread in Figs. 6b and 7b is due to the heterogeneity of the sites rather than errors with the HRPPs or the buoy gauges. The two squares in the top left represent small positive biases in NRL blended at two of the sites. These small values occurred at locations with a particularly small mean rainfall. When expressed as percentage biases, they become extremely large values and cause the strange behavior of NRL Blended seen in Fig. 6b.
The triangles in Fig. 6 show the median percentage bias of the TAO buoy gauges without the undercatch correction. In the east Pacific, similar results are obtained regardless of the use of the undercatch correction because the rainfall amounts are generally smaller than in the west and thus have less influence. However, in the West Pacific (Fig. 6a), the median percentage biases are all much smaller than in the east. For CMORPH and TMPA, the median percentage bias in the west Pacific is near zero, with a similar spread as obtained from the corrected data. In the case of NRL blended and PERSIANN, the median percentage bias is greater than zero, suggesting that they overestimate western Pacific precipitation.
Bowman (2005) used the TAO gauges to evaluate the long-term mean rainfall rates from the TMI as well as the TRMM PR and found that the TMI was in good agreement with the corrected gauges, whereas the PR underestimated the corrected gauges. Despite this result, it is likely that the bias in the HRPPs comes from the PMW sensors because the IR is tuned to match the PMW data, and several of the datasets match estimates to TMI.
The results show that the HRPPs underestimate precipitation in the east Pacific, but the conclusion is less clear in the west Pacific (under the ITCZ and the South Pacific Convergence Zone). There are some questions regarding the adequacy of the gauge correction, but there is little doubt that some correction is required, even if the magnitude of the correction is uncertain. Given this, both TMPA and CMORPH seem to underestimate precipitation in the west Pacific, and PERSIANN and NRL blended might be closer to the buoys, although it should be noted that NRL blended has a larger spread of percentage bias values than the other datasets.
c. Other statistics
Several statistics based on contingency tables are commonly used in the verification of precipitation. To do this, the data are reduced to binary form, with an event defined as a nonzero 3-hourly rainfall total. Here, we show the probability of detection (POD), the false alarm ratio (FAR) and the Heidke skill score [HSS; refer to Jolliffe and Stephenson (2003) for further details].
Figure 9 shows the POD, FAR, and HSS for the SGP sites for the cold and warm seasons. The POD gives the number of correctly identified events as a proportion of the total number of observed events. The FAR gives the number of events falsely identified as raining as a proportion of the total number of identified events (both correctly and incorrectly identified). A high POD and a low FAR are desirable. The TMPA has a lower POD and lower FAR than the other HRPPs in both the warm and cold seasons, which is indicative of fewer total raining events (in simpler terms, there were fewer to get wrong but also fewer to get right). Of the other HRPPs, CMORPH has a high POD in both seasons but also has the highest combined FAR with PERSIANN and the Hydro-Estimator. The values of the FAR are similar in each season for all HRPPs, but the POD is generally higher during the convective warm season. This suggests the HRPPs do indeed capture these events, even if they tend to overestimate the rainfall (as in Fig. 5b). Interestingly, the 4-km stage IV product has a slightly lower POD than the 0.25° version, reflecting issues with spatial sampling at such high resolutions.
The HSS gives an estimate of the number of correctly identified hits or misses as a proportion of the total number of events. This “proportion correct” estimate is adjusted to use a random forecast as a baseline: an HSS of unity indicates a perfect forecast, and an HSS of zero indicates a forecast that is no better than random chance. The Hydro-Estimator gives the highest HSS in the cold season, which is surprising given the skill exhibited in the warm season. The TMPA gives the highest in the warm season and is slightly superior to the stage IV data at 0.25° resolution. The other HRPPS have broadly similar HSS.
Over the tropical Pacific (Fig. 10), there is much higher spread in the POD, FAR, and HSS than observed over the SGP sites, which reflects heterogeneities caused by the large area covered by these data. As with the SGP sites, the TMPA has lower POD and FAR in the west Pacific but is more in line with the other HRPPs over the east Pacific, although there are fewer rain events in the east Pacific. The differences between the HRPPs are surprisingly small for POD, FAR, and HSS. The daily statistics (an event is a rainy day) are far superior for POD and FAR, showing the advantage of averaging. However, the daily and 3-hourly estimates have very similar HSS because they are both assessed relative to random chance.
d. Representation of the diurnal cycle
The diurnal cycle of precipitation is currently poorly represented by numerical models and is unavailable from most gauges because they lack subdaily measurements. This is, therefore, a crucially important trait for the HRPPs because they provide one of the few possible sources of real data for the global diurnal cycle. Janowiak et al. (2005) studied the mean seasonal diurnal cycle of CMORPH and pointed out that IR estimates alone are not well suited to the study of the diurnal cycle as a result of the lag between the detection of clouds and the occurrence of rainfall at the surface. Sorooshian et al. (2002) evaluated the diurnal cycle of PERSIANN (which is IR based but trained with PMW data) against gauge, radar, and other data sources in the tropics and found lags in the tropics of no more than 1–2 h. However, they also admitted that contamination from cold anvil cirrus might degrade the performance of the PERSIANN with respect to the mean performance and the diurnal cycle. Additionally, Hong et al. (2005) presented a version of PERSIANN that was adjusted using TMI data to enhance the accuracy of the diurnal cycle. They reported improvements in the time lag of the diurnal cycle from PERSIANN reporting time lags of 2–3 h without the adjustment and 1–2 h with the adjustment as well as improvements in bias.
Passive microwave estimates also have some difficulties in estimating the diurnal cycle, although these are more direct measurements so the lag is expected to be shorter. Sanderson et al. (2006) studied methods for detecting the diurnal cycle using TRMM data (because this satellite has a precessing orbit that makes it suitable for the assessment of the mean diurnal cycle without any other sensors given a sufficiently long averaging time). They found that significant differences can be found between algorithms, and that a careful approach to combination is required.
Figure 11 shows the mean seasonal diurnal cycle averaged over all sites for the cold and warm seasons for the SGP sites. The diurnal cycle is quite flat in the cold season (Fig. 11a) with low rainfall values, and all of the HRPPs capture this pattern. There is some sign that the Hydro-Estimator has a higher mean value than the gauges, as was seen in the percentage bias summaries (Fig. 5a). In the warm season, there is a strong diurnal cycle, and all HRPPs capture the basic shape of the cycle quite accurately with a nocturnal maximum as would be expected in this region (Balling 1985). Both the TMPA and the Hydro-Estimator provide very accurate estimates of the amplitude of the diurnal cycle. In the case of the TMPA, this is probably due to the monthly correction. Assuming that the TMPA overestimates the warm season precipitation in a similar fashion to the other datasets, which preferentially use passive microwave estimates, the scaling factors would reduce all 3-hourly estimates within the summer months and would hence reduce the amplitude of the estimate. For the Hydro-Estimator, this is probably related to tuning for flood forecasting. CMORPH, NRL blended, and PERSIANN all overestimate the amplitude of the warm-season diurnal cycle, which is consistent with their large positive biases (Fig. 5b). There is some indication that all HRPPs, aside from the Hydro-Estimator, are slightly out of phase with the gauges (and the stage IV estimates). NRL blended is the worst of these and seems to be several hours out of phase, which might explain its lower 3-hourly correlations. This possible lag requires further investigation using hourly or half-hourly data, which is beyond the scope of this investigation.
e. Prospects for ensembles of existing datasets
The satellite products being evaluated in this study generally use similar inputs, so differences between them are most likely related to the way in which they combine the different datasets. One of the prime goals of PEHRPP is to explore ways in which the HRPPs can be improved—perhaps by combining different techniques or even products. The latter would be of use if differences between the HRPPs were systematic rather than simply random error. For example, the Hydro-Estimator uses only IR data to estimate precipitation, whereas CMORPH uses only PMW data for its estimate (the IR is only used to morph the PMW). If the information contained in these two datasets is sufficiently different, then a simple combination might yield extra overall skill in the same way as is obtained by monthly 2.5°, merged datasets, such as GPCP (Adler et al. 2003) and CMAP (Xie and Arkin 1997a). Figure 12 shows the mean correlation for each month for CMORPH and the Hydro-Estimator over all SGP sites. CMORPH usually has higher correlations against the validation data, although there are months when the Hydro-Estimator is superior. Additionally, the Hydro-Estimator had a lower bias in the summer months (without the use of gauge corrections), although it performed relatively poorly in the winter with lower correlations and a higher absolute bias than the other datasets.
As an example, CMORPH and the Hydro-Estimator are relatively dissimilar, and it is possible that some combination of these two datasets could yield an improvement compared to the Hydro-Estimator. Figure 12 also shows the mean monthly correlation of a mixture of the Hydro-Estimator and CMORPH. CMORPH was used to identify the occurrence of precipitation for each 3-hourly period at each of the validation sites. Then the precipitation amount was taken directly from the Hydro-Estimator for the 3-hourly periods with rain. This approach simply applies a rain/no rain mask, derived from CMORPH, to the Hydro-Estimator. The use of this mask slightly improves the correlation of the Hydro-Estimator with the validation data, but it still has lower correlations in most places than the original CMORPH data. One notable exception is during February 2006, which was a very dry month in which the mixture gives a higher correlation. This improvement in skill is probably mostly due to a reduced probability of precipitation in the combined dataset. Both CMORPH and the Hydro-Estimator tend to overestimate the probability of precipitation (not shown).
Our results indicate that differences between the HRPPs are minor and that they are somewhat interdependent. A consequence of this is that the combination of existing techniques is unlikely to lead to noticeable improvement. This does not imply that the application of different techniques would lead to no skill. The TMPA, for example, currently uses a simplistic merging technique and might benefit from a more complex technique, such as the weighted combination employed by NRL blended. However, because the datasets all have similarly high correlations against the validation data, it seems unlikely that a combination of techniques would lead to more than only minor improvements in performance.
5. Performance of GFS model data
One PEHRPP hypothesis posits that a possible alternative or complement to using satellite precipitation estimates with high time and space resolution is to use atmospheric model forecasts of precipitation, which can be produced on fine time and space scales. Such forecasts are derived from atmospheric observations of temperature, pressure, winds, and moisture using the physical laws governing the behavior of the system, and thus might be considered quasi observations. Studies have shown that in some circumstances, model-derived precipitation may be more accurate than any routinely observed or estimated values (Serreze et al. 2005). Several of the validation efforts organized by the IPWG and included in PEHRPP regularly evaluate NWP forecasts in addition to satellite-derived estimates (Ebert et al. 2007). Additionally, Gottschalck et al. (2005) found that daily model-based estimates of precipitation performed well over the United States compared with satellite data and gauges during the cold season, although satellite estimates (including TMPA and PERSIANN) were slightly superior in the warm season on the daily time scale and they speculated that the satellite estimates would be superior at subdaily time scales. Numerous practical issues must be confronted before any such use can be made; however, among them, the question of whether the skill of model forecasts is comparable to satellite-derived estimates in situations such as those discussed in this paper.
Although a full evaluation of the performance of model data is beyond the scope of this study, we will examine the performance of 3-hourly precipitation forecasts from the GFS model from March 2004 to September 2006 in comparison to the performance of the HRPPs as an example of model performance.
Figure 13 shows correlations and percentage bias of the GFS precipitation with the SGP data along with similar summaries for TMPA, CMORPH, and stage IV during the same restricted period and 1° resolution. The GFS precipitation has lower correlations than the TMPA, CMORPH, and stage IV in both warm and cold seasons. In the cold season (Fig. 13a), the GFS has a mean correlation of around 0.3 compared to ∼0.4 for CMORPH. The circles indicate the mean correlation obtained using daily data and show that the GFS outperforms the other datasets in the cold season at the daily resolution, which reflects the common model issues with correctly capturing the diurnal cycle. In the warm season (Fig. 13b), the GFS data has far lower correlations, indicating that it poorly resolves the convection that dominates precipitation during those months. This is most likely due to effective spatial scale imposed by the parameterizations in the GFS, as was noted by Janowiak et al. (2007). The percentage bias of the GFS precipitation is relatively low in both seasons, although it is more often positive than negative. A significant success is that it does not hugely overestimate warm-season convective precipitation (like CMORPH does), although this is likely because it fails to correctly forecast large events at all, as is suggested by its poor warm-season correlation with the SGP sites.
A common use of satellite precipitation estimates is in the validation of model output over the ocean, so a comparison of the performance of model data with satellite estimates has useful ramifications. The GFS model precipitation performs quite poorly over both the west and east Pacific (Figs. 14a and 14b), with median correlations around 0.15 and daily correlations around 0.3–0.4. Three-hourly correlations this low are indicative of extremely poor performance, even with the high levels of noise present. There is more skill in the daily precipitation, but it is still far inferior to both TMPA and CMORPH. In contrast to the TMPA and CMORPH, the GFS precipitation tends to overestimate the precipitation in both the west and east Pacific and has a generally large spread in values of percentage bias. These results suggest that the GFS data performs generally poorly over the tropical Pacific, although it performs surprisingly well over the SGP area—particularly in the cold season when large-scale precipitation dominates.
In this study, we have examined the performance of five independently developed high-resolution precipitation products against quality-controlled, subdaily gauge estimates from two geographically distinct areas of the globe for four years between 2003 and 2006. Generally speaking, CMORPH shows the highest correlations with the validation data with 3-hourly, 0.25° correlations averaging around 0.55. The other datasets are also well correlated with the validation data. Results between the sites in the United States were relatively consistent with each other, whereas greater disparity was shown among the ocean buoy gauges as a result of their much larger spatial spread. Over the U.S. land, the TMPA had the lowest biases as a result of the application of a monthly gauge correction, whereas the other HRPPs overestimated precipitation amount by as much as 125% in the warm season. However, the TMPA exhibits similar biases to CMORPH over the tropical Pacific. In this region, CMORPH, TMPA, and NRL blended underestimate the precipitation, whereas PERSIANN has a near-zero percentage bias on average. Without the gauge wind correction, PERSIANN and NRL blended overestimate the mean precipitation, whereas CMORPH and the TMPA both have a near-zero mean bias. Questions remain regarding the suitability of the wind correction (which is based on land measurements with a different gauge type) for the buoy gauge data, but it seems likely that some kind of correction is required, implying that CMORPH and TMPA probably underestimate the true mean of precipitation in the western tropical Pacific.
A natural question for users of these data would be which of the HRPPs should be used for practical applications. This is a challenging question because the available datasets have different strengths and weaknesses. Furthermore, the correlations reported in this study are only marginally different from each other, and this should be remembered when interpreting results. We have attempted to maximize the validation period so as to avoid spurious results and believe that our results are stable (in fact, several other data sources were excluded because of insufficient record length). This analysis shows that CMORPH and the TMPA appear to offer slightly greater accuracy than the other three-hourly datasets when compared to this limited set of observations. The monthly bias correction used in the TMPA makes it a good candidate for use in studies over land, where the precipitation amount is of primary interest. CMORPH, however, yields slightly higher correlations at the 3-hourly resolution with comparable biases over the ocean, and it might be a better choice of dataset for use in studies in which the variations of precipitation are more important. There are also other attributes that must be considered, such as the large-scale patterns or the representation of precipitation during severe precipitation events. Efforts to evaluate the performance of these data under these situations are ongoing through the PEHRPP project and will provide further useful information for interested users.
A goal of the PEHRPP project is to understand differences between similar algorithms so as to improve the datasets, and we have a number of suggestions that arise from this study. First, our two validation sites show that HRPPs, which use more PMW data, are generally more accurate than those that rely more heavily on the frequently sampled IR data. It is becoming increasingly clear that the advantages gained because of the quality of PMW estimates outweigh the disadvantages of its sampling. The methods that incorporate PMW data will most likely continue to thrive as the available data increases with the upcoming Global Precipitation Mission (GPM). Second, it is also clear from this work that the gauge adjustments of the TMPA are a significant advantage over land and that such an approach would be a boon for all of the HRPPs, particularly those used for hydrology. However, it should be remembered by users that the advantage of gauge adjustments exists only over land, not over the ocean.
The authors gratefully acknowledge the sources for the HRPP data used, including Bob Joyce, Bob Adler, Bob Kuligowski, Kuo-Lin Hsu, and Joe Turk as well as their development teams. The IPWG co-chairs Chris Kidd and Ralph Ferraro as well as Beth Ebert made important contributions to the PEHRPP analysis. This work is a project of the Cooperative Institute for Climate Studies at the University of Maryland and has received support from the Climate Change Data and Detection program of the NOAA Climate Program Office and the NASA Energy and Water Cycle Study. The continuing support of Jared Entin is particularly appreciated.
Corresponding author address: M. R. P. Sapiano, Earth System Science Interdisciplinary Center, University of Maryland, College Park, 5825 University Research Court, Suite 4001, College Park, MD 20740-3823. Email: email@example.com