This study evaluates the performance of a newly developed daily precipitation climate data record, called Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks–Climate Data Record (PERSIANN-CDR), in capturing the behavior of daily extreme precipitation events in China during the period of 1983–2006. Different extreme precipitation indices, in the three categories of percentile, absolute threshold, and maximum indices, are studied and compared with the same indices from the East Asia (EA) ground-based gridded daily precipitation dataset. The results show that PERSIANN-CDR depicts similar precipitation behavior as the ground-based EA product in terms of capturing the spatial and temporal patterns of daily precipitation extremes, particularly in the eastern China monsoon region, where the intensity and frequency of heavy rainfall events are very high. However, the agreement between the datasets in dry regions such as the Tibetan Plateau in the west and the Taklamakan Desert in the northwest is not strong. An important factor that may have influenced the results is that the ground-based stations from which EA gridded data were produced are very sparse. In the station-rich regions in eastern China, the performance of PERSIANN-CDR is significant. PERSIANN-CDR slightly underestimates the values of extreme heavy precipitation.
Precipitation is a key component of the hydrological cycle and a primary input for hydrometeorological and climate models (Sorooshian et al. 2011). Therefore, accurate estimation of the rainfall amount at sufficient temporal and spatial resolutions is a prerequisite for a wide range of applications from global climate modeling (Dai 2006) to local weather and flood forecasting (Demargne et al. 2014). Historically, precipitation datasets based on ground-based rain gauge observations have served as the main source of precipitation measurements for various hydrological, hydrometeorological, and climatological applications because of their relatively long record lengths (Yatagai et al. 2009). However, in many regions of the world, ground-based measurement networks (radar and/or rain gauges) are sparse and inadequate for capturing the spatial and temporal variability of precipitation systems; in some cases, these measurement networks are nonexistent. This lack of adequate precipitation data limits the ability to conduct hydroclimatological investigations and the use of physical and statistical hydrological models for water resources management.
From a statistical point of view, the longer the duration of the data time series, the more valid the results of the analysis of the characteristics of climate extremes will be (Klein Tank et al. 2009). Based on the World Meteorological Organization (WMO) report, at least 30 years of historical data are needed for the purpose of conducting climate studies (Burroughs 2003). Lack of a consistent record of +30 years has been the motivation for exploring the utility of satellite-based precipitation products that will provide full global coverage at relatively high temporal and spatial resolutions for studying extremes and long-term climate variability. Until recently, the Global Precipitation Climatology Project (GPCP) has provided long-standing, globally complete precipitation data by merging the highest-quality satellite and gauge estimates (Huffman et al. 1997). In particular, GPCP has been providing three precipitation datasets, including monthly (Adler et al. 2003; Huffman et al. 2009) and 5-day (Xie et al. 2003) datasets at 2.5° resolution covering the period from 1979 to present, and a daily dataset at 1° resolution covering the period from 1996 to present (Huffman et al. 2001).
However, the coarseness of the GPCP precipitation datasets, unsuitable for the study of extremes, has provided an opportunity to consider certain alternatives. Being an infrared (IR)-based model, the Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN; Hsu et al. 1997; Sorooshian et al. 2000) model can estimate global precipitation at half-hourly temporal and 0.25° spatial resolutions. The National Climatic Data Center (NCDC) Climate Data Record (CDR) program of the National Oceanic and Atmospheric Administration (NOAA) has established a new retrospective satellite-based precipitation dataset, called PERSIANN-CDR, for long-term hydrological and climate studies (Ashouri et al. 2015). PERSIANN-CDR is a multisatellite, high-resolution precipitation product that provides daily precipitation estimates at 0.25° spatial resolution from 1 January 1983 to 31 March 2014. PERSIANN-CDR uses the archive of Gridded Satellite (GridSat-B1) IR data (Knapp 2008) as the input to the PERSIANN model. The retrieval algorithm uses IR satellite data from global geosynchronous satellites as the primary source of precipitation information. To meet the calibration requirement of PERSIANN, the model is pretrained using the National Centers for Environmental Prediction (NCEP) stage IV hourly precipitation data. Then, the parameters of the model are kept fixed and the model is run for the full historical record of GridSat-B1 IR data. To reduce the biases in the estimated precipitation, while preserving the temporal and spatial patterns in high resolution, the resulting estimates are then adjusted using the GPCP monthly 2.5° precipitation products (Ashouri et al. 2015).
PERSIANN-CDR provides the opportunity to study the behavior of extreme precipitation patterns on a global scale over the past three decades. However, an important step in determining the efficacy of a dataset for such applications is to test and compare it with available ground-based observations. This requires independent testing and comparison with available ground-based observations over a given time period. The primary objective of the present study is therefore to evaluate the efficacy of PERSIANN-CDR in capturing the behavior of extreme precipitation events over China.
The reason for focusing on China is that, as the most populated nation (Piao et al. 2010) and one of the fastest-growing economies in the world (Hubacek et al. 2007), its requirements for hydroclimatological information and availability of reliable, long-term, and relatively high-resolution precipitation information are of critical importance. Such information is required for a range of applications, including but not limited to statistical flood frequency analysis and water resources planning, design, and system operation. Almost every year, floods in China cause considerable economic losses and serious damage to both urban and rural areas (farms). In 2013, floods in China affected 120 million people and an agriculture production area 1.19 × 105 km2 in size, causing economic damage totaling more than $50.7 billion (http://politics.people.com.cn/n/2014/0109/c70731-24072825.html). Therefore, the ability to characterize and study extreme climate events in China, with its diverse conditions of geography and topography and its susceptibility to monsoons, depends on data that provide better coverage than sparse rain gauge networks. The availability of remotely sensed precipitation datasets, as an alternative to in situ observations, is evaluated in the present paper.
2. Data and analysis
We obtained the observations from the gauge-based analysis of daily precipitation over East Asia (EA; ftp://ftp.cpc.ncep.noaa.gov/precip/xie/EAG/EA_V0409/; Xie et al. 2007). The EA dataset contains observations from ~1400 ground-based stations across China. Figure 1 shows the distribution of rain gauge stations in the EA dataset and the elevation map of China. Rain gauge distribution in terms of areal coverage is highly uneven. There are approximately 1.8 gauges per 10 000 km2 in the eastern monsoon region of China, but the number reduces to ~0.4 per 10 000 km2 in the western and northwestern regions of China. The procedure used by Xie et al. (2007) to develop the gridded EA dataset was through the interpolation of point measurements into 0.5° × 0.5° grid boxes using the optimal interpolation (OI) method. The EA dataset provides daily rainfall data over China for the period from 1 January 1962 to 31 May 2007. This is used as the reference dataset in this study.
The PERSIANN-CDR dataset was provided by the NOAA NCDC and the Center for Hydrometeorology and Remote Sensing (CHRS) at the University of California, Irvine (ftp://data.ncdc.noaa.gov/cdr/persiann/files/). For comparison purposes, PERSIANN-CDR 0.25° precipitation estimates are aggregated into the same resolution as that of the observed data (0.5°). It is also noteworthy that both PERSIANN-CDR and EA daily rainfall datasets correspond to the 0000–2400 UTC time frame.
To evaluate the performance of PERSIANN-CDR in capturing the behavior of precipitation extremes over China, 11 extreme precipitation indices, in three general categories, are calculated and compared respectively with the same indices from the EA gauge-based dataset. Seven of these indices (SDII, R20mm, R10mm, Rx1d, Rx5d, CWD, and CDD, all described in Table 1) are defined by the Expert Team on Climate Change Detection and Indices (ETCCDI; Klein Tank et al. 2009). The work of the ETCCDI was sponsored jointly by the WMO Commission for Climatology (CCI), the Joint Technical Commission for Oceanography and Marine Meteorology (JCOMM), and the World Climate Research Programme (WCRP) on Climate and Ocean: Variability, Predictability and Change (CLIVAR). Moreover, we looked at the four other extreme precipitation indices (RR99p, RR95p, R20mmTOT, and R10mmTOT). The definitions of these 11 extreme precipitation indices are presented in Table 1. Because the EA datasets are only available up to May 2007, all of the indices are calculated using daily precipitation data from PERSIANN-CDR and EA dataset products for the 1983–2006 period.
In addition to our evaluation over China, we also conducted a more localized test of PERSIANN-CDR’s performance by zooming in on a “box” surrounding the Yellow River (YR) basin, where there is a higher density of rain gauges (see Fig. 1). The purpose of this test was to determine the sensitivity of the results to the number of gauges available and how well PERSIANN-CDR compares with the increase in gauge density, which is anticipated to capture the heterogeneity of precipitation more accurately.
Besides evaluating the spatial pattern of the mean value of 11 extreme indices, we still calculated the Pearson correlation coefficient R in each pixel between the PERSIANN-CDR and the EA dataset during the 1983–2006 period and the root-mean-square error (RMSE) and R for the 24-yr average of 11 extreme indices:
where O and P are the extreme indices from EA and PERSIANN-CDR datasets in yth year or pth pixel, respectively; n is the length of the annual extreme indices (equal to 24 in this research); and m is the pixel number over China.
a. Percentile indices
Figure 2 shows the performance of PERSIANN-CDR in capturing the 99th (RR99p) and 95th (RR95p) percentile indices of the daily precipitation at each 0.5° × 0.5° grid box over China. High values of annual RR99p and RR95p appear in southeastern China. In general, PERSIANN-CDR captures a spatial distribution of RR99p and RR95p similar to that of the EA dataset, with values increasing from north to south and east to west. However, the disagreement between PERSIANN-CDR and the EA dataset is relatively obvious for percentile indices in western and northwestern China. As far as the Pearson correlation coefficients between PERSIANN-CDR and the EA dataset are concerned, the significance of correlation coefficient values was tested at the 0.05 level using the two-tailed Student's t test. As shown in Fig. 2, high positive and statistically significant correlations are found in most regions for both indices, especially over the southeastern regions, where extreme precipitation events occur frequently. Over all of China, the significant correlations at the 95% confidence level for RR99p and RR95p cover 28% and 45% of the country, respectively. However, the percentages increase to 32% and 64%, respectively, in the monsoon area. The scatterplot shows that PERSIANN-CDR has good agreement with gridded-gauge data. PERSIANN-CDR tends to underestimate the low-value percentile indices for the dry and arid regions in western and northwestern China.
b. Absolute threshold indices
The five absolute threshold indices for PERSIANN-CDR and the EA dataset are calculated and plotted in Fig. 3. As noted in the previous section, PERSIANN-CDR exhibits good agreement with the EA gridded-gauge data in depicting the spatial distributions of the absolute threshold indices. With respect to the correlation coefficients between PERSIANN-CDR and the EA dataset, high correlations are observed in the southern and eastern regions of the country, where rain gauge networks are much denser and where most of the heavy rainfall events occur. Over all of China, significant correlations at the 95% confidence interval for SDII, R20mm, R10mm, R20mmTOT, and R10mmTOT cover 63%, 60%, 70%, 62%, and 70% of total area, respectively. However, the percentages increase to 85%, 76%, 95%, 79%, and 95%, respectively, in the monsoon area. The daily precipitation in parts of northwestern China is below the threshold (20 and 10 mm), resulting in blank spaces in the map of annual correlation analysis. Compared to percentile indices, PERSIANN-CDR shows closer agreement with the EA dataset in the eastern China monsoon region for the absolute threshold indices. The correlation of annual absolute threshold indices between PERSIANN-CDR and the EA dataset is significant in most of the eastern China monsoon regions.
c. Maximum indices
As for the third category of the extreme precipitation indices, we look at the maximum indices of precipitation, both with respect to the value of the daily rainfall and to the duration of the rain/no-rain period. Figure 4 shows the Rx1d, Rx5d, CWD, and CDD statistics derived from the PERSIANN-CDR and EA datasets. As shown, the agreement between PERSIANN-CDR and the EA dataset in replicating the behavior of the maximum 5-day precipitation (Rx5d) index is better than that of the maximum daily precipitation (Rx1d). As for the CWD and CDD indices, in general, PERSIANN-CDR depicts similar patterns as those of the EA dataset, with a better performance in replicating the CDD index over the dry and arid regions in western China. Thescatterplots also illustrate that the PERSIANN-CDR results agree well with those obtained from the EA dataset.
As already discussed above, PERSIANN-CDR is bias corrected using the GPCP monthly 2.5° product. In brief, GPCP monthly data use the global monthly gauge information provided by the Global Precipitation Climatology Centre (GPCC; Rudolf 1993; Rudolf et al. 1994; Schneider et al. 2008), as well as the merged satellite precipitation estimates, to produce its product. We note that the EA dataset used in our study is based on daily precipitation information from ~1400 gauging stations across China, 700 of which are from meteorological stations and the rest from hydrological observation networks that are not included in the GPCC dataset. Furthermore, GPCC is a 2.5° monthly product while EA is a 0.5° daily observation dataset. Therefore, the combination of the above factors minimizes the likelihood of dependency between PERSIANN-CDR and the EA dataset.
To adequately discuss the results, China is separated into two distinct regions, namely, western and northwestern China and the “monsoon” region (see Fig. 1; Liu et al. 2014). The monsoon region receives more precipitation as compared to the arid and the semiarid regions in western and northwestern China. The density of rain gauges is greater in the monsoon region (approximately two gauges per 10 000 km2) than in the western and northwestern regions (approximately one gauge per 25 000 km2). Within the monsoon region, a separate study was also conducted, focusing on the Yellow River basin (see Fig. 1), where there is a higher concentration of rain gauges (approximately eight gauges per 10 000 km2).
a. Evaluation results for western and northwestern China
The agreement between PERSIANN-CDR and the EA dataset in western and northwestern China is relatively weak as measured by the 11 extreme precipitation indices. The corresponding correlation coefficients are close to zero and, as shown in some pixels, they are even negative (Figs. 2–4). There are three possibilities for the lack of stronger agreement between the two datasets. Either the satellite-based estimations or the ground-based observations are inaccurate or both are inaccurate. It could be argued that satellite estimates of precipitation are not replacements for high-quality, ground-based observations obtained from a relatively dense rain gauge network that captures the spatial pattern of the rain events. However, western and northwestern China contain few ground-based stations, leading to uncertainty and potential errors in representing the spatial heterogeneity of rainfall for this vast region. Figure 5 shows the probability density function (PDF) of the relative error for the 11 precipitation indices, for different values of rain gauge density (the number of stations within a pixel). The large relative errors are more likely from the regions with sparse rain gauge stations (e.g., western and northwestern China) based on PDFs for all 11 extreme precipitation indices.
Moreover, the effects of topography on precipitation during the interpolation process from point measurements to gridded networks are not fully taken into account. The highest and most expansive highland in the world, the Tibetan Plateau, which is located in northwestern China, has an average elevation of over 4000 m above mean sea level. However, the interpolation method, the OI scheme, used in preparation of the EA dataset attempts to minimize the total error of all the observations by placing different “optimal” weights on individual observations according to statistical information (Gandin 1965). As is often the case, rain gauge locations usually tend to lie at lower elevations relative to the surrounding terrain (Tong et al. 2014), and therefore, simple interpolation from sparse rain gauge stations may not capture the influence of orographic lifting on precipitation, especially in topographically complex regions like the Tibetan Plateau. Adam et al. (2006) suggest that the correction for orographic effects leads to a 20.2% increase in net precipitation in orographically influenced regions. Similarly, one cannot rule out the influence of the complex topography on the quality of satellite rainfall estimates.
b. Evaluation results for the eastern China monsoon region
The agreement between PERSIANN-CDR and the EA dataset in China’s monsoon region, as measured by the 11 extreme precipitation indices, demonstrates that PERSIANN-CDR is capable of representing extreme precipitation events in eastern China, which is highly susceptible to monsoon-induced flooding. As can be observed from Fig. 5, the PDF curves of relative errors for all 11 extreme precipitation indices become more peaked, tighter, and symmetric in regions with higher densities of rain gauges (e.g., the eastern China monsoon region), suggesting smaller relative errors between PERSIANN-CDR and the EA dataset in data-rich regions. Among the 11 extreme precipitation indices, it is found that PERSIANN-CDR tends to slightly underestimate the maximum daily precipitation (Rx1d), maximum 5 days of consecutive precipitation (Rx5d), the heavy precipitation (precipitation ≥10 mm), and the extreme heavy precipitation (precipitation ≥20 mm) in the eastern China monsoon region.
c. Evaluation results for the Yellow River region
Examination of this smaller region, which has the highest density of rain gauges, shows that the relative error for the 11 extreme precipitation indices is relatively small compared to that for western and northwestern China and the eastern China monsoon region. As can be seen from Fig. 6, the tails of the relative error PDFs shorten with increasing rain gauge density. This confirms that a larger rain gauge density is likely to lead to smaller relative errors. By comparing all the PDFs, it can be seen that the larger relative errors are associated with regions with sparse rain gauge density. It is also noteworthy that the performance of PERSIANN-CDR in the Yellow River basin remains stable and does not differ significantly when the threshold (i.e., the minimum number of rain gauges in a given pixel) increases from four to eight. This suggests a consistent performance for PERSIANN-CDR, which is a promising result, especially for regions that do not have dense rain gauge networks.
Availability of the newly released daily, 0.25° satellite-based PERSIANN-CDR covering the period from 1983 to present provides the opportunity to study the behavior of extreme precipitation patterns over three decades at a finer resolution than previously possible. In this study, we evaluated the PERSIANN-CDR performance in capturing the behavior of extreme precipitation events over China when compared to ground-based observations. The period selected for the study was 1983–2006, and the ground-based gridded EA daily precipitation dataset was used as the reference observation set. Based on the results from the 11 extreme precipitation indices, PERSIANN-CDR has the capability to represent the extreme precipitation events in the eastern China monsoon region. Among the three categories of extreme indices (percentile indices, absolute threshold indices, and maximum indices), PERSIANN-CDR shows the best performance for the five absolute threshold indices relative to the other two indices. In the case of the western and northwestern regions of China with very sparse rain gauge density, the agreement between PERSIANN-CDR and the EA dataset was weaker. This result does not suggest that PERSIANN-CDR is less accurate than the EA dataset over this region. On the contrary, it is a fair argument to suggest that the most likely reason for the lack of better agreement between the datasets is because the rain gauge density fails to capture the heterogeneity of precipitation, as well as the influence of the complex topography of western China not captured by the OI scheme. In the case of the Yellow River basin, the larger relative errors are associated with regions with sparse rain gauge density. The comparison shows that the agreement between the PERSIANN-CDR dataset and the EA dataset does not change significantly when the rain gauge number increases from four to higher numbers in each pixel. This observation provides additional confidence in the robustness of the PERSIANN-CDR dataset for climate studies and other applications. To conclude, based on the good agreement between PERSIANN-CDR and the EA dataset in capturing the behavior of extreme precipitation events specifically in the eastern China monsoon region, we conclude that the PERSIANN-CDR dataset can serve as a valuable observation dataset for various applications, such as statistical hydrology and other hydroclimate-related investigations, including studies of changing patterns of extremes and nonstationarity behaviors at fine regional scales.
The authors appreciate the partial financial support provided by the National Oceanic and Atmospheric Administration’s (NOAA) Cooperative Institute for Climate and Satellites (CICS) and NOAA’s National Climatic Data Center (NCDC) Climate Data Record program (Prime Award NA09NES440006 and NCSU CICS Subaward 2009-1380-01), NOAA’s Climate Change Data and Detection (CCDD; NA10DAR4310122), and the National Aeronautics and Space Administration’s (NASA) Earth and Space Science Fellowship (NESSF) award (NNX12AO11H). We are also grateful to NOAA’s Climate Prediction Center (CPC) and NOAA’s Earth System Research Laboratory (ESRL) for providing the observed climate data over the East Asia (EA) and Global Precipitation Climatology Project (GPCP) datasets, respectively, as well as the NOAA NCDC and the Center for Hydrometeorology and Remote Sensing (CHRS) at the University of California, Irvine, for providing the PERSIANN-CDR data.