1. Introduction
Gridded precipitation data are important for a variety of studies. These data can be used not only to force hydrological models but also to evaluate or correct regional climate model (RCM) outputs (Daly et al. 2017; Herrera et al. 2019; Huffman et al. 2001). Beguería et al. (2016) found that the recognizable good quality of gridded datasets for mean values cannot be extended to the variance. Thus, using statistical distributions to justify the quality of gridded products is better because they contain the properties of mean, variance and quantiles. In fact, the statistical distribution properties of gridded precipitation have been used in many applications. For example, the daily mean value has been used to evaluate numerical model outputs (Evans et al. 2012), variance-related indices have been used to monitor extreme events (Wu et al. 2017) and quantiles have been used to correct numerical model outputs (Themeßl et al. 2012; Lafon et al. 2013). An accurate statistical distribution is essential to understand hydrological and meteorological processes from the watershed scale to the global scale.
However, the statistical distribution of gridded precipitation can be affected by many factors, including the station data and the interpolation method. Precipitation measurements usually can be affected by measurement undercatch, which refers to the measurement loss induced by wind, trace losses and evaporative loss (Ma et al. 2015; Pan et al. 2016; Yang et al. 1993; Yang and Ohata 2001). Due to the large proportion of solid precipitation in alpine and cold regions such as the TP, the measurements are more susceptible to wind than those in low elevation areas (Goodison et al. 1998; Yang et al. 1993). As pointed out by previous studies, the underestimation can reach 50%, especially on snow days with strong wind speeds (Goodison et al. 1998). The mean value of the final gridded precipitation will essentially decrease when the measurement undercatch is neglected.
The interpolation method can also have a substantial impact on the statistical distribution of the final gridded data. Director and Bornn (2015) pointed out that interpolation often provides a rational mean but a problematic distribution. Hofstra et al. (2010) studied the European land-only daily high-resolution gridded datasets (E-OBS) and found that when a limited number of stations were used for interpolation, the variance was small, and an oversmoothed pattern appeared. Moreover, the relationship between precipitation and altitude and how this relationship is considered can have important implications (Hofstra et al. 2008; Roe 2005; Wagner et al. 2012).
Most of the existing gridded precipitation datasets ignore the distributional uncertainties caused by the measurement undercatch and the interpolation method, which may cause trouble in future studies. Prein and Gobiet (2017) compared three types of daily European precipitation datasets and seven global datasets and found that the differences between those datasets exhibited the same magnitude as the errors in the RCMs. Beguería et al. (2016) found that the variance in gridded datasets crucially depends on the spatial density of the station network and pointed out that the bias in gridded datasets may cause incorrect conclusions in climate change assessments. Kotlarski et al. (2019) found that the uncertainties in the reference datasets could influence the evaluation of RCMs even at a large scale, in which precipitation was more sensitive than temperature. In addition, it was found that the uncertainties of the gridded quantile would be propagated to the correction results in quantile-based statistical downscaling methods (Gudmundsson et al. 2012; Laflamme et al. 2016; Maraun 2013; Piani et al. 2010; Thrasher et al. 2012).
In this study, by exploring the statistical distribution error caused by measurement undercatch and the interpolation method, we developed a new method to acquire a gridded precipitation product with an accurate statistical distribution in the TP.
This paper is organized as follows: In section 2, we introduce how we identify the best interpolation method and compensate for the measurement undercatch. Then in section 3, we describe the datasets we used. In section 4, we analyze the changes in the gridded precipitation after the measurement undercatch correction and the optimization of the interpolation algorithm. We discuss the improvement and validity of our proposed method and the impact on the statistical distribution caused by the station network density and altitude in section 5. The conclusions are provided in the final section.
2. Methodology
The purpose of this study is to combine the optimized interpolation algorithm with gauge undercatch correction to obtain a suitable gridded precipitation product for the TP. The first step is to study how the interpolation algorithms affect the gridded precipitation to determine the optimum interpolation algorithm for the construction of the gridded precipitation product for the TP. Then, based on the optimum interpolation method, the impacts on the statistical distribution caused by the wind-induced undercatch of the station data are explored. In the final step, gridded precipitation constructed by the optimum interpolation method with undercatch correction is compared with two existing datasets to demonstrate the advances of the proposed method.
a. Finding the optimum interpolation method
We design six interpolation methods to grid the station precipitation and perform an independent validation to estimate the statistical distribution uncertainties of different interpolation methods. The six methods can be classified as single step methods, including the directly inverse distance weighted (IDW) method and ordinary kriging (OK), and multistep methods such as TPSD3D_OK. The TPSD3D_OK method uses thin-plate splines (TPS) and the ordinary kriging algorithm; in addition, the “D3D” in “TPSD3D_OK” indicates that altitude is considered as a covariate to estimate the daily precipitation value. Table 1 describes the abbreviations and detailed steps of each method. These interpolation methods are the commonly used for existing gridded precipitation datasets. Two interpolation methods make up a comparison group, which is used to explore a specific problem. For example, a group in which one method considers altitude as a covariate while the other does not is used to determine if the consideration of orography can reduce the statistical distribution error.
Descriptions of the six interpolation methods.
1) Interpolation method settings
TPS3D_OK and TPS2D_OK are set to study the effects of altitude on precipitation. TPS3D_OK considers altitude as a covariate to interpolate monthly precipitation totals, while TPS2D_OK does not. The relationship between altitude and precipitation is relatively more significant than that between precipitation and other orographic effects, such as slope and aspect (Daly et al. 2008). Therefore, we do not consider any orographic effects other than altitude.
TPS3D_OK and TPSD3D_OK are set to compare the two mainstream interpolation methods in daily precipitation gridding; the first interpolation method considers altitude as a covariate to estimate the monthly precipitation totals (Haylock et al. 2008), and the latter considers altitude to estimate the daily climatology, which is defined as the 365-calendar-day time series of the mean daily precipitation averaged over the entire study period (Isotta et al. 2014). For example, the background field of 1 January is the mean value of the precipitation records at all 1 January days in all study years.
TPS3D_IDW and TPS3D_OK are set to compare the methods for the interpolation of anomalies, which are defined as the ratio between the daily value and the monthly estimate. Both methods use the same interpolation algorithm to grid the monthly precipitation, but different algorithms are used for anomalies; TPS3D_IDW utilizes the inverse distance weighted algorithm, which is representative of deterministic methods, and TPS3D_OK utilizes ordinary kriging, which is representative of geostatistical methods (Ly et al. 2011).
Finally, IDW and OK are set as simple references to be compared with the above sophisticated methods.
2) Interpolation algorithm
We utilize ANUSPLIN to perform the TPS smoothing algorithm. ANUSPLIN is a software package that uses the theory of thin-plate smoothing splines and can be thought of as an extension of linear regression; further, it is different from using a nonparametric smooth function (Hutchinson and Gessler 1994). The first steps in the TPS3D_OK, TPS3D_IDW, TPS2D_OK, and TPSD3D_OK methods adopt the TPS algorithm.
A single isotropic variogram model is used at the OK algorithm for all days. In the OK algorithm, the variogram model describes the spatial variation of the target variable. Haylock et al. (2008) pointed out that a single semivariogram model on all days is better than a standalone model on each day because more data are used to estimate the variogram parameters. In addition to OK, the second steps of TPS3D_OK, TPS2D_OK, and TPSD3D_OK use this algorithm.
3) Statistical distribution validation
We use the Wasserstein distance (WD), also named the Earth mover’s distance, to measure the distribution similarity between the gridded data and station data. Although the mean, variance and quantile can reflect the statistical distribution, they are only partial reflections and cannot be considered as integrated scores.
In the calculation of the WD, not all of the values are involved. Generally, a threshold is needed to determine the occurrence of rainfall. On account of drizzle effects (Gutowski et al. 2003), this threshold usually does not equal 0. As the minimum value is 0.1 mm in the station data, we use 0.1 mm as the threshold for a wet day, which can indicate that rainfall does occur. The comparisons for the frequency distribution are from the period 1 January 1980–31 December 2009, for a total of 10 958 series. This amount is enough to maintain the stability of the statistical characteristics of the different datasets from different interpolation methods.
The spatial resolution of the grid is initially set to 1 km. No gridded value at the 1-km scale can be considered to be bias free in the statistical distribution, so we hypothesize that the statistical distribution is homogeneous within a 1-km grid box because the decorrelation length of the daily precipitation is usually greater than 10 km (Auer et al. 2005). Thus, if one gridded dataset is more biased than the other one at the 1-km scale, it will still be more biased when downscaled to the point scale. Based on this hypothesis, the station data at the point scale can be used to examine the statistical distribution for different interpolation schemes.
b. Undercatch correction for the measured precipitation
The catch ratio K is related to the threshold used to distinguish solid and liquid precipitation. A commonly used temperature for distinguishing solid and liquid precipitation is −2° and 2°C (Yang and Ohata 2001; Ye et al. 2007; Zhang et al. 2004). Ding et al. (2014) noted that a high temperature threshold appeared in high-altitude regions. Kang et al. (1997) obtained a threshold of 2.8° and 5.5°C through field observations at the Daxigou station on the northern slope of the Tianshan Mountains. Ma et al. (2015) suggested that the threshold of 2.8° and 5.5°C is more suitable in the TP than the previous threshold because of the similar geographic conditions of the two areas. Here, we adopt their view and use the threshold 2.8° and 5.5°C to distinguish between solid and liquid precipitation. Parameters ΔPw, ΔPe, and ΔPt are all constants and are decided by only the type of rain gauges. Here, we also adopt the correction parameters gathered by Ma et al. (2015), which describe the six classes of rain gauges in those countries surrounding the TP.
The algorithm of the undercatch correction requires the wind speed at 10-m height and the air temperature at 2-m height as inputs, in addition to the precipitation. Because more than 98% of the measurements contain complete wind speed and temperature records, the undercatch correction can be conducted without problems. In cases for which the air temperature or wind speed records are missing, which account for only a small number of datasets, the undercatch correction would not be implemented. Since the data are mainly missing for India along the southern Himalayan foothills, the proportion of snowfall in this area is very small. Ignoring the undercatch effect would not have a significant impact on the overall results.
c. Evaluation of the new gridded precipitation
The station data with undercatch correction were gridded by the best interpolation method and compared with the existing datasets. We named our datasets Gridded Precipitation for Quantile Mapping (GPQM). Because of the different spatial resolutions in the GPQM and the other two datasets, we use the mean value aggregation to unify the spatial resolution to 0.25°. Then, comparisons between the three are conducted. We mainly explore the differences in the mean, 98th quantile and variance among the three datasets. Because there is no gridded precipitation that can represent the true value at 0.25° to conduct a direct comparison in WD.
3. Study area and datasets
a. Study area
The TP (Fig. 1) has an average elevation of more than 4000 m and an area of more than 2.3 million km2 (Liu and Chen 2000). The world’s highest mountains, such as the Himalayas, Pamir, and Karakorum, are all located in the TP. The TP is very important and can influence general global circulation through mechanical and thermal forcing due to its extent and height (An et al. 2001). The distribution of precipitation in the TP is very complicated. The annual precipitation ranges from tens to thousands of millimeters. Precipitation in mountainous areas is affected not only by large-scale and mesoscale circulation but also by orography (Higuchi et al. 1982; Maeno et al. 2009; Wulf et al. 2010).
b. Station data
In this study, daily station observations were acquired from the China Meteorological Administration (CMA) and the Global Surface Summary of the Day (GSOD). In total, 164 stations were used in this study. A total of 154 stations were used for interpolation, while the remaining 10 were used for validation (Fig. 1, Table 2). The period of the collected datasets ranges from 1 January 1980 to 31 December 2009. Seven countries with different classes of rain gauges were included in the study area. The CMA dataset has less missing records in precipitation than the GSOD dataset, as indicated by its lighter color (Fig. 2). Most of the stations in the CMA dataset have a record length that covers the whole study period.
Stations for independent validation.
Several steps were used to eliminate those apparent errors in the station data before the interpolation. The first was a value check where missing values, values that were duplicated more than 5 times and unreasonable values, such as more than 500 mm day−1 of precipitation and temperatures greater than 70°C, were removed (Isotta et al. 2014). The monthly precipitation totals are summed from daily series. If a monthly total is 10 times less than its average value, this month was dropped because missing values may be recorded as zero, and such zero values are ubiquitous in the GSOD dataset.
c. Existing gridded precipitation dataset
Here, we adopt two widely used gridded precipitation datasets to compare with our proposed GPQM. The first gridded precipitation dataset is the NOAA Climate Prediction Center, Unified Gauge-Based Analysis of Daily Precipitation (CPC-UNI) (Chen et al. 2008; Xie et al. 2010, 2007), which is a global daily dataset. This dataset has a spatial resolution of 0.5°. More than 30 000 stations are used to produce this dataset, but only dozens of stations are located in our study area. The time range is from 1979 to the present.
The second dataset is the Asian Precipitation–Highly Resolved Observational Data Integration Towards the Evaluation of Water Resources (APHRODITE) (Yatagai et al. 2012, 2009), with a spatial resolution of 0.25° and daily temporal resolution. This dataset only covers the whole Asia. The version we used here is V1101, which has a time range from 1950 to 2007. The number of stations used in APHRODITE in the study is about the same number as we used, which is approximately 3 times that used in CPC-UNI.
4. Results
a. The influences of the interpolation methods on the statistical distribution
The consideration of altitude as a covariate can reduce the statistical distribution error in most cases. The average WDs in TPS3D_OK and TPS2D_OK are 3.47 and 3.40, respectively. Although TPS2D_OK has a smaller WD on average, the WD of TPS3D_OK is smaller than that of TPS2D_OK for 8 of the 10 sites. TPS3D_OK has a larger WD than TPS2D in only Fugong and Jiacha (Table 3). The WD at Fugong is 3 times larger than that of TPS2D_OK, which increases the average WD integrally in TPS3D_OK.
WD for different methods at 10 independent validation sites. The bold terms indicate the best method for statistical distribution validation.
A consideration of the relationship between altitude and precipitation at the monthly precipitation totals estimates can provide a better statistical distribution than a consideration at the daily climatology. Table 3 shows that the average WD of TPS3D_OK is lower than that of TPSD3D_OK at all stations except for Gorakhpur and Fugong; the WDs at these stations are 3.47 and 3.62, respectively. Additionally, the WD of TPS3D_OK is less than that of TPSD3D_OK for 8 of the 10 sites.
The geostatistical interpolation method shows better distribution than the deterministic interpolation method. In TPS3D_OK and TPS2D_IDW, the WDs at 8 sites increased significantly after using IDW to interpolate the anomalies. The comparison of the reference methods OK and IDW indicates that the WDs from OK are less than those from IDW at all sites. Figure 3 shows that the IDW method is always above the ratio line at 1 at all sites when the precipitation amount is low. The ratios of IDW begin to decrease with the increment of the precipitation amounts. Finally, at heavy rainfall, the values become the lowest below the ratio line at 1. This result suggests that IDW produces more drizzle precipitation at low values and reduces the precipitation under heavy precipitation.
b. The statistical distribution change due to undercatch correction
After the undercatch correction, the frequency distribution of gridded precipitation shows a noticeable improvement. In most sites, the corrected results significantly reduce the WD when the maximum decline reaches more than 30% (Table 4). Except for the Wuqia and Fugong stations, after the undercatch correction, the quantile–quantile (QQ) plot becomes slightly closer to the 1:1 line, which indicates that the statistical distribution exhibits improved agreement with the station data (Fig. 4). Undercatch correction essentially adds the loss back to the observation. This addition is the restoration of reality, not a redistribution of the statistics. Therefore, after the undercatch correction, the QQ plot undergoes only a small translation; meanwhile, the shape remains almost unchanged (Fig. 4).
WDs before and after the undercatch correction.
The improvement by undercatch correction does not always make the results more realistic. In Fugong, the values in the QQ plot after the correction become larger than before. The WD increases from 3.87 to 4.68 (Table 3). This result means that the correction makes the overestimation more distinct when the interpolation results have already been overestimated, although it leads to the station precipitation being closer to the real value (Fig. 4).
c. Comparison of the GPQM dataset with APHRODITE and CPC-UNI
The daily mean value in the GPQM dataset is slightly higher than those in the CPC-UNI and APHRODITE datasets (Figs. 5a,d,h and Figs. 6a,b). For most of the study area, the results in the GPQM dataset are larger than those in both the CPC-UNI and APHRODITE datasets. The incremental value mainly ranges from 0% to 50%. The average increase is 7.1% and 38.9% for the CPC-UNI and APHRODITE datasets (Table 5), respectively.
The average difference between the GPQM dataset and the CPC-UNI and APHRODITE datasets. Q98 means the 98th quantile, Var means the temporal variance.
For the 98th quantile, the results from the GPQM dataset are also larger than the corresponding results in the CPC-UNI and APHRODITE datasets (Figs. 5b,e,i and Figs. 6b,e). The 98th quantile in the GPQM dataset is 12.3% larger than that in the CPC-UNI dataset and 69.7% larger than that in the APHRODITE dataset. The largest area is concentrated in the middle east of the plateau.
The temporal variance in the values in the GPQM dataset is still larger than that in the CPC-UNI and APHRODITE datasets (Figs. 5c,f,j and Figs. 6c,f), which suggests that GPQM has a weaker smoothing effect than CPC and APHRODITE. The variance in GPQM is approximately 68% larger than that in CPC-UNI and 335% larger than that in APHRODITE (Table 5). A smoothing effect means low precipitation is increased and large precipitation value is reduced.
5. Discussion
a. The improvement of the proposed method
This study reveals two problems existing in gridded precipitation data that have not been considered in previous products. One problem is the station error caused by wind-induced undercatch, and the other is the distributional error caused by interpolation methods. Thus, we present a new method to reduce the statistical distribution error in gridded precipitation over the TP.
The proposed method compensates for the wind-induced undercatch in the station data and is more suitable for the TP where precipitation is susceptible to wind due to the large proportion of solid precipitation (Goodison et al. 1998; Yang et al. 1993). Most previous studies have ignored wind-induced undercatch, although they have used dense station networks or sophisticated interpolation models (Chen et al. 2010; Hijmans et al. 2005). The compensation for the wind-induced undercatch in the statistical distribution aims to acquire an accurate statistical mean value.
In addition, the independent validation of the statistical distribution was used to indicate that the optimum interpolation method was chosen. Our validation is different from the validation in previous studies. We compare the WD, which directly indicates the statistical distribution similarity between gridded precipitation and observation data. Meanwhile, previous studies mainly compared the correlation coefficient or root-mean-square error (Wagner et al. 2012).
b. How station network density affects the statistical distribution
Previous studies on how the density affects the gridded estimates have drawn some common conclusions. When the observation network is sparse, the estimated variance and mean will be underestimated compared with the gridded truth for both spatial and temporal statistics (Herrera et al. 2019). The lower the station density, the more serious smoothing effect. Figure 7 shows a negative trend between the station network density and the WD value regardless of the interpolation algorithm used. This means that a sparse network can increase the statistical distribution error. As a whole, the slopes of the fit lines in IDW and TPS3D_IDW are the largest among the six interpolation methods. This means that IDW and TPS3D_IDW need more stations than TPS2D_OK, TPS3D_OK, TPSD3D_OK, and OK to obtain the same statistical distribution accuracy. This result demonstrates that it is possible to improve the validity of the gridded precipitation in sparse network areas by improving the interpolation method.
Gridded statistical distributions for dense networks can be influenced by two factors. First of all, in places with a high station density, local precipitation features, such as extreme precipitation and thunderstorms, are more easily captured (Maraun and Martin 2018). The high precipitation value caused by extreme precipitation and thunderstorms will reduce the smoothing effect by enlarging the variance. Second, in dense networks, the stations used to estimate the gridded value are close to each other, which means a larger shared variance compared to sparse networks (Herrera et al. 2019). A large shared variance induces a large variance at the gridded box, which can reduce the smoothing effect and lead to an accurate statistical distribution.
c. The validity of the gridded precipitation
The probability density function of the time series of the precipitation intensity at a single grid box is usually a gamma distribution. The distribution function is unique when the mean and variance are fixed. Thus, the validity of the new gridded precipitation is equal to the validity of the long-term mean and temporal variance in the gridbox estimates.
Measurement undercatch (Prein and Gobiet 2017) and interpolation, especially in sparse observation networks, will decrease the estimate mean at grid boxes (Hofstra et al. 2010; Herrera et al. 2019). In section 4c, our results show that after undercatch correction, the mean value of GPQM is 7.1% and 38.9% higher than that of CPC-UNI and APHRODITE, respectively. Ma et al. (2015) indicated that the annual precipitation would increase by approximately 27% after considering the measurement loss in the TP. Ye et al. (2007) also found that the annual precipitation increased by more than 30% when precipitation loss was considered in western China. Thus, it can be inferred that the increment of the mean is within a reasonable range.
For temporal variance, previous studies have pointed out that interpolation could lead to a decreased variance in the grid boxes (Herrera et al. 2019). In section 4c, the results show that the variance in GPGM is larger than that of both CPC-UNI and APHRODITE. Because bias in variance will be propagated to other statistics, such as quantiles and extreme weather indices, a larger variance that is no greater than the true value indicates that the smoothing effect is more effectively suppressed, and thus, the probability of reaching incorrect conclusions about changes in the climate variability and extremes is lower.
To further clarify the rationality of the increase in the mean and variance, we chose a relatively dense network area in the northern Himalayas. This area has six meteorological stations within a 150 km × 150 km square (Fig. 8). To match with such a resolution, we use mean value aggregation to upscale the GPQM, CPC-UNI and APHRODITE datasets and calculate their variance and mean. In addition, an areal average (AA), which serves as the true estimate, is obtained by calculating the average of the station value within the chosen area. The results show that both the mean and variance in the GPQM dataset are less than those of the AA but larger than those of the CPC-UNI and APHRODITE datasets (Table 6). It can be inferred that the mean and variance in the GPQM dataset will still be less than the true values in the remaining area because a sparser network will result in a more dramatic smoothing effect (Haylock et al. 2008; Hofstra et al. 2010). This result proves the rationality of our method because the mean and variance in the GPQM dataset are not greater than the areal averages (gridded truth) at the dense network.
Comparison with existing gridded datasets for a dense network. Var means the temporal variance.
d. Altitude impact
In the validation of the interpolation schemes, a larger WD than the observations is found at the Fugong station after considering the altitude as a copredictor (Table 3), which is different from the results at other sites. This result suggests that the positive impacts from the consideration of altitude have limitations.
In ANUSPLIN, cokriging or advanced methods such as the Parameter-Elevation Regressions on Independent Slopes Model (PRISM), the covariate term is simply linearly added to the estimated values (Daly et al. 2008; Hutchinson and Gessler 1994; Ly et al. 2011). However, there is not a simple linear relationship between altitude and precipitation amount (Higuchi et al. 1982). Thus, this relationship cannot be adequately described in the interpolation scheme, especially in those regions with scarce observation data and large height differences. For example, consider only two points A and B, which are located at 1000 and 2000 m, respectively. If the precipitation at point B is higher than that at point A, the interpolation method will assume that the precipitation will continue to increase from 1000 m. Even if the maximum precipitation height is reached, the precipitation will still increase. However, if station C exists at an altitude of 3000 m and the precipitation at C is less than that at B, then the interpolation method can correctly deal with the relationship between precipitation and altitude.
The reality is often more complicated. Bookhagen and Burbank (2006) found that the southern slope of the Himalayas exhibited more than one maximum. However, in other places, the relationships between precipitation and altitude are often simple. Aizen et al. (1997) found that the maximum height was 4000–5000 m in the areas surrounding Khan Tengri in the Tianshan Mountains. Kang et al. (1999) found that the maximum height in the Qilian Mountains could reach 6000 m. All of these heights are higher than the maximum height on the southern slope of the Himalayas. Therefore, the precipitation in those areas is less likely to be overestimated. For those areas with relatively small altitude differences, even if the relationship between precipitation and altitude is significant or not noticeable, rainfall will not be overestimated.
e. Limitations and challenges
Our study has some limitations that cannot be neglected. The first limitation is that we considered only the relationship between altitude and statistical distribution characteristics. To obtain a comprehensive understanding of the influences of orography on the statistical distribution, other orographic factors, such as slope, topographic facet orientation, and coastal proximity, should also be studied (Daly et al. 2017, 2008).
In addition, we used only independent validation, not cross validation. This validation process may induce sampling errors. Meanwhile, in cross validation, one portion of the data is used for validation, and the remaining data are used for interpolation. Finally, the interpolation is evaluated by averaging the performance measures across all runs. This method can significantly reduce the sampling error.
Interpolation shows the limitation of connecting the statistical distribution between stations and grid boxes. The results in section 4a show that even at a spatial resolution of 1 km, the gridded precipitation still differs from the statistical distribution with station data regardless of the interpolation method used. For further study, a new method that bridges the gap between stations and grid boxes should be used. Osborn and Hulme (1997) developed a theoretical model that described the rain day frequencies at stations and in grid boxes. Since their target was the evaluation of climate models, the details used to infer the gridbox statistical distribution from station data were not included.
6. Conclusions
We designed a new method to reduce the statistical distribution error in gridded precipitation over the Tibetan Plateau. This method compensates for the wind-induced undercatch in the original observations and optimizes the interpolation method, thus providing a more accurate statistical distribution compared to previous products. This method results in an effective suppression of the smoothing effect.
Our results show that performing an undercatch correction for station precipitation can improve the statistical distribution. The maximum decline in the statistical distribution error reaches more than 30%. The thin-plate splines algorithm that separates the gridding of the monthly precipitation totals and daily anomalies and considers altitude as a covariate of the monthly totals has the minimum distributional error in most locations. However, for those areas where the relationship between altitude and precipitation is not simply linear and the station network is not sufficiently dense, considering altitude may result in a poor statistical distribution. In addition, deterministic interpolation methods, such as inverse distance weighted, should be avoided due to their serious smoothing effect.
The comparison with the existing gridded precipitation data for the TP shows that the gridded precipitation from our method has a higher mean value, 98th percentile and temporal variance on average, indicating a more accurate precipitation amount and a weaker smoothing effect compared to existing datasets.
This method will help to improve the statistical distributional accuracy of gridded precipitation, thus providing a reliable reference for hydrological and meteorological studies on the TP.
Acknowledgments
This work is supported by the Chinese Academy of Sciences through the Strategic Priority Research Program under grant XDA19070302, the National Key Research and Development Program of China under Grant 2019YFC1510503, the National Natural Science Foundation of China under Grant 41971399, and the Natural Science Foundation of Qinghai Province under Grant 2020-ZJ-731. We are extremely grateful to the data providers. The station data in China were provided by the Chinese Meteorological Admiration. Meanwhile, the data from China and the CPC-UNI gridded precipitation were provided by the National Climatic Data Center (NCDC)/National Oceanic and Atmospheric Administration (NOAA). The APHRODITE gridded precipitation data were provided by the National Center for Atmospheric Research (NCAR).
Data availability statement
The resulted dataset GPQM is available from the National Tibetan Plateau Data Center (https://data.tpdc.ac.cn/en/data/0648c1db-ea04-4cc5-8a31-2ca4a017a258/).
REFERENCES
Aizen, V. B., E. M. Aizen, J. M. Melack, and J. Dozier, 1997: Climatic and hydrologic changes in the Tien Shan, Central Asia. J. Climate, 10, 1393–1404, https://doi.org/10.1175/1520-0442(1997)010<1393:CAHCIT>2.0.CO;2.
An, Z., J. E. Kutzbach, W. L. Prell, and S. C. Porter, 2001: Evolution of Asian monsoons and phased uplift of the Himalaya–Tibetan plateau since Late Miocene times. Nature, 411, 62–66, https://doi.org/10.1038/35075035.
Auer, I., R. Böhm, A. Jurković, and A. Orlik, 2005: A new instrumental precipitation dataset for the greater alpine region for the period 1800-2002. Int. J. Climatol., 25, 139–166, https://doi.org/10.1002/joc.1135.
Beguería, S., S. M. Vicente-Serrano, M. Tomás-Burguera, and M. Maneta, 2016: Bias in the variance of gridded data sets leads to misleading conclusions about changes in climate variability. Int. J. Climatol., 36, 3413–3422, https://doi.org/10.1002/joc.4561.
Bookhagen, B., and D. W. Burbank, 2006: Topography, relief, and TRMM-derived rainfall variations along the Himalaya. Geophys. Res. Lett., 33, L08405, https://doi.org/10.1029/2006GL026037.
Chen, D., T. Ou, L. Gong, C. Xu, W. Li, C. Ho, and W. Qian, 2010: Spatial interpolation of daily precipitation in China: 1951-2005. Adv. Atmos. Sci., 27, 1221–1232, https://doi.org/10.1007/s00376-010-9151-y.
Chen, M., W. Shi, P. Xie, V. B. S. Silva, V. E. Kousky, R. Wayne Higgins, and J. E. Janowiak, 2008: Assessing objective techniques for gauge-based analyses of global daily precipitation. J. Geophys. Res., 113, D04110, https://doi.org/10.1029/2007JD009132.
Daly, C., M. Halbleib, J. I. Smith, W. P. Gibson, M. K. Doggett, G. H. Taylor, J. Curtis, and P. P. Pasteris, 2008: Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Int. J. Climatol., 28, 2031–2064, https://doi.org/10.1002/joc.1688.
Daly, C., M. E. Slater, J. A. Roberti, S. H. Laseter, and L. W. Swift, 2017: High-resolution precipitation mapping in a mountainous watershed: Ground truth for evaluating uncertainty in a national precipitation dataset. Int. J. Climatol., 37, 124–137, https://doi.org/10.1002/joc.4986.
Ding, B., K. Yang, J. Qin, L. Wang, Y. Chen, and X. He, 2014: The dependence of precipitation types on surface elevation and meteorological conditions and its parameterization. J. Hydrol., 513, 154–163, https://doi.org/10.1016/j.jhydrol.2014.03.038.
Director, H., and L. Bornn, 2015: Connecting point-level and gridded moments in the analysis of climate data. J. Climate, 28, 3496–3510, https://doi.org/10.1175/JCLI-D-14-00571.1.
Evans, J. P., M. Ekström, and F. Ji, 2012: Evaluating the performance of a WRF physics ensemble over South-East Australia. Climate Dyn., 39, 1241–1258, https://doi.org/10.1007/s00382-011-1244-5.
Goodison, B. E., P. Y. T. Louie, and D. Yang, 1998: WMO solid precipitation measurement intercomparison. Instruments and Observing Methods Rep. 67, WMO/TD-872, 212 pp., https://www.wmo.int/pages/prog/www/IMOP/publications/IOM-67-solid-precip/WMOtd872.pdf.
Gudmundsson, L., J. B. Bremnes, J. E. Haugen, and T. Engen-Skaugen, 2012: Technical Note: Downscaling RCM precipitation to the station scale using statistical transformations - A comparison of methods. Hydrol. Earth Syst. Sci., 16, 3383–3390, https://doi.org/10.5194/hess-16-3383-2012.
Gutowski, W. J., S. G. Decker, R. A. Donavon, Z. Pan, R. W. Arritt, and E. S. Takle, 2003: Temporal–spatial scales of observed and simulated precipitation in central U.S. climate. J. Climate, 16, 3841–3847, https://doi.org/10.1175/1520-0442(2003)016<3841:TSOOAS>2.0.CO;2.
Haylock, M. R., N. Hofstra, A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New, 2008: A European daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006. J. Geophys. Res., 113, D20119, https://doi.org/10.1029/2008JD010201.
Herrera, S., S. Kotlarski, P. M. M. Soares, R. M. Cardoso, A. Jaczewski, J. M. Gutiérrez, and D. Maraun, 2019: Uncertainty in gridded precipitation products: Influence of station density, interpolation method and grid resolution. Int. J. Climatol., 39, 3717–3729, https://doi.org/10.1002/joc.5878.
Higuchi, K., Y. Ageta, T. Yasunari, and J. Inoue, 1982: Characteristics of precipitation during the monsoon season in high-mountain areas of the Nepal Himalaya. IAHS Publ., 138, 21–30.
Hijmans, R. J., S. E. Cameron, J. L. Parra, P. G. Jones, and A. Jarvis, 2005: Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol., 25, 1965–1978, https://doi.org/10.1002/joc.1276.
Hofstra, N., M. Haylock, M. New, P. Jones, and C. Frei, 2008: Comparison of six methods for the interpolation of daily, European climate data. J. Geophys. Res., 113, D21110, https://doi.org/10.1029/2008JD010100.
Hofstra, N., M. New, and C. McSweeney, 2010: The influence of interpolation and station network density on the distributions and trends of climate variables in gridded daily data. Climate Dyn., 35, 841–858, https://doi.org/10.1007/s00382-009-0698-1.
Huffman, G. J., R. F. Adler, M. M. Morrissey, D. T. Bolvin, S. Curtis, R. Joyce, B. Mcgavock, and J. Susskind, 2001: Global precipitation at one-degree daily resolution from multisatellite observations. J. Hydrometeor., 2, 36–50, https://doi.org/10.1175/1525-7541(2001)002<0036:GPAODD>2.0.CO;2.
Hutchinson, M. F., and P. E. Gessler, 1994: Splines—More than just a smooth interpolator. Geoderma, 62, 45–67, https://doi.org/10.1016/0016-7061(94)90027-2.
Isotta, F. A., and Coauthors, 2014: The climate of daily precipitation in the Alps: Development and analysis of a high-resolution grid dataset from pan-Alpine rain-gauge data. Int. J. Climatol., 34, 1657–1675, https://doi.org/10.1002/joc.3794.
Kang, E., Y. Shi, D. Yang, Y. Zhang, and G. Zhang, 1997: An experimental study of runoff formation in the mountains basin of the Urumqi River. Quat. Sci., 17, 44–51.
Kang, E., G. Cheng, Y. Lan, and H. Jin, 1999: A model for simulating the response of runoff from the mountainous watersheds of inland river basins in the arid area of northwest China to climatic changes. Sci. China, 42D, 52–63, https://doi.org/10.1007/BF02878853.
Kotlarski, S., and Coauthors, 2019: Observational uncertainty and regional climate model evaluation: A pan-European perspective. Int. J. Climatol., 39, 3730–3749, https://doi.org/10.1002/joc.5249.
Laflamme, E. M., E. Linder, and Y. Pan, 2016: Statistical downscaling of regional climate model output to achieve projections of precipitation extremes. Wea. Climate Extremes, 12, 15–23, https://doi.org/10.1016/j.wace.2015.12.001.
Lafon, T., S. Dadson, G. Buys, and C. Prudhomme, 2013: Bias correction of daily precipitation simulated by a regional climate model: A comparison of methods. Int. J. Climatol., 33, 1367–1381, https://doi.org/10.1002/joc.3518.
Liu, X., and B. Chen, 2000: Climatic warming in the Tibetan Plateau during recent decades. Int. J. Climatol., 20, 1729–1742, https://doi.org/10.1002/1097-0088(20001130)20:14<1729::AID-JOC556>3.0.CO;2-Y.
Ly, S., C. Charles, and A. Degré, 2011: Geostatistical interpolation of daily rainfall at catchment scale: The use of several variogram models in the Ourthe and Ambleve catchments, Belgium. Hydrol. Earth Syst. Sci., 15, 2259–2274, https://doi.org/10.5194/hess-15-2259-2011.
Ma, Y., Y. Zhang, D. Yang, and S. B. Farhan, 2015: Precipitation bias variability versus various gauges under different climatic conditions over the Third Pole Environment (TPE) region. Int. J. Climatol., 35, 1201–1211, https://doi.org/10.1002/joc.4045.
Maeno, K., H. Ohmori, J. Matsumoto, and T. Hayashi, 2009: Characteristics of daily precipitation during the monsoon season in Nepal. J. Geophys. Res., 113, 512–523, https://doi.org/10.5026/jgeography.113.4_512.
Maraun, D., 2013: Bias correction, quantile mapping, and downscaling: Revisiting the inflation issue. J. Climate, 26, 2137–2143, https://doi.org/10.1175/JCLI-D-12-00821.1.
Maraun, D., and W. Martin, 2018: Statistical Downscaling and Bias Correction for Climate Research. Cambridge University Press, 92 pp., https://doi.org/10.1017/9781107588783.
Mueller, J., and T. Jaakkola, 2015: Principal differences analysis: Interpretable characterization of differences between distributions. Adv. Neural Inf. Process. Syst., 28, 1702–1710.
Osborn, T. J., and M. Hulme, 1997: Development of a relationship between station and grid-box rainday frequencies for climate model evaluation. J. Climate, 10, 1885–1908, https://doi.org/10.1175/1520-0442(1997)010<1885:DOARBS>2.0.CO;2.
Pan, X., and Coauthors, 2016: Bias corrections of precipitation measurements across experimental sites in different ecoclimatic regions of western Canada. Cryosphere, 10, 2347–2360, https://doi.org/10.5194/tc-10-2347-2016.
Piani, C., G. P. Weedon, M. Best, S. M. Gomes, P. Viterbo, S. Hagemann, and J. O. Haerter, 2010: Statistical bias correction of global simulated daily precipitation and temperature for the application of hydrological models. J. Hydrol., 395, 199–215, https://doi.org/10.1016/j.jhydrol.2010.10.024.
Prein, A. F., and A. Gobiet, 2017: Impacts of uncertainties in European gridded precipitation observations on regional climate analysis. Int. J. Climatol., 37, 305–327, https://doi.org/10.1002/joc.4706.
Roe, G. H., 2005: Orographic precipitation. Annu. Rev. Earth Planet. Sci., 33, 645–671, https://doi.org/10.1146/annurev.earth.33.092203.122541.
Sevruk, B., 1982: Methods of correction for systematic error in point precipitation measurement for operational use. WMO Operational Hydrology Rep. 21, 91 pp.
Themeßl, M. J., A. Gobiet, and G. Heinrich, 2012: Empirical-statistical downscaling and error correction of regional climate models and its impact on the climate change signal. Climatic Change, 112, 449–468, https://doi.org/10.1007/s10584-011-0224-4.
Thrasher, B., E. P. Maurer, C. McKellar, and P. B. Duffy, 2012: Technical Note: Bias correcting climate model simulated daily temperature extremes with quantile mapping. Hydrol. Earth Syst. Sci., 16, 3309–3314, https://doi.org/10.5194/hess-16-3309-2012.
Wagner, P. D., P. Fiener, F. Wilken, S. Kumar, and K. Schneider, 2012: Comparison and evaluation of spatial interpolation schemes for daily rainfall in data scarce regions. J. Hydrol., 464–465, 388–400, https://doi.org/10.1016/j.jhydrol.2012.07.026.
Wu, J., X. Gao, F. Giorgi, and D. Chen, 2017: Changes of effective temperature and cold/hot days in late decades over China based on a high resolution gridded observation dataset. Int. J. Climatol., 37, 788–800, https://doi.org/10.1002/joc.5038.
Wulf, H., B. Bookhagen, and D. Scherler, 2010: Seasonal precipitation gradients and their impact on fluvial sediment flux in the Northwest Himalaya. Geomorphology, 118, 13–21, https://doi.org/10.1016/j.geomorph.2009.12.003.
Xie, P., M. Chen, S. Yang, A. Yatagai, T. Hayasaka, Y. Fukushima, and C. Liu, 2007: A gauge-based analysis of daily precipitation over East Asia. J. Hydrometeor., 8, 607–626, https://doi.org/10.1175/JHM583.1.
Xie, P., M. Chen, and W. Shi, 2010: CPC unified gauge-based analysis of global daily precipitation. 24th Conf. on Hydrology, Atlanta, GA, Amer. Meteor. Soc., 2.3A, https://ams.confex.com/ams/90annual/techprogram/paper_163676.htm.
Yang, D., and T. Ohata, 2001: A bias-corrected Siberian regional precipitation climatology. J. Hydrometeor., 2, 122–139, https://doi.org/10.1175/1525-7541(2001)002<0122:ABCSRP>2.0.CO;2.
Yang, D., J. R. Metcalfe, B. E. Goodison, and E. Mekis, 1993: “True Snowfall”-An evaluation of the double fence intercomparison reference gauge. 61st Annual Western Snow Conf., Quebec City, QC, Canada, Western Snow Conference, 105–111, https://westernsnowconference.org/node/554.
Yatagai, A., O. Arakawa, K. Kamiguchi, H. Kawamoto, M. I. Nodzu, and A. Hamada, 2009: A 44-year daily gridded precipitation dataset for Asia based on a dense network of rain gauges. SOLA, 5, 137–140, https://doi.org/10.2151/SOLA.2009-035.
Yatagai, A., K. Kamiguchi, and O. Arakawa, 2012: Aphrodite constructing a long-term daily gridded precipitation dataset for Asia based on a dense network of rain gauges. Bull. Amer. Meteor. Soc., 93, 1401–1415, https://doi.org/10.1175/BAMS-D-11-00122.1.
Ye, B., D. Yang, Y. Ding, and T. Han, 2007: A bias-corrected precipitation climatology for China. Acta Geogr. Sin., 62, 3–13.
Zhang, Y., T. Ohata, and D. Yang, 2004: Bias correction of daily precipitation measurements for Mongolia. Hydrol. Processes, 18, 2991–3005, https://doi.org/10.1002/hyp.5745.