Meteorological data are fundamental to understanding global water and energy cycles (Trenberth et al. 2007, 2009; Schneider et al. 2013; L’Ecuyer et al. 2015; Rodell et al. 2015). A variety of geoscientific and operational applications, such as hydrological modeling and climate studies, need gridded meteorological inputs of variables such as precipitation, temperature, and humidity (Clark et al. 2015a,b; Hamman et al. 2018; Nguyen et al. 2018).
Many gridded meteorological datasets have been developed with different spatiotemporal coverage and resolution, different meteorological variables, different application objectives, and different data sources (Maggioni et al. 2016; Sun et al. 2018). Ground stations are the most important data source used to produce and validate gridded meteorological datasets (Sheffield et al. 2006; Livneh et al. 2015; Fick and Hijmans 2017; Tang et al. 2020b; Harris et al. 2020) given their high accuracy and long temporal coverage in many station networks (Menne et al. 2012). However, station networks have disadvantages such as limited spatial coverage in remote/undeveloped regions and temporal discontinuities caused by missing records and incomplete observation periods (Eischeid et al. 2000; Kidd et al. 2017; Tang et al. 2020a). Remote sensing techniques and numerical weather prediction models provide additional information to produce gridded meteorological datasets with global coverage at high spatiotemporal resolutions (Sorooshian et al. 2000; Huffman et al. 2007; Hou et al. 2014; Gelaro et al. 2017; Hersbach et al. 2020). Most remote sensing and model datasets rely on station observations for bias correction, data assimilation, and quality validation (Adler et al. 2003; Ashouri et al. 2015; Ma et al. 2018; Beck et al. 2019).
Existing meteorological datasets are typically deterministic, providing a single estimate of a given meteorological variable for a specific location and time step. Stations suffer from point-to-area interpolation uncertainties and measurement errors such as precipitation evaporation/wetting loss and undercatch/overcatch (Goodison et al. 1998; Yang et al. 2005; Scaff et al. 2015; Kochendorfer et al. 2018). Remote sensing techniques face challenges of imperfect retrieval algorithms, instrument limitations, insufficient sampling, and signal attenuation (Dinku et al. 2002; Adler et al. 2017; Beck et al. 2017; Tang et al. 2020b). Numerical weather prediction models are limited by imperfect model representations of physical processes and observational constraints (Donat et al. 2014; Parker 2016). Applications based on deterministic datasets may ignore these uncertainties. This is particularly true for remote regions and complex climatic/topographic conditions where almost all meteorological datasets exhibit large uncertainties due to sparse/unreliable measurements and imperfect models/algorithms (Gao et al. 2012; Henn et al. 2018; Newman et al. 2020).
Probabilistic datasets have advantages in estimating uncertainties and representing extremes (Kirstetter et al. 2015; Mendoza et al. 2017; Frei and Isotta 2019). Yet probabilistic datasets often have large data size, may be difficult to interpret, and their applications require more computational resources. Currently, a few datasets provide explicit uncertainty estimates, such probabilistic datasets include the HadCRUT4 global temperature dataset with 100 members (Morice et al. 2012), the Spatially Coherent Probabilistic Extended Climate dataset (SCOPE Climate) with 25 members in France (Caillouet et al. 2019), the ensemble precipitation and temperature datasets with 100 members in the United States and parts of Canada (Newman et al. 2015, 2019, 2020), and the Ensemble Meteorological Dataset for North America (EMDNA) with 100 members (Tang et al. 2021). Recently, several deterministic datasets offer probabilistic realizations, such as the ensemble version (Cornes et al. 2018) of the Europe-wide E-OBS temperature and precipitation dataset (Haylock et al. 2008), and the High-Resolution Ensemble Precipitation Analysis (Khedhaouiria et al. 2020) as the ensemble version of the Canadian Precipitation Analysis (CaPA; Mahfouf et al. 2007; Fortin et al. 2015). However, a global probabilistic meteorological dataset has not yet been developed.
Here we develop the Ensemble Meteorological Dataset for Planet Earth (EM-Earth) dataset, which provides estimates of precipitation, mean daily temperature (Tmean), daily temperature range (Trange), and dewpoint temperature (Tdew) at 0.1° resolution from 1950 to 2019 for global land areas. Precipitation, Tmean, and Tdew estimates are also available at the hourly scale. EM-Earth utilizes station data from the Serially Complete Earth (SC-Earth) dataset developed by Tang et al. (2021a), a global station dataset without temporal discontinuities. SC-Earth is merged with ERA5 to generate global gridded meteorological estimates and uncertainties. To meet the requirement of diverse applications, EM-Earth provides two types of datasets: the daily and hourly deterministic dataset, and the daily probabilistic dataset with 25 members (Fig. 1).
Datasets
Datasets used in this study (summarized in Table 1) can be divided into input sources, auxiliary data, validation stations, and intercomparison datasets.
Datasets used in the production and validation of EM-Earth.
Input data.
The two major inputs of EM-Earth are the SC-Earth and ERA5. SC-Earth was developed to address the temporal discontinuities of station measurements caused by occasional/seasonal missing records, values failed in quality control, and incomplete observation periods (Eischeid et al. 2000; Feng et al. 2004; Wang et al. 2017). Gap filling of station data can improve the quality of gridded meteorological estimates according to Longman et al. (2020) in Hawaii and Tang et al. (2021b) in North America. To illustrate this point, Tang et al. (2021b) compared the performance of raw station observations and a gap-filled dataset in gridded precipitation and temperature estimation over North America from 1979 to 2018 using various interpolation methods. They showed that gap filling improves the accuracy and trend of gridded estimates, with improvements due to gap filling more notable for lower station densities.
SC-Earth uses station data from the Global Historical Climatology Network Daily (GHCN-D; Menne et al. 2012) and the Global Surface Summary of the Day (GSOD; https://data.gov/index.html). Raw station observations have undergone strict quality control, and only stations with at least 8-yr records were used (Tang et al. 2021a). Tmean and Trange were calculated, respectively, as the mean of and difference between daily maximum and minimum air temperature from station data. That is desirable as some stations only provide Tmean data, and also, constraining Trange to be positive is easier than constraining the maximum temperature to be larger than the minimum; the potential skewness of diurnal temperature cycle was not considered. SC-Earth imputes missing values using 15 strategies based on four methods including quantile mapping, spatial interpolation/regression, machine learning, and multisource merging. A climatological correction based on quantile mapping and quantile delta mapping is applied to further improve the SC-Earth estimates. SC-Earth includes 64,399, 35,925, 34,851, 12,310, and 12,872 stations for precipitation, Tmean, Trange, Tdew, and wind speed, respectively. The station density of SC-Earth is constant from 1950 to 2019 and higher than many raw station datasets. For example, GHCN-D has ∼113,000 precipitation stations in total, while 90% of years before 2020 have fewer than 40,000 stations. Wind speed is not a target variable of EM-Earth because of the potential inhomogeneities in raw station data (Tang et al. 2021a). The average fractions of filled data are ∼55% for precipitation, Tmean, and Trange, and ∼59% for Tdew. The fractions are higher in early years due to sparse station networks.
ERA5 provides hourly near-surface estimates of precipitation, minimum temperature, maximum temperature, and Tdew. Tmean and Trange are inferred from minimum and maximum temperatures. We use ERA5 because it is the reanalysis product with the highest spatial and temporal resolution during 1950–2019, and its estimates can complement SC-Earth station data in sparsely gauged regions. In addition, ERA5 air temperature estimates at different pressure levels are used as auxiliary data to calculate near-surface temperature lapse rate to support spatial downscaling of ERA5 near-surface Tmean estimates. ERA5 ensemble estimates are not used because the Ensemble Data Assimilation (EDA) of ERA5 has coarse spatial resolution and incomplete uncertainty representation. Instead, EM-Earth uncertainties are estimated directly by using station data for cross validation. Other reanalysis products are not used because their spatiotemporal resolutions and temporal coverage do not meet the requirement of this study.
Auxiliary datasets.
Elevation data are from the Multi-Error-Removed Improved-Terrain (MERIT) digital elevation model (DEM) at 3″ (∼90 m at the equator) resolution (Yamazaki et al. 2017). MERIT DEM is spatially averaged to the 0.1° resolution to provide auxiliary information for spatial interpolation of station data. The MERIT DEM is also used as the land–sea mask. Monthly temperature (Tmean and Trange are inferred from minimum and maximum temperature) and vapor pressure data from WorldClim V2.1 (Fick and Hijmans 2017) are used to downscale ERA5 from the 0.25° to the target 0.1° resolution because the 1-km WorldClim data contain the topographic information at a very high resolution. WorldClim vapor pressure estimates are converted to Tdew estimates using the Tetens equation (Tetens 1930; Fick and Hijmans 2017). The bias-corrected WorldClim V2 data from the Precipitation Bias Correction (PBCOR) dataset (Beck et al. 2020) are used to correct the undercatch error of EM-Earth precipitation estimates. PBCOR infers the “true” long-term precipitation amount from global streamflow observations using the Budyko curve. The PBCOR WorldClim estimates are spatially aggregated from the raw 0.05° resolution to 0.1° resolution.
Evaluation and comparison datasets.
Some GHCN-D and GSOD stations are not used in SC-Earth and EM-Earth for reasons such as short observation periods, insufficient samples, and failure in gap filling. Those stations have passed quality control and are used as independent data sources to validate EM-Earth ensemble estimates. The station numbers are 33,548, 10,721, and 5,425 for precipitation, Tmean/Trange, and Tdew, respectively. EM-Earth data are also compared to three widely used climate datasets, including the Global Precipitation Climatology Centre (GPCC) dataset (Schneider et al. 2013), the Climatic Research Unit gridded Time Series (CRU TS; Harris et al. 2020), and the University of Delaware Air Temperature and Precipitation Dataset (UDEL; Matsuura and Willmott 2017).
Methodology
Theory of probabilistic estimation.
Deterministic estimation.
Station-based estimates.
Locally weighted linear regression is used as a spatial interpolation method: the topographic attributes at station locations are used as predictor variables in the regression equation, and the meteorological variables at station locations on a given day is the predictand (Clark and Slater 2006). Elevation is a common predictor variable, yet we found that elevation can lead to large bias in regions where the station network is too sparse to represent local topographic variation (e.g., mean temperature is largely overestimated in the Andes Mountains and in the Arctic Archipelago). Therefore, we utilized the climatologically aided interpolation (CAI; Willmott and Robeson 1995) which uses WorldClim climatology as the background and uses locally weighted linear regression to interpolate anomalies (ratio for precipitation and difference for temperature variables). The predictors used in the locally weighted linear regression are latitude and longitude. WorldClim is spatially averaged to the 0.1° spatial resolution and contributes to the inclusion of topographic information in CAI estimates.
The leave-one-out strategy is used to obtain estimates for every station to enable independent evaluation and provide estimates of uncertainty that are used in the optimal interpolation merging strategy (described later). We use the leave-one-out method since it retains the density of stations better than other data withholding methods (e.g., drop 10%) and does not need station selection.
Reanalysis estimates.
The ERA5 estimates are adjusted in time (time shift) to better match the local station data, and adjusted in space (downscaling) to provide estimates at the 0.1° grid.
The time shift of ERA5 optimizes the merging of ERA5 and station data. ERA5 estimates are in UTC, while daily station measurements are often recorded at local time (Yatagai et al. 2020). For example, daily precipitation measurements at manual stations are accumulations referring to the 24 h before the reporting time (e.g., the 24 h before 0700 local time). To account for the temporal mismatch, the hourly ERA5 series is adjusted (i.e., shifted forward or backward) to achieve the optimal agreement between daily reanalysis and station series (appendix B). The temporal adjustment can improve the reanalysis–station-merged estimates, particularly for precipitation that is most affected by the temporal mismatch.
Downscaling the ERA5 data are necessary to provide information at finer spatial resolutions than the original ERA5 grid, which is given at the 0.25° resolution. We downscale ERA5 estimates to obtain 1) 0.1° gridded estimates and 2) point-scale estimates corresponding to stations. The bilinear interpolation is used for precipitation because daily precipitation shows strong spatial variability and advanced statistical downscaling methods are necessary to obtain authentic spatial details. The high-resolution WorldClim is used as the background to downscale Tmean, Trange, and Tdew (appendix C).
Station–reanalysis merging.
Optimal interpolation (OI) is an effective method for multisource merging and can improve the accuracy of gridded estimates (Mahfouf et al. 2007; Xie and Xiong 2011; Fortin et al. 2015; Shen et al. 2018). OI-based merging of station data and ERA5 estimates follows the framework of Tang et al. (2021) with a novel design to calculate spatiotemporally distributed merging weights based directly on observation and background errors (appendix D). OI merging provides gridded meteorological estimates and uncertainty estimates. The leave-one-out strategy is used to validate OI-merged estimates.
We also merge station and reanalysis information to estimate the probability of precipitation using locally weighted logistic regression, which is implemented similar to the locally weighted linear regression. The predictand is the binary precipitation occurrence (0 or 1) from station data, and the predictor is the daily precipitation amount from ERA5 data. The reanalysis–station-merged estimates provide the deterministic version of the EM-Earth.
Probabilistic estimation.
The parameters of the probability distributions in Eq. (1) are generated during the deterministic estimation step. To sample from the probability distributions, we generate the spatiotemporally correlated random field (SCRF; appendix E). The SCRF contains random numbers for every grid and time step. The spatial correlation structure is based on a two-parameter power-exponential correlation function (e.g., Papalexiou and Serinaldi 2020). The temporal correlation structure is based on the lag-1 autocorrelation or intervariable correlation.
Probabilistic estimates are obtained by using the SCRF to sample from the probability distributions (appendix F). We generate 25 ensemble members, which compose the probabilistic version of EM-Earth. A much larger number of ensemble members can be generated, yet to restrict the total size of the EM-Earth dataset we created 25 members.
Postprocessing of EM-Earth.
The postprocessing addresses two problems caused by the limitation of raw station observations: 1) the undercatch of precipitation; and 2) the local reporting time. Station measurements underestimate precipitation amounts due to undercatch of precipitation, particularly for snowfall during windy conditions (Rasmussen et al. 2012). Overcatch may also exist but is ignored here because overcatch is uncommon. We correct EM-Earth deterministic and probabilistic precipitation estimates using PBCOR WorldClim as the background (appendix G). The correction ensures that EM-Earth and PBCOR WorldClim have the same precipitation climatology during their overlapped period from 1970 to 2000, and thus accounts for the undercatch bias and other possible biases.
The raw daily EM-Earth estimates match the local reporting time of meteorological stations, while large-domain applications require data corresponding to 0000–2400 UTC. We use a temporal disaggregation and aggregation method to adjust the reporting time of daily estimates (appendix H). Hourly EM-Earth estimates are obtained during the disaggregation step for all variables except Trange for which hourly data are scarcely used in research and applications. The released version of EM-Earth contains hourly and daily deterministic estimates before and after temporal adjustment, and daily probabilistic estimates after temporal adjustment.
Evaluation of EM-Earth.
Two common metrics, i.e., the correlation coefficient (CC) and the root-mean-square error (RMSE), are used to evaluate EM-Earth deterministic estimates (i.e., OI merging) based on the leave-one-out strategy. We use precipitation estimates before bias correction to match the raw station data. The evaluation uses raw station observations from SC-Earth, although both raw observations and gap-filled estimates are used in the production of EM-Earth. The evaluation of probabilistic estimates uses independent GHCN-D and GSOD stations. The Brier skill score (BSS; Brier 1950) and the continuous ranked probability skill score (CRPSS; Hersbach 2000) are applied to evaluate probabilistic precipitation and temperature estimates, respectively (appendix I). The perfect value for both metrics is one. EM-Earth deterministic estimates are also compared to several widely used deterministic datasets, including GPCC, CRU TS, UDEL, and ERA5 to validate the spatial distributions and climate trends of EM-Earth.
Results
Evaluation of EM-Earth deterministic estimates.
The CC and RMSE of EM-Earth deterministic estimates are shown in Figs. 2 and 3, respectively. The global mean CC values for precipitation, Tmean, Trange, and Tdew are 0.77, 0.97, 0.83, and 0.97, respectively. Natural variability and station density are factors affecting the quality of EM-Earth estimates, particularly for precipitation and Trange that vary more than Tmean and Tdew. All variables show lower CC in the tropics and oceanic islands because of the strong climate variability and lower station density. The CC of precipitation and Trange is lower in Africa, South America, and central and north Asia compared to other regions due to insufficient ground observations.
The global mean RMSE values are 4.50 mm day−1, 1.49°C, 2.50°C, and 1.48°C for precipitation, Tmean, Trange, and Tdew, respectively. The RMSE of precipitation shows higher values in regions where precipitation is larger such as the Amazon rain forest and Malay Archipelago (Fig. 3). Tmean, Trange, and Tdew show lower RMSE in Europe and higher RMSE in central and northeast Asia and the Rocky Mountains in North America. In addition, Tdew shows relatively high RMSE in the Sahara Desert and Arabian Peninsula with a very dry climate. The global mean errors are −0.07 mm day−1, −0.01°C, −0.1°C, and 0.01°C for precipitation, Tmean, Trange, and Tdew, respectively. Overestimation and underestimation coexist in most regions of the world, while a few regions show systematic errors (e.g., Tmean is underestimated in the Sahara Desert and Arabian Peninsula; Fig. ES1 in the online supplemental material). The global mean relative biases of precipitation and Trange are both ∼1% with similar spatial patterns for mean errors (Fig. ES2).
Evaluation of EM-Earth probabilistic estimates.
The spread of probabilistic estimates depends on the uncertainty of deterministic estimates, which show generally consistent spatial distributions with RMSE (Fig. 4). The large uncertainty can result in large spread of probabilistic estimates in regions such as the tropics, South America and Africa for precipitation, and central and northeast Asia for Tmean. The magnitude of uncertainties is related to the precipitation and temperature range. For example, compared to the eastern United States, the western United States shows lower uncertainty in the precipitation magnitude due to the drier climate (Fig. 4), yet it shows a higher ratio of uncertainties due to precipitation estimation issues in the mountains (Fig. ES3). It is challenging for deterministic meteorological datasets to obtain accurate estimates in those regions, while probabilistic estimates can capture the true values through ensemble realizations.
The reliability diagram shows the conditional probability of observed precipitation events given the probability of probabilistic estimates (Fig. 5). The reliability for the 0 mm day−1 threshold is high for all continents. The observed probability is slightly overestimated for the high estimated probability because gridded estimates tend to have more wet events than point-scale station observations. The reliability performance decreases as the rain–no-rain threshold increases from 0 to 50 mm day−1. The rank of the six continents from high to low is North America, Oceania, Europe, Asia, South America, and Africa.
The reliability of heavy precipitation estimates is notably worse for Asia, South America, and Africa. In these regions, extreme precipitation events are typically caused by small convective systems. These convective systems are not well captured by the sparse station networks or well simulated by numerical weather models. EM-Earth deterministic estimates have higher accuracy than station-based regression estimates and ERA5 estimates, but could still show a high false alarm ratio and low hit rate for extreme events in Africa, Asia, and South America. The limited number of validation stations is another reason for the poor performance because the reliability performance for large rain–no-rain thresholds may be effected from a few stations located in specific regions. For example, the weak reliability in Asia mainly occurs along the southern slopes of the Himalayas, where the complex topography and climate make accurate precipitation estimation difficult.
Almost all deterministic datasets have low accuracy of heavy precipitation estimates, but probabilistic estimates have advantages in capturing extremes. For example, if EM-Earth deterministic estimates substantially underestimate an extreme precipitation event, most EM-Earth ensemble members will inherit the systematic underestimation resulting in the low reliability (e.g., Fig. 5), however, several members may receive large positive perturbation and thus encompass the true value.
Validation based on BSS and CRPSS metrics (Fig. 6) also shows that EM-Earth precipitation and temperature probabilistic estimates are much better in North America, Oceania, and Europe than in Asia, South America, and Africa. The CRPSS values for Tmean, Trange, and Tdew estimates are particularly high and stable over North America and Europe. For Oceania, the lower rank of temperature estimates compared to precipitation estimates (Figs. 5 and 6) is caused by the larger number of precipitation stations than temperature stations in Australia. South America shows large variation for the 25%–50% CRPSS values (Figs. 6b,d) mainly due to the degraded performance of EM-Earth estimates in the Amazon rain forest.
Comparison between EM-Earth and other datasets.
EM-Earth precipitation and Tmean are compared to several popular meteorological datasets. Trange and Tdew are not included in this comparison since they are not always provided by existing datasets. We present two types of EM-Earth precipitation estimates: EM-Earth raw and EM-Earth final after bias correction (Fig. 7). The latitudinal curves of precipitation are shown in Fig. ES4. The correction results in a substantial increase in precipitation amounts in high-latitude regions, particularly Greenland. The Malay Archipelago, India, and northern part of South America also see a notable increase of precipitation after correction, indicating that the raw EM-Earth estimates in those regions may be deficient. For example, SC-Earth does not have any stations in the northern corner of South America, resulting in the relatively low quality of EM-Earth and thus obvious correction impact. Compared to GPCC, CRU TS, and UDEL, EM-Earth final shows higher precipitation over most of Asia (particularly the Himalayas), high-latitude North America (particularly Greenland), and Andes Mountains in South America, but slightly lower precipitation in Africa, eastern South America, and Oceania. The phenomenon is consistent with the findings based on PBCOR WorldClim in Beck et al. (2020). ERA5 shows the highest precipitation among all datasets in most parts of the world, particularly South America, East Africa, and mainland Southeast Asia, while in Greenland, ERA5 shows much lower precipitation than EM-Earth final. Nevertheless, EM-Earth final precipitation estimates may not be reliable in Greenland where the PBCOR dataset does not use any streamflow measurement (Beck et al. 2020).
For Tmean, EM-Earth, CRU TS, and UDEL show the largest difference in Greenland where all datasets lack sufficient observations (Fig. 8). In the Himalayas, EM-Earth Tmean is lower than CRU TS and UDEL but higher than ERA5. In the East Siberian Mountains, EM-Earth is notably higher than CRU TS, slightly higher than UDEL, and slightly lower than ERA5. In the Andes Mountains, EM-Earth is slightly lower than CRU TS, notably lower than UDEL, and slightly higher than ERA5. Overall, all datasets are comparable for Tmean except in some regions with complex topography or very few stations.
GPCC and CRU TS have very similar time series of global annual precipitation and precipitation anomalies from 1950 to 2019 (Fig. 9). ERA5 shows the highest global precipitation and the largest interannual variability. EM-Earth raw precipitation is closer to GPCC, CRU TS, and UDEL after 1970 compared to earlier years (Fig. 9c) because the effect of ERA5 on EM-Earth seems larger in early years when SC-Earth has a high fraction of gap-filled estimates. As the result, EM-Earth shows a larger increasing trend of precipitation compared to other datasets. The mean annual precipitation estimates for global land (excluding Antarctica) are 794, 793, 783, 803, and 862 mm yr−1 for GPCC, CRU TS, UDEL, EM-Earth raw, and EM-Earth final, respectively. EM-Earth final shows the highest precipitation because the PBCOR WorldClim, as the reference for correction, has a global precipitation estimate of 862 mm yr−1, which is close to the Global Precipitation Climatology Project (GPCP) estimate of 853 mm yr−1 (Beck et al. 2020). GPCP uses a different undercatch correction method compared to GPCC and might be closer to the true precipitation in high-latitude regions (Behrangi et al. 2018). For Tmean, all products show consistent interannual variability and trend, except that ERA5 shows a slightly lower Tmean particularly before 1979 probably because the retrospective ERA5 estimates from 1950 to 1978 are still at a preliminary stage.
Many datasets use GPCC or CRU TS estimates as the reference for monthly correction to achieve higher accuracy and consistent spatiotemporal variations (e.g., Huffman et al. 2007; Abatzoglou et al. 2018; Beck et al. 2019). We do not apply such correction to keep the independence of EM-Earth which already integrates information from stations and reanalysis models. Nevertheless, EM-Earth is also suitable for water and energy budget studies that mainly rely on long-term mean climatology data. EM-Earth precipitation data show the same climatology with PBCOR WorldClim due to the bias correction procedure (appendix G). EM-Earth temperature data are close to CRU TS and UDEL in the globe (Fig. 9) and all continents except South America where all datasets show notable discrepancies.
Conclusions
We developed the EM-Earth version 1 dataset with both deterministic and probabilistic estimates of precipitation, Tmean, Trange, and Tdew at the 0.1° resolution for global land areas from 1950 to 2019. Daily minimum and maximum temperature can be estimated from Tmean and Trange since Trange is symmetric about Tmean. Humidity variables such as vapor pressure can be inferred from Tdew estimates. The deterministic estimates are at the hourly and daily scales. The probabilistic estimates are at the daily scale and have 25 ensemble members.
The EM-Earth methodology features several advantages compared to existing meteorological datasets: 1) a serially complete station dataset (i.e., SC-Earth) was developed and used to reduce the effect of stations’ temporal discontinuities on gridded estimates; 2) the temporal mismatch between reanalysis estimates and station observations is considered, which improves the accuracy of EM-Earth estimates (particularly for precipitation); 3) a novel implementation of optimal interpolation is used to merge reanalysis and station data and achieves higher accuracy than using a single input; 4) distributed parameters of spatial and temporal correlation structures are estimated to generate global spatiotemporally correlated random fields; 5) EM-Earth provides both deterministic and probabilistic estimates, and the probabilistic estimates explicitly consider meteorological uncertainties which can complement the inadequacy of existing datasets.
The validation shows that EM-Earth version 1 has reasonable accuracy and comparable distributions and trends with several widely used datasets over the globe. The quality of deterministic estimates is less reliable in regions with complex climate/topography and sparse stations, where probabilistic estimates can provide valuable information of meteorological uncertainties. The dataset can be used in diverse hydrological, meteorological, and climate studies.
Future work will focus on improving precipitation estimates in sparsely gauged regions, including more meteorological variables, and utilizing more data sources. For example, currently, gridded uncertainty estimates are directly interpolated from stations, and EDA ensemble reanalysis estimates could benefit uncertainty estimation in regions with sparse stations. Merging other reanalysis products and satellite products can further improve the quality compared to merely merging ERA5 and station data.
Data availability statement.
The EM-Earth version 1 dataset is available at the Federated Research Data Repository (FRDR) website (https://doi.org/10.20383/102.0547). A link to the entire EM-Earth dataset will be provided upon acceptance of the manuscript.
Acknowledgments.
The study is funded by the Global Water Futures project. SMP acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC Discovery Grant: RGPIN-2019-06894).
Appendix A: Precipitation transformation
We evaluated alternative probabilistic precipitation estimation methods using the lognormal distribution to replace the normal distribution. The lognormal distribution can directly obtain probabilistic precipitation estimates due to its explicit mathematical expression of the mean and standard deviation. However, probabilistic estimates based on the lognormal distribution were worse than those based on the normal distribution enabled by the Box–Cox transformation (not shown). Future studies can investigate the applicability of parametric probability distributions such as the generalized gamma distribution (Papalexiou 2018; Papalexiou and Serinaldi 2020).
Appendix B: Temporal adjustment of ERA5
The temporal adjustment of ERA5 estimates aims to achieve the highest correlation with station data (Beck et al. 2019). There are four steps. First, hourly ERA5 estimates are matched with stations using the nearest neighbor method. Second, for each station, the matched hourly ERA5 series is shifted by −48, −47, …, 0, …, 47, 48 h, and the daily estimates are obtained for each shift hour. The correlation coefficient between station observations and shifted ERA5 estimates is calculated. The shift hour with the highest correlation is adopted. Third, gridded shift hours are generated by adopting the mode of shift hours of all stations within every country. This is because stations within the same country often adopt the same reporting time, although sometimes different station networks in the same country may have different routines. We do not implement direct interpolation of station-based shift hours because this may introduce large regional uncertainties. Finally, gridded hourly ERA5 estimates are converted to daily estimates based on the gridded shift hours. There is an additional postprocessing step (described in appendix H) to convert the final dataset back to the ERA5 time.
We do not adjust the daily station data using the shift hours because 1) the shift hours may not represent the actual reporting time of stations due to the bias in reanalysis estimates and 2) adjusting raw station data could introduce uncertainties such as inflated wet days and decreased extreme records. The possible historical change of station reporting time (such as in Canada; Hopkinson et al. 2011) is not considered in the study.
Appendix C: Spatial downscaling of ERA5
The downscaling of Tmean, Trange, and Tdew estimates is based on the delta method using WorldClim as the background. First, WorldClim data are averaged from the 1 km to 0.25° resolution. The difference between ERA5 and WorldClim data are calculated at the 0.25° resolution and bilinearly interpolated to the 1-km resolution. Then, WorldClim data are adjusted at the 1-km resolution and averaged to the 0.1° resolution.
The daily 0.1° ERA5 estimates are further downscaled to match meteorological stations. For precipitation, Trange, and Tdew, the nearest neighbor interpolation is used. For Tmean, an additional downscaling step is used because mean temperature generally has a reliable relationship with elevation. We first estimate the long-term monthly temperature lapse rate at the 0.25° resolution based on the regression relationship between vertical ERA5 air temperature and geopotential heights (Tang et al. 2018, 2021). The lapse rate is bilinearly interpolated to the 0.1° resolution and downscale gridded ERA5 estimates to the elevation of matched stations.
Appendix D: Optimal interpolation-based merging
We calculate two types of OI estimates: gridded estimates and leave-one-out strategy-based estimates corresponding to station locations. The second type is used to 1) obtain the gridded uncertainty estimates by interpolating the squared difference between the leave-one-out OI estimates and raw station observations using the inverse distance weighting method, and 2) to perform independent validation of OI-merged estimates.
Appendix E: Generating spatiotemporally correlated random fields
The temporal correlation between SCRFs at two successive time steps is based on the lag-1 autocorrelation for temperature (i.e., Tmean, Trange, and Tdew) and the cross-correlation between precipitation and Trange (Newman et al. 2015, 2019). The correlation coefficient values are calculated and interpolated in the same way with the parameters of spatial correlation functions.
Appendix F: Generating probabilistic estimates
The estimate y is back transformed to the raw precipitation space using Eq. (A1) to obtain the final probabilistic precipitation estimate (xP).
Appendix G: Undercatch correction of precipitation estimates
Many empirical correction functions are developed to correct the undercatch bias for different types of rain gauges with or without various windshields (Yang et al. 2005; Kochendorfer et al. 2018; Zhang et al. 2019). However, the correction of global precipitation stations is challenging due to the lack of station metadata for SC-Earth and the lack of wind speed observations in many regions. Therefore, we use a simple climatology-based correction method. First, we calculate the long-term monthly mean precipitation from 1970 to 2000 based on EM-Earth deterministic estimates and ensemble mean of 25 probabilistic members. Then, the ratio between EM-Earth and PBCOR WorldClim precipitation is calculated for the period 1970–2000, which is used to scale EM-Earth deterministic and probabilistic estimates from 1950 to 2019.
PBCOR infers precipitation amounts using the water balance method based on the Budyko curve and is affected by the uncertainties in streamflow data (Hamilton and Moore 2012; Kiang et al. 2018), the Budyko curve (Gerrits et al. 2009), parameterization (Beck et al. 2020), noncontributing areas of river basins, and the scarcity of streamflow data in some regions (especially Greenland). Therefore, the current correction scheme is not perfect in this study. In addition, there are other global undercatch correction factors (Adam and Lettenmaier 2003; Yang et al. 2005; Adam et al. 2006) that are not tested in this study. Further efforts are needed to achieve better global undercatch correction by comparing all available methods (water balance–based methods and station-based methods) and datasets.
Appendix H: Temporal adjustment of EM-Earth estimates
The temporal disaggregation and aggregation to obtain EM-Earth estimates at UTC have three steps. First, EM-Earth daily estimates are disaggregated to the hourly scale using the diurnal information of hourly ERA5 estimates after temporal shift (appendix B). The disaggregation uses the multiplicative method for precipitation and the additive method for temperature. Second, the disaggregated hourly estimates are shifted back to the raw ERA5 time routine using the opposite of gridded shift hours obtained in appendix B. Finally, the hourly estimates are aggregated (accumulation for precipitation and average for temperature) to the daily scale corresponding to 0000–2400 UTC. The three steps are implemented for deterministic estimates and each member of probabilistic estimates.
Appendix I: Probabilistic evaluation metrics
References
Abatzoglou, J. T., S. Z. Dobrowski, S. A. Parks, and K. C. Hegewisch, 2018: TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. Sci. Data, 5, 170191, https://doi.org/10.1038/sdata.2017.191.
Adam, J. C., and D. P. Lettenmaier, 2003: Adjustment of global gridded precipitation for systematic bias. J. Geophys. Res., 108, 4257, https://doi.org/10.1029/2002JD002499.
Adam, J. C., E. A. Clark, D. P. Lettenmaier, and E. F. Wood, 2006: Correction of global precipitation products for orographic effects. J. Climate, 19, 15–38, https://doi.org/10.1175/JCLI3604.1.
Adler, R. F., and Coauthors, 2003: The version-2 Global Precipitation Climatology Project (GPCP) monthly precipitation analysis (1979–present). J. Hydrometeor., 4, 1147–1167, https://doi.org/10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2.
Adler, R. F., and Coauthors, 2017: Global precipitation: Means, variations and trends during the satellite era (1979–2014). Surv. Geophys., 38, 679–699, https://doi.org/10.1007/s10712-017-9416-4.
Ashouri, H., K.-L. Hsu, S. Sorooshian, D. K. Braithwaite, K. R. Knapp, L. D. Cecil, B. R. Nelson, and O. P. Prat, 2015: PERSIANN-CDR: Daily precipitation climate data record from multisatellite observations for hydrological and climate studies. Bull. Amer. Meteor. Soc., 96, 69–83, https://doi.org/10.1175/BAMS-D-13-00068.1.
Beck, H. E., and Coauthors, 2017: Global-scale evaluation of 23 precipitation datasets using gauge observations and hydrological modeling. Hydrol. Earth Syst. Sci., 21, 6201–6217, https://doi.org/10.5194/hess-21-6201-2017.
Beck, H. E., and Coauthors, 2019: MSWEP V2 global 3-hourly 0.1° precipitation: Methodology and quantitative assessment. Bull. Amer. Meteor. Soc., 100, 473–500, https://doi.org/10.1175/BAMS-D-17-0138.1.
Beck, H. E., and Coauthors, 2020: Bias correction of global high-resolution precipitation climatologies using streamflow observations from 9372 catchments. J. Climate, 33, 1299–1315, https://doi.org/10.1175/JCLI-D-19-0332.1.
Behrangi, A., A. Gardner, J. T. Reager, J. B. Fisher, D. Yang, G. J. Huffman, and R. F. Adler, 2018: Using GRACE to estimate snowfall accumulation and assess gauge undercatch corrections in high latitudes. J. Climate, 31, 8689–8704, https://doi.org/10.1175/JCLI-D-18-0163.1.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Caillouet, L., J.-P. Vidal, E. Sauquet, B. Graff, and J.-M. Soubeyroux, 2019: SCOPE Climate: A 142-year daily high-resolution ensemble meteorological reconstruction dataset over France. Earth Syst. Sci. Data, 11, 241–260, https://doi.org/10.5194/essd-11-241-2019.
Clark, M. P., and A. G. Slater, 2006: Probabilistic quantitative precipitation estimation in complex terrain. J. Hydrometeor., 7, 3–22, https://doi.org/10.1175/JHM474.1.
Clark, M. P., and Coauthors, 2015a: A unified approach for process-based hydrologic modeling: 1. Modeling concept. Water Resour. Res., 51, 2498–2514, https://doi.org/10.1002/2015WR017198.
Clark, M. P., and Coauthors, 2015b: A unified approach for process-based hydrologic modeling: 2. Model implementation and case studies. Water Resour. Res., 51, 2515–2542, https://doi.org/10.1002/2015WR017200.
Cornes, R. C., G. van der Schrier, E. J. M. van den Besselaar, and P. D. Jones, 2018: An ensemble version of the E-OBS temperature and precipitation data sets. J. Geophys. Res. Atmos., 123, 9391–9409, https://doi.org/10.1029/2017JD028200.
Dinku, T., E. N. Anagnostou, and M. Borga, 2002: Improving radar-based estimation of rainfall over complex terrain. J. Appl. Meteor., 41, 1163–1178, https://doi.org/10.1175/1520-0450(2002)041<1163:IRBEOR>2.0.CO;2.
Donat, M. G., J. Sillmann, S. Wild, L. V. Alexander, T. Lippmann, and F. W. Zwiers, 2014: Consistency of temperature and precipitation extremes across various global gridded in situ and reanalysis datasets. J. Climate, 27, 5019–5035, https://doi.org/10.1175/JCLI-D-13-00405.1.
Eischeid, J. K., P. A. Pasteris, H. F. Diaz, M. S. Plantico, and N. J. Lott, 2000: Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J. Appl. Meteor., 39, 1580–1591, https://doi.org/10.1175/1520-0450(2000)039<1580:CASCND>2.0.CO;2.
Feng, S., Q. Hu, and W. Qian, 2004: Quality control of daily meteorological data in China, 1951–2000: A new dataset. Int. J. Climatol., 24, 853–870, https://doi.org/10.1002/joc.1047.
Fick, S. E., and R. J. Hijmans, 2017: WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol., 37, 4302–4315, https://doi.org/10.1002/joc.5086.
Fortin, V., G. Roy, N. Donaldson, and A. Mahidjiba, 2015: Assimilation of radar quantitative precipitation estimations in the Canadian Precipitation Analysis (CaPA). J. Hydrol., 531, 296–307, https://doi.org/10.1016/j.jhydrol.2015.08.003.
Frei, C., and F. A. Isotta, 2019: Ensemble spatial precipitation analysis from rain gauge data: Methodology and application in the European Alps. J. Geophys. Res. Atmos., 124, 5757–5778, https://doi.org/10.1029/2018JD030004.
Gao, L., M. Bernhardt, and K. Schulz, 2012: Elevation correction of ERA-Interim temperature data in complex terrain. Hydrol. Earth Syst. Sci., 16, 4661–4673, https://doi.org/10.5194/hess-16-4661-2012.
Gelaro, R., and Coauthors, 2017: The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). J. Climate, 30, 5419–5454, https://doi.org/10.1175/JCLI-D-16-0758.1.
Gerrits, A. M. J., H. H. G. Savenije, E. J. M. Veling, and L. Pfister, 2009: Analytical derivation of the Budyko curve based on rainfall characteristics and a simple evaporation model. Water Resour. Res., 45, W04403, https://doi.org/10.1029/2008WR007308.
Goodison, B. E., P. Y. Louie, and D. Yang, 1998: WMO solid precipitation measurement intercomparison: Final report. WMO/TD-872, IOM Rep. 67, 318 pp., https://library.wmo.int/doc_num.php?explnum_id=9694.
Hamilton, A. S., and R. D. Moore, 2012: Quantifying uncertainty in streamflow records. Can. Water Resour. J., 37, 3–21, https://doi.org/10.4296/cwrj3701865.
Hamman, J. J., B. Nijssen, T. J. Bohn, D. R. Gergel, and Y. Mao, 2018: The Variable Infiltration Capacity model version 5 (VIC-5): Infrastructure improvements for new applications and reproducibility. Geosci. Model Dev., 11, 3481–3496, https://doi.org/10.5194/gmd-11-3481-2018.
Harris, I., T. J. Osborn, P. Jones, and D. Lister, 2020: Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci. Data, 7, 109, https://doi.org/10.1038/s41597-020-0453-3.
Haylock, M. R., N. Hofstra, A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New, 2008: A European daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006. J. Geophys. Res., 113, D20119, https://doi.org/10.1029/2008JD010201.
Henn, B., A. J. Newman, B. Livneh, C. Daly, and J. D. Lundquist, 2018: An assessment of differences in gridded precipitation datasets in complex terrain. J. Hydrol., 556, 1205–1219, https://doi.org/10.1016/j.jhydrol.2017.03.008.
Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559–570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.
Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803.
Hopkinson, R. F., D. W. McKenney, E. J. Milewska, M. F. Hutchinson, P. Papadopol, and L. A. Vincent, 2011: Impact of aligning climatological day on gridding daily maximum–minimum temperature and precipitation over Canada. J. Appl. Meteor. Climatol., 50, 1654–1665, https://doi.org/10.1175/2011JAMC2684.1.
Hou, A. Y., and Coauthors, 2014: The Global Precipitation Measurement mission. Bull. Amer. Meteor. Soc., 95, 701–722, https://doi.org/10.1175/BAMS-D-13-00164.1.
Huffman, G. J., and Coauthors, 2007: The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-global, multiyear, combined-sensor precipitation estimates at fine scales. J. Hydrometeor., 8, 38–55, https://doi.org/10.1175/JHM560.1.
Khedhaouiria, D., S. Bélair, V. Fortin, G. Roy, and F. Lespinas, 2020: High-resolution (2.5 km) ensemble precipitation analysis across Canada. J. Hydrometeor., 21, 2023–2039, https://doi.org/10.1175/JHM-D-19-0282.1.
Kiang, J. E., and Coauthors, 2018: A comparison of methods for streamflow uncertainty estimation. Water Resour. Res., 54, 7149–7176, https://doi.org/10.1029/2018WR022708.
Kidd, C., A. Becker, G. J. Huffman, C. L. Muller, P. Joe, G. Skofronick-Jackson, and D. B. Kirschbaum, 2017: So, how much of the Earth’s surface is covered by rain gauges? Bull. Amer. Meteor. Soc., 98, 69–78, https://doi.org/10.1175/BAMS-D-14-00283.1.
Kirstetter, P.-E., J. J. Gourley, Y. Hong, J. Zhang, S. Moazamigoodarzi, C. Langston, and A. Arthur, 2015: Probabilistic precipitation rate estimates with ground-based radar networks. Water Resour. Res., 51, 1422–1442, https://doi.org/10.1002/2014WR015672.
Kochendorfer, J., and Coauthors, 2018: Testing and development of transfer functions for weighing precipitation gauges in WMO-SPICE. Hydrol. Earth Syst. Sci., 22, 1437–1452, https://doi.org/10.5194/hess-22-1437-2018.
L’Ecuyer, T. S., and Coauthors, 2015: The observed state of the energy budget in the early twenty-first century. J. Climate, 28, 8319–8346, https://doi.org/10.1175/JCLI-D-14-00556.1.
Livneh, B., T. J. Bohn, D. W. Pierce, F. Munoz-Arriola, B. Nijssen, R. Vose, D. R. Cayan, and L. Brekke, 2015: A spatially comprehensive, hydrometeorological data set for Mexico, the U.S., and southern Canada 1950–2013. Sci. Data, 2, 150042, https://doi.org/10.1038/sdata.2015.42.
Longman, R. J., A. J. Newman, T. W. Giambelluca, and M. Lucas, 2020: Characterizing the uncertainty and assessing the value of gap-filled daily rainfall data in Hawaii. J. Appl. Meteor. Climatol., 59, 1261–1276, https://doi.org/10.1175/JAMC-D-20-0007.1.
Ma, Y., and Coauthors, 2018: Performance of optimally merged multisatellite precipitation products using the dynamic Bayesian model averaging scheme over the Tibetan Plateau. J. Geophys. Res. Atmos., 123, 814–834, https://doi.org/10.1002/2017JD026648.
Maggioni, V., P. C. Meyers, and M. D. Robinson, 2016: A review of merged high-resolution satellite precipitation product accuracy during the Tropical Rainfall Measuring Mission (TRMM) era. J. Hydrometeor., 17, 1101–1117, https://doi.org/10.1175/JHM-D-15-0190.1.
Mahfouf, J.-F., B. Brasnett, and S. Gagnon, 2007: A Canadian precipitation analysis (CaPA) project: Description and preliminary results. Atmos.–Ocean, 45, 1–17, https://doi.org/10.3137/ao.v450101.
Matsuura, K., and C. J. Willmott, 2017: Terrestrial air temperature: 1900–2017 gridded monthly time series. University of Delaware, http://climate.geog.udel.edu/∼climate/html_pages/Global2017/README.GlobalTsT2017.html.
Mendoza, P. A., A. W. Wood, E. A. Clark, E. Rothwell, M. P. Clark, B. Nijssen, L. D. Brekke, and J. R. Arnold, 2017: An intercomparison of approaches for improving predictability in operational seasonal streamflow forecasting. Hydrol. Earth Syst. Sci., 21, 3915–3935, https://doi.org/10.5194/hess-21-3915-2017.
Menne, M. J., I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston, 2012: An overview of the Global Historical Climatology Network-Daily database. J. Atmos. Oceanic Technol., 29, 897–910, https://doi.org/10.1175/JTECH-D-11-00103.1.
Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones, 2012: Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set. J. Geophys. Res., 117, D08101, https://doi.org/10.1029/2011JD017187.
Newman, A. J., and Coauthors, 2015: Gridded ensemble precipitation and temperature estimates for the contiguous United States. J. Hydrometeor., 16, 2481–2500, https://doi.org/10.1175/JHM-D-15-0026.1.
Newman, A. J., and Coauthors, 2019: Use of daily station observations to produce high-resolution gridded probabilistic precipitation and temperature time series for the Hawaiian Islands. J. Hydrometeor., 20, 509–529, https://doi.org/10.1175/JHM-D-18-0113.1.
Newman, A. J., M. P. Clark, A. W. Wood, and J. R. Arnold, 2020: Probabilistic spatial meteorological estimates for Alaska and the Yukon. J. Geophys. Res. Atmos., 125, e2020JD032696, https://doi.org/10.1029/2020JD032696.
Nguyen, P., A. Thorstensen, S. Sorooshian, K. Hsu, A. Aghakouchak, H. Ashouri, H. Tran, and D. Braithwaite, 2018: Global precipitation trends across spatial scales using satellite observations. Bull. Amer. Meteor. Soc., 99, 689–697, https://doi.org/10.1175/BAMS-D-17-0065.1.
Papalexiou, S. M., 2018: Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency. Adv. Water Resour., 115, 234–252, https://doi.org/10.1016/j.advwatres.2018.02.013.
Papalexiou, S. M., and F. Serinaldi, 2020: Random fields simplified: Preserving marginal distributions, correlations, and intermittency, with applications from rainfall to humidity. Water Resour. Res., 56, e2019WR026331, https://doi.org/10.1029/2019WR026331.
Parker, W. S., 2016: Reanalyses and observations: What’s the difference? Bull. Amer. Meteor. Soc., 97, 1565–1572, https://doi.org/10.1175/BAMS-D-14-00226.1.
Rasmussen, R., and Coauthors, 2012: How well are we measuring snow: The NOAA/FAA/NCAR winter precipitation test bed. Bull. Amer. Meteor. Soc., 93, 811–829, https://doi.org/10.1175/BAMS-D-11-00052.1.
Rodell, M., and Coauthors, 2015: The observed state of the water cycle in the early twenty-first century. J. Climate, 28, 8289–8318, https://doi.org/10.1175/JCLI-D-14-00555.1.
Scaff, L., D. Yang, Y. Li, and E. Mekis, 2015: Inconsistency in precipitation measurements across the Alaska–Yukon border. Cryosphere, 9, 2417–2428, https://doi.org/10.5194/tc-9-2417-2015.
Schneider, U., A. Becker, P. Finger, A. Meyer-Christoffer, M. Ziese, and B. Rudolf, 2013: GPCC’s new land surface precipitation climatology based on quality-controlled in situ data and its role in quantifying the global water cycle. Theor. Appl. Climatol., 115, 15–40, https://doi.org/10.1007/s00704-013-0860-x.
Sheffield, J., G. Goteti, and E. F. Wood, 2006: Development of a 50-year high-resolution global dataset of meteorological forcings for land surface modeling. J. Climate, 19, 3088–3111, https://doi.org/10.1175/JCLI3790.1.
Shen, Y., Z. Hong, Y. Pan, J. Yu, and L. Maguire, 2018: China’s 1 km merged gauge, radar and satellite experimental precipitation dataset. Remote Sens., 10, 264, https://doi.org/10.3390/rs10020264.
Sorooshian, S., K. L. Hsu, X. Gao, H. V. Gupta, B. Imam, and D. Braithwaite, 2000: Evaluation of PERSIANN system satellite-based estimates of tropical rainfall. Bull. Amer. Meteor. Soc., 81, 2035–2046, https://doi.org/10.1175/1520-0477(2000)081<2035:EOPSSE>2.3.CO;2.
Sun, Q., C. Miao, Q. Duan, H. Ashouri, S. Sorooshian, and K.-L. Hsu, 2018: A review of global precipitation data sets: Data sources, estimation, and intercomparisons. Rev. Geophys., 56, 79–107, https://doi.org/10.1002/2017RG000574.
Svoboda, V., P. Máca, M. Hanel, and P. Pech, 2015: Spatial correlation structure of monthly rainfall at a mesoscale region of north-eastern Bohemia. Theor. Appl. Climatol., 121, 359–375, https://doi.org/10.1007/s00704-014-1241-9.
Tang, G., A. Behrangi, Z. Ma, D. Long, and Y. Hong, 2018: Downscaling of ERA-interim temperature in the contiguous United States and its implications for rain–snow partitioning. J. Hydrometeor., 19, 1215–1233, https://doi.org/10.1175/JHM-D-18-0041.1.
Tang, G., M. P. Clark, A. J. Newman, A. W. Wood, S. M. Papalexiou, V. Vionnet, and P. Whitfield, 2020a: SCDNA: A serially complete precipitation and temperature dataset for North America from 1979 to 2018. Earth Syst. Sci. Data, 12, 2381–2409, https://doi.org/10.5194/essd-12-2381-2020.
Tang, G., M. P. Clark, A. J. Newman, A. W. Wood, S. M. Papalexiou, V. Vionnet, and P. Whitfield 2020b: Have satellite precipitation products improved over last two decades? A comprehensive comparison of GPM IMERG with nine satellite and reanalysis datasets. Remote Sens. Environ., 240, 111697, https://doi.org/10.1016/j.rse.2020.111697.
Tang, G., M. P. Clark, S. M. Papalexiou, A. J. Newman, A. W. Wood, D. Brunet, and P. H. Whitfield, 2021: EMDNA: Ensemble meteorological dataset for North America. Earth Syst. Sci. Data, 13, 3337–3362, https://doi.org/10.5194/essd-13-3337-2021.
Tang, G., A. Behrangi, Z. Ma, D. Long, and Y. Hong, 2021a: SC-Earth: A station-based serially complete Earth dataset from 1950 to 2019. J. Climate, 34, 6493–6511, https://doi.org/10.1175/JCLI-D-21-0067.1.
Tang, G., A. Behrangi, Z. Ma, D. Long, and Y. Hong, 2021b: The use of serially complete station data to improve the temporal continuity of gridded precipitation and temperature estimates. J. Hydrometeor., 22, 1553–1568, https://doi.org/10.1175/JHM-D-20-0313.1.
Tang, G., A. Behrangi, Z. Ma, D. Long, and Y. Hong, 2021c: SC-Earth: A station-based serially complete Earth dataset from 1950 to 2019. J. Climate, 34, 6493–6511, https://doi.org/10.1175/JCLI-D-21-0067.1.
Tetens, O., 1930: Uber einige meteorologische Begriffe. Z. Geophys., 6, 297–309.
Trenberth, K. E., L. Smith, T. Qian, A. Dai, and J. Fasullo, 2007: Estimates of the global water budget and its annual Cycle using observational and model data. J. Hydrometeor., 8, 758–769, https://doi.org/10.1175/JHM600.1.
Trenberth, K. E., J. Fasullo, and J. Kiehl, 2009: Earth’s global energy budget. Bull. Amer. Meteor. Soc., 90, 311–324, https://doi.org/10.1175/2008BAMS2634.1.
Wang, X. L., H. Xu, B. Qian, Y. Feng, and E. Mekis, 2017: Adjusted daily rainfall and snowfall data for Canada. Atmos.–Ocean, 55, 155–168, https://doi.org/10.1080/07055900.2017.1342163.
Willmott, C. J., and S. M. Robeson, 1995: Climatologically aided interpolation (CAI) of terrestrial air temperature. Int. J. Climatol., 15, 221–229, https://doi.org/10.1002/joc.3370150207.
Xie, P., and A.-Y. Xiong, 2011: A conceptual model for constructing high‐resolution gauge‐satellite merged precipitation analyses. J. Geophys. Res., 116, D21106, https://doi.org/10.1029/2011JD016118.
Yamazaki, D., and Coauthors, 2017: A high-accuracy map of global terrain elevations. Geophys. Res. Lett., 44, 5844–5853, https://doi.org/10.1002/2017GL072874.
Yang, D., D. Kane, Z. Zhang, D. Legates, and B. Goodison, 2005: Bias corrections of long-term (1973–2004) daily precipitation data over the northern regions. Geophys. Res. Lett., 32, L19501, https://doi.org/10.1029/2005GL024057.
Yatagai, A., M. Maeda, S. Khadgarai, M. Masuda, and P. Xie, 2020: End of the Day (EOD) judgment for daily rain-gauge data. Atmosphere, 11, 772, https://doi.org/10.3390/atmos11080772.
Zhang, Y., Y. Ren, G. Ren, and G. Wang, 2019: Bias correction of gauge data and its effect on precipitation climatology over mainland China. J. Appl. Meteor. Climatol., 58, 2177–2196, https://doi.org/10.1175/JAMC-D-19-0049.1.