We present Multi-Source Weighted-Ensemble Precipitation, version 2 (MSWEP V2), a gridded precipitation P dataset spanning 1979–2017. MSWEP V2 is unique in several aspects: i) full global coverage (all land and oceans); ii) high spatial (0.1°) and temporal (3 hourly) resolution; iii) optimal merging of P estimates based on gauges [WorldClim, Global Historical Climatology Network-Daily (GHCN-D), Global Summary of the Day (GSOD), Global Precipitation Climatology Centre (GPCC), and others], satellites [Climate Prediction Center morphing technique (CMORPH), Gridded Satellite (GridSat), Global Satellite Mapping of Precipitation (GSMaP), and Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) 3B42RT)], and reanalyses [European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim) and Japanese 55-year Reanalysis (JRA-55)]; iv) distributional bias corrections, mainly to improve the P frequency; v) correction of systematic terrestrial P biases using river discharge Q observations from 13,762 stations across the globe; vi) incorporation of daily observations from 76,747 gauges worldwide; and vii) correction for regional differences in gauge reporting times. MSWEP V2 compares substantially better with Stage IV gauge–radar P data than other state-of-the-art P datasets for the United States, demonstrating the effectiveness of the MSWEP V2 methodology. Global comparisons suggest that MSWEP V2 exhibits more realistic spatial patterns in mean, magnitude, and frequency. Long-term mean P estimates for the global, land, and ocean domains based on MSWEP V2 are 955, 781, and 1,025 mm yr−1, respectively. Other P datasets consistently underestimate P amounts in mountainous regions. Using MSWEP V2, P was estimated to occur 15.5%, 12.3%, and 16.9% of the time on average for the global, land, and ocean domains, respectively. MSWEP V2 provides unique opportunities to explore spatiotemporal variations in P, improve our understanding of hydrological processes and their parameterization, and enhance hydrological model performance.
MSWEP V2 is the first fully global precipitation dataset with a 0.1° resolution derived by optimally merging a range of gauge, satellite, and reanalysis estimates.
Precipitation P drives the terrestrial hydrological cycle (Oki and Kanae 2006; Trenberth et al. 2007). It is also among the most difficult meteorological variables to estimate because of its high spatiotemporal heterogeneity (Daly et al. 1994; Adler et al. 2001; Roe 2005; Stephens et al. 2010; Herold et al. 2016; Prein and Gobiet 2017). A plethora of regional, quasi-global, and fully global gridded P datasets have been developed over the past decades (for an overview, see Maggioni et al. 2016; Beck et al. 2017c; Levizzani et al. 2018; Sun et al. 2018; http://ipwg.isac.cnr.it; http://reanalyses.org). These datasets differ in terms of design objective (instantaneous accuracy, temporal homogeneity, record length, or combinations thereof), data source (gauge, ground radar, satellite, analysis, reanalysis, or combinations thereof), spatial resolution (from 0.05° to 2.5°), and temporal resolution (30 min to monthly).
Multi-Source Weighted-Ensemble Precipitation (MSWEP) is a recently released global P dataset with a 3-hourly temporal resolution, covering the period 1979 to the near present (Beck et al. 2017b). The dataset is unique in that it takes advantage of the complementary strengths of gauge-, satellite-, and reanalysis-based data to provide reliable P estimates over the entire globe. Since the release of version 1 (0.25° spatial resolution) in May 2016, MSWEP has been successfully applied at global scales for a variety of purposes, such as modeling soil moisture and evaporation (Martens et al. 2017), estimating plant rooting depth (Yang et al. 2016), water resources reanalysis (Schellekens et al. 2017), and evaluating climatic controls on vegetation (Papagiannopoulou et al. 2017a,b). MSWEP has also been successfully used for several purposes regionally, for example, to analyze diurnal variations in rainfall (Chen and Dirmeyer 2017; Chen et al. 2017), investigate lake dynamics (Satgé et al. 2017; Wang et al. 2017), evaluate root-zone soil moisture patterns (Zohaib et al. 2017), and drive a dynamic ecohydrological model (Liu et al. 2016). In addition, MSWEP has been included in at least four regional P dataset evaluation studies focusing, respectively, on the Amazon (Correa et al. 2017), Chile (Zambrano-Bigiarini et al. 2017), India (Nair and Indu 2017), and the Sahel (Zhang et al. 2017).
Since the release of MSWEP version 1 (MSWEP V1), considerable improvements were implemented, resulting in MSWEP version 2 (MSWEP V2), the focus of the present study. Improvements include i) the introduction of cumulative distribution function (CDF) and P frequency corrections, to account for spurious drizzle, attenuated peaks, and temporal discontinuities evident in version 1 (Nair and Indu 2017; Zhang et al. 2017); ii) increasing the spatial resolution from 0.25° to 0.1° to increase the local relevance of the P estimates (especially important for high-water-yield mountainous regions); iii) the inclusion of ocean areas to enable oceanic studies and terrestrial hydrology studies for coastal areas and small islands; iv) the addition of P estimates derived from Gridded Satellite (GridSat) thermal infrared (IR) imagery (Knapp et al. 2011) for the pre-TRMM era to supplement the reanalysis and gauge data; v) the use of a daily gauge correction scheme that accounts for regional differences in reporting times, to minimize timing mismatches when applying the daily gauge corrections; vi) the use of a large database of daily gauge observations compiled from several sources to replace the 0.5° Climate Prediction Center (CPC) unified dataset (Xie et al. 2007; Chen et al. 2008); and vii) extension of the data record to 2017 (MSWEP V1 finished in 2015).
MSWEP V2 is the first fully global P dataset with a spatial resolution of 0.1° (11 km at the equator), supporting global-scale land-surface modeling at hyperresolution (Wood et al. 2011; Bierkens et al. 2015). Other P datasets with a high spatial resolution (≤0.1°) include Climate Hazards Group Infrared Precipitation with Stations (CHIRPS; 0.05°; Funk et al. 2015b), CPC morphing technique (CMORPH; 0.07°; Joyce et al. 2004), Global Satellite Mapping of Precipitation (GSMaP; 0.1°; Ushio et al. 2009; Mega et al. 2014), Integrated Multisatellite Retrievals for Global Precipitation Measurement (IMERG; 0.1°; Huffman et al. 2014), and Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks–Cloud Classification System (PERSIANN-CCS; 0.04°; Hong et al. 2004). However, these datasets are limited to latitudes ≤60°N/S (≤50°N/S for CHIRPS), do not take advantage of river discharge Q observations for bias correction, and do not incorporate reanalysis-based P estimates (with the arguable exception of CHIRPS, which uses them to temporally disaggregate from 5-day to daily estimates). Additionally, none of these datasets apply P gauge corrections at the daily time scale, with the exception of GSMaP, although it fails to account for differences in gauge reporting times. Moreover, CHIRPS and PERSIANN-CCS do not integrate passive microwave-based P retrievals, and the daily temporal resolution of CHIRPS renders it less suitable in highly dynamic P environments. Finally, with the exception of CHIRPS, these datasets span ≤20 years, which is less optimal to assess long-term hydrological changes/trends (Weatherhead et al. 1998).
Here, we describe the data and methodology underlying MSWEP V2, evaluate the performance of the dataset for the conterminous United States (CONUS), and assess spatiotemporal P patterns globally.
DATA AND METHODS.
MSWEP V2 methodology.
Daily P gauge observations were used for three purposes: i) to determine the merging weights for the six 3-hourly non-gauge-based P datasets incorporated in MSWEP V2 [CMORPH, European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim), GridSat, GSMaP, Japanese 55-year Reanalysis (JRA-55), and Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) 3B42RT; see Table 1 for details on the datasets], ii) to calculate the wet-day biases for the reanalyses (ERA-Interim and JRA-55), and iii) to correct the P estimates near gauge stations. Initially, 117,759 P gauges were compiled from various global and national databases. Extensive quality control was applied, for example, to remove erroneous zeros frequently present in records from the Global Summary of the Day (GSOD) database (https://data.noaa.gov; Fig. 2a). After quality control, a final gauge dataset composed of 76,747 gauges remained (Fig. 2b). See “Gauge data quality control” appendix section for details.
Information about gauge reporting times is crucial to avoid timing mismatches when applying daily gauge corrections but is generally not provided. We developed a procedure to infer reporting times for all gauges based on correlations with four non-gauge-based P datasets (CMORPH, ERA-Interim, GSMaP, and JRA-55). See “Inferring gauge reporting times” appendix section for details.
MSWEP V1 relied entirely on reanalysis and gauge data during the pre-TRMM era (prior to 2000; Beck et al. 2017b). For MSWEP V2, we supplemented the reanalysis and gauge data during the pre-TRMM era with rainfall estimates based on IR data from the GridSat B1 archive (0.07° resolution; Knapp et al. 2011) to improve the P estimates in convection-dominated regions. Rainfall was estimated using a parsimonious CDF-matching approach. See “Rainfall estimation using thermal infrared imagery” in the appendix for details.
- To assess the individual performance of the six non-gauge-based P datasets incorporated in MSWEP V2, we calculated, for each of the 76,747 gauges, Pearson correlation coefficients between 3-day mean gauge and gridded P time series (r3day). In addition, since reanalyses tend to consistently overestimate the P frequency and underestimate the intensity (Zolina et al. 2004; Sun et al. 2006; Lopez 2007; Stephens et al. 2010; Skok et al. 2015; Herold et al. 2016), for ERA-Interim and JRA-55, we calculated the bias in the number of wet days per year, using the gauge observations as reference, according to Akinremi et al. 1999; Haylock et al. 2008; Driouech et al. 2009; Trenberth and Zhang 2018). See “Gauge-based assessment of satellite and reanalysis P datasets” appendix section for details.
Global weight maps were derived for each of the six non-gauge-based P datasets incorporated in MSWEP V2 based on the r3day values calculated in the preceding step. The r3day values were squared to yield the coefficient of determination and subsequently interpolated to yield gap-free global weight maps. Similarly, gap-free global maps of βWD were produced for the reanalyses to correct the P frequency prior to the data merging. See “Global maps of weights and wet-day biases” in the appendix for details.
MSWEP V1 used Climate Hazards Group’s precipitation climatology (CHPclim; 0.05° resolution; Funk et al. 2015a) to determine the long-term mean over the land surface. For MSWEP V2, we used WorldClim (1-km resolution; Fick and Hijmans 2017) because of the better P gauge coverage. Systematic P underestimation over land due to gauge undercatch and orographic effects was corrected similarly to MSWEP V1, by inferring the “true” P using river discharge Q observations. See “Determination of long-term mean P” appendix section for details.
To correct the P frequency of the reanalyses, we subtracted, for each grid cell, a small amount of P calculated using the interpolated βWD values from step 5 (Fig. 1b). In addition, the six non-gauge-based P datasets incorporated in MSWEP V2 were resampled to 0.1° and rescaled to minimize the presence of temporal discontinuities after merging. See “P frequency correction and dataset harmonization” appendix section for details.
Three-hourly reference P distributions were calculated by weighted averaging of the distributions of five non-gauge-based P datasets (CMORPH, ERA-Interim, GSMaP, JRA-55, and TMPA 3B42RT) using the interpolated weight maps from step 5. See “Reference P distributions” in the appendix for details.
The six non-gauge-based P datasets incorporated in MSWEP V2 were merged for every possible P dataset combination by weighted averaging using the interpolated weight maps from step 5. The merged P estimates of each dataset combination were subsequently CDF matched to the reference P distributions derived in step 8, after which we selected, for each 3-hourly time step and 0.1° grid cell, the merged and CDF-corrected p value from the dataset combination with the highest cumulative weight (Figs. 1c and 1d). The CDF matching corrects the spurious drizzle and attenuated peaks and ensures that temporal transitions from one dataset combination to another are largely unnoticeable. See “Merging of satellite and reanalysis P datasets” in the appendix for details.
The 3-hourly merged P estimates were corrected using daily and monthly P gauge observations through a multiplicative approach. For each grid cell, we looped over the five closest gauges and corrected the 3-hourly merged P data at the daily time scale. When applying the daily corrections, we accounted for the gauge reporting times derived in step 2 to reduce temporal mismatches (Figs. 1e and 1f). We subsequently applied monthly gauge corrections using the Global Precipitation Climatology Centre (GPCC) Full Data Reanalysis, version 7 (FDR7), dataset (0.5° resolution; U. Schneider et al. 2014), which incorporates a more extensive collection of gauges, following the same procedure but without accounting for gauge reporting times, to yield the final gauge-corrected MSWEP V2. See “Gauge correction scheme” in the appendix for details.
Evaluation using Stage IV gauge–radar data for the CONUS.
We evaluated the performance of MSWEP V2 and, for the sake of comparison, MSWEP V1, a widely used satellite-based dataset (CMORPH), a widely used reanalysis (ERA-Interim), and a state-of-the-art reanalysis corrected using daily gauge observations [Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2); Table 1]. The evaluation was performed at a 3-hourly temporal and 0.1° spatial resolution for 2002–15. As reference, we used the National Centers for Environmental Prediction (NCEP) Stage IV dataset (Lin and Mitchell 2005), which has a 0.04° spatial and hourly temporal resolution and merges data from 140 radars and ∼5,500 gauges for the CONUS. Stage IV provides high-quality P estimates and has therefore been widely used as a reference for the evaluation of P datasets (e.g., Hong et al. 2006; Habib et al. 2009; AghaKouchak et al. 2011, 2012; Zhang et al. 2018). To reduce systematic biases, the Stage IV dataset was rescaled such that its long-term mean matches that of the Parameter-Elevation Regressions on Independent Slopes Model (PRISM) dataset (Daly et al. 2008) for the evaluation period (2002–15).
As a performance metric, we used the Kling–Gupta efficiency (KGE; Gupta et al. 2009; Kling et al. 2012), an objective performance metric combining correlation, bias, and variability, introduced in Gupta et al. (2009) and modified in Kling et al. (2012). The KGE is calculated as follows:
where the correlation component r is represented by the (Pearson’s) coefficient of correlation, the bias component β by the ratio of estimated and observed means, and the variability component γ by the ratio of the estimated and observed coefficients of variation:
where μ and σ are the distribution mean and standard deviation, respectively, and the subscripts s and o indicate estimate and reference, respectively. Three-hourly accumulations were calculated for the P datasets with a temporal resolution <3 h (CMORPH, MERRA-2, and Stage IV). The P datasets with a spatial resolution >0.1° (MSWEP V1, ERA-Interim, and MERRA-2) were downscaled to 0.1° using nearest neighbor, while the dataset with a spatial resolution <0.1° (CMORPH) was upscaled to 0.1° using bilinear interpolation.
RESULTS AND DISCUSSION.
Gauge reporting times.
For the Global Historical Climatology Network-Daily (GHCN-D) database, we found marked differences in reporting times between neighboring countries (e.g., between Canada and the United States and between Portugal and Spain) and sometimes within countries (e.g., Mexico, Namibia, and South Africa; Fig. 2c), reflecting differences in reporting practices among hydrological and meteorological agencies. Our reporting times correspond well with published times available for Australia (Viney and Bates 2004), Brazil (Liebmann and Allured 2005), the eastern CONUS (DeGaetano 2000), India (Yatagai et al. 2012), the Netherlands (Holleman 2006), and Japan (Yatagai et al. 2012). Although the GSOD gauges represent automated gauges with reporting times officially at around 2400 UTC (Menne et al. 2012), our analysis yielded considerably earlier reporting times averaging at around −12 h (UTC) (except for eastern Australia; Fig. 2d). A potential explanation for this discrepancy could be that satellites represent radiation from an atmospheric column rather than P that has reached the surface. However, Villarini and Krajewski (2007) obtained timing differences ranging from 30 to 90 min for TMPA 3B42 using 5-min rain gauge data for a single 0.25° grid cell in Oklahoma, suggesting that this explanation is insufficient to account for the full 12-h difference. Additionally, the differences are also found in high latitudes (>60°N/S), where the reporting times were inferred using reanalysis data. An alternative, more likely explanation is that the daily GSOD values incorporate a significant portion of P from the previous day. Overall, these results highlight the importance of accounting for reporting times in time-critical applications relying on daily gauge observations.
Gauge-based assessment of satellite and reanalysis datasets.
Figures 3a and 3b present r3day (temporal correlation) values obtained for CMORPH and ERA-Interim, respectively. Since the results were very similar for all satellite datasets (with the exception of GridSat) and for all reanalysis datasets, we only present results for one dataset of each kind. ERA-Interim is most skillful in mid- and high-latitude coastal regions in the path of the prevailing westerlies (notably along the Pacific coast of North America, in southern Chile, and in western Europe; Fig. 3a), whereas CMORPH performs best in moist midlatitude regions with mild winters (e.g., the southeastern United States, eastern South America, and eastern China; Fig. 3b). When we calculate the difference in r3day values between the datasets, a clear picture emerges: CMORPH consistently performs better at low latitudes and ERA-Interim at high latitudes (Fig. 3d). These results underscore the long-recognized but sometimes overlooked complementary P estimation performance of satellites and weather models (e.g., Janowiak 1992; Huffman et al. 1995; Xie and Arkin 1997; Adler et al. 2001; Ebert et al. 2007; Massari et al. 2017). MSWEP is the only P dataset besides CMAP (Xie and Arkin 1997) to exploit this complementary relationship.
Figure 3c presents r3day values for the GridSat IR-based rainfall dataset, which has been produced to complement the gauge and reanalysis data during the pre-TRMM era (“Rainfall estimation using thermal infrared imagery” appendix section). The r3day values for GridSat are consistently lower than those obtained for CMORPH (Fig. 3a), which was expected since cloud-top IR brightness temperatures are only indirectly related to surface rainfall (Adler and Negri 1988; Vicente et al. 1998; Scofield and Kuligowski 2003). Compared to r3day values obtained using the IR-based PERSIANN dataset (Sorooshian et al. 2000) presented in Beck et al. (2017b, their Fig. 3c), the Gridsat-based r3day values are slightly lower in some regions, suggesting there may still be some opportunity for improving the GridSat-based rainfall estimates. We refer to Beck et al. (2017c) for a more comprehensive evaluation of the GridSat rainfall.
Figures 3e and 3f present βWD (bias in the number of wet days per year) values for CMORPH and ERA-Interim, respectively. The results were again similar among satellite datasets and among reanalysis datasets, and therefore, we again present results for only one of each. Globally, CMORPH represents the P frequency substantially better than ERA-Interim. CMORPH slightly overestimates (underestimates) the P frequency at low (high) latitudes (Fig. 3e). Conversely, ERA-Interim strongly overestimates the P frequency across the entire globe (Fig. 3f), because of deficiencies in the parameterization of the processes controlling P generation (Zolina et al. 2004; Sun et al. 2006; Lopez 2007; Stephens et al. 2010; Skok et al. 2015; Herold et al. 2016). These findings highlight the importance of the P frequency corrections implemented in MSWEP V2 (“P frequency correction and dataset harmonization” appendix section). When interpreting these results, it must be kept in mind that point observations from gauges tend to underestimate the number of wet days compared to similar estimates from gridded data from satellites and reanalyses (as the former samples a much smaller area; Osborn and Hulme 1997; Ensor and Robeson 2008).
Global patterns in weights.
Figure 4 shows global maps of the relative weights assigned to the gauge-, satellite-, and reanalysis-based P estimates for three periods: i) 1979–82, ii) 1983–99, and iii) 2000–17. The gauge weights were calculated as a function of distance to surrounding gauges (“Gauge correction scheme” appendix section), whereas the satellite and reanalysis weights were calculated based on the performance of the respective satellite and reanalysis datasets at surrounding gauges (“Global maps of weights and wet-day biases” appendix section). Gauge-based P estimates provide the main contribution over the terrestrial surface for all periods (Fig. 4). GridSat data are introduced in 1980 and represent the only satellite-based source of P estimates until 2000, when passive microwave-based estimates are introduced (CMORPH, GSMaP, and TMPA 3B42RT). Prior to 1982, however, GridSat provides limited coverage over South Asia and particularly Africa, and a horizontal striping pattern can be observed in some regions caused by gaps in the GridSat data (Fig. 4a). In regions without rain gauges, reanalyses provide the dominant contribution over most of the globe until 1999, while from 2000 onward, the dominant contribution comes from satellite data at low and midlatitudes and reanalysis data at high latitudes (Fig. 4).
Evaluation using Stage IV gauge–radar data for the CONUS.
Beck et al. (2017c) evaluated MSWEP V2 and 20 other P datasets globally using observations from 76,086 gauges and hydrological modeling for 9,053 catchments at daily and monthly time steps. However, evaluation at the 3-hourly time step was lacking. We therefore evaluated MSWEP V2 and, for comparison purposes, MSWEP V1, CMORPH, ERA-Interim, and MERRA-2 (details provided in Table 1) at the 3-hourly time step for the CONUS using the Stage IV gauge–radar P dataset (Lin and Mitchell 2005) as a reference. Consistent with the global evaluation by Beck et al. (2017c), MSWEP V2 was found to perform best overall, yielding a median KGE score of 0.70 (Fig. 5a). The second- and third-best performing P datasets were MSWEP V1 (Beck et al. 2017b; Fig. 5e) and MERRA-2 (Reichle et al. 2017; Fig. 5q), exhibiting median KGE scores of 0.53 and of 0.41, respectively. Similar to MSWEP V2, MSWEP V1 and MERRA-2 include daily gauge corrections (based on the CPC unified dataset; Xie et al. 2007; Chen et al. 2008). However, in contrast to MSWEP V2, they did not account for gauge reporting times (“Gauge reporting times” section), which has resulted in temporal mismatches when applying the corrections (Figs. 1e and 1f). CMORPH (Joyce et al. 2004; Fig. 5i) and ERA-Interim (Dee et al. 2011; Fig. 5m) obtained lower median KGE scores of 0.36 and 0.35, respectively. Performance was markedly worse for all datasets in the western CONUS, because of the more complex topography and greater spatiotemporal heterogeneity of P (Daly et al. 2008).
Global patterns in long-term mean P.
Figure 6a presents a global map of long-term mean P from MSWEP V2 (“Determination of long-term mean P” appendix section). Figures 6b–f, respectively, present the difference in long-term mean P between MSWEP V2 and five other P datasets (Table 1): i) MSWEP V1 (1979–2015; 3 hourly; 0.25°; Beck et al. 2017b), ii) GPCC, version 2015 (GPCC2015; 1951–2000; monthly; 0.5°; U. Schneider et al. 2014, 2017), iii) Global Precipitation Climatology Project, version 2.3 (GPCP2.3; 1979–2013; monthly; 2.5°; Adler et al. 2003, 2017, 2018), iv) Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data, version 3.2 (HOAPS3.2; 1987–2008; 0.5°; 6 hourly; Schlosser and Houser 2007; Andersson et al. 2010), and v) MERRA-2 (1980–2017; ∼50 km; hourly; Reichle et al. 2017). The differences between MSWEP V1 and MSWEP V2 (Fig. 6b) primarily reflect the change from CHPclim to WorldClim in version 2. Compared to MSWEP V2, the fully gauge-based GPCC2015 dataset shows consistently lower mean P at high northern latitudes (Fig. 6c), whereas the gauge- and satellite-based GPCP2.3 dataset exhibits lower mean P only in northern North America and northeastern Asia but generally higher mean P in Europe and northwestern Asia (Fig. 6d). These differences probably reflect the use of different gauge undercatch correction schemes; GPCC2015 (Legates and Willmott 1990) and GPCP2.3 (Legates 1988) employ more conventional approaches using World Meteorological Organization (WMO) gauge undercatch correction equations in combination with daily observations of P, Ta, and wind speed from a relatively sparse station network. Conversely, MSWEP V2 infers the “true” P using Q observations and Pe estimates from 13,762 catchments globally (Beck et al. 2017b). The gauge- and reanalysis-based MERRA-2 dataset exhibits good agreement with MSWEP V2 at high latitudes but shows substantially lower P over tropical regions (except in Africa; Fig. 6f). Compared to MSWEP V2, the other P datasets (GPCC2015, GPCP2.3, and MERRA-2) exhibit substantially less P at high elevations (e.g., in the Rocky Mountains, the southern Andes, and most Asian mountainous regions; Figs. 6c, 6d, and 6f, respectively). This is attributable to their coarser resolutions (0.5°, 2.5°, and 0.5°, respectively) and lack of explicit orographic corrections. The differences between MSWEP V2 and the other P datasets over the equatorial oceans are probably at least partly because MSWEP V2 computes the long-term mean using satellite data from 2000 to 2017, during which the meridional location of the maximum intertropical convergence zone (ITCZ) convection was more northerly (T. Schneider et al. 2014).
Using MSWEP V2, we obtained a long-term mean global P estimate of 955 mm yr−1 (Table 2) or 488,100 km3 yr−1. This estimate is based on terrestrial P data representative of 1970–2000 (i.e., the range of the WorldClim gauges; Fick and Hijmans 2017) and oceanic P data representative of 1979–2017 (i.e., the range of the satellite and reanalysis datasets). The long-term mean P of MSWEP V2 over land (excluding Antarctica) is 839 mm yr−1, corresponding to 113,100 km3 yr−1. The same estimate for MSWEP V1 is 858 mm yr−1, slightly (2.3%) higher because of the switch from CHPclim to WorldClim and the reduction of the Chilean and Iranian bias correction factors in version 2 (“Determination of long-term mean P” in the appendix). The estimate for GPCP2.3 is 853 mm yr−1, also slightly (1.7%) higher than the MSWEP V2 estimate. For GPCC2015, the corresponding estimate is 793 mm yr−1, which is considerably (5.5%) lower for the reasons previously explained. The estimate for MERRA-2 is 785 mm yr−1, also considerably (6.4%) lower than the MSWEP V2 estimate, mainly because of the aforementioned differences in tropical and mountainous regions. The long-term mean P for ocean areas based on MSWEP V2 amounted to 1,025 mm yr−1 (Table 2), corresponding to 373,200 km3 yr−1. Arguably, the most comprehensive P datasets with ocean coverage currently available are the satellite-based GPCP2.3 and HOAPS3.2 datasets. Compared to our estimate, GPCP2.3 yields a 3.1% higher estimate of 1,057 mm yr−1 (Fig. 6d). Over the area for which HOAPS3.2 has continuous data (coastal areas are missing, and there are seasonal gaps at high latitudes), the dataset yields a 2.9% lower long-term mean P than MSWEP V2 (1,037 vs 1,068 mm yr−1; Fig. 6e). Another estimate of 1,074 mm yr−1 for the entire ocean area that was derived from satellite radar reflectivities (2007–09) by the TRMM and CloudSat instruments (Behrangi et al. 2014) is 4.8% higher than our estimate. In summary, our P estimate is close to the average of previous estimates (Table 2).
Global patterns in P extremes.
Figure 7 presents global maps of 99.99th-percentile 3-hourly P amounts (equivalent to a return period of 3.42 years) for MSWEP V2 and, for illustrative purposes, CMORPH and ERA-Interim. CMORPH agrees well with MSWEP V2 in the tropics but appears to overestimate the 99.99th-percentile P with respect to MSWEP V2 in some midlatitude regions (e.g., in the central CONUS and in Argentina). Indeed, Beck et al. (2017c) recently found CMORPH to overestimate the 99th-percentile daily P magnitude in precisely these regions, and Tian et al. (2009) found CMORPH to overestimate summer P extremes strongly in the CONUS. As expected, ERA-Interim fails to resolve small-scale orographic features because of its coarse (∼0.7°) resolution and consistently estimates lower 99.99th-percentile P amounts because of the model parameterization challenges mentioned. Compared to a global map (1°) of 99th-percentile daily P amounts (equivalent to a return period of 100 days) derived from the Expert Team on Climate Change Detection and Indices (ETCCDI) P dataset (Dietzsch et al. 2017, their Fig. 5d), our 99.99th-percentile 3-hourly P map (Fig. 7a) exhibits more plausible patterns. Most importantly, Dietzsch et al.’s (2017) map fails to represent small-scale P variations, mainly because of its coarse resolution, and shows unrealistically low values over land compared to the oceans, reflecting the use of different P data sources for land and ocean areas (the gauge-based GPCC and satellite-based HOAPS datasets, respectively). The presence of slight discontinuities in the MSWEP V2 map at approximately 50°S (Fig. 7a) suggests that there are still inhomogeneities among the incorporated datasets, despite the frequency correction and harmonization applied. The higher 99.99th-percentile amounts near gauge locations (most noticeable in the Amazon in Fig. 7a) reflect the loss of variance between P gauges due to interpolation (Hutchinson 1998; Haberlandt 2007).
Global patterns in P occurrence.
Figure 8 presents global maps of the percentage of time without P for MSWEP V2, CMORPH, and ERA-Interim. CMORPH agrees fairly well overall with MSWEP V2 in the tropics, although it exhibits less frequent P at mid- and high latitudes (notably in southern Chile and along the Pacific coast of North America), in agreement with our P gauge-based assessment (Fig. 3e). This reflects the inability of current-generation satellites to detect P signals at high latitudes (Ebert et al. 2007; Tian et al. 2009; Tian and Peters-Lidard 2010; Behrangi et al. 2012; Massari et al. 2017; Beck et al. 2017c). Also in agreement with our P gauge-based assessment (Fig. 3f), ERA-Interim severely overestimates the P frequency across the entire globe. Our P frequency map (Fig. 8a) visually compares well with an equivalent map for the land surface derived from gauge observations during the period 1840 to 2001 produced by Sun et al. (2006, their Fig. 1). Additionally, our map agrees closely with ocean maps based on CloudSat data from 2006 to 2007 (Ellis et al. 2009, their Fig. 3a) and CloudSat, TRMM, and Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E) data from 2007 to 2009 (Behrangi et al. 2014, their Fig. 1a). We did, however, obtain a somewhat higher P frequency over the Southern Ocean, possibly because of uncertainties in the P frequency corrections caused by the near-complete absence of gauges south of 60°S (Fig. 2b).
Trenberth and Zhang (2018) examined how often it rains (or snows) worldwide for latitudes ≤60°N/S, using a gauge-corrected version of CMORPH (hourly, 0.25° resolution) and a 0.02 mm h–1 threshold, and found that P occurs 11.0% of the time on average (8.2% over land and 12.1% over oceans). Using 3-hourly accumulations and a 0.06 mm (3 h)−1 threshold (triple the hourly threshold), they found that P occurs 13.8% of the time on average (10.7% over land and 15.0% over oceans). The averages calculated using 3-hourly data are thus ∼25% higher than the ones calculated using hourly data. Based on MSWEP V2 (3-hourly, 0.1° resolution), using the same 0.06 mm (3 h)−1 threshold, we found that P occurs 15.0% of the time on average (11.5% over land and 16.2% over oceans) for the same region (≤60° latitude). Therefore, our estimates are similar to but slightly (∼9%) higher than those of Trenberth and Zhang (2018) but possibly more accurate given that the corrected CMORPH exhibits difficulties in detecting northern P (Beck et al. 2017c, their Fig. 2b). For the entire globe, based on MSWEP V2, P occurs 15.5% of the time on average, while P occurs 12.3% of the time over the land surface (excluding Antarctica) and 16.9% of the time over ocean areas. All estimates should, however, be interpreted with some caution because of the detection limits of satellite sensors (∼0.8 mm h−1 over land and ∼0.02 mm h−1 over ocean; Wolff and Fisher 2008) and rain gauges (∼0.25 mm; Kuligowski 1997), as well as the scale discrepancy between point observations from rain gauges and gridded data from satellites and reanalyses (Osborn and Hulme 1997; Ensor and Robeson 2008).
Trends in mean annual P.
Figure 9 presents global maps of the linear trend in mean annual P for MSWEP V2 and MSWEP V1; CHIRPS, version 2.0 (CHIRPS2.0); Climate Prediction Center (CPC) Merged Analysis of Precipitation, version 1707 (CMAP1707); GPCC FDR7; GPCP2.3; and HOAPS3.2 (details in Table 1). The trends were estimated at each grid cell using simple linear regression (Kenney and Keeping 1962). Over land, the datasets exhibit good agreement overall (with the exception of CMAP1707 and MERRA-2), which was expected since all datasets use similar gauge data sources. MERRA-2 exhibits suspect trend patterns over tropical land areas (Fig. 9h), which could be related to the bias adjustment using CMAP and CPC unified (Reichle et al. 2017). The small differences in trends between MSWEP V1 and MSWEP V2 (e.g., over the Amazon and the southwest Indian Ocean islands; Figs. 9a and 9b) are attributable to the new daily gauge data (“Gauge correction scheme” in the appendix). The correspondence in trends is considerably less over the oceans, presumably because of the lack of gauge observations to constrain uncertainty (Fig. 2b). CMAP1707 (and MERRA-2, which has been bias adjusted using CMAP over the oceans) generally tends more toward negative trends (Fig. 9d), which Yin et al. (2004) attributed to discontinuities caused by changes in gauge coverage and satellite input data. HOAPS3.2 exhibits a substantially noisier trend pattern and more pronounced trends overall (Fig. 9g), which are both likely attributable to its shorter data record. In the Southern Ocean, HOAPS3.2 not only shows P underestimation (Fig. 6e) but also a spurious upward trend, as reported in previous studies (Romanova et al. 2010; Liu et al. 2011). We refer to Adler et al. (2017) and Schneider et al. (2017) for a more comprehensive overview of the current state of knowledge with respect to trends in P worldwide.
Any P trend estimates should, however, be interpreted with caution because of the potential presence of temporal inhomogeneities. For gauge data, inhomogeneities tend to be caused by measurement errors and changes in station coverage (Sevruk et al. 2009; U. Schneider et al. 2014); for satellite estimates, by instrument changes, sensor degradation, and algorithm changes (Kummerow et al. 1998; Biswas et al. 2013); and for reanalyses, by production stream transitions and changes in the observing systems (Dee et al. 2011; Trenberth et al. 2011; Kang and Ahn 2015; Kobayashi et al. 2015). Additionally, agreement in trends among different P datasets does not necessarily imply less uncertainty because the input data may be the same. MSWEP V2 trends are likely subject to much less uncertainty after the year 2000, because of the relative stability of the observing systems and the addition of multiple passive microwave-based P datasets. Beck et al. (2017c) recently evaluated 22 P datasets using observations from 76,086 gauges worldwide covering 2000–16 and found that MSWEP V2 exhibits more reliable trends overall than MSWEP V1 as well as other P datasets.
We presented MSWEP V2, a gridded P dataset spanning 1979–2017, which has several unique aspects: i) fully global coverage including all land and oceans (most satellite-based datasets are limited to 50° or 60° latitude); ii) high spatial (0.1°) and temporal (3 hourly) resolution, increasing the local relevance of the P estimates; iii) optimal merging of a wide range of gauge, satellite, and reanalysis P datasets, to obtain the best possible P estimates at any location; iv) correction for distributional biases, to eliminate spurious drizzle and restore attenuated peaks; v) correction of systematic terrestrial P biases due to gauge undercatch using observed Q from 13,762 catchments worldwide; vi) corrections using daily observations from 76,747 gauges across the globe; and vii) a gauge correction scheme that accounts for gauge reporting times. The main findings are as follows:
There are marked differences in reporting times between neighboring countries and sometimes within countries. Contrary to expectations, the automated GSOD gauges exhibited reporting times averaging at around 1200 UTC rather than at 2400 UTC. These findings underscore the importance of accounting for reporting times when applying daily gauge corrections.
The gauge-based assessment of the satellite and reanalysis P datasets revealed that the reanalyses strongly overestimate the P frequency across the globe. Confirming previous studies, we found that reanalyses exhibit lower skill than the satellite estimates in the (sub)tropics, whereas the opposite was the case at high latitudes. MSWEP is the only high-resolution P dataset to date that exploits this complementary relationship.
For the CONUS, we evaluated MSWEP V2 and four other P datasets at a 3-hourly time scale using Stage IV gauge–radar P data as reference. MSWEP V2 provided the best overall performance, followed in order by MSWEP V1, MERRA-2, CMORPH, and ERA-Interim. These results confirm the effectiveness of the MSWEP V2 methodology.
Long-term mean P estimates for the global, land, and ocean domains based on MSWEP V2 are 955, 781, and 1,025 mm yr−1, respectively. This is in close agreement with previously published estimates, yet importantly for hydrological applications, other datasets appear to consistently underestimate P amounts in mountainous regions because of a lack of orographic corrections and coarser spatial resolutions.
Compared to other state-of-the-art P datasets, MSWEP V2 shows more plausible spatial patterns in mean, magnitude, and frequency. Using MSWEP V2, we estimated that P occurs 15.5%, 12.3%, and 16.9% of the time on average for the global, land, and ocean domains, respectively, slightly more frequent than previous estimates based on CMORPH.
Trends in 1979–2017 mean annual P among state-of-the-art P datasets generally agree over land, at least partly because of the use of common input datasets. Over oceans, the agreement is considerable less, possibly reflecting the lack of marine gauge observations.
MSWEP V2 is available online (via www.gloh2o.org). We thank the four anonymous BAMS reviewers and the BAMS editorial team for constructively critical comments. We gratefully acknowledge the P dataset developers for producing and making available their datasets. The Water Center for Arid and Semi-Arid Zones in Latin America and the Caribbean (CAZALAC) and the Centro de Ciencia del Clima y la Resiliencia (CR) 2 (FONDAP 15110009) are thanked for sharing the Mexican and Chilean gauge data, respectively. We also acknowledge the gauge data providers in the Latin American Climate Assessment and Dataset (LACA&D) project: IDEAM (Colombia), INAMEH (Venezuela), INAMHI (Ecuador), SENAMHI (Peru), SENAMHI (Bolivia), and DMC (Chile). We further wish to thank Ali Alijanian, Koen Verbist, and Piyush Jain for providing additional gauge data and Francesco Lin, Kevin Trenberth, Mauricio Zambrano Bigiarini, Noemi Vergopolan, Vincenzo Levizzani, and Yuting Yang for comments and suggestions on earlier versions of the manuscript. Hylke E. Beck was supported by the U.S. Army Corps of Engineers’ International Center for Integrated Water Resources Management (ICIWaRM), under the auspices of UNESCO. Albert I. J. M. van Dijk was supported under Australian Research Council’s Discovery Projects funding scheme (Project DP140103679). Diego G. Miralles acknowledges support from the European Research Council (ERC) under Grant Agreement 715254 (DRY-2-DRY).
APPENDIX: DETAILED DESCRIPTION OF THE MSWEP V2 METHODOLOGY.
Here, we describe in detail the different processing steps involved in the production of MSWEP V2 (Fig. 1a).
Gauge data quality control.
Daily gauge observations were used to determine the merging weights and wet-day biases for the individual P datasets (“Global maps of weights and wet-day biases” appendix section) and to improve P estimates near gauge stations (“Gauge correction scheme” appendix section). Our initial database comprises 117,759 gauges worldwide compiled from the GHCN-D database (Menne et al. 2012), the GSOD database (https://data.noaa.gov), the Latin American Climate Assessment and Dataset (LACA&D) database (http://lacad.ciifen.org/), the Chile Climate Data Library (www.climatedatalibrary.cl), and national databases for Mexico, Brazil, Peru, and Iran.
Gauge data can have considerable measurement errors, and therefore, quality control is important (Goodison et al. 1998; Viney and Bates 2004; Sevruk et al. 2009; U. Schneider et al. 2014). For example, GSOD records frequently contain long series of erroneous zero rainfall (Durre et al. 2010; Funk et al. 2015b). To identify and discard these periods, we developed an automated procedure entailing the following steps: i) for each month, we computed the fraction of days without P (fD); ii) we excluded months without any P (fD = 1) and computed the distribution mean μ and standard deviation σ; iii) if the CDF of the normal distribution with μ and σ evaluated at fD = 0.9 exceeds 0.85, the gauge was considered to be sufficiently “wet” for detecting the erroneous zeros, and we proceeded to the next step; iv) a year was marked as erroneous if the median of the 12 monthly fD values exceeded 0.9; and v) the 6 months preceding and following each erroneous year were also marked as erroneous. Figure 2a illustrates the procedure for an arbitrarily selected GSOD gauge with the described issue.
Additionally, we eliminated all days with P > 2,000 mm (approximately the maximum recorded 24-h rainfall; Cerveny et al. 2007) and discarded gauges with record length <4 years during 1979–2017. From the remaining set of 81,047 gauges, we also discarded those matching one or more of the following criteria (percentage of remaining gauges satisfying the criteria reported between parentheses): i) 3-day Pearson correlation coefficient r3day with all five non-gauge-based P datasets (CMORPH, ERA-Interim, GSMaP, JRA-55, and TMPA 3B42RT; Table 1) <0.4 and r3day with the nearest gauge also <0.4 (1.01%); ii) more than half of the 3-day intervals contain missing values (1.62%); iii) fewer than 15 unique values in the entire record (1.02%); iv) the highest and/or second highest values were present >3 times in the record, indicative of truncated peaks (0.60%); and v) >99.5% of the record is dry (<0.5 mm day−1; 3.05%). In total, 4,300 (5.31%) of the remaining gauges fulfilled one or more of these criteria and hence were discarded; the resultant dataset comprised 76,747 gauges (Fig. 2b).
Inferring gauge reporting times.
Information about gauge reporting times is crucial to avoid timing mismatches when applying daily gauge corrections but is generally not provided. We developed a procedure to infer gauge reporting times using four gridded 3-hourly non-gauge-based P datasets (CMORPH, ERA-Interim, GSMaP, and JRA-55; Table 1). Specifically, we calculated, for each gauge, Spearman rank correlation coefficients ρ between daily grid- and gauge-based time series, with the grid-based time series shifted by offsets of −36, −33, −30, …, +30, +33, and +36 h, resulting in 4 × 25 = 100 ρ values for each gauge. The dataset and temporal-offset combination yielding the highest ρ value was subsequently taken to reflect the UTC boundary of the 24-h accumulation period for the gauge under consideration. It should be kept in mind, however, that the inferred estimates are subject to a rounding error of at most 1.5 h and on average 45 min because of the 3-hourly temporal resolution of the P datasets. In addition, the estimates are affected by the fact that satellites represent radiation from an atmospheric column, whereas gauges represent P that has reached the surface (Villarini and Krajewski 2007). Furthermore, the approach relies on the assumption of a temporally constant reporting time, which may not be true for every gauge (Viney and Bates 2004).
Rainfall estimation using thermal infrared imagery.
MSWEP V1 relied exclusively on reanalysis and gauge data during the pre-TRMM era (before 1998; Beck et al. 2017b). For MSWEP V2, we supplemented the reanalysis and gauge data with rainfall estimates based on cloud-top IR temperatures during the pre-TRMM era to improve the accuracy in convection-dominated regions. Although several IR-based rainfall datasets already exist [e.g., CHIRP, Hydro-Estimator, PERSIANN, PERSIANN-CCS, PERSIANN–Climate Data Record (PERSIANN-CDR), and Tropical Applications of Meteorology using Satellite Data and Ground-Based Observations (TAMSAT)], none of these meet all of our requirements: i) quasi-global coverage over land and ocean, ii) temporal coverage from the 1980s to the near present, iii) spatial resolution ≤0.1°, iv) temporal resolution ≤3 h, and v) no gauge corrections. We therefore produced a new 3-hourly 0.1° rainfall dataset based on the GridSat B1 IR archive (V02R01; 3-hourly, 0.07° resolution; 1980 to the near present) containing IR imagery from various intercalibrated geostationary satellites (Knapp et al. 2011).
Although the GridSat archive has already had some quality control applied, we still observed numerous navigation, calibration, and masking errors (particularly prior to 1983). To ensure that the data were robust, several additional quality control steps were applied. First, all grid cells with values <173 K (the record minimum; Ebert and Holland 1992) were assumed to be erroneous and discarded. Additionally, if the percentage of grid cells with temperature <173 K exceeded 1%, the entire image was discarded. Furthermore, if the spatial (Pearson) correlation between the current image and the previous image (both resampled to 1° using bilinear interpolation) was <0.75, both images were discarded. Finally, assuming that sudden isolated changes in the record are indicative of errors, images were discarded if the global mean deviated >3 K from the 24-h running global mean. Note that prior to 1998, there are extensive periods of missing data because of a poorer spatial coverage.
IR data can be used to estimate rainfall in several ways (Scofield and Kuligowski 2003; Stephens and Kummerow 2007; Michaelides et al. 2009; Kidd and Levizzani 2011). Hydro-Estimator, for example, employs an empirical equation calibrated using ground radar data to obtain an initial rain-rate estimate, which is subsequently corrected using precipitable water and relative humidity outputs from an atmospheric analysis model (Scofield and Kuligowski 2003). Conversely, CHIRP employs cold-cloud duration (CCD) values derived from IR data using a fixed 235-K threshold to estimate 5-day rain rates, where the CCD–rain relationship is established by linear regression against TMPA 3B42 data (Funk et al. 2015b). Similarly, the African TAMSAT dataset uses IR-based CCD values to estimate 10-day rainfall but uses gauge observations to determine the regression parameters and temperature thresholds (Tarnavsky et al. 2014). CCD-based methods are, however, unsuitable for our purposes as they would require IR data with a temporal resolution <3 h to derive 3-hourly CCD values. PERSIANN-CCS employs a more elaborate method using artificial neural networks and IR data patterns to distinguish between cloud types, which are subsequently related to specific rainfall intensities (Ashouri et al. 2015).
Here, we used a parsimonious method entailing the following steps: i) resampling the GridSat IR data to 0.1° using bilinear interpolation; ii) rejecting IR data when daily mean Ta is <5°C, due to the difficulty of detecting P signals in cold conditions (Kidd and Levizzani 2011; Beck et al. 2017b); iii) reversing the sign of the values, since lower IR radiances correspond to higher rainfall intensities (Adler and Negri 1988); and iv) converting the values to rain rates by CDF matching against the warm-period reference P distribution produced in the “Reference P distributions” appendix section. Our approach bears some resemblance to that of Karbalaee et al. (2017), who CDF matched the IR-based PERSIANN-CCS rainfall dataset to a passive microwave-based reference. The method used here may not perform well in regions with a marked temporal variability in storm type and, correspondingly, in the relationship between IR radiance and rainfall. Any such deficiencies would be reflected in low weights in the merging process (“Global maps of weights and wet-day biases” appendix section).
Gauge-based assessment of satellite and reanalysis P datasets.
MSWEP V2 incorporates six gridded non-gauge-based P datasets (CMORPH, ERA-Interim, GridSat, GSMaP, JRA-55, and TMPA 3B42RT; Table 1). To assess the individual performance of these datasets, we calculated, for each P gauge, Pearson correlation coefficients between 3-day mean gauge- and grid-based P time series (r3day) for 2000–17 (the start date is limited by the GSMaP and TMPA 3B42RT datasets). To minimize timing mismatches between the gauge- and grid-based time series, prior to calculating the r3day values, the records of gauges with reporting times >+12 h UTC were shifted backward by −1 day, while the records of gauges with reporting times <−12 h UTC were shifted forward by +1 day (“Inferring gauge reporting times” appendix section). The use of 3-day rather than daily averages has two benefits: first, it minimizes the impact of any remaining temporal mismatches in the 24-h accumulation period between the gridded datasets and the gauges, and second, it reduces the influence of days with potentially erroneous gauge measurements. The r3day values were calculated for the full period of contemporaneous gauge- and grid-based data, as well as for “cold” and “warm” conditions, distinguished using a daily mean air temperature Ta threshold of 5°C. MSWEP V1 employed a 1°C threshold, which we increased in version 2 to further reduce the likelihood of incorporating potentially unreliable satellite data. For Ta, we used ERA-Interim (Dee et al. 2011) downscaled to 0.1° using nearest-neighbor resampling and offset to match the long-term mean of the high-resolution, station-based WorldClim, version 2.0 (WorldClim2.0), dataset (Fick and Hijmans 2017). We only calculated an r3day value if >1 year of simultaneous gauge and gridded 3-day means were available. The r3day values range from −1 to 1, with higher values corresponding to better performance.
Reanalyses tend to overestimate the P frequency and underestimate the intensity because of deficiencies in the parameterization of the physical processes controlling P generation (Zolina et al. 2004; Sun et al. 2006; Lopez 2007; Stephens et al. 2010; Skok et al. 2015; Herold et al. 2016). To quantify and correct for this, we calculated the bias in the number of wet days per year, using the P gauge observations as reference, according to Eq. (1). Wet days were identified using a 0.5 mm day−1 threshold, similar to several previous studies (e.g., Akinremi et al. 1999; Haylock et al. 2008; Driouech et al. 2009; Trenberth and Zhang 2018). The βWD values range from 0 to ∞, with values closer to unity corresponding to better performance.
Global maps of weights and wet-day biases.
Global weight maps were derived for the entire period and for warm and cold conditions for each of the non-gauge-based satellite and reanalysis P datasets (Table 1) from the gauge-based r3day values (“Gauge-based assessment of satellite and reanalysis P datasets” appendix section). The r3day values were truncated at zero, squared to yield the coefficient of determination, and subsequently interpolated to yield gap-free global weight maps by calculating, for each 0.1° grid cell, the median of the 10 nearest gauges. The cold-condition weights were set to zero for the satellite datasets. Similarly, gap-free global maps of βWD were produced for the reanalyses to correct the P frequency prior to the merging.
Because of a lack of gauges over ocean areas, the use of the 10 nearest gauges in the interpolation frequently resulted in strong discontinuities in the middle of oceans as a result of contrasting values on opposite sides of the oceans. To eliminate these discontinuities, we applied an exponential smoothing kernel with a bandwidth of 1,000 km over the ocean areas of the interpolated weight and βWD maps.
Determination of long-term mean P.
The long-term mean P over the land surface was determined in version 2 using the WorldClim dataset (1-km resolution; version 2.0; Fick and Hijmans 2017) rather than the CHPclim dataset (0.05° resolution; Funk et al. 2015a). We switched from CHPclim to WorldClim because of the better gauge coverage in South America, Scandinavia, India, Australia, and New Zealand. Systematic P underestimation over land due to gauge undercatch and orographic effects (Kauffeldt et al. 2013; Beck et al. 2015, 2017a; Prein and Gobiet 2017) was corrected similarly to MSWEP V1 by inferring catchment-average P using the Zhang et al. (2001) relationship in combination with river discharge Q observations and potential evaporation Ep estimates (Beck et al. 2017b). However, for MSWEP V2, the correction factors inferred for Chilean and Iranian catchments were set to 1 prior to the interpolation, because of suspected issues with the observed Q data.
The long-term mean P over the oceans was estimated by weighting the long-term means of five satellite and reanalysis datasets (CMORPH, ERA-Interim, GSMaP, JRA-55, and TMPA 3B42RT; Table 1). The weights for the satellite datasets ws were set to 1 for latitudes <20° and 0 for latitudes >40°, decreasing linearly from 1 at 20° to 0 at 40°. The weights for the reanalyses wr were set to 1 − ws. Thus, wr was set to 0 at latitudes <20°, due to the tendency of reanalyses to overestimate tropical P amounts (Trenberth et al. 2011; Kang and Ahn 2015).
P frequency correction and dataset harmonization.
The following three steps were implemented to reduce the P frequency of the two reanalyses and harmonize the six non-gauge-based P datasets incorporated in MSWEP V2 (CMORPH, ERA-Interim, GridSat, GSMaP, JRA-55, and TMPA 3B42RT; Table 1):
The datasets with spatial resolutions higher or lower than 0.1° (CMORPH, ERA-Interim, JRA-55, and TMPA 3B42RT) were resampled to 0.1 using nearest-neighbor resampling, and 3-hourly means were calculated for the datasets with temporal resolutions <3 h (CMORPH and GSMaP).
The Water and Global Change (WATCH; Weedon et al. 2011) and WATCH Forcing Data ERA-Interim (WFDEI; Weedon et al. 2014) datasets [derived respectively from the 40-yr ECMWF Re-Analysis (ERA-40) and ERA-Interim] were corrected for overestimations in P frequency by progressively removing the smallest events until the P frequency matched that of the gauge-based CRU dataset. However, this approach results in P distributions with a lack of light P events. We therefore employed an alternative approach to correct the P frequency of the reanalyses (ERA-Interim and JRA-55). First, for grid cells with interpolated βWD values >1, we calculated the “correct” annual number of wet days (WDobjective ) according to WDobjective = WDgridded /βWD, where WDgridded was calculated from daily accumulations and βWD represents the interpolated value (“Global maps of weights and wet-day biases” appendix section). Next, we iteratively carried out the following steps: i) subtract d mm (3 h)−1 from the original 3-hourly time series, starting with d = 0.01 mm (3 h)−1; ii) truncate the resulting values to zero and rescale them to restore the original long-term mean; iii) calculate the annual number of wet days from daily accumulations (WDnew); and iv) return to step i, increasing d in 0.01 mm (3 h)−1 increments, until WDnew ≤ WDobjective. Figure 1b illustrates the procedure for ERA-Interim.
The reanalysis datasets, which are valid for the entire period, and the satellite datasets, which are only valid for warm conditions, were rescaled to minimize the presence of spurious temporal discontinuities after merging. For this purpose, we first rescaled the reanalyses to match the long-term P estimates derived in the “Determination of long-term mean P” appendix section. Next, means were calculated for the entire period and for warm and cold conditions based on the rescaled reanalyses, using the full-period weight maps derived in the “Global maps of weights and wet-day biases” appendix section. Finally, the satellite datasets were rescaled to match the rescaled warm-condition reanalysis mean.
Reference P distributions.
In MSWEP V2, the 3-hourly merged satellite and reanalysis P estimates were CDF matched to reference P distributions (Fig. 1) to correct the spurious drizzle and attenuated peaks evident in version 1 (Nair and Indu 2017; Zhang et al. 2017). Two separate 3-hourly reference distributions (0.1° resolution) were calculated: one representing warm conditions and one representing cold conditions (as before distinguished using a daily mean Ta threshold of 5°C). The reference distribution for warm conditions was calculated by weighted-median averaging of the distributions of five satellite and reanalysis P datasets (CMORPH, ERA-Interim, GSMaP, JRA-55, and TMPA 3B42RT; Table 1). The GridSat dataset was excluded because it does not represent an independent estimate, being derived using the reference distributions (“Rainfall estimation using thermal infrared imagery” appendix section). For cold conditions, the reference distribution was calculated by weighted-mean averaging of only the two reanalysis P datasets (ERA-Interim and JRA-55). Prior to the averaging, the P frequency of the reanalyses was corrected and the datasets were homogenized as described in the previous section. We only used data observed since 2000 to derive the reference distributions for two reasons: i) to avoid inconsistencies between the warm- and cold-condition reference distributions due to the much longer temporal coverage of the reanalyses and ii) because satellite data prior to 2000 are subject to more uncertainty (Xie et al. 2017).
Merging of satellite and reanalysis P datasets.
Six 3-hourly non-gauge-based P datasets (CMORPH, ERA-Interim, GridSat, GSMaP, JRA-55, and TMPA 3B42RT; Table 1) were merged through the following steps:
For cold and warm conditions separately, and for every possible P dataset combination, the 3-hourly estimates were merged by weighted-mean averaging using the interpolated weight maps (“Global maps of weights and wet-day biases” appendix section). The total number of combinations comprising two or more P datasets equals 57 for warm conditions, while just one combination (containing both reanalyses) is valid for cold conditions (the satellite data were discarded). Prior to the merging, the P frequency of the reanalyses was corrected, and the datasets were harmonized (“P frequency correction and dataset harmonization” appendix section). Satellite data were discarded prior to 2000 and for grid cells with daily mean Ta ≥ 5°C less than 10% of the time.
Averaging multiple P datasets tends to result in spurious drizzle and attenuated peaks, as was the case for MSWEP V1 (Nair and Indu 2017; Zhang et al. 2017). To correct for this, we CDF matched the merged P estimates from 2000 to 2017 of each dataset combination, for cold and warm conditions separately, to the reference P distributions (which represent 2000–17; see the “Reference P distributions” appendix section). Similar CDF-matching approaches have been used to correct other P datasets, including CMORPH (Xie et al. 2017), Global Ensemble Forecast System (GEFS; Zhu and Luo 2015), and PERSIANN-CCS (Karbalaee et al. 2017). To obtain consistent time series for the entire 1979–2017 period, we first calculated the change in the P estimates due to the CDF corrections for different P magnitudes, after which we applied the same magnitude-specific changes to the P estimates from 1979 to 1999.
A side effect of the implemented CDF corrections is that they result in regionally amplified trends. These corrections essentially increase (decrease) the magnitude of large (small) P events, inadvertently causing the trends associated with large events to become not just stronger but also more prominent in the overall record. We therefore rescaled the merged CDF-corrected estimates for cold and warm conditions separately and for each dataset combination such that their trends match those of the merged non-CDF-corrected estimates. Trends were calculated using simple linear regression (Kenney and Keeping 1962).
For cold and warm conditions separately, and for each possible dataset combination, we subsequently summed the interpolated weights of the incorporated datasets, yielding the cumulative interpolated weight, which roughly reflects the total information content of the dataset combination in question. Next, we selected, for each 3-hourly time step and 0.1° grid cell, the merged and corrected P value from the dataset combination with the highest cumulative weight. The applied CDF corrections ensure that temporal transitions from one dataset combination to another are largely unnoticeable. Figures 1c and 1d illustrate the merging procedure for a single grid cell.
Gauge correction scheme.
The merged 3-hourly satellite- and reanalysis-based P data (referred to as pmerge; “Merging of satellite and reanalysis P datasets” appendix section) were corrected using gauge P observations through an iterative, multiplicative approach that accounts for variability in the reporting times of gauges (“Inferring gauge reporting times” appendix section). We used a multiplicative rather than an additive correction method (Vila et al. 2009) to preserve the subdaily distribution of pmerge. The approach assumes that the long-term mean of pmerge, being based on the gauge-corrected WorldClim dataset (“Determination of long-term mean P” appendix section), is already reliable and therefore only adjusts the temporal variability of pmerge using the gauge data. The approach entails the following steps:
For each 0.1° grid cell, very small P amounts were added to pmerge, to avoid a high gauge estimate from yielding a zero estimate after the correction when pmerge = 0, which occurs frequently in MSWEP V2 because of the P frequency and CDF corrections. Specifically, we added an almost negligible amount (0.1%) of the non-CDF-matched (and thus drizzly) merged satellite- and reanalysis-based P data. The resulting estimate will be referred to as pdrizzly.
The five nearest (as the crow flies) gauges were selected (“Gauge data quality control” appendix section), and each gauge record was rescaled such that its mean equals that of pmerge for the period of overlap.
pdrizzly was corrected at the daily time scale in an iterative manner by looping through the five nearest gauges. During each loop, daily P accumulations of pdrizzly were calculated for the 24-h period ending at the reporting time, after which a blended estimate was calculated by weighted-mean averaging of the daily pdrizzly and gauge accumulations. The 3-hourly pdrizzly data were subsequently rescaled to match this blended estimate and passed on to the next loop iteration. The gauge weight wg (unitless) was calculated according to wg = 4exp(−di /d0), where di (km) represents the distance from the grid-cell center to the gauge and d0 (km) represents the range of influence (set to 25 km using trial and error). The pdrizzly weight was calculated as the sum of the weights assigned to the incorporated gridded P datasets (step 3 of the “Merging of satellite and reanalysis P datasets” appendix section) and the gauge weights from the previous loop iterations. Figures 1e and 1f illustrate the importance of accounting for reporting times when applying daily gauge corrections.
To take advantage of the wider availability of monthly gauge data, we subsequently corrected pdrizzly using the monthly 0.5° GPCC FDR7 dataset (U. Schneider et al. 2014, 2017) following the same procedure but without accounting for gauge reporting times to yield the final gauge-corrected MSWEP V2.