A Global Probabilistic Dataset for Monitoring Meteorological Droughts

: Accurate and timely drought information is essential to move from postcrisis to preimpact drought-risk management. A number of drought datasets are already available. They cover the last three decades and provide data in near–real time (using different sources), but they are all “deterministic” (i.e., single realization), and input and output data partly differ between them. Here we first evaluate the quality of long-term and continuous climate data for timely meteorological drought monitoring considering the standardized precipitation index. Then, by applying an ensemble approach, mimicking weather/climate prediction studies, we develop Drought Probabilistic (DROP), a new global land gridded dataset, in which an ensemble of observation-based datasets is used to obtain the best near-real-time estimate together with its associated uncertainty. This approach makes the most of the available information and brings it to the end users. The high-quality and probabilistic information provided by DROP is useful for monitoring applications, and may help to develop global policy decisions on adaptation priorities in alleviating drought impacts, especially in countries where meteorological monitoring is still challenging.

AFFILIATIONS: Turco and Jerez-Regional Atmospheric Modeling Group, Department of Physics, University of Murcia, Murcia, Spain; Donat-Earth Science Department, Barcelona Supercomputing Center, Barcelona, Spain; Toreti-European Commission, Joint Research Centre, Ispra, Italy; Vicente-Serrano-Instituto Pirenaico de Ecología, Consejo Superior de Investigaciones Científicas, Zaragoza, Spain; Doblas-Reyes-Earth Science Department, Barcelona Supercomputing Center, and ICREA, Barcelona, Spain E cosystems and human societies are strongly impacted by drought (Wilhite 2000;Vicente-Serrano et al. 2013;Turco et al. 2017a;Toreti et al. 2019). While most natural hazards are rapid-onset events (e.g., floods), drought is considered a "creeping disaster": It is usually a slow-onset phenomenon (Gillette 1950;Schwalm et al. 2017), which also means that there may be more time to prepare and implement an adequate response (Wilhite 2012). Accurate and timely information of evolving drought conditions is crucial to take early action to alleviate their impacts (Pozzi et al. 2013;Hao et al. 2017). For instance, using the best and updated drought information available in drought-monitoring systems, authorities and water managers may establish better practices to optimize water use, improve control of environmental systems (e.g., forest-fire incidence) and plan measures for agriculture. To enable a proactive response, near-real-time observed data are of paramount importance (Wilhite et al. 2014;Wilhite and Pulwarty 2017). In addition, the best available, up-to-date information on drought can be used as inputs to generate better forecasts (e.g., Hao et al. 2014;Dutra et al. 2014b; Mo and Lyon 2015;Turco et al. 2017b), and to develop impact-risk prediction models (Turco et al. 2018).
There are several operational drought-monitoring systems that operate at regional scale, such as the U.S. Drought Monitor (Lawrimore et al. 2002;Svoboda et al. 2002), the European Drought Observatory (Sepulcre-Canto et al. 2012), the Drought Observatory for the Tuscany Region and the Mediterranean (Magno et al. 2018), an experimental tool for Africa (Sheffield et al. 2008), and a recently developed dataset for South Asia (Aadhar and Mishra 2017). On a global scale, a few drought-monitoring tools are also available, including the global drought-monitoring system based on the standardized precipitation-evapotranspiration index (Beguería et al. 2014), the Global Integrated Drought Monitoring and Prediction System (GIDMaPS; Hao et al. 2014), and the Global Precipitation Climatology Centre (GPCC) drought index (Ziese et al. 2014).
Drought is a complex phenomenon that involves different natural and eventually also human drivers. Based on both physical and socioeconomic contributing factors, drought is usually classified into four types: meteorological, agricultural, hydrologic, and socioeconomic (Wilhite and Glantz 1985). Meteorological droughts are apparent after a period of time with a deficiency of precipitation (Wilhite et al. 2014). Agricultural drought is generally identified by soil-moisture deficit, while hydrologic drought is related to surface or subsurface water deficit. Finally, the socioeconomic type considers drought in terms of supply and demand, evaluating the impacts of a water deficit on socioeconomic systems (Van Loon et al. 2016). In this study we focus on precipitation deficits through the standardized precipitation index (SPI; McKee et al. 1993), suggested by the World Meteorological Organization as a starting point for meteorological drought monitoring (WMO 2012).
The quality of data available in real-time data are still a constraint in global drought monitoring , and a common shortcoming of all the available datasets is that they usually do not quantify the inherent uncertainty (with few exceptions, see, for example, Hersbach et al. 2020;Cornes et al. 2018;Frei and Isotta 2019). All the real-time climate products have inherent uncertainties, originating mainly from data-quality issues, periods of data unavailability, and/or poor spatial coverage of observations (Tapiador et al. 2017;Sun et al. 2018). These challenges are especially pronounced when observations are needed for drought monitoring in near-real time as less time is available to retrieve and control the observations. Moreover, gridded datasets, which are the main source for drought monitoring, have a number of potential inaccuracies and errors (Dunn et al. 2014;Beck et al. 2017;Sun et al. 2018;Beck et al. 2019b). Generally, interpolation errors and uncertainties increase as the network density decreases, especially for variables with shorter spatial decorrelation scales (e.g., precipitation), and the quality degrades in areas of complex terrain. For instance, although the GPCC uses the largest number of stations worldwide (more than 85,000 from different sources) to produce different gridded precipitation products (including the GPCC drought index), it has spatial representativeness problems since the stations are heterogeneously distributed and their number is not constant over time (Ziese et al. 2014). Satellite observations provide new opportunities for climate monitoring with a more homogeneous and consistent spatial coverage. However, as precipitation cannot be directly measured by satellites, the estimates are also affected by uncertainties related to conversion/transfer functions (AghaKouchak et al. 2015). In addition, satellite data are affected by retrieval error and biases (exacerbated when considering extremes) and their relatively short lengths of records can limit their applications (Mu et al. 2013;Maidment et al. 2017;Ceglar et al. 2017). To try to alleviate these problems, a number of datasets combine rain gauge analyses with satellite estimates in order to reduce the biases. Note, however, that gridded precipitation estimates themselves are subject to substantial uncertainty (Herold et al. 2016). Finally, reanalyses (here referring to atmospheric models that assimilate a set of observations) provide physically consistent estimates of climate variables with temporal continuity and spatial homogeneity (Kalnay et al. 1996). However, uncertainties in the assimilated observations and the model limitations influence the quality of the estimates generated by the reanalyses (Parker 2016;Buizza et al. 2018).
Despite the growing number of climate datasets and studies analyzing these data [for instance the review of Sun et al. (2018) compares 30 currently available global precipitation datasets], a global assessment of meteorological datasets covering the last three decades and providing data in near-real time is still missing. A careful assessment of these datasets could help to characterize the uncertainties relevant to meteorological drought, a crucial step in translating data into actionable information for making decisions. A strategy to deal with these uncertainties comes from weather/climate prediction studies: using an ensemble of observations/reanalysis to quantify the observational agreement among its members (Dutra et al. 2014a;Mo and Lettenmaier 2014;Massonnet et al. 2016).
The objectives of this paper are as follows: (i) to assess the quality of the datasets available in near-real time for meteorological drought monitoring at the global scale (gridded observations, state-of-the-art reanalyses, and mixed products obtained by merging gauge observations with satellite estimates); and (ii) to describe the development of Drought Probabilistic (DROP), a new probabilistic monitoring tool, in which an ensemble of observation-based datasets is used to obtain real-time estimates and their associated uncertainties.
Data and methods DROP underlying data and methodology. Figure 1 schematically illustrates the three main steps in producing the DROP dataset. First, we search for all the available precipitation data that are currently publicly available, covering at least the last three decades (drought monitoring requires an extended record of observations in order to calculate anomalies that can be used to identify drought events), and that are updated every month, that is, that are available in near-real time. Ten datasets satisfy the abovementioned requirements (Table 1). Three of these datasets are based exclusively on interpolated station observations [CPC, GPCC, and Precipitation Reconstruction over Land (PREC/L)], four are based exclusively on reanalysis data (ERA5, JRA-55, NCEP, and MERRA-2), and three datasets combine gauge and satellite data [Climate Anomaly Monitoring System-OLR Precipitation Index (CAMS_OPI), Climate Hazards Group Infrared Precipitation with Station data (CHIRPS), and GPCP].
Next, we calculate the SPI (McKee et al. 1993) for each dataset. SPI is a transformation of the accumulated precipitation values over a specific time scale (here over 1, 3, 6, and 12 months) into a standard Gaussian distribution with mean 0 and standard deviation 1. Positive values indicate surplus of rainfall, whereas negative values identify dry conditions relative to the long-term climatology. The SPI has been widely used for meteorological drought studies and is recommended for this purpose by the World Meteorological Organization (WMO 2012). This index has several advantages compared to other indicators: it is easy to compute, since it requires only precipitation as an input variable; it is flexible, since it can be computed for different time scales; it is spatially consistent because the product is standardized and so may be compared equally well anywhere in space. Besides, since the standardization in the SPI uses its own climatology, it adjusts for bias in mean and variability of precipitation. However, it has also a weakness the user needs to take into account: SPI cannot directly identify the role of other variables (e.g., temperature, humidity, solar radiation, or wind speed) in drought conditions. The SPI transformation is applied to each dataset, resulting in an ensemble of 10 SPI estimates.
Finally, each dataset constitutes a member of the ensemble. DROP uses the calculated SPI for all ensemble members to provide drought information in a variety of ways. DROP is a new global land gridded dataset that provides different layers: the ensemble mean, the ensemble spread, the drought warning, and the confidence level of drought. The most basic information is the ensemble mean for users interested in the magnitude of the SPI. Importantly, in order to guarantee the same statistical characteristics of the SPI, the ensemble mean of the different SPI estimates has been rescaled (obtained by defining an anomaly by subtracting the long-term mean from the original series and dividing the anomaly by its long-term standard deviation) to retain the unit standard deviation (as recommended by Dutra et al. 2014b). An estimation of the uncertainty around the ensemble mean is provided by the ensemble spread (i.e., the standard deviation). DROP also provides a simple color-coded drought warning based on a combination of uncertainty and severity as indicated in Table 2. This approach, based on the guidelines for disaster management of the European Commission (EC; EC 2010), allows users to focus on areas where there is high confidence of severe drought (highlighted in red colors), or where there is high confidence of moderate drought/moderate confidence of severe drought (orange) or where there is either high confidence of abnormally dry conditions or low confidence of severe drought (yellow). In addition to this simple colored warning system, DROP provides also the full confidence level of having (at least) moderate drought, thus giving a more detailed illustration of the drought uncertainty. Specifically, the confidence level to observe moderate drought is determined from the fraction of members having SPI < −0.8 [i.e., the SPI threshold to have moderate drought according to Svoboda et al. (2002); see also Table ES1 in the online supplement (https://doi.org/10.1175/BAMS-D-19-0192.2)]. It is worth noting that several thresholds can be implemented and tested. High percentages of drought occurrence indicate a high degree of confidence of observing drought among all the available datasets.
DROP evaluation. Different drought types (e.g., agricultural, hydrological) are often linked with below-normal levels of groundwater, soil moisture, and streamflow and, although different processes are involved, it is expected some degree of correlation between SPI (as a measure of meteorological drought) and these variables (Dai 2011a). Thus, following Dai et al. (2004) and Dai (2011b), we evaluate the quality of the DROP dataset against the independent data of the European Space Agency soil moisture dataset (Gruber et al. 2019;Dorigo et al. 2017;Gruber et al. 2017), of the terrestrial water storage changes from the Gravity Recovery and Climate Experiment (GRACE; Zhao et al. 2017) and of the Global Streamflow and Metadata Archive (GSIM; Do et al. 2018;Gudmundsson et al. 2018), a global collection of streamflow time series.
We also consider the Multi-Source Weighted-Ensemble Precipitation (MSWEP) (v2.1; Beck et al. 2019a) precipitation dataset for the evaluation, but not as a DROP ensemble member as it is not available in near-real time. Recent studies comparing several precipitation datasets have shown that it provides the best overall performance (Beck et al. 2017(Beck et al. , 2019bXu et al. 2019), which is also confirmed by our assessment that compares all the datasets against the independent data (see "Results and discussion" section). Currently, MSWEP is the only global precipitation dataset at high spatial resolution (0.1° resolution) obtained by merging a rain gauges, satellite, and reanalysis estimates (Beck et al. 2019a).
To compare the various gridded datasets, their precipitation values are remapped (firstorder conservative remapping; Jones 1999) from their original resolution to the coarsest grid of the available datasets, defined by GPCP (2.5° × 2.5°).
We evaluate different attributes (Murphy 1993) of the DROP quality: degree of association between the ensemble mean and the reference datasets through the Pearson's r correlation coefficient; accuracy by the mean absolute error; reliability through the reliability diagram (Weisheimer and Palmer 2014); and resolution by the relative operating characteristic (ROC) plot (Mason and Graham 2002). The details of the verification strategy are provided in the appendix.

Results and discussion
Deterministic assessment. Figure 2 shows the strong agreement between the DROP dataset and the MSWEP precipitation dataset, with most of the grid cells showing high and statistically significant correlations (but note that there are extended regions in Africa and South America where the correlation values are lower, although still statistically significant). Furthermore, DROP shows extensive regions with statistically significant correlations against soil moisture, being the highest values mainly clustered in the United States, southern and eastern Europe, central and southern Asia, northern and southern Africa, eastern and southern South America, and Australia. Over most land areas, DROP is positively and statistically significant correlated with the variations in GRACE total water storage. Furthermore, around 90% (336 of 378) of the streamflow data show positive and statistically significant correlations with DROP.
Of course, the links between precipitation-related indices and other hydrological variables are limited due to the complex processes involved (nature and human-based). However, the comparison between all the SPI12 gridded datasets and these variables offers insight on the quality of the former data. Figure 3 summarizes the results for all datasets and variables considered. Generally, all datasets show relatively high correlation values in most grid cells against the MSWEP dataset ( Fig. 3a and Fig. ES1), with more than half of the grid cells having correlation values above 0.7. Of the individual members, CHIRPS, GPCC, and GPCP demonstrate the best performance, with more than half of the grid cells having correlation values Table 2. This drought-risk matrix indicates the colors used to plot the drought warning levels. For example, yellow could reflect either a high confidence of abnormally dry conditions or a low confidence of a severe drought.
above 0.8. DROP systematically outperforms the individual products, especially reducing the spread in the verification metrics, improving, in particular, the agreement in regions with the lowest skill (Fig. ES1). The high quality of DROP is also confirmed by the mean absolute error (MAE) metric (Fig. ES2). Generally, all the datasets show relatively low MAE values in most grid cells, with more than half of them having MAE less than 0.6. DROP reduces the MAE of the individual members, with more than half of the grid cells having MAE less than 0.35. This verification has been performed using the SPI12, but similar results have been obtained with SPI1, SPI3, and SPI6 (see Fig. ES2).

b) DROP correlation with ESI CCI soil moisture a) DROP correlation with MSWEP c) DROP correlation with GRACE d) DROP correlation with GSIM
Considering the independent data (Figs. 3b-d and Figs. ES3-ES5), CHIRPS, ERA5, GPCC, and GPCP and DROP show the highest correlations, with more than half of the grid cells having correlation values above 0.4 considering soil moisture and GRACE data, and above 0.6 considering streamflow series. In conclusion, the magnitudes of the correlations between various hydrological variables clearly show that DROP is capable to monitor drought conditions in different systems. The performance of DROP is very similar to MSWEP but regular updates allow DROP to be used as well-performing monitoring product.
Probabilistic evaluation. Due to inherent, large uncertainties in monitoring drought, a key added value of DROP is its provision of an estimate of these uncertainties. Figure 4a shows the reliability diagram for Australia for moderate drought events (SPI < −0.8; Svoboda et al. 2002) as an illustrative example. It compares the observed (MSWEP) relative frequency against the monitored (DROP) confidence of occurrence of this kind of events, providing a quick visual assessment of the reliability. A perfectly reliable system should draw a line as close as possible to the diagonal (slope equal to 1) indicating that the observed relative frequency and the DROP confidence of occurrence are similar. In this example the uncertainty range of the reliability line does not include the perfect reliability line. Actually, the slope is larger than 1, indicating that DROP overestimates low confidence levels (e.g., there are not events when DROP indicates 10% of drought confidence) and underestimates high confidence levels [e.g., the observed frequency in top 10% (above 0.9) do not occur in DROP]. However, it is inside the skillful Brier skill score (BSS) area (BSS > 0 and slope > 0), indicating that DROP is still very useful for decision-making (Weisheimer and Palmer 2014) in this region. In fact, the ROC diagram for the same region and type of event (Fig. 4b) shows that DROP has skill in terms of identifying drought occurrence probabilistically, since its curve is well above the identity line (where the hit rate equals the false alarm rate). Figures 4c and 4d show the same analysis for the Amazon basin. In this case the spread in the reliability diagram is closer to the perfect reliability line, but the ROC diagram indicates a lower skill compared to the previous case. This was somehow expected, since the Amazon basin is a more challenging area for drought monitoring than Australia, partly due to a lack of station observations (Sun et al. 2018).
To summarize the results for all the regions, we calculate the slopes of the reliability diagrams and the ROC area skill scores (and associated uncertainties) and show them as boxplot distributions in Fig. 5. Figure 5a shows that the slopes are always positive, and in few regions the spread includes the perfect reliability line. In most of the areas the slope is higher than 1, indicating that DROP, although reliable, tends to be overconfident in those regions. Figure 5b shows that the ROC area skill score (ROCSS) is always larger than 0.5, indicating skill, and, not surprisingly, that the higher values are obtained in areas with good observational coverage (Sun et al. 2018): Australia, North America, Europe, and East and North Asia (also confirmed in Fig. ES6 that shows the correlation of DROP against MSWEP aggregated for the same regions). Similar results have been obtained considering SPI1, SPI3, and SPI6 and considering severe instead of moderate droughts (Figs. ES7 and ES8).
Case studies. The skill estimates based on the performance of the system in the past may guide end users on the expected performance of the system in monitoring drought events. As illustrative applications, we compare the ability of DROP in monitoring two extreme events: Africa in 1984 and the United States in 2012. They represent two illustrative tests of the system, the former in a data-poor region and the latter in an area with very good observational coverage.
Wide areas in Africa experienced extreme drought conditions in 1984, including many regions in eastern, western, and southern Africa, resulting in one of the worst humanitarian disasters of the twentieth century (Naumann et al. 2014;Masih et al. 2014). The SPI estimated by the MSWEP dataset (Figs. 6a,b) indicates extended drought areas with SPI reaching values below −2, that is, exceptional drought conditions [according to Svoboda et al. (2002) and Table  ES1]. The spatial pattern of the drought severity depicted by the DROP ensemble mean (Fig. 6c) resembles the pattern of MSWEP (Fig. 6a). The DROP ensemble spread (Fig. 6d) clearly points to large uncertainty in most of the regions, thus indicating that normal and drought conditions could be equally likely and highlighting the challenge of monitoring and alert systems. Figure 6e shows the DROP warning-level map. Most of the continent shows yellow to red color (i.e., low, medium, and high drought warning levels, respectively), indicating the large spatial extent of this drought event. This map allows users to focus on the areas where there is higher confidence in the severity of the drought conditions. For instance, red colors indicate high agreement among the ensemble members of having SPI < −1.3, that is, a severe drought. The warning-level "medium" instead reflects either high confidence of moderate drought or low-to-medium confidence of a severe drought. Finally, the DROP confidence level to have SPI lower than −0.8 is plotted in Fig. 6f. The area where the values are larger than 60% (brown colors) are generally consistent with the area where the MSWEP indicates the most extreme drought conditions. Thus, a generally consistent pattern emerges from the analysis of the available data, further supporting the robustness of DROP. Clearly, results for individual grid points should be considered with caution. Figure 7 shows the drought evolution associated with a pixel in central Africa (identified by an arrow in Fig. 6c), elucidating how the uncertainties can be very large (see also Fig. ES9 that shows large uncertainties also aggregating the data over a larger region instead of a single pixel). While MSWEP estimates moderate drought conditions for the entire period here considered, DROP does only for some months. Among the individual members, the spread is very large, with some datasets indicating extreme droughts and others indicating wet conditions. Finally, a large uncertainty affects the estimated end of this drought event. The consistent increase in SPI values between March and September 1985 among the majority of the datasets (the exception being MSWEP, which shows moderate drought conditions during this period) indicates that the attenuation of the drought likely started in mid-1985. A recently developed global database of meteorological drought events (Spinoni et al. 2019) also shows that in the middle of 1985 SPI12 tended toward normal conditions. This case study illustrates that MSWEP, despite being a dataset with an overall good quality, is also affected by uncertainties, especially in areas with limited observations. This further emphasizes the importance of ensembles of monitoring datasets for drought-alert systems.
A second case study is shown in Fig. 8. The central United States experienced in 2012 one of the most severe droughts since the start of the observational records, with estimated  Table ES1], (c) DROP ensemble mean, (d) SPI ensemble spread, (e) warning-level map defined according to Table 2, and (f) the confidence level for moderate drought occurrence (i.e., SPI < −0.8). The time series derived for the pixel indicated by an arrow in (c) is shown in Fig. 7. losses of $12 billion and severe impacts on agricultural production (Hoerling et al. 2014;Spinoni et al. 2019). The agreement between MSWEP and DROP is very high as both show a vast area of central United States under severe to exceptional drought conditions. The low spread reflects the fact that individual members show similar values in the identified region with "high" warning level (highlighted in red colors) and with the confidence of more than 80% to have drought conditions. Also the temporal evolution of this drought event in a particular grid point shows a relatively low uncertainty (Fig. 9), at least compared with the previous African case study (see also Fig. ES9). However, uncertainties also affect the estimated onset of the event, with some spread between April and June 2012. According to Hoerling et al. (2014)

Conclusions
We present DROP, a new global probabilistic precipitation-based dataset for monitoring and early warning of meteorological drought events. DROP is operationally updated every monthly and provides probabilistic information in near-real time, that is, up to the previous month. An ensemble approach, similarly to weather/climate prediction studies, is used by DROP, where the members represent the available observations-based products. We have shown the importance of having an ensemble-based probabilistic approach in near-real-time monitoring systems, aimed at providing the best possible information for planning and acting to reduce the potential impacts (e.g., crop losses, increased fire risk). Indeed, DROP could become an important tool to inform end users across a range of socioeconomic sectors (e.g., energy and water management, insurance, agriculture, fire risk).
DROP is publicly available online (www.um.es/gmar/projects/predfire.html). Users can retrieve the estimated SPI indices of the ensemble mean of DROP and drought confidence levels. All codes used in the production of DROP are also freely available, via the DROP archive, which ensures adherence to the Enabling Findable, Accessible, Interoperable and Reusable (FAIR) Data Project for Earth-science research (www.copdess.org/enabling-fair-data-project/). In the future, we may see development of more refined specific monitoring systems at regional scales where high-resolution data are available, and of more sophisticated methods (including other climatic variables, such as evaporative demand or soil moisture) to monitor other types of drought (e.g., agricultural drought). For this, it is worth noting that the recently developed ERA5 (Hersbach 2020) offers an ensemble that could be beneficial in estimating uncertainty of different climate variables relevant for drought monitoring (while for DROP we use the ensemble mean of only one variable, the precipitation).
In addition, DROP may serve as a basis for model evaluation (it is worth noting that verification of climate predictions generally neglect uncertainties in observations; Massonnet et al. 2016;Bellprat et al. 2017) and to provide initial conditions for improving seasonal forecasts.
The powerful impacts of droughts necessitate a move from postcrisis to preimpact droughtrisk management (Wilhite et al. 2014) and points to the need for innovative solutions providing the full range of information. We believe DROP contributes to address this need.  Table 1 for providing access to these datasets. Special thanks to Dr. Meng Zhao to provide the GRACE data and Dr. Hong Xuan Do for providing R scripts to read and process the GSIM data.

Appendix: Details of the drop development and verification
Here we describe in more detail the approach followed to develop and evaluate the DROP dataset.
We computed SPI using the implementation of the R package SPEI (Beguería and Vicente-Serrano 2013), that is, by fitting the precipitation series to a gamma distribution in accord with by the procedure recommended by the World Meteorological Organization (WMO 2012).
We computed the Pearson correlation coefficients between all the gridded datasets available in near-real time (including the ensemble mean of the 10 individual datasets) and the references datasets for each grid point. Specifically, we compare (i) the annual mean annual soil moisture (Gruber et al. 2019;Dorigo et al. 2017;Gruber et al. 2017) and the SPI12; (ii) the monthly GRACE drought severity index (Zhao et al. 2017) and the SPI12 over the common period April 2002-December 2016 (following Cammalleri et al. 2019); and (iii) the annual observed GSIM streamflow Gudmundsson et al. 2018) and the basin-averaged SPI12 (rescaled in order to preserve the unit standard deviation). The GSIM collects 30,959 time series. We consider only the stations with over 90% of valid data during the study period and whose basins contain at least one pixel of the 2.5° grid, resulting in a set of 378 gauges. We estimated the significance level of correlation using the Student distribution with N degrees of freedom, N being the effective number of independent data calculated following the method described in Von Storch and Zwiers (2001). We also compute the MAE between all the gridded datasets available in near-real time (including the ensemble mean of the 10 individual datasets) and the MSWEP data for each grid point.
We computed the reliability diagrams, a common diagnostic of probabilistic forecasts that shows for a specific event (e.g., moderate drought) the correspondence of the DROP confidence levels with the reference frequency (here based on MSWEP data) of occurrence of that event (Weisheimer and Palmer 2014). We also included the weighted linear regression through the points in our diagrams using the number of DROP values in each confidence bin as weights (following Weisheimer and Palmer 2014).
We consider the ROCSS based on the ROC diagram. ROC shows the hit rate (i.e., the relative number of times a DROP event actually occurred, based on MSWEP) against the false alarm rate (i.e., the relative number of times an event was indicated by DROP but did not eventuate) for different potential decision thresholds (Mason and Graham 2002).
To increase the sample size, the reliability diagrams and the ROC are computed by pooling all the grid points together over large regions (see Dutra et al. 2014a), following the procedure recommended by the WMO (WMO 2010). Specifically, for moderate drought events (i.e., SPI < −0.8), we calculate the confidence levels using the ensemble members' distribution. Then, we group these levels into bins (here five of width 0.2) and count the observed occurrences and non-occurrences. Finally, we sum these counts for all area-weighted grid points in each region. We estimate the uncertainties in the reliability slopes and the ROCSS using bootstrap resampling, where the DROP and MSWEP pairs are drawn randomly with replacement 1,000 times and new confidence levels and observed occurrences are calculated. The confidence interval is defined by the 2.5th and the 97.5th percentiles of the ensemble of the 1,000 bootstrap replications. To account for the spatial dependence structure of the data, we use the same resampling sequence for all grid points within each bootstrap iteration [see also Turco et al. (2017b) for an application of the verification approach applied to seasonal drought forecasts].