Seasonal hydrologic extremes in the form of droughts and wet spells have devastating impacts on human and natural systems. Improving understanding and predictive capability of hydrologic extremes, and facilitating adaptations through establishing climate service systems at regional to global scales are among the grand challenges proposed by the World Climate Research Programme (WCRP) and are the core themes of the Regional Hydroclimate Projects (RHP) under the Global Energy and Water Cycle Experiment (GEWEX). An experimental global seasonal hydrologic forecasting system has been developed that is based on coupled climate forecast models participating in the North American Multimodel Ensemble (NMME) project and an advanced land surface hydrologic model. The system is evaluated over major GEWEX RHP river basins by comparing with ensemble streamflow prediction (ESP). The multimodel seasonal forecast system provides higher detectability for soil moisture droughts, more reliable low and high f low ensemble forecasts, and better “real time” prediction for the 2012 North American extreme drought. The association of the onset of extreme hydrologic events with oceanic and land precursors is also investigated based on the joint distribution of forecasts and observations. Climate models have a higher probability of missing the onset of hydrologic extremes when there is no oceanic precursor. But oceanic precursor alone is insufficient to guarantee a correct forecast—a land precursor is also critical in avoiding a false alarm for forecasting extremes. This study is targeted at providing the scientific underpinning for the predictability of hydrologic extremes over GEWEX RHP basins and serves as a prototype for seasonal hydrologic forecasts within the Global Framework for Climate Services (GFCS).
A multimodel global hydrologic forecasting system that provides hydroclimate prediction services is evaluated to understand the predictability of seasonal hydrologic extremes over global major river basins.
Persistent hydrologic extreme events such as droughts and wet spells (rainfall anomalies with long durations) have devastating impacts on the human and natural systems and have caused total economic losses of about hundreds of billions of dollars in the United States (Smith and Katz 2013) and around the world (Below et al. 2007). Their occurrences are associated with anomalous atmospheric moisture transport that may be linked to variations of large-scale climate phenomena—for example, El Niño–Southern Oscillation (ENSO), Pacific decadal oscillation (PDO), and Atlantic multidecadal oscillation (AMO)—through ocean–atmosphere teleconnections (Cayan et al. 1999; Hoerling and Kumar 2003; McCabe et al. 2004). Their severities and durations are also influenced by land–atmosphere coupling that can enhance existing extremes (Hong and Kalnay 2000; Schubert et al. 2004). They may be further exacerbated by anthropogenic climate change (Diffenbaugh et al. 2013) and human water consumption (Barnett et al. 2008; Wada et al. 2013).
According to the latest Intergovernmental Panel on Climate Change (IPCC) report, agricultural and hydrological droughts are projected to increase in intensity and duration in presently dry regions by the end of this century under a business-as-usual scenario [representative concentration pathway (RCP) 8.5 scenario], while heavy rainfall events are very likely to increase over most of the midlatitude landmasses and wet tropical regions (IPCC 2013). Such changes impose an increasing risk of hydrologic extremes in the future. Improving understanding of the processes that lead to extremes and establishing operational predictive capabilities that can provide skillful and reliable forecasting of the frequency and intensity of extreme events on regional to global scales are therefore science imperatives of the World Climate Research Programme (WCRP) and are being considered as WCRP grand challenges (Karoly 2012; Giorgi et al. 2012).
As a core project of the WCRP, the Global Energy and Water Cycle Experiment (GEWEX; Morel 2001; www.gewex.org) is responsible for facilitating research on quantifying, understanding, and predicting global and regional energy and water variations and extremes through improved observations and modeling of the land, atmosphere, and their interactions, thereby providing the scientific underpinnings of climate services. One of three major research foci of GEWEX is to demonstrate skill in predicting changes in water resources on time scales up to seasonal and annual as an integral part of the climate system [in particular at the regional scale through Regional Hydroclimate Projects (RHP); Coughlan and Avissar 1996; Raschke et al. 1998; Stewart et al. 1998; GEWEX 2012]. Therefore, predicting hydrologic extremes at seasonal scales and investigating their predictability over global major river basins in an integrated hydroclimate forecasting system will help address the WCRP grand challenges and GEWEX RHP research focus.
Seasonal predictability originates primarily from tropical oceans via sea surface temperature (SST) anomalies. The SST anomaly over the tropical Pacific Ocean (i.e., ENSO) has global impacts (Shukla 1998; Goddard et al. 2001), the SST anomaly over the tropical Atlantic Ocean plays a role on the hydroclimate of the Sahel region (Camberlin et al. 2001), and the Indian Ocean dipole (Saji et al. 1999) can contribute to the predictability over Australia, Africa, and southern Asia, somewhat independently from ENSO (Zhao and Hendon 2009; Doblas-Reyes et al. 2013). Other sources of seasonal predictability could come from stratospheric condition (Ineson and Scaife 2009), soil moisture anomaly (Koster et al. 2011), and snow cover (Douville 2010), although these impacts are regional as compared with ENSO. With gradual improvements in observational data assimilation, computing resources that facilitate high-resolution numerical simulations, and the understanding of atmosphere–ocean–land physical processes that account for major seasonal predictability (e.g., ENSO), coupled atmosphere–ocean–land general circulation models (CGCMs) are now widely used for seasonal climate predictions (Weisheimer et al. 2009; Barnston et al. 2012; Kirtman et al. 2014). They have also shown improvement in predictive skill over the past decade, especially for large-scale climate features such as ENSO (Barnston et al. 2012) and the North Atlantic Oscillation (NAO; Scaife et al. 2014).
However, owing to deficiencies in land surface hydrologic parameterizations and/or land surface initializations of CGCMs, the hydrologic forecast products (e.g., soil moisture, runoff) from global seasonal prediction models cannot be directly used for applications. A typical solution is to bias correct the meteorological forecasts from the CGCMs and then drive advanced hydrologic models with refined initial land surface hydrologic conditions to produce seasonal hydrologic forecasts and extreme predictions (Wood et al. 2002; Luo and Wood 2007, 2008; Li et al. 2009; Mo et al. 2012; Yuan and Wood 2012a; Sinha and Sankarasubramanian 2013; Yuan et al. 2013a,b; Mo and Lettenmaier 2014b; Shukla et al. 2014). We refer to this as the CGCM-Hydrology forecasting approach.
Nevertheless, to our knowledge, most CGCM-Hydrology seasonal forecasting studies focus on the soil moisture and/or streamflow prediction over a single basin or continent, usually with a single CGCM (Luo and Wood 2008; Mo et al. 2012; Yuan et al. 2013b; Shukla et al. 2014), while a comprehensive investigation of predictability of global hydrologic extremes in a multi-CGCM framework has not been examined. The multi-CGCM framework not only provides a more reliable assessment of hydrologic predictability but also offers an opportunity to help quantify the uncertainty. Another limitation of previous studies is that most of them assess the forecast skill for extreme indices that blend extreme conditions with normal conditions (Quan et al. 2012; Yoon et al. 2012; Sohn et al. 2013) while ignoring the analysis of hydrologic predictability specifically for individual hydrologic extreme events—for example, the predictive skill for drought onset (Yuan and Wood 2013). Last, theoretical estimates of global hydrologic predictability indicate that initial hydrological conditions provide much of the potential hydrological predictability (Shukla et al. 2013; van Dijk et al. 2013; Yossef et al. 2013), depending on the location and season, but this has yet to be evaluated in terms of actual predictability within the CGCM-Hydrology framework.
This article presents the development and validation of a global seasonal hydrologic forecasting system that is based on multiple CGCMs participating in the North American Multimodel Ensemble (NMME) project (Kirtman et al. 2014) and the Variable Infiltration Capacity (VIC; Liang et al. 1996) land surface hydrologic model. An analysis of droughts and wet spells is carried out over global major river basins, and the CGCM-Hydrology approach is evaluated against the traditional ensemble streamflow prediction (ESP; Twedt et al. 1977) approach, which resamples from the historic record to provide an ensemble of meteorological forcings and relies on the persistence in initial land conditions to provide forecast skill. The association of the extreme event onset with antecedent oceanic and land conditions is discussed, based on the joint distribution of the forecast and observation.
THE POTENTIAL OF USING NMME FOR HYDROLOGIC APPLICATIONS OVER THE GEWEX HYDROCLIMATE PROJECT BASINS.
During the past four years, the Climate Forecast System, version 2 (CFSv2; Saha et al. 2014), developed by the National Oceanic and Atmospheric Administration (NOAA)’s National Centers for Environmental Prediction (NCEP) has been widely used for hydrologic applications (Yuan et al. 2011; Mo et al. 2012; Quan et al. 2012; Yoon et al. 2012; Yuan and Wood 2012a; Dirmeyer 2013; Yuan et al. 2013b,a; Kumar et al. 2013; Lang et al. 2014; Sheffield et al. 2014; Shukla et al. 2014; Tian et al. 2014). Within the NMME, CFSv2 has been shown to be the most reliable model for seasonal forecasting of global drought onset (Yuan and Wood 2013). But in terms of global gridded analyses, several studies find that combining CFSv2 with other climate forecast models can increase the predictive skill of precipitation (Yuan et al. 2011; Yuan and Wood 2012b; Becker et al. 2014; Kirtman et al. 2014), including extremes (Yuan and Wood 2013).
Here, we target our analysis on major global river basins that are the focus of the GEWEX RHP. Figure 1 shows the continuous ranked probability skill score (CRPSS; Wilks 2011; see appendix for details) for the seasonal mean, basin average precipitation, predicted by CFSv2 and NMME, using all hindcasts started at the beginning of each calendar month during 1982–2009. CFSv2 has 24 ensemble members, while NMME has 71 ensemble members, in total, from six climate models that are producing real-time seasonal climate forecasts. The six models are the National Center for Atmospheric Research (NCAR) Community Climate System Model, version 3 (CCSM3); Geophysical Fluid Dynamics Laboratory Climate Model, version 2.2 (GFDL CM2.2); National Aeronautics and Space Administration (NASA) Goddard Earth Observing System Model, version 5 (GEOS-5); NCEP CFSv2; Canadian Meteorological Centre (CMC) Third Generation Canadian Coupled Global Climate Model (CanCM3); and CMC CanCM4 [see Kirtman et al. (2014) for ensemble information and full references]. Figure 1 shows that NMME has higher probabilistic predictive skill than CFSv2 over 70% of the river basins selected in this study and has comparable skill to CFSv2 for the remaining basins. In particular, obvious improvement can be found over the Amazon and Parana basins in South America, the Colorado basin in North America, the Nile and Niger basins in Africa, the Yangtze and Pearl basins in China, and several high-latitude basins in Eurasia. Figure S1 [find this and more information online (http://dx.doi.org/10.1175/BAMS-D-14-00003.2)] shows the statistics for each season, suggesting that the improvement from CFSv2 to NMME is not necessarily limited to the dry/low flow periods, such as the Yangtze, Pearl, Mekong (East Asia), and Niger (West Africa) basins during June–August (JJA) and the Orange basin (South Africa) and Murray–Darling basin (Australia) during December–February (DJF), where the NMME shows improvement against CFSv2 in their corresponding wet seasons. The improvement over low latitudes originates from better representation of oceanic forcings in other NMME models (Kirtman et al. 2014), while at high latitudes the improvement may be related to the enhancement in reliability from the multimodel ensemble (Yuan and Wood 2013) or a better description of cold season processes. In fact, Fig. 1 in Yuan and Wood (2013) shows that several NMME models (e.g., GFDL CM2.2 and CMC CanCM4) have higher drought onset detectability than the CFSv2 over high latitudes, but the underlying physical reasons for the improvement are unclear. Further diagnosis is needed on the estimation of solid precipitation and the representation of cryospheric processes (e.g., snow and/or frozen soil).
The improvement of NMME against CFSv2 in basin precipitation prediction [more examples can be found in Kirtman et al. (2014)] provides an opportunity to advance the hydrologic forecast over most GEWEX RHP basins. However, the initial condition also has a strong control on the seasonal hydrologic predictability (Shukla et al. 2013; van Dijk et al. 2013), and its spatiotemporal variation can result in quite different hydrologic predictability as compared with the predictability of precipitation. Therefore, this article focuses on a comparison between NMME-based and ESP-based forecasts of hydrologic extremes and explores to what extent the meteorological forecasts from state-of-the-art climate forecast models can improve the hydrologic forecasts relative to the traditional approach that only relies on the information from hydrologic initial conditions.
PRINCETON’S GLOBAL SEASONAL HYDROLOGIC FORECAST SYSTEM.
The global system draws from a legacy of national and continental systems (Luo and Wood 2007; Sheffield et al. 2014) developed by the Terrestrial Hydrology group at Princeton University. Its U.S. Drought Monitoring and Forecast System (Luo and Wood 2007; Yuan et al. 2013b) operates over the conterminous United States (CONUS) at 1/8° resolution and utilizes climate predictions from NCEP’s CFSv2 and observations from phase 2 of the North American Land Data Assimilation System (NLDAS-2; Xia et al. 2012a). The system has been transitioned to the NCEP Environmental Modeling Center (EMC) for operational drought prediction (www.emc.ncep.noaa.gov/mmb/nldas/forecast/TSM/perc/). Recently, a continental system for African flood and drought monitoring and forecasting (Sheffield et al. 2014; Yuan et al. 2013a) has been developed and installed at regional centers in sub-Saharan Africa.
The central part of a hydrologic forecast system is the land surface hydrologic model. As shown in Fig. 2, the hydrologic modeling part of the global system consists of the VIC land surface model and a global routing model. The VIC model (Liang et al. 1996), version 4.0.5, is used to predict soil moisture and runoff in this study. It is a semidistributed, grid-based hydrologic model with a mosaic representation of land cover and soil water storage capacity. The global routing model utilizes topographic data to derive flow velocity and direction and translates runoff from the VIC grid cells to the river network and routes the flows to the oceans or internal basins, therefore estimating streamflow globally. This first-order approximation of river routing allows for a globally continuous estimate of streamflow that is extremely computationally efficient and therefore can be utilized for both hydrologic monitoring and ensemble forecasting. The VIC model is calibrated over major river basins with 1° global resolution, using monthly streamflow data from Global Runoff Data Centre (GRDC) as compiled by Dai et al. (2009). The streamflow gauge locations and drainage areas for the basins and length of the records are listed in Table S1 (online supplemental material). The calibration is done using the shuffled complex evolution algorithm (Duan et al. 1994) based on long-term (1952–81) global historical model simulations forced by meteorological data from the Princeton Global Meteorological Forcing dataset (PGF; Sheffield et al. 2006). Except for some basins with heavy water resources management such as the Yellow and Murray–Darling basins, the Nash–Sutcliffe efficiency coefficients for the VIC monthly streamflow simulation vary between 0.6 and 0.9 for the calibration period (1952–81) and 0.5 and 0.8 for the validation period (1982–2006).
A preprocessor component bias corrects the monthly precipitation and temperature hindcasts from each NMME ensemble member using a simple quantile-mapping method (Wood et al. 2002) in a cross-validation mode (leaving the target year out for the climatological distribution). For each calendar month and each NMME model, all hindcasts during 1982–2009 (excluding the target year) with all ensemble members for the target month are used to construct cumulative distribution functions (CDFs) of the forecasts. The 62-yr (1948–2010 excluding the target year) PGF observations in that calendar month are used to construct CDFs of the observations. Both CDFs of forecasts and observations are then transformed to normal probability space through quantile mapping (Wood et al. 2002). Both individual members and NMME multimodel ensemble mean in the normal space are then transformed back to the original space using normal observations as the reference distribution. Finally, the bias-corrected individual members are adjusted according to the bias-corrected NMME multimodel ensemble mean to increase the sharpness (Yuan and Wood 2013). This differs from Princeton’s CONUS forecast system, which uses a Bayesian merging method to correct bias (Luo and Wood 2008; Yuan et al. 2013b). The reason is that it is difficult to obtain stable weights for different climate models that can outperform the arithmetic mean from a short sample of hindcast data (Doblas-Reyes et al. 2005; Yuan and Wood 2012b), especially for extremes. The bias-corrected forecasts are temporally downscaled to a daily time step by sampling from the historic PGF dataset and rescaling to match the monthly forecasts (“weather” generator). The daily meteorological forecasts are used to force the VIC model and the river routing model to produce soil moisture and streamflow forecasts, based on initial conditions taken from the historical offline simulation. The postprocessor removes the probabilistic bias (Yuan and Wood 2012a) in the soil moisture and streamflow forecasts and calculates extreme indices based on percentile thresholds.
EVALUATION OF HYDROLOGIC HINDCASTS.
The hydrologic hindcasts start from the first day of each calendar month during 1982–2009 and run out to six months, with 71 members for NMME VIC. These are compared with hindcasts based on ESP, which samples random 6-month sequences of daily meteorological data from the PGF dataset that are used to force VIC, to give 20 ensemble members for ESP VIC.
For each calendar month, the monthly soil moisture and streamflow are converted into percentiles. Droughts are then defined as monthly soil moisture or streamflow percentiles that are less than 20%, while wet spells are for those larger than 80%. The validation data for soil moisture and streamflow are from the VIC historical simulation.
Figure 3 shows the hit rate of 3-month soil moisture (agricultural) drought from ESP VIC and NMME VIC hindcasts at different lead times over the hindcast period (1982–2009). The hit rate is the fraction of actual events that are predicted (the detectability; see appendix for details). The initial hydrologic conditions (as shown by the ESP VIC results) have a strong control on agricultural drought over the Northern Hemisphere, high-latitude basins, and two of the African basins (Fig. 3, left column). The detectability is not negligible (e.g., >0.2) even in some midlatitude basins at short leads. For example, more than 30% of agricultural droughts over the Mississippi basin can be detected by ESP VIC one season ahead (Fig. 3, top-left panel), and the detectability can be as high as 50% during the dry season (Fig. S2, top-left panel). Unlike the precipitation forecast (Fig. 1), NMME VIC’s agricultural drought detectability is not necessarily limited to ENSO-affected regions and is actually a compromise between climatic and hydrologic predictability. For instance, NMME has high precipitation predictive skill over the Amazon basin (Fig. 1) because of the ENSO influence, but NMME VIC has low agricultural drought detectability because of lower predictability from the initial hydrologic conditions (Fig. 3). In contrast, NMME VIC has high detectability over high-latitude basins where the influence of initial condition is strong (Fig. 3), regardless of the low precipitation predictive skill (Fig. 1). As a result, NMME VIC improves drought detectability compared to ESP VIC over basins with moderate control from both remote large-scale oceanic conditions and local initial hydrologic conditions, which are mostly located in midlatitude areas (e.g., Mississippi and Yangtze). For those midlatitude basins, NMME VIC has consistently higher detectability than ESP VIC throughout different seasons, although both methods have higher detectability during dry seasons than during wet seasons (Fig. S2). Similar results can be found for wet spells (not shown), but in general wet spells are less detectable than droughts because of less influence from the initial condition and lower skill in precipitation prediction.
Hit rate (detectability) is one aspect of forecast quality. However, as indicated by Yuan and Wood (2013), the models with high meteorological drought hit rates usually have high false alarm ratios (the fraction of “yes” forecasts that turn out to be wrong; see the appendix for details). Therefore, a balanced index, the equitable threat score (ETS; Wilks 2011; see appendix) that considers both the hit rate and the false alarm ratio is used to quantify the contribution of NMME beyond ESP for the predictive skill of soil moisture extremes. Figure 4 shows the maximum forecast leads for which NMME VIC has significantly (p < 0.05) higher ETS than ESP VIC for the prediction of droughts and wet spells. The Student’s t test is used here, and samples come from ETS for each grid cell within the basin. For droughts that last for at least 1–2 months, NMME VIC prediction is significantly better than ESP VIC up to 3–6 months over the basins with short soil moisture memory, such as tropical and midlatitude basins (Figs. 4a,b). For the basins with long soil moisture memory, NMME VIC outperforms ESP VIC only if the NMME has high predictive skill for the precipitation (CRPSS > 0.1; Fig. 1), such as the Nile basin in Africa. For the Niger basin in West Africa with long soil moisture memory, NMME only has moderate precipitation predictive skill (Fig. 1), and so NMME VIC does not have significantly higher ETS than ESP VIC (Figs. 4a,b). For 3-month droughts, NMME VIC also has significantly higher ETS than ESP VIC up to 2–3 months for more than 40% of the basins.
The differences for wet spells (Figs. 4d–f) are much smaller than for droughts, which are expected because of lower skill of the climate models in predicting wet conditions. Similar to drought, the differences between NMME VIC and ESP VIC usually decrease with an increase of the wet spell duration. Some of the exceptions to this are the Columbia and Colorado basins in North America, and the Murray–Darling in Australia, where NMME’s advantage emerges as the duration increases.
The results of maximum forecast leads for individual seasons are shown in Fig. S3. Because of the insufficient number of samples for the 2- and 3-month droughts or wet spells, only the results for 1-month duration are plotted. Regardless of the season, NMME VIC drought prediction is significantly better than ESP VIC up to six months over the Mississippi basin. Even for the wet spells, the NMME VIC prediction has significantly higher skill than ESP VIC up to 4–5 months over the Mississippi during wet seasons [March–May (MAM) and JJA].
Given that several seasonal forecast systems based on CFSv1 or CFSv2 have been developed over the United States (Luo and Wood 2008; Mo et al. 2012; Yuan et al. 2013b), Fig. S4 compares the NNME/VIC system with the CFSv2/VIC for drought prediction. Note that CFSv2/VIC in this study has the same resolution as NMME VIC (i.e., 1°), which is coarser than previous studies. Actually a high-resolution (1/8°) NMME VIC system is currently being developed at Princeton University by following the work of Yuan et al. (2013b). Similar to Fig. 4, Fig. S4 shows the maximum forecast leads when CFSv2/VIC is used as the reference forecast. There is no significant difference over the Columbia basin. The NMME VIC prediction is significantly better than CFSv2/VIC up to two months over the Colorado basin for 1- and 2-month droughts and over the Mississippi basin for 2-month droughts.
The streamflow forecasts at the outlet of each basin are also assessed. Figure 5 shows the CRPSS of monthly streamflow predicted by NMME VIC, using ESP VIC as the reference forecast, and Fig. S5 shows the results for different seasons. NMME VIC is generally more skillful than ESP VIC over most regimes. However, the biggest improvement does not necessarily occur in the first month of the forecast (e.g., Amazon, Yangtze, and Murry–Darling). Again, this is another example of the compromise between predictive skill derived from the climate forcing and the initial hydrological conditions. The effects of initial condition on streamflow over these large basins are expected to be larger than the effects on grid-scale runoff because of the memory from upstream areas. This sometimes results in a negligible difference between NMME and ESP skill for the streamflow forecast at the beginning of the forecast, despite that the precipitation predictive skill of NMME is significantly higher than ESP at short leads.
Figure 6 shows the probabilistic forecast quality for low, normal, and high flow conditions, using hindcast samples from all basins. ESP VIC has reliable predictions (results fall along the diagonal lines) for hydrologic droughts and wet spells (red and blue lines, respectively) at 0.5-month lead, but it becomes overconfident as the forecast proceeds. Although the ESP forcings (climatological precipitation and temperature) are very reliable, they are not necessarily reliable for predicting hydrologic extremes. This is especially true for forecasts that start with anomalously dry or wet initial conditions, for which the ensemble of historical forcings tend to bring the hydrological states to neutral conditions, thus degrading the reliability for extremes. In contrast, NMME VIC maintains the reliability for drought and wet conditions much better, even out to 5.5-month lead. Similar to the 2-m temperature forecasts illustrated in Kirtman et al. (2014), the reliability for neutral streamflow conditions is more difficult to maintain for longer forecast lead times (Fig. 6, gray lines).
REAL-TIME FORECASTING OF THE 2012 CENTRAL U.S. DROUGHT.
The 2012 summertime drought over the central United States was the most severe seasonal drought in the past 100 years (Hoerling et al. 2014). Most seasonal climate forecast models including CFSv2 failed to predict well the meteorological drought (Kumar et al. 2013; Hoerling et al. 2014). However, some NMME models such as GFDL CM2.2 did capture the 2012 drought (Kam et al. 2014). Kirtman et al. (2014) also showed that a NMME-based 6-month standardized precipitation index that blends antecedent observations with the seasonal forecasts had some skill for the 2012 drought. Here we test the capability of the NMME-based hydrologic forecast system in predicting the 2012 agricultural drought as an extension of the previous results for meteorological drought forecasts (Hoerling et al. 2014; Kirtman et al. 2014).
The approximate “real-time” forecast is done by bias correcting the NMME climate forcings for 2012 using all hindcast data during 1982–2009, which differs from the cross-validation mode of leaving out the target year during the hindcast period. Real-time observational data are used to run the VIC model up to the start of the forecast to produce the initial conditions. The real-time observational data are taken from the Climate Prediction Center (CPC) Unified Gauge-Based Analysis for precipitation (Chen et al. 2008) and the Climate Forecast System Reanalysis (CFSR; Saha et al. 2010) for other meteorological variables. These are used to extend the PGF data after 2010 and are adjusted to match the monthly climatology (1948–2010) of PGF through quantile mapping. The bias-corrected data are then used to force the VIC model from 2011 to 2012 to generate the initial conditions as well as reference soil moisture data to evaluate the forecasts. Note that an operational real-time forecast would be subject to biases in the real-time meteorological forcings that are likely to be high in regions with sparse gauge networks.
Figure 7 shows the 6-month soil moisture drought area forecasts for 20 ESP VIC members and 71 NMME VIC members initialized on two dates: February 2012 and June 2012. Before the drought onset, ESP VIC has some skill in the first two months (February and March forecast in Fig. 7a). After March, almost all ESP VIC ensemble members underestimate the drought area during 2012, especially during the summer when the drought is quite severe. The NMME VIC grand ensemble encompasses the evolution of the reference drought area (solid black line) quite well (Fig. 7b), and the ensemble mean (blue line) is also much closer to the reference drought area than ESP VIC. Nevertheless, the ensemble mean of NMME VIC shows an earlier drought recovery than the reference data, which indicates the difficulty of predicting drought recovery.
Besides evaluating the 2012 drought forecast with the VIC offline simulation, we also compared the forecast with two satellite-based estimates of the drought: the multidecadal (1979–2013) essential climate variable for soil moisture (ECV_SM) dataset that homogenizes and merges six microwave-based satellite soil moisture retrievals (Liu et al. 2011; Dorigo et al. 2015) and the 11-yr (2003–13) Gravity Recovery and Climate Experiment (GRACE) terrestrial water storage dataset (Wahr et al. 2004). Figure 7 shows that the ECV_SM (dashed black lines) matches the VIC offline simulation (solid black lines) quite well before the drought onset, but they diverge slightly as the drought emerges. This difference can be attributed in part to the representative depth of the satellite soil moisture retrieval, which is for a very thin surface layer (∼1 cm) due to the frequency of the sensors, and to the larger errors in more densely vegetated regions for the retrieval. The GRACE data (plus symbols) have larger seasonal variations and show a larger drought area than both the VIC simulation and the ECV_SM, because it represents changes in total water storage that includes surface water bodies (e.g., lakes and reservoirs) and groundwater, which are not represented by other datasets. Additionally, GRACE has a short climatology (11 years) that contributes to a larger uncertainty in its seasonal climatology from which the percentiles are estimated. Nonetheless, the satellite data provide useful information for validating the hydrologic forecasts, especially if their corresponding time scales and uncertainties are well understood.
OCEANIC AND LAND PRECURSORS FOR HYDROLOGIC EXTREMES.
Because of chaotic nature of the atmosphere, seasonal prediction relies heavily on the memory imparted by both the ocean and land, as do the predictions of hydrologic extremes. For instance, ENSO is recognized as the largest source of seasonal predictability, and tropical SST anomalies not only alter the Walker circulation and convection in the tropics because of the positive feedbacks between SSTs and wind (Walker and Bliss 1932; Smith et al. 2012) but also affect the climate in midlatitudes through Rossby wave trains (Hoskins and Karoly 1981; Trenberth and Caron 2000). To investigate the impact of ENSO on hydrologic prediction over the GEWEX basins, differences in composite soil moisture percentiles between selected El Niño and La Niña years (i.e., average soil moisture percentiles in El Niño years minus those in La Niña years) during 1982–2009 are shown in Fig. 8. The selected years are according to Smith et al. (2012), which are classified by using a detrended 100-yr SST time series. NMME VIC reproduces the ENSO influence on soil moisture during wintertime very well (Figs. 8a,b) but underestimates its impact over the North American monsoon and East Asian monsoon regions during summertime (Figs. 8e,f). In general, the responses of seasonal soil moisture to ENSO are roughly captured by the CGCM-Hydrology forecast system. There are moderate differences for the summertime composite among individual models: GFDL CM2.2 and two Canadian models are better over North American basins, while NCEP CFSv2 is better over East Asian basins (not shown).
Besides ENSO, the initial soil moisture is also thought to influence both the potential and actual subseasonal to seasonal climate predictability via land–atmosphere coupling (Koster et al. 2004, 2006, 2010). However, the association of oceanic and land precursors with model performance for individual hydrologic extreme events at the global scale is still unclear. Here we investigate the ENSO and soil moisture associations based on the joint distribution of the forecast and observation for the onsets of droughts and wet spells. We would like to answer the following question: What are the probability distributions of antecedent oceanic and land conditions for hit cases (observed extreme events that are captured by the models) and for false alarms (forecasted extreme events that do not occur in the observation)? The onset events of droughts or wet spells are defined as three continuous months when the soil moisture percentile is consistently below 20% or above 80%, respectively.
Figure 9 shows the spatial frequency of conditional mean, antecedent Niño-3.4 SST absolute anomaly and the initial soil moisture percentiles calculated over the GEWEX basins. For example, the green lines (from six NMME models) in Fig. 9a represent the frequency distributions of seasonal mean Niño-3.4 SST (three months before the onset of extreme events) averaged over those forecast events where the models issue a soil moisture drought onset forecast (fcst = T) but drought does not occur in the observation (obs = F), where T and F represent that drought occurs or does not occur, respectively, either for the forecast (fcst) or observation (obs). Red and blue curves are for detected and missed events, respectively. The higher peaks of the blue curves (fcst = T) compared to those of the green and red curves (fcst = F) around small SST anomaly values indicate that climate models have a higher chance of missing the agricultural drought onset when the antecedent SST anomaly is smaller (Fig. 9a). As the SST anomaly increases, climate models have a higher chance of issuing a drought forecast than missing a drought (i.e., the red and green curves show higher frequency than the blue curves), but there is also a higher chance of a false alarm (green curves). This is similar to the meteorological drought analysis in Yuan and Wood (2013). The association for the wet spell onset forecast (Fig. 9b) is similar to the drought onset, but different models have moderate differences for the false alarms (green curves).
This asymmetric performance for predicting soil moisture droughts and wet spells is more obvious in the analysis of land precursors (Figs. 9c,d). The spread of initial soil moisture percentiles for model-predicted drought events (red and green curves, Fig. 9c) is smaller than that for wet spell events (Fig. 9d), suggesting that there is less dependence of wet spell onset forecast on the initial land conditions than for the drought onset. In fact, as the drought occurs, initial soil moisture memory (anomaly) could persist for a period of time, while when a wet spell occurs, an individual rainfall event can sometimes erase all soil moisture memory. This interacts with the atmospheric asymmetry mentioned above and amplifies the difference between dry and wet conditions. Figures 9c,d also demonstrate that the missed drought (wet spell) events (blue curves) are associated with higher (lower) initial soil moisture (i.e., less information from the land precursor). Therefore, some droughts and wet spells occur without clear SST and soil moisture precursors (e.g., the 2012 central U.S. drought), and they are the most difficult to predict at seasonal time scales. The differences in the red and green curves in Figs. 9c,d are larger than those in Figs. 9a,b, suggesting that the oceanic precursor facilitates a higher probability that the climate models issue an extreme forecast (although it is sometimes difficult to determine whether the forecast is correct or a false alarm), while the land precursor will reduce the probability of false alarms.
BEYOND THE NORTH AMERICAN MULTI-MODEL ENSEMBLE.
Phase 2 of the NMME project (Kirtman et al. 2014) will provide higher-temporal-resolution datasets (e.g., three hourly) with more variables besides precipitation, 2-m surface air temperature, and SST. This will enable a more comprehensive diagnostic study that can provide feedback to model development. Nevertheless, in terms of applications (e.g., hydrologic forecasts), there are other concerns. As pointed out by Yuan and Wood (2012b), six of the seven original NMME models (without the two Canadian models) use the ocean model developed at GFDL, which to some extent may result in similarity or overconfidence in the seasonal climate forecasts. While combining those seven models does not gain much predictability in terms of a deterministic forecast, skill can be increased by including European models, which are considered to be more independently developed (Yuan and Wood 2012b).
To explore this, we briefly evaluate the benefit of combining the NMME models with those in the Climate-System Historical Forecast Project (CHFP; Kirtman and Pirani 2009). Figure 10 shows the CRPSS for basin-averaged, May–July (MJJ) mean precipitation predicted by NMME and NMME+CHFP. Because of data availability during 1982–2009, the CHFP models used here only include the European Centre for Medium-Range Weather Forecasts (ECMWF) Seasonal Forecast System 4 (S4; Molteni et al. 2011; Dutra et al. 2013), two models from Japan [Meteorological Research Institute Coupled Atmosphere–Ocean General Circulation Model, version 3 (MRI-CGCM3) and Model for Interdisciplinary Research on Climate, version 5 (MIROC5)], one model from Germany [Max Planck Institute Earth System Model (MPI-ESM)], and one from Australia [Predictive Ocean Atmosphere Model for Australia (POAMA-2)]. Figure 10 shows that including CHFP does not improve the MJJ precipitation prediction over the basins in North America (except for Columbia, but MJJ is a transition season between the snow-dominated winter and mostly dry summer), suggesting that the NMME is the best multimodel ensemble in predicting hydroclimate over North America. Nevertheless, improvement over the basins in East Asia and Australia is not negligible (CRPSS difference larger than 0.05) in basins such as the Yangtze, Mekong, Ganges, and Murray–Darling. Therefore, increasing the number of international models (and presumably the level of international collaboration) may be necessary to advance hydrological forecasting at the global scale.
An alternative multimodel ensemble is the ensemble of multiple land surface hydrologic models. In fact, work over the past decade has shown that hydrological models, even when forced with identical atmospheric boundary conditions, can produce results that are substantially different (Dirmeyer et al. 2004; Mitchell et al. 2004; Duan et al. 2007; Wang et al. 2009; Xia et al. 2012b). Recently, Mo and Lettenmaier (2014a) have reported the challenge of applying hydrologic models in monitoring the droughts with different severity categories, and Nijssen et al. (2014) have developed a prototype global drought information system based on multiple land surface models. These studies suggest that augmenting the NMME-based multimodel hydrologic forecasting system with multiple land surface models would enhance its capability in handling the hydrologic extremes with different severity levels.
A global seasonal hydrologic forecasting system based on the NMME climate forecast models and VIC land surface hydrologic model has been established, and its performance against the traditional ESP forecast approach in predicting droughts and wet spells is assessed over the GEWEX RHP basins for a 28-yr hydrologic hindcast experiment and a “real time” case study. The ESP forecast skill relies on the information from the initial hydrological conditions, and so the comparison between the output of the NMME VIC and ESP VIC provides an opportunity to quantify the origin of hydrological predictability from the ocean and land states.
NMME VIC improves drought detectability against ESP VIC mostly over midlatitude basins where the controls of both remote large-scale oceanic states and local initial hydrological conditions are moderate. It is found that NMME VIC has significantly (p < 0.05) higher ETS values than ESP VIC up to 3–6 months over basins with short soil moisture memory. In terms of the streamflow forecasts, the NMME VIC is superior to ESP VIC for accuracy and reliability, and the biggest improvement does not necessarily occur in the first month of the forecast. A real-time forecasting of the 2012 central U.S. drought shows that none of the ESP VIC ensemble members is able to forecast the drought onset; however, the NMME VIC grand ensemble covers the evolution of drought area quite well, with an ensemble mean closer to the reference data. The association of the onsets of extreme hydrologic events with oceanic and land precursors is also investigated on the basis of the joint distribution of the forecast and observation. Climate models have a higher probability of missing the onset of hydrologic extremes when the antecedent SST anomaly is smaller. Larger SST anomalies offer a higher probability for the models to issue a forecast for extremes but also bring higher probability of issuing a false alarm. The probability of such a false alarm can be reduced if there is a large anomaly in land surface conditions.
Overall, the global hydrologic forecast system established in this study shows encouraging performance when compared with the ESP approach for predicting hydrologic extremes, such as higher detectability for historical soil moisture droughts, more reliable streamflow ensemble forecasts for low or high flow conditions, and better prediction for the 2012 North American extreme drought in a real-time forecast mode. The system also shows the potential for successfully utilizing climate models to advance GEWEX RHP. A website is being established to make real-time hydrological forecasts available, drawing from the existing Princeton CONUS and African monitoring and seasonal forecast websites (http://hydrology.princeton.edu/forecast; http://hydrology.princeton.edu/adfm). Linking the real-time forecasting of soil moisture and streamflow with impact models for predicting reservoir inflow, crop yield, and wild fire, etc. will amplify the usefulness of the system. Therefore, the NMME VIC system can serve as a prototype system for the Global Framework for Climate Services (GFCS), both in providing hydroclimate information services and in contributing to the science underpinning the prediction and predictability of the terrestrial hydrologic systems, including droughts and wet spells.
However, initial tests also indicate that the superiority of climate model-based streamflow forecasts tend to diminish when using the observed streamflow data for validation. There are a number of reasons for this, which include the often poor representation of water resources management (e.g., reservoir operation, irrigation) in land surface models and inadequate process parameterization in the hydrological models. Examples of the latter include surface–subsurface interactions; regional to continental surface water transportation (river routing); insufficient parameters for soil and vegetation properties; and inadequate hydrologic model, which includes calibration that requires either statistical postprocessing (e.g., Yuan and Wood 2012a; Ye et al. 2014), or an ensemble of multiple land surface hydrologic models, and/or model improvements, especially for simulating water resources managements (Jaranilla-Sanchez et al. 2011; Wang et al. 2012). Uncertainties in the observed forcings (e.g., precipitation) can also be significant over basins with sparse in situ observations, which may be a nontrivial problem when implementing a hydrologic forecast system globally.
With the planned release of higher-temporal-resolution NMME datasets (phase 2) with more variables besides monthly precipitation, 2-m temperature, and SST that will also include land surface conditions as well as pressure level atmospheric variables, there will be opportunities to diagnose more completely individual model performance that influence seasonal extreme predictability (e.g., stationary waves, land–atmosphere coupling) and provide feedback to model development. Furthermore, the benefit of incorporating climate models from international centers for improving hydrological predictability globally calls for an international ensemble seasonal prediction system.
The research was supported by the NOAA Climate Program Office through Grants NA10OAR4310246 and NA12OAR4310090. The first author also acknowledges the Thousand Talents Program for Distinguished Young Scholars. We thank the International Research Institute for Climate and Society (IRI) for making the NMME forecast information available. We thank three anonymous reviewers for their comments, Antje Weisheimer for providing the link to the CHFP data, and Ming Pan for introducing the ECV_SM data. We acknowledge WCRP CLIVAR WGSIP and CIMA for the CHFP data and acknowledge PICSciE OIT at Princeton University for the supercomputing support.
APPENDIX: HIT RATE, FALSE ALARM RATIO, ETS, AND CRPSS.
The nonprobabilistic forecasts for discrete predictands (e.g., a drought event) can be verified by several measures that are based on a 2 × 2 contingency table. Taking the drought event forecast as an example, define a as the number of events when drought occurs in both the forecast and observation, b for when drought occurs in the forecast but not in the observation, c for when drought occurs in the observation but not in the forecast, and d for when drought does not occur in either the forecast or observation. Then, the hit rate is
where it is also called the probability of detection. The false alarm ratio is
where it is fraction of “drought” forecasts that turn out to be false. The ETS is
where aref = (a+b)(a+c)/(a+b+c+d).
The probabilistic forecasts for continuous predictands (e.g., precipitation) can be verified through the CRPSS. First, the CRPS is defined as
where F(y) is the CDF of the forecast with a predictand value of y and Fo(y) is the CDF of the observation and
Then, the CRPSS is defined as
where CRPSref is the CRPS from the reference forecast (e.g., the climatological forecast used in this study). So, a value of CRPSS = 0.2, for example, indicates that the probabilistic forecast error is 20% less than the climatological forecast error.
A supplement to this article is available online (DOI:10.1175/BAMS-D-14-00003.2)