We examine reforecasts of flash droughts over the United States for the late spring (April–May), midsummer (June–July), and late summer/early autumn (August–September) with lead times up to 3 pentads based on the NOAA second-generation Global Ensemble Forecast System reforecasts version 2 (GEFSv2). We consider forecasts of both heat wave and precipitation deficit (P deficit) flash droughts, where heat wave flash droughts are characterized by high temperature and depletion of soil moisture and P deficit flash droughts are caused by lack of precipitation that leads to (rather than being the cause of) high temperature. We find that the GEFSv2 reforecasts generally capture the frequency of occurrence (FOC) patterns. The equitable threat score (ETS) of heat wave flash drought forecasts for late spring in the regions where heat wave flash droughts are most likely to occur over the north-central and Pacific Northwest regions is statistically significant up to 2 pentads. The GEFSv2 reforecasts capture the basic pattern of the FOC of P-deficit flash droughts and also are skillful up to lead about 2 pentads. However, the reforecasts overestimate the P-deficit flash drought FOC over parts of the Southwest in late spring, leading to large false alarm rates. For autumn, the reforecasts underestimate P-deficit flash drought occurrence over California and Nevada. The GEFSv2 reforecasts are able to capture the approximately linear relationship between evaporation and soil moisture, but the lack of skill in precipitation forecasts limits the skill of P-deficit flash drought forecasts.
Flash droughts have received considerable attention since the rapidly evolving 2012 central U.S. event (Hoerling et al. 2014), and subsequent events in 2017 over the northern Great Plains, and 2019 over the southern states. While much remains to be learned about flash droughts (Pendergrass et al. 2020), a key feature is their rapid intensification. The quick onset and tendency in the United States to occur over the Great Plains and southern tier of states (Mo and Lettenmaier 2015, 2016) can cause large agricultural losses; for example, the 2017 event that lasted from mid-May to June resulted in an estimated $2.6 billion in agricultural losses (Jensco et al. 2019). The 2019 flash drought that lasted from 16 July to 12 August also demonstrated linkages between rapid drying of vegetation and subsequent wildfires (Di Liberto 2019).
Flash droughts differ from conventional drought which is characterized by the persistent lack of precipitation P, accompanied by soil moisture (SM), and/or runoff deficits, usually for 6 months or longer (Svoboda et al. 2002). Flash droughts have much shorter durations—typically a few weeks. Furthermore, while conventional droughts develop slowly, a key feature of flash droughts is their rapid onset and intensification (Pendergrass et al. 2020).
Mo and Lettenmaier (2015, 2016) studied flash droughts over the United States, and classified them into two categories based on their forcings: heat wave flash drought (Mo and Lettenmaier 2015) and precipitation deficit (P deficit) flash drought (Mo and Lettenmaier 2016). Heat wave flash droughts are initialized by high surface air temperatures Tair that cause evaporation (ET) to increase—especially in the Pacific Northwest and the north-central United States. That, in turn, leads to decreased SM (Mo and Lettenmaier 2015). In heat wave flash droughts, temperature Tair is the major forcing, and SM and ET respond.
Precipitation-deficit flash droughts are caused by the lack of precipitation, which in turn leads to a decrease of SM. In areas where SM and ET have a near-linear relationship, ET decreases as a response to decreased SM (Koster et al. 2009). Sensible heat increases to balance the decreases of ET. That in turn leads to high Tair. This type of flash drought is most prominent in the southern United States and the Great Plains (Mo and Lettenmaier 2016). In P-deficit flash droughts, dry conditions are the driving force and high Tair is a response. Recently, Wang and Yuan (2018) used definitions of flash droughts similar to those in Mo and Lettenmaier (2015, 2016) to study flash droughts over China and their connections with conventional drought.
A number of alternative definitions of flash drought have been developed. Ford and Labosier (2017) defined flash drought as a decrease in soil moisture from 40% to below 20%, which is the threshold for category D2 drought according to the U.S. Drought Monitor (USDM; Svoboda et al. 2002) over a period of 20 days. Later, Koster et al. (2019) adopted a similar definition of flash drought. Because rapid onset is a key aspect of flash drought (Pendergrass et al. 2020), some definitions of flash drought are based on the rapid rate of intensification of an index (e.g., Otkin et al. 2018) together with a soil moisture requirement. The index can be satellite derived ET based on remote sensing. For example: Otkin et al. (2018) proposed using the rapid change index (RCI; Otkin et al. 2013; Otkin et al. 2014), which encapsulates the magnitudes of moisture stress changes over a few weeks when soil moisture percentiles are less than 20%. Chen et al. (2019) studied flash drought events based on the USDM and related flash drought occurrence to cold ENSO events.
Definitions of flash drought may differ, but many include abnormally high Tair that can be either a forcing or response and is associated with abnormally high ET and a decrease of SM. Furthermore, as indicated by Pendergrass et al. (2020) a key aspect of flash drought is its rapid intensification, While the quick onset can cause large agricultural losses as noted above, on the other hand, the rapid evolution by most definitions, on the order of a week or two, suggests that there may be the potential to predict flash droughts, which could help to mitigate, at least partially, their consequences.
Pendergrass et al. (2020) documented the physical processes that produce flash drought. Flash drought often involves P deficits and high Tair which start before or coincide with a rapid soil moisture decline in subseasonal time scales. This suggests that Tair and P are the main forcings of flash drought, and accurate forecasts of these two variables will be the key to forecast flash droughts. ET plays an important role because it serves as a bridge which provides feedbacks between land and atmosphere, and also controls the rate of change of SM.
Notwithstanding a rapidly evolving body of research on the topic, there is no commonly accepted definition of flash droughts. Pendergrass et al. (2020) proposed, for the United States, a definition based on change in the USDM categories, while Liu et al. (2020) argued that the rate of SM change is preferable to our heat wave and P-deficit definitions. Because the USDM is not objective, it is not clear how the Pendergrass et al. (2020) definition can be implemented in forecasts, although it could perhaps be linked to model-derived quantities such as soil moisture, given previous work by Chen et al. (2019) that linked USDM drought categories to soil moisture percentiles. The definitions suggested by Liu et al. (2020) and Pendergrass et al. (2020) could be readily adapted to the strategy we outline below and apply specifically to a set of medium-range weather reforecasts. In any event, given the lack of consensus surrounding flash drought definitions, we study flash drought predictability here in the context of heat wave and P-deficit flash droughts following Mo and Lettenmaier (2015, 2016).
2. Flash drought forecasts
As indicated by Pendergrass et al. (2020), prediction of flash drought is a challenge because of their rapid onset, and the fact that most land–atmosphere coupled models do not predict land atmosphere interactions well. An alternative is to use statistical methods; for instance, Otkin et al. (2015) predicted flash drought intensification probabilities derived from their RCI. Here, we prefer to link our forecasts to an operational medium range weather forecast model, linked with an offline macroscale hydrologic model to produce the combination of physical (P and Tair) and hydrologic (primarily SM and ET) variables needed to produce forecasts of both heat wave and P-deficit flash droughts. We then compare the (ensemble) forecasts with equivalent quantities reproduced by observations (analysis) by driving the same hydrologic model by observed P and Tair. While our skill evaluation depends on the definition of flash droughts, our strategy is readily adaptable to other definitions so long as the requisite quantities are reproduced by either the weather forecast model and/or the hydrology model. In particular, we use the Variable Infiltration Capacity (VIC) land hydrology model (Liang et al. 1994), although any similar land surface model could be used.
The forecast skill of medium range weather forecasts (typically defined as having lead times about 2 weeks or less) has improved in recent years (Hughes 1992; Novak et al. 2014). For example, the U.S. Weather Prediction Center (https://www.wpc.ncep.noaa.gov/html/scorcomp.shtml) showed the mean average error for day 7 minimum (maximum) temperature improved from 3.1°C (3.0°C) in 1998 to 2.4°C (2.8°C) in 2019. While P forecast skill is lower than temperature, Novak et al. (2014) showed that short lead (up to about 3 days) P forecast skill improved substantially over the period 1960–2012. On the other hand, Hamill et al. (2004, p. 1434) state that “skill has not improved much despite the investment in large new computers and despite the millions of person hours invested in model development.” Our own experience is that the skill of P forecasts at monthly lead times generally is poor (Mo and Lettenmaier 2014). Most of the skill in forecasts of SM and other related land surface hydrologic variables, such as runoff, comes from SM or runoff initial conditions. Shukla et al. (2012) found that replacing ensemble streamflow prediction (ESP)-based forecasts of SM and runoff, which are essentially model output forced with resampled climatology, with forecasts of P and Tair from NOAA’s Medium Range Weather Forecast (MRF) model for the first 15 days improved forecast skill for SM and runoff for 1-month leads. This implies that forecasts of both Tair and P have some useful skill and may be a good candidate for flash drought prediction at the scale of a few pentads (5-day means), notwithstanding that most of the P forecast skill likely is in the first pentad. We note that our use of pentads here follows our previous work (Mo and Lettenmaier 2015, 2016) and is consistent with the scale of evolution of flash droughts—typically well less than 1 month.
Skillful forecasting of Tair and P does not necessarily imply high forecast skill of flash droughts. For both heat wave and P-deficit flash droughts, the land surface model (VIC in our case) needs to be able to capture the physical mechanisms associated with flash droughts, and in particular the linkages between (observed) P and Tair and model-derived SM and ET. For heat wave flash droughts, the linkage between high temperatures and ET and in turn the relationship of ET with SM needs to be captured. For P-deficit flash drought, the model representation of the relationship between SM and ET is especially important.
The MRF model (which we use in a reforecast mode to provide P and Tair forcings to VIC) has evolved in recent years to become the current (operational) Global Ensemble Forecast System (GEFS) for NOAA. We are interested in NOAA/NCEP forecasts both because of the potential to implement operationally flash drought forecast guidance, and because real-time forecasts are freely available. As with all operational models, a challenge is that the GEFS model changes with time both as model parameterizations improve and computing power increases to support finer spatial resolutions. Here, we required a stable forecast model so that our evaluation of forecast skill would not be time varying. Fortunately, ESRL/PSD has produced second-generation MRF reforecasts (Reforecast v2) (Hamill et al. 2013) using the 2012 version of GEFS. The dataset consists of an 11-member ensemble of forecasts, produced every day from 0000 UTC initial conditions. The reforecast period is December 1984 to present. We refer hereafter to the v2 reforecasts as GEFSv2. Given the above context, our objectives here are 1) to investigate whether GEFSv2 is able to capture the frequency of occurrence (FOC) and magnitude of either or both heat wave and P-deficit flash droughts, 2) to assess the forecast skill of flash drought events, and 3) to diagnose errors in reforecasts of flash droughts. These are important steps that could ultimately support development of a real-time forecast system for flash droughts.
In the remainder of the paper, we summarize the datasets and procedures used in section 3. In section 4, we assess the FOC of flash droughts in GEFSv2 reforecasts. In section 5, we evaluate flash drought forecast skill and diagnostics. We discuss some key aspects of our conclusions and sensitivities in section 6 and provide conclusions in section 7.
3. Datasets and procedures
Global forecasts from short- and medium-range weather forecasts (e.g., with lead times less than about 2 weeks) of daily Tair and P usually have spatial resolutions that are too coarse for hydroclimatic studies. Furthermore, the forecasts often have large systematic local biases. ET and SM usually are not archived by such operational models. To forecast flash droughts, we first need to correct systematic biases in Tair and P forecasts and then downscale them to a finer spatial resolution so a land surface model such as the VIC model (Liang et al. 1994) can be used to derive ET and SM. A bias correction and spatial downscaling procedure for doing so is outlined by Wood and Lettenmaier (2006). We follow that procedure here.
For analysis, Tair and P are forcing terms. For SM and ET, there are only sparse observations over the United States and virtually none with record lengths more than about 20 years, which is too short to derive reliable records directly. Therefore, we use a land surface model to obtain SM and ET. These quantities therefore depend on the model used. Model uncertainties with respect to the analysis fields we used were documented by Mo et al. (2012).
As noted above, the land surface model we used is the VIC model version 4.0.6 (Liang et al. 1994). We used the same model for both analysis and forecasts for consistency. The model setup is documented in Maurer et al. (2002). In the setup we used, the VIC model has three soil layers. The first layer has a depth of 0.1 m. The second layer has depth ranging from 0.2 to 2.4 m, and the third layer has depth varying between 0.1 and 2.5 m.
The VIC model was driven by forcings derived from Tair and P, and other variables such as downward solar and longwave radiation, humidity, and wind using methods described by Bohn et al. (2013). We used retrospective forcings from the near-real-time University of California, Los Angeles (UCLA) surface water monitor, which were derived from roughly 2400 index stations across the conterminous United States (CONUS) using procedures outlined in Wood and Lettenmaier (2006). The 10-m wind speed was taken from the Climate Data Assimilation System (CDAS; Kalnay et al. 1996). We ran the VIC model in a water balance mode. That means the surface temperature is taken to be equal to the surface air temperature for purposes of surface energy balance closure. The spatial resolution of the model was 0.5° latitude–longitude. Our analysis run was from 1 January 1915 through 31 December 2013. The long run avoided spinup issues; as described below, our forecast evaluation was for the more recent period 1985–2012, for which the GEFSv2 outputs from Hamill et al. (2013) are available. We labeled this run as analysis, which was also used for verification.
We archived pentad means of four variables: P, Tair, ET, and SM from the analysis run for the base period 1985–2012. We computed pentad standardized temperature, ET anomaly, and P and SM percentiles for the base period. We used these four variables to identify heat wave and P-deficit flash drought events.
b. GEFSv2 reforecasts
As noted above, the reforecast dataset is from Hamill et al. (2013), which is archived at Climate Prediction Center (CPC). The archived dataset has 1° latitude–longitude spatial resolution. For each initial date, there are a total of 11 ensemble members with one control run and a 10-member perturbation run each day. Each run lasts for 16 days, but we only used the first 15 days so as to form 3 pentads. We used the reforecast data for the period from 1985 to 2012, which provided an adequate basis for statistical analysis. For each run, the reforecast archive has P, Tmax, and Tmin forecasts daily over the CONUS domain at the 1° spatial resolution. We calculated Tair as the average of Tmax and Tmin.
For a given initial starting date and each lead, we formed the ensemble mean as the equally weighted average of the 11 ensemble members. To interpolate forecast variables to the VIC grid, we adopted the same approach outlined in Shukla et al. (2012), specifically we interpolated the reforecasts to 0.5° using the inverse squared distance interpolation scheme (Shepard 1984). We then bias corrected Tmax, Tmin, and P using a 45-day training period prior to the initial starting date (Alpert and Saha 1989).
The bias correction procedure was as follows: The corrections were determined from the ensemble mean of the 11 ensemble members. We determined the parameters for correction based on the differences between forecasts and analysis in the training period T, where T = 45 days. For a given initial target date Dy and a given lead L, we took the mean of reforecasts started from (Dy − L − 1) to (Dy − L − 1 − T). For example, if the initial date is 30 May 2000, then for lead 1 and T = 45, the training period is from 14 April to 28 May 2000. We also took the mean of the corresponding analysis from VIC for the same period and corrected the forecast mean with the analysis mean. We tested 20-, 30-, and 50-day training periods and found that results are not sensitive to the length of the training period as long as it is longer than about 30 days. The corrected values were equally distributed to each ensemble member.
To obtain ET and SM, we used the same implementation of VIC as in the analysis. To force VIC, we need wind speed in addition to the error-corrected Tmin, Tmax, and P. We used the climatological wind speed at 10 m obtained from the CDAS (Kalnay et al. 1996) for all members (note that, ideally, we would have used wind speed from the reforecasts, but this was not an archived variable). Other land surface model forcings (e.g., downward solar and longwave radiation and humidity) were not archived in the reforecast dataset, so instead we computed them using algorithms summarized by Bohn et al. (2013). In general, these algorithms index downward solar radiation to the daily temperature range, downward longwave radiation to the daily mean temperature, and the dewpoint hence vapor pressure deficit to the daily minimum temperature with a correction for arid areas. Application of these procedures resulted in all the forcings at a daily time step needed to drive the VIC model.
For each year, there are a total of 36 cases with initial conditions one pentad (5 days) apart from 1 April to 28 September, e.g., the initial date from 1 April, 6 April, …, to 28 September (5-day spacing) and each case has 11 ensemble forecasts of duration 16 days each, which is the duration of the GEFSv2 reforecasts. For each case and each ensemble member, the bias-corrected P, Tmin, Tmax, and climatological 10-m wind speed were used to derive forcings which were used to drive the VIC model to obtain daily ET and SM for each forecast period of length 16 days. Note that Livneh et al. (2015) showed that VIC is relatively insensitive to use of climatological versus time-varying wind speed. For each initial condition, each ensemble member, we had four forecasted variables (P, Tair, P, ET, and SM) to define flash droughts. From the daily forecast values (neglecting the last or sixteenth day), we computed 5-day (pentad) means (3 pentads for each forecast). We therefore had ensemble pentad means of P, Tair, ET, and SM from lead 1–3 pentads for each case.
To compute ET anomalies, standardized Tair anomalies, and SM and P percentiles, we performed cross validations. Using the 1 April forecasts as an example, for each lead pentad and each variable, we had 28 years of ensemble mean reforecasts starting from 1 April 1985 through 1 April 2012. To form anomalies and percentiles, we removed 1 year (Tg) from the 28 years and assigned it as the target year, with the training period as the remaining 27 years. For a given variable (P, Tair, ET, or SM), the climatology of each variable was determined from data in the training period and an anomaly is defined as the departure from that climatology. Similarly, we computed percentiles of a given variable for the target year using data in the training period to determine the profiles and distribution functions. In this way, we obtained ET anomalies, standardized Tair anomalies and P and SM percentiles for each pentad. We then selected flash drought events from the record, which were identified based on the same criteria as was used in the analysis.
c. Flash drought events
High temperature is major driver for heat wave flash drought, which requires that the Tair anomaly exceeds one standard deviation. In Mo and Lettenmaier (2015) we tested many different criteria for SM and P. In particular, we tested the criterion that the lack of P is a condition for flash drought concurrent with extreme temperature (Tair > 1 standard deviation, ET > 0, and P anomaly < 0 with no SM requirement). In those tests, we found that the FOC patterns remain the same, but magnitudes of FOC change depending on the specific thresholds. Based on those results, we require the ET anomaly to be positive and SM below 40% in addition to Tair > 1 standard deviation for heat wave flash drought events.
For P-deficit flash droughts, we tested four different criteria with different thresholds in Mo and Lettenmaier (2016). We tested case 1: P < 40%, ET < 0 with no Tair requirement. The other three cases tested have the basic requirements of ET and Tair anomalies: ET anomaly < 0; Tair > one standard deviation. The differences are the criteria associated with P and SM anomalies. We tested case 2: P < 40%; case 3: SM < 40%, and case 4: P < 20%. The results showed similar FOC patterns but that the magnitudes change with the criteria. We decided here to use case 2: P below 40%, Tair anomaly exceeding one standard deviation and ET anomaly being negative without specific requirement for SM (Mo and Lettenmaier 2016). ET change is not a requirement but the composites of ET anomalies from −2 pentads from onset to onset to 2 pentads after onset indicate intensification of ET from lag 1 pentad to onset (Mo and Lettenmaier 2016).
d. Forecast verification
We verified whether the forecasts were able to capture the pattern and magnitudes of the FOC of both types flash droughts by comparing the FOCs between analysis and forecasts. The FOC is defined as the total number of flash drought events over the data period divided by the length of data period.
We then examined whether the forecasts were able to capture individual flash drought events. We use a contingency table approach because the forecasts are dichotomous: flash drought, or no flash drought. This approach was recommended by the World Weather Research Programme (WWRP)/Working Group on Numerical Experimentation (WGNE) joint working group (https://www.wmo.int/pages/summary/progs_struct_en.html).
There are four possibilities: hit, miss, false alarm and correct negative. At a given lead and initial conditions, we considered a hit to have occurred when both analysis and forecast indicate a flash drought. It is a miss if the analysis indicates drought, but the forecast does not. It is a false alarm if the forecast indicates a flash drought, but the analysis does not. It is a correct negative (CN) if no flash drought occurs in both analysis and forecasts. CN is above 90% for the heat wave flash droughts and above 80% for the P deficit flash droughts over the areas that FOC > 1%. From the contingency table, we derived the equitable threat score (ETS), which we used to assess the forecast skill:
The ETS score is adjusted for hits associated with random chances. It has a range from −1/3 to 1. Below zero indicates no skill while a perfect score is 1.0. To examine whether the GEFSv2 reforecasts tend to overestimate or underestimate the observed flash drought events, we computed biases. The bias is defined as
A perfect score is 1.0. The bias ranges from 0 to infinity. If the bias is greater than 1.0, then the forecast overestimates flash drought events. If the bias is less than 1.0, then the forecast underestimates flash drought events. We performed the evaluation for late spring, midsummer and late summer/autumn by pooling cases with initial dates in late spring (April–May), midsummer (June–July) and late summer/early autumn (August–September). There are total of 13 cases per year for late spring, 12 cases for midsummer, and 13 cases for late summer/early autumn each year.
4. FOC of flash drought events
From the analysis, we computed FOC for heat wave and P-deficit flash droughts. In Fig. 1, we show the FOC for April–May, June–July, and August–September for the base period 1985–2012. For late spring when heat wave flash droughts are mostly likely to occur, the FOC (Fig. 1a) indicates maxima located in the north-central and central-eastern United States with another band of maxima extending from northern California to the Pacific Northwest. These results are consistent with Mo and Lettenmaier (2015) who performed the same analysis except for the longer period from 1916 to 2013. The FOCs for heat wave flash droughts have large seasonal variations; specifically, the number of events decreases through the warm season. In midsummer, most events occur in the Midwest and the Pacific Northwest, but the member of events is much lower than the number in late spring (Fig. 1b). There are only few scattered events in late summer and early autumn (Fig. 1c).
The FOCs for P-deficit flash drought (Figs. 1d–f) from analysis show a band of maxima over the southern United States and another band of maxima along the path from Texas to the northern Great Plains, but the locations and magnitudes of maxima vary with seasons. In late spring, there are very few events over the north-central United States. More events are located west of about 90°W with a band of maxima near the southwest U.S. border with Mexico. There are more P-deficit flash droughts in summer. The FOCs are highest across the southern United States, with some values exceeding 8%–10%. FOCs are slightly lower (6%–8%) along a path from the southern Great Plains northward to the Dakotas. In late summer and early autumn, the magnitudes of the FOCs decrease for the southern United States but increase in the inland Pacific Northwest and in California.
Desert areas [e.g., the Southwest (25°–35°N, 110°–123°W)] are not the focus of our study, and caution should be exercised in any event in interpreting results over areas with very low precipitation. This is so because the climatology of these areas is exceptionally dry, and relatively small anomalies can be reflected in apparent and spurious flash droughts. Furthermore, these areas are often due to observations errors—they are usually areas with sparse population, meaning few precipitation gauges, and the gridding algorithms used to produce the precipitation dataset is prone to anomalies due to averaging in stations that are some distance away and may have different seasonal precipitation patterns than the central desert areas. For instance, consider the town of Mojave, California, versus Lebec some 80 km to the west, which gets much more (and more frequent) precipitation. Both stations are likely to be included in the weighted average for at least part of the Mojave Desert, where Fig. 3 shows apparently anomalous tendency for flash droughts in the spring.
The FOCs for heat wave flash drought from the GEFSv2 reforecasts (Fig. 2) from lead 1 pentad (day 1–5) to lead 3 pentads (day 10–15) capture the seasonal variations of the FOCs. The forecasts for late spring indicate that heat wave flash droughts occur over the north-central United States and over the West from Northern California to the Pacific Northwest. The patterns of the FOCs are consistent with analysis but magnitudes are weaker (Figs. 2a–c). For midsummer, the events are located over the northern part of the country and the Pacific Northwest. There are few scattered events in late summer and autumn. Overall, the model is able to capture the patterns of the FOCs and the seasonal variations well.
The FOCs for P-deficit flash drought from the GEFSv2 reforecasts (Fig. 3) capture the overall patterns of the FOC, but the locations and magnitudes of the maxima differ from the analysis. For late spring, the FOCs (Figs. 3a–c) capture the minima over the north-central region but the forecasts overestimate the FOCs in the interior of the West (e.g., Nevada, Utah, and Idaho). In midsummer, the forecasts capture the band of maxima over the southern United States and another band of maxima extending from Texas to the northern Great Plains, but the model underestimates the FOCs in California and Arizona. In late summer and early autumn, the model underestimates FOCs over California and Nevada.
5. Forecast skill
We used the ETS score to evaluate forecast skill. The skill scores should be considered together with the FOC. In areas where the FOC is small, a few incorrectly forecasted events can lead to large ETS errors (e.g., the Mojave Desert example above), even though they have little practical impact. Therefore, we only performed our forecast evaluations for areas with FOCs (in our analysis) above 1%.
a. Heat wave flash droughts
Figure 4 shows the ETS scores for heat wave flash droughts. The ETS scores are from 0.2 to 0.5 for late spring for lead 1 pentad and decrease at lead times 2 and 3 pentads, but even at lead 3 pentads, ETS scores are mostly positive. The skill is not as high in summer as in late spring but ETS is nonetheless positive and between 0 and 0.3 over the northern United States where FOC from analysis is greater than 1%. At lead 3 pentads, ETS is negative over parts of the north-central United States. There were too few events to allow a meaningful evaluation for late summer and autumn.
We examined in more detail the skill for late spring where heat wave flash droughts occur most frequently (Fig. 5). Even in late spring, CN is greater than 90% because heat wave flash droughts do not occur often in either forecasts or in analysis. Biases are between 0 and 1 so overall the model does not have a tendency to overestimate or underestimate heat wave flash drought events. At lead one pentad, the hit rate is greater than 0.4 over the high FOC areas, but hit rates decrease and false alarm rates increase as the lead increases. After lead 1 pentad, there are more misses than hits. At lead 3 pentads, the hit rate is between 0.1 and 0.2 with the false alarm rate greater than 0.7. Large misses and large false alarm rates indicate that flash drought may be forecasted, but they do not occur at the same time as in the analysis.
Heat wave flash droughts are driven by Tair, hence forecast skill is influenced by the ability of the GEFSv2 to forecast high temperature anomalies greater than one standard deviation from climatology. Figure 6 shows the ETS score for such temperatures by season. For lead 1 pentad, skill is 0.2 or above for all seasons. Skill is also higher in midsummer with ETS > 0.4 and late summer/autumn with ETS > 0.3. Skill is lower in late spring because Tair in spring has larger variability (higher standard deviations). Therefore, it is difficult to forecast. Skill decreases rapidly fast as the lead increases. For April–May where the heat wave flash droughts are most likely to occur, skill is above 0.2 for small areas near Wisconsin and Minnesota and Northern California at lead 2 pentads. For other areas, the skill is above 0. At lead 3 pentads, the highest score is only 0.1 over the northern United States except the Pacific Northwest. For June–July, the skill for lead 2 pentads is above 0.2–0.3 except for the western interior region and the Southwest where temperature is influenced by monsoon rainfall. At lead 3 pentads, skill is above 0.2 over Texas. There is not a sufficient number of flash drought events in late summer/autumn to make any differences. Overall, the temperature forecasts are only skillful at lead 1–2 pentads and that influences the skill of heat wave flash drought forecasts.
b. P-deficit flash droughts
In contrast to heat wave flash droughts, P-deficit flash droughts are more uniformly distributed across seasons. The forecasts are generally skillful at lead 1 pentad (Figs. 7a,d,g). In the southern United States and the path from Texas to the northern Plains where the FOC has a band of maxima, ETS values are between 0.2 and 0.4 at lead 1 pentad. The skill decreases quickly as lead increases. At lead 2 pentads, in the large FOC areas the ETS is between 0.1 and 0.2 for Texas and the southern states but only between 0 and 0.1 along the path from Texas to the northern Great Plains. For summer and autumn, the skill over the western region including California and Nevada is low where the GEFSv2 reforecasts also underestimate the FOC. At lead 3 pentads, ETS is mostly just above zero except Texas where ETS is higher and is between 0.1 and 0.2. Over the West, ETS is negative for both midsummer and late summer/autumn but is above zero in late spring.
At lead 1 pentad, the CN rate is above 0.8 (Fig. 8) for areas with FOC > 0.1 which is smaller than the CN for heat wave flash droughts because the FOC for P-deficit flash droughts is generally higher than for heat wave flash droughts. For all seasons, the forecasts have higher hit rates over the southern United States and low biases and low false alarms. However, the GEFSv2 forecasts show more events over the interior of the West (west of 90°W) than in the analysis in spring and summer. Most of the excess events in the forecasts are false alarms, and that causes biases larger than 1. For summer, in addition to the large false alarm rates over the West, biases greater than 1 also occur over the north-central region where there should be fewer P-deficit flash droughts as indicated by analysis (Figs. 1d–f). Figure 9 shows the ETS skill for P. The skill of P forecasts is overall much lower than temperature (Fig. 6). The ETS for P < 40% which is one of the conditions to define the P-deficit drought is between 0.2 and 0.4 only for lead 1 pentad. After that, ETS drops to 0.1 even though they are positive at lead 2 pentads. At lead 3 pentads, ETS is below 0.1 or even negative.
Comparison of the forecast skill indicates that the ETS skill for the P deficit drought events is higher than the ETS skill for P forecasts. Therefore, there is another source of skill. As indicated by Mo and Lettenmaier (2016), two conditions have to be satisfied for the P-deficit flash droughts to occur: 1) lack of P and 2) high correlation between ET and SM. P deficit can drive down the SM, but the SM anomalies are more persistent. If the SM deficit causes ET to decrease, over these areas, the sensible heat will increase to balance the decrease of ET (latent heat) and that will cause the temperature anomaly to increase. If Tair anomaly increases above 1 standard deviation, a P deficit flash drought can be detected. From Koster et al. (2009) and Mo and Lettenmaier (2016), the areas where SM and ET have a near-linear relationship are over the southern United States and a swath from Texas to the northern Great Plains. These are also areas that the ETS for P-deficit flash droughts have high skill (Fig. 7). Figure 10 shows the correlation between SM and ET from the GEFSv2 reforecasts. The correlation needs to be more than 0.14 to be statistically significant at the 95% level. The correlations are statistically significant over the southern states and the significant correlations also extend from Texas to the northern Great Plains in spring and at lead 1 pentad for summer and autumn. If P anomalies at the initial time are negative, they may lead to negative SM anomalies. The GEFSv2 physics is able to capture the SM and ET relationship. This may explain why the P-deficit flash drought forecasts have higher skill than the P forecasts.
As we note above, a number of alternative definitions of flash droughts have been proposed over the last few years. At this time, there is no accepted definition. Most alterative definitions are based on changes in ET or variables related to ET, and in some cases an additional SM percentile requirement (e.g., Otkin et al. 2018). We view ET changes as forced, and the forcings are in general associated with temperature and/or precipitation. In any event, the composites of ET before and during flash drought events selected based on Tair and P show that there is a rapid change in ET for both heat wave and P deficit flash droughts (Mo and Lettenmaier 2016). More importantly though, from the standpoint of flash drought forecast skill is that our results indicate the skill of the GEFSv2 forecast for flash droughts is limited to one or two pentads. On average, and over most of our domain, the forcing Tair overall is skillful for about 2 pentads and P is only skillful for one pentad. Because ET and SM are forced by Tair and P, it stands to reason that if an alternative definition of flash drought requires duration of drought longer than about 2 pentads (e.g., Ford and Labosier 2017; Christian et al. 2019; Otkin et al. 2018), then flash droughts based on those definitions are not likely to be predictable by the GEFSv2. The approach we use could of course be applied to other flash drought indices, so long as they are based on physical variables that are predicted by the GEFS (or other medium range weather forecast models). The suggestion of Pendergrass et al. (2020) that rapid evolution is a key element and application of that principle by Liu et al. (2020) using SM as the target variable does suggest that there is potential for flash drought forecast skill so long as the period of evolution is a small number of pentads. Our definitions are consistent with the basic concept that flash droughts are rapidly evolving events which are distinguished from longer more slowly evolving “classical” droughts. Nonetheless examination of the implications of alternative definitions of flash droughts on their predictability is, we believe, a topic that merits further investigation.
Our objective was to assess the skill of the MRF in forecasting flash droughts. We examined GEFSv2-based reforecasts of flash droughts over the CONUS for the spring, summer, and autumn. We found the following:
The GEFSv2 reforecasts generally capture the spatial and seasonal patterns of flash drought FOCs, especially for heat wave flash droughts. The GEFSv2 reforecasts capture the basic pattern of the FOC of P-deficit flash droughts, aside from some anomalies in late spring over parts of the Southwest, which lead to large false alarm rates, and underestimate of flash drought occurrence over California and Nevada.
ETS scores for late spring for heat wave flash droughts (the season when they are most likely to occur) are greater than 0.2 for lead 1 and in some regions 2 pentads, which suggests usable (if not strong) forecast skill. At lead 1 pentad in spring, the hit rate is more than 0.4 and biases are low. As the lead increases, the hit rate decreases and the false alarm rate increases. At lead 3 pentads, the ETS is still mostly positive although skill probably is too low to perform useful forecasts. For summer, the reforecasts have ETS scores greater than about 0.1 for lead 1 pentad. At lead 3 pentads, ETS is below 0.1 and is negative in some places so there is no useful skill. For P-deficit flash droughts, the GEFSv2 reforecasts generally are less skillful than for heat wave flash droughts. The reason for poor P deficit flash drought forecasts is because the GEFSv2 reforecasts of P have lower skill than for temperature. However, the P-deficit flash drought forecasts are more skillful than the P forecasts, and the forecasts are able to capture the relationships between SM and ET. Even though the P forecasts are not skillful after lead 1 pentad, the negative SM anomalies are more persistent leading to some skill in the P-deficit flash drought forecasts.
Overall, while the GFFSv2 reforecasts are (mostly) able to capture the general patterns of the seasonal variation of FOCs, the forecast skill for individual events beyond lead one to two pentads is low – primarily because there are too many false alarms. One interesting question that could be the subject of future research is whether the GEFSv2 reforecasts are able to forecast the onset of strong and persistent drought events, such as the 2012 central U.S. drought (which was triggered by a flash drought). The operational GEFS forecast in 2012 did not capture the onset of the drought. After the event was underway, the GEFS captured the event but the forecasted magnitudes of Tair and SM percent were weaker than in the analysis.
This work is funded by MAPP/CPO/NOAA Grant GC-14-189A and NA17OAR4310146 (UCLA ref. 20172011).