Droughts and heat waves have important impacts on multiple sectors including water resources, agriculture, electricity generation, and public health, so it is important to understand how they will be affected by climate change. However, there is large uncertainty in the projected changes of these extreme events from climate models. In this study, historical biases in models are compared against their future projections to understand and attempt to constrain these uncertainties. Biases in precipitation, near-surface air temperature, evapotranspiration, and a land–atmospheric coupling metric are calculated for 24 models from phase 5 of the Coupled Model Intercomparison Project (CMIP5) against 2 models from phase 2 of the North American Land Data Assimilation System (NLDAS-2) as reference for 1979–2005. These biases are highly correlated across variables, with some models being hotter and drier and others wetter and cooler. Models that overestimate summer precipitation project larger increases in precipitation, evapotranspiration, and land–atmospheric coupling over important agricultural regions by the end of the twenty-first century (2070–99) under RCP8.5, although the percentage variance explained is low. Changes in the characteristics of droughts and heat waves are calculated and linked to historical biases in precipitation and temperature. A method to constrain uncertainty by ranking models based on historical performance is discussed but the rankings differ widely depending on the variable considered. Despite the large uncertainty that remains in the magnitude of the changes, there is consensus among models that droughts and heat waves will increase in multiple regions in the United States by the end of the twenty-first century unless climate mitigation actions are taken.
Droughts and heat waves are two of the most damaging natural hazards that affect water resources (Dawadi and Ahmad 2012), agriculture (Lesk et al. 2016), electricity generation (van Vliet et al. 2016), and public health (Anderson and Bell 2011). When these extreme events impact large expanses of cultivated areas, they can cause water and heat stress to plants and crops (Lobell et al. 2013; Hatfield and Prueger 2015), reducing yields and potentially leading to increases in food prices (World Bank 2012). Droughts and heat waves result from climate variability, but climate change may increase their frequency, severity, and other characteristics (IPCC 2013).
Multiple studies have explored the potential future changes in extreme events (Orlowsky and Seneviratne 2013; Sillmann et al. 2013; Maloney et al. 2014; Wuebbles et al. 2014), including droughts (Sheffield and Wood 2008; Dai 2011; Trenberth et al. 2013; Jeong et al. 2014; Cook et al. 2015; Touma et al. 2015) and heat waves (Abatzoglou and Barbero 2014; Russo et al. 2014) over North America. These were based on climate model experiments from phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) that informed the Intergovernmental Panel on Climate Change Fifth Assessment Report (IPCC 2013). Past work has also looked at impacts of droughts and heat waves on agriculture (Mishra and Cherkauer 2010; Lobell et al. 2013, 2014) and how this sector may be affected over North America under different climate change scenarios (Parry et al. 2004). While common trends have been identified, such as the drying of the U.S. Southwest and rising air temperatures throughout North America, there is still high uncertainty in the future projections (Allen et al. 2000; Knutti et al. 2008; Knutti and Sedláček 2013), especially regarding extreme events (Burke and Brown 2008; Sheffield and Wood 2008).
This uncertainty results from a variety of sources, including internal variability of the climate system (Deser et al. 2014), the degree of future mitigation of anthropogenic greenhouse gases (Diffenbaugh and Giorgi 2012), and the climate models used (Knutti et al. 2010; Cheruy et al. 2014; Friedlingstein et al. 2014). The relative contribution of each of these uncertainty sources to the overall value depends on the time horizon of the projections. For example, internal variability dominates uncertainty in the present and can have important contributions even up to 50 years into the future (Thompson et al. 2015), while uncertainties regarding emissions and climate models play an increasing role through the end of the century (Hawkins and Sutton 2009). Internal variability is difficult to predict because it arises from complex interactions within the climate system. Similarly, it is challenging to predict greenhouse gas emissions because they depend on human development and mitigation efforts. Uncertainty from climate models occurs because they include different sets of physical processes, use different parameterizations, or have different spatial and vertical resolutions, even though they share significant components and seek to solve the same general physical equations (Knutti et al. 2013).
The work presented here focuses on the uncertainty in future projections of droughts and heat waves derived from the diversity of climate models in CMIP5, and seeks to constrain it by using information on the models’ historical biases. These biases are calculated using observationally constrained model output for key variables of the land surface, near-surface atmosphere, and their interactions, which are important in representing and controlling the occurrence of droughts, heat waves, and their feedbacks. Land–atmospheric (L–A) coupling is one of the main physical processes examined. This represents how much influence the land surface has on the lower part of the atmosphere and vice versa. The type of coupling determines whether a region is water limited (evapotranspiration is positively correlated with soil moisture), energy limited (evapotranspiration is negatively correlated with soil moisture), or in a transition zone (Seneviratne et al. 2010), with important implications for the occurrence of droughts and heat waves. L–A coupling can intensify droughts and increase their persistence (Wu and Kinter 2009; Roundy et al. 2013, 2014), and generate and strengthen local heat waves (Fischer et al. 2007a,b; Lorenz et al. 2010; Berg et al. 2015; Miralles et al. 2014). Compound events, in which droughts and heat waves take place simultaneously, cause large damages to crops due to both water and heat stress (Lesk et al. 2016). It is expected that L–A coupling will become more important in the future under climate change, especially for regions under transitional and dry regimes (Dirmeyer et al. 2013a,b). If this is the case, a stronger feedback between the land surface and the atmosphere may lead to increased drought persistence and intensity, and frequency of compound events.
An accurate depiction of the historical climate is a necessary, albeit not sufficient, condition to have confidence in the projections of a given climate model (Tebaldi and Knutti 2007). For example, models that have positive temperature biases in the historical period have been shown to project larger increases in temperatures (Cheruy et al. 2014) because they generally overestimate incoming shortwave radiation due to misrepresentation of cloudiness. Furthermore, biases in L–A coupling strength can have an important impact on models’ future projections. If a model displays stronger coupling, more incoming radiation will heat the lower atmosphere, especially during dry soil moisture periods (Seneviratne et al. 2010; Jaeger and Seneviratne 2011). This may then lead future increases in net radiation from increased CO2 to be exaggerated. Conversely, if a model is too wet and coupling too weak, potential trends in desertification, droughts, and heat waves will be underestimated due to dampening of these land–atmospheric feedbacks.
The historical biases are used to develop model rankings based on the climate models’ historical performance. Many studies have analyzed historical biases in climate models (e.g., Reichler and Kim 2008; McCrary and Randall 2010; Sheffield et al. 2013a,b) but have generally not linked performance to uncertainty in future projections. Several studies have also sought to develop model rankings to inform ensemble means where the contribution of each model depends on its performance (Brekke et al. 2008; Gleckler et al. 2008; Santer et al. 2009) instead of the more common “one model, one vote” criterion. However, not much emphasis has been placed on constraining the uncertainty of droughts (Wehner et al. 2011) and heat waves in particular, nor do past studies examine how the choice of performance metrics affects the resulting uncertainty of future projections.
2. Data and methods
Data from 24 CMIP5 models from 14 modeling centers were used and are listed in Table 1 along with their general characteristics. The models were chosen based on the availability of variables needed for this study, in particular soil moisture at different layers, or with a total soil column under 2.5 m. The historical (~1850–2005) experiment simulations were used to evaluate the models’ biases, and the representative concentration pathway 8.5 (RCP8.5; van Vuuren et al. 2011) simulations (~2006–2100) to explore the biases’ relationships with future projections.
Observational data and observation-driven land surface hydrological model output were taken from phase 2 of the North American Land Data Assimilation System (NLDAS-2; Xia et al. 2012a,b). The NLDAS-2 runs multiple land surface models over the continental United States at ⅛1/8° spatial and 1-h temporal resolution from 1979 to present, in support of understanding the land surface hydrological cycle, drought monitoring and forecasting, and initialization of weather models (Xia et al. 2012a). The NLDAS-2 data have been evaluated against a range of observations, including streamflow (Xia et al. 2012b), soil moisture (Xia et al. 2014), soil temperature (Xia et al. 2013), and evapotranspiration (Peters-Lidard et al. 2011). It provides arguably the best estimate of land surface hydrology at high resolution for the contiguous United States, in particular for soil moisture and evapotranspiration, for which direct observations are lacking over large scales and long time periods (>10 yr) (Nearing et al. 2016). Data for two of the NLDAS-2 models were used to evaluate the CMIP5 historical run climatologies: the Variable Infiltration Capacity model (VIC; Liang et al. 1994) and the Noah model (Chen et al. 1996). These two models were chosen because they provided the best overall performance in the evaluation studies mentioned previously. Data from these two models were averaged to produce the NLDAS-2 estimates.
The common time period of 1979–2005 was chosen for the comparisons between the NLDAS-2 and the CMIP5 historical data. The future changes were calculated between the end of the twenty-first century, 2070–99, and this historical time period. Data from CMIP5 and NLDAS-2 models were interpolated to the grid with the lowest resolution among the models (i.e., 2.8° × 2.8°).
b. Definition of droughts and heat waves
There are multiple definitions of a drought (Wanders et al. 2010; Sheffield and Wood 2011; Lloyd-Hughes 2014), and the decision of which to use depends on the application. We focus on summer agricultural drought calculated from monthly soil moisture (SM; kg m−2 month−1) for June–August (JJA), for a standard depth of 2 m. Some of the models only report soil moisture for a total soil column depth of 1.5–2.5 m, which was used directly. Other models reported data for multiple soil layers, so these were interpolated to a 2-m level, assuming that soil moisture varied linearly between layers.
We carried out tests (not shown) using data from models that reported soil moisture at multiple layers to understand the impact of using values for slightly shallower (e.g., 1.5 m) or slightly deeper (e.g., 2.5 m) columns. Projected changes in drought frequency, duration, and severity (defined in Table 2) were calculated over the Crop Area (defined in Fig. 2) for 12 models at all their reported depths. All models projected increases in drought frequency overall. However, three models projected higher increases in the probability of a drought occurring as a function of depth at an average rate of 5% m−1. On the other hand, nine models showed decreased changes in drought frequency as a function of depth with an average rate of −8% m−1. Nine models projected more severe events as a function of depth, driven mainly by increased drought duration in deeper soil columns. This suggests that models with deeper soil columns will tend to underestimate changes in drought frequency and overestimate their severity compared to a 2-m baseline. How much so depends greatly on the model.
To quantify and understand the historic biases and future changes in soil moisture (and hence drought), three other variables were considered: monthly precipitation (Prcp; kg m−2 month−1), evapotranspiration (ET; kg m−2 month−1), and near-surface air temperature (Tas; K). JJA climatologies for the historic and future periods were calculated for all variables. The winter [December–February (DJF)] and spring [March–May (MAM)] climatologies were also calculated for precipitation because summer droughts are related to the previous seasons via snowpack and soil moisture persistence. Daily maximum near-surface air temperature (Tasmax; K) was used to identify heat waves (Lau and Nath 2012). Since heat waves usually last on the order of days to weeks, daily data between 1 June and 31 August were used.
Drought events were calculated from monthly soil moisture fields over a depth between 1.5 and 2.5 m, depending on the model. An empirical cumulative distribution function (ECDF) was calculated for each summer month (i.e., June, July, August) for the historical period for each grid cell, and was used to calculate a percentile value for each month throughout the record. A month was defined to be under drought if soil moisture was below the 20th percentile (Sheffield et al. 2009). For future projections, the ECDF of the historical period was used to calculate the equivalent percentile for the future soil moisture values, thus including any shifts in the climatology as well as changes in variability.
There are also several definitions of heat waves (Robinson 2001; Della-Marta et al. 2007; Fischer et al. 2007a; Anderson and Bell 2011; Lau and Nath 2012). It is common to use a fixed value threshold for a given number of consecutive days (e.g., above 30°C for 5 days) (Della-Marta et al. 2007). This has the advantage of being easily translated to agricultural impacts where these thresholds have been linked to reduced yields (e.g., Lobell et al. 2013). Nevertheless, this type of definition poses a challenge when using CMIP5 data because models have temperature biases, leading to under- or overestimation of heat waves relative to observations depending on the sign of the biases. Another definition of heat waves is based on percentiles (e.g., Anderson and Bell 2011), similar to our definition of soil moisture drought. This has the advantage of bypassing biases in temperature by defining the extreme events relative to the climatology of each model. For this reason, the latter method was chosen. Heat waves were calculated based on Tasmax, when values were above the 80th percentile (for consistency with the drought analysis) for five consecutive days, based on the ECDF for each day in JJA. For future heat waves, the historical ECDF was used to calculate the percentile values.
Yearly frequency, mean duration, mean intensity, and mean severity were calculated for both droughts and heat waves. The respective equations are defined in Table 2.
c. Definition of land–atmosphere coupling strength
Land–atmosphere processes depend largely on the type and strength of the dependence of evapotranspiration on soil moisture (Seneviratne et al. 2010), and whether it is water limited, radiation limited, or transitional. The strength of the coupling is also modulated by the magnitude of evapotranspiration. For example, in dry regions, the correlation between soil moisture and evapotranspiration is large and positive. However, as evapotranspiration is generally low, there is little feedback with the atmosphere. Therefore, strong L–A interactions take place where there is a combination of strong positive correlation between soil moisture and evapotranspiration, and a relatively high evapotranspiration rate.
Several metrics have been proposed to quantify the type and strength of L–A coupling and how they are represented in climate models (Koster et al. 2002; Dirmeyer 2006; Dirmeyer et al. 2006). One metric commonly used in the literature (Dirmeyer et al. 2013b) is the correlation of interannual evapotranspiration (or latent heat flux) and soil moisture ρ(SM, ET), multiplied by the interannual standard deviation of evapotranspiration σ(ET), shown in Eq. (1):
The correlation identifies if a region is typically water or energy limited over a time span of decades. The standard deviation multiplier adds information about the variability of the evaporative flux throughout the data record. Thus, the metric quantifies the variability of land–atmospheric coupling strength within a region from year to year. However, for this study, it is more important to use the evapotranspiration climatology (the average of the flux’s strength) instead to capture the regions where, generally, the land surface has the capacity to impact the atmosphere within the summer season.
Figure 1 plots the mean JJA evapotranspiration against the interannual standard deviation of JJA evapotranspiration for water-limited, radiation-limited, and transition regions in the domain of the two NLDAS-2 models. Grid cells are defined as water limited when the correlation R between soil moisture and evapotranspiration is significant (p < 0.05) and larger than 0.3, as radiation limited when this correlation is significant and negative with a magnitude larger than 0.3, and as transition otherwise. While this threshold is arbitrary, the sensitivity of the grid cell classification to it is low until high thresholds (R = 0.5–0.8) are chosen. This shows that there are water-limited regions with high mean values and low standard deviations, as well as regions with low mean values and high standard deviations. Therefore, to better account for seasonal L–A feedbacks, we modify the coupling metric by replacing the interannual standard deviation by the mean value μ(ET), and normalizing by the maximum evapotranspiration value throughout the domain, max(ET), as shown in Eq. (2):
This normalization bounds the metric between −1 (strongly radiation limited) and 1 (strongly water limited).
The spatial patterns of interannual correlations between JJA soil moisture and evapotranspiration are very similar for both Noah and VIC (not shown), with the largest difference over the Southeast, where Noah shows higher mean evaporative fluxes compared to VIC. To account for the uncertainty of these estimates, we averaged the model values to a single NLDAS-2 ensemble mean. The percentage errors of the difference between the models’ estimates with respect to the ensemble mean were calculated for climatologies in JJA SM, JJA ET, and JJA ρ(SM, ET) (both models have the same meteorological forcings). These were found to be 41%, 33%, and 43%, respectively, when averaged over the entire NLDAS-2 domain. Figure 2 shows the NLDAS-2 ensemble mean of the two coupling metrics given by Eqs. (1) and (2) (i.e., γ and ϕ, respectively). Note that γ was normalized by the maximum standard deviation value in the domain to allow for comparison between the two. Here, γ shows a lower coupling in the U.S. Southeast (a relatively wet region) than over the north of Mexico (a semiarid region), in contrast to ϕ. Given that land–atmospheric coupling depends heavily on the strength of evaporative fluxes, which in turn depend on water availability, one would expect higher coupling over the U.S. Southeast compared to the north of Mexico. For the rest of the study, the domain is split into seven subregions based on the spatial patterns of L–A coupling shown by ϕ.
L–A coupling is also a function of soil moisture depth, given that evapotranspiration takes place in the upper region of the soil column, depending on the distribution of the vegetation’s roots (Rodríguez-Iturbe and Porporato 2004). The 1.5–2.5-m depth generally encompasses the root zone and is deep enough to capture longer-lasting soil moisture memory beyond the frequency of individual storm events. This is in contrast to the upper soil layer (e.g., 10 cm), which experiences fluctuations at a higher frequency and therefore does not represent L–A coupling accurately at monthly time scales. However, soil columns between 2 and 2.5 m can be deep enough to dampen some of the coupling strength if the vegetation has shallower roots in a given region. As with the droughts statistics, we explored the sensitivity of ϕ to soil depth, and found different sensitivities across models. Nine models showed an expected decrease in coupling with an average change of 16% per meter relative to the 2-m value, while three models surprisingly showed an average increase in coupling with soil depth of 2% per meter.
d. Definition of subregions
We define a set of subregions to capture the spatial variation in L–A coupling. Figure 2 shows the coupling metric calculated from the average of the NLDAS-2 models. The Southeast shows the strongest coupling strength in the domain. The Northeast has negative coupling values as the region is wet and strongly radiation limited. The Northwest and Southwest have strong positive correlations between soil moisture and evapotranspiration since they are generally drier regions. However, L–A coupling is low since the seasonal evapotranspiration is also low. An important agricultural region “Crop Area” (Bagley et al. 2012) is further split into “Crop Upper” and “Crop Lower” because the difference in their coupling may have different implications for future changes.
e. Estimation of historical biases and relationship with future projections
Historical biases in each variable are calculated relative to the NLDAS-2 data, by averaging the data over each subregion and subtracting the NLDAS-2 estimates from the CMIP5 model estimates. In this study the focus is on relating the biases in JJA Prcp to future projected changes via linear regression across models for each subregion. This assumes that there is a linear relationship between the projected changes and the predictor. However, the biases in different variables are not independent: for example, biases in Prcp are associated with biases in ET in water-limited regions. We quantify this dependency by calculating the correlation matrices between the biases for each region across models.
a. Historical biases in mean climate
Figures 3a–f show boxplots of the historical biases of MAM and JJA Prcp, JJA ET, JJA Tas, JJA ρ(SM, ET), and JJA land–atmospheric coupling metric ϕ across the 24 climate models averaged over each subregion. Additionally, Table 3 lists the biases for each model for DJF, MAM and JJA Prcp, JJA ET, Tas, and ϕ for the Crop subregions. Of all regional biases, 61.3% were statistically significant (p < 0.05) using a two-sample t test. Over the Northeast, Northwest, Southwest, and Southeast the CMIP5 models show median positive biases for MAM Prcp amounting to a median percentage error of 30%, 33%, 76%, and 13%, respectively. Median biases were also found to be positive for JJA Prcp in these regions, with respective median percentage errors of 18%, 49%, 24%, and 8%, respectively. All four regions show median positive biases in ET (median percentage errors of 35%, 37%, 47%, and 18%). These four subregions also have small negative biases in Tas (median percentage errors of 0.44%, 0.84%, 0.55%, and 0.51%, respectively). The Northeast shows a median positive bias in ϕ (median percentage errors of 532%), while the Northwest, Southwest, and Southeast show a median negative bias (median percentage errors of 24%, 32%, and 29%). In the Northeast, both components of ϕ are generally overestimated such that 21 of the models do not represent this region as being radiation limited, resulting in such a large percentage error. In the Northwest, Southwest, and Southeast, ϕ is underestimated by the median of the CMIP5 models because ρ(SM, ET) is underestimated. Here there are probably two competing effects: models with positive biases in Prcp represent these regions as being less water limited, decreasing ρ(SM, ET), while their positive biases in ET increase μ(ET). Biases in ϕ are then a result of these effects on each of its components over each subregion. Except for the case of Tas for which the climate models represent the climatologies quite well, the hydrological variables are relatively poorly represented by the median of the models.
These biases are not independent from each other. The cross correlations between biases in each variable across the 24 models are shown in Figs. 3g–n. Biases in DJF Prcp are not shown but were positively correlated with those in MAM Prcp everywhere except for the Southeast, and with those in JJA Prcp and JJA ET in the Northwest. They were also negatively correlated with biases in Tas over the Southwest. Models that have higher JJA Prcp also tend to have higher MAM Prcp (except in the Southeast and Crop Area), higher JJA ET, lower JJA Tas (except in the Northwest and Southwest), and lower correlations between SM and ET (except in the Southwest). The lower temperatures are consistent with a wet bias that induces more ET and more evaporative cooling. Conversely, models with less summer Prcp also tend to experience a drier spring, lower ET, higher Tas, and a stronger dependence of ET on SM. Interestingly, no region showed a significant correlation (p < 0.05) between biases in JJA Prcp and biases in ϕ. This is probably because of the competing effects mentioned in the previous paragraph, whereby higher Prcp leads to higher ET rates but also lower ρ(SM, ET), thus having mixed effects on ϕ. Low correlations in other regions may also be related to how the models represent ET and SM dynamics, irrespective of the biases in Prcp. Overall, the correlations show that there are common climate regimes for the historical period across the models: models that are wetter (drier) during the summer, are also wetter (drier) in the spring, have higher (lower) ET, lower (higher) Tas, and weaker (stronger) relationships between ET and SM.
b. Relationship between historical biases and future projected changes in mean climate
The ranges of projected changes in MAM and JJA Prcp, JJA ET, Tas, ρ(SM, ET), and ϕ from the 24 climate models are shown in Figs. 4a–f. Furthermore, Table 4 lists the projected changes for each model for DJF, MAM, and JJA Prcp, JJA ET, Tas, and ϕ over the Crop subregions. The ranges are large and there is no absolute consensus on the sign of most of these changes across regions. The median of the models show an increase in MAM Prcp in every subregion but the Southwest, while the median also shows slight decreases of JJA Prcp, albeit with several models showing no changes or a positive one. These two changes are positively correlated across models (Figs. 4g–n) because those that project the largest decreases in JJA Prcp also project decreases in MAM Prcp, and those that project no or positive changes in JJA Prcp project increases in MAM Prcp. Changes in ET are more uncertain in the Southeast and the Crop Area, although most models project increases in the Northeast and Northwest and decreases in the Southwest. All models and subregions show an increase in Tas with a median of 5.0°C across the domain. However, some models project an increase of up to 8.5°C over the Crop Upper region. This large disparity in projected changes in temperature has been partially attributed to the models’ historical biases in incoming shortwave radiation due to misrepresentation of clouds. Models with the highest deficiencies in depicting cloudiness tend to project the largest temperature increases in midlatitude areas globally (Cheruy et al. 2014). The median of the models shows projected increases in the correlation between SM and ET, and the coupling metric except for the Southwest, although there is large disagreement on the signs of these changes.
To understand how the projected changes in each variable are related, their cross correlations were calculated across models for each subregion (Figs. 4g–n). Changes in DJF Prcp are not included as they were only positively correlated with changes in MAM Prcp over the Southwest, Southeast, and the Crop areas. There are several strong correlations for changes in temperature, which are negatively correlated with changes in JJA Prcp in the Northeast, Southwest, Southeast, and the Crop areas. This shows that by the end of the century, the models tend to fall into a range of climates over certain regions. On one hand, models with higher increases in JJA Prcp are likely to also have a wetter spring over the Southwest, Southeast, and Crop areas, higher JJA ET rates across regions, stronger ϕ (except in the Northeast and Northwest), and dampening the JJA Tas increase (except in the Northwest). Conversely, models that exhibit the highest increases in temperature also tend to experience the largest decreases in Prcp and ET, and a weakening of ϕ.
A linear regression was fitted between the historical biases in JJA Prcp and the projected changes in MAM and JJA Prcp, JJA ET, Tas, ρ(SM, ET), and ϕ across climate models and for each subregion. Figure 5 displays the regression slopes and R2 values (Fig. 5, left), and intercepts (Fig. 5, right). No significant relationships (p < 0.05) were found with changes in DJF Prcp, so they are not shown.
Figure 5 shows that for the Northeast, Northwest, and Crop Area, a positive bias in JJA Prcp is related to larger positive increases in MAM Prcp, amounting to 20%, 40%, and 18% of the variance in the model projections in each region, respectively. The same relationship is evident for changes in JJA Prcp over the Southeast and the Crop Area, although models with smaller bias (close to the regression intercept) project a decrease in Prcp in the Southeast (shown by the negative regression intercept) and no change over the Crop Area. The percentages of the variance explained by these relationships are 19% and 22%, respectively. For example, the regression slope and intercept of the projected changes in JJA Prcp against bias in JJA Prcp over the Crop Area are 0.26 mm month−1 (mm month−1)−1 and −1.7 mm month−1, respectively (p = 0.036). This positive relationship between historical bias in JJA Prcp and its projected changes means that a wetter model during the historical period will tend to project a wetter United States by the end of the twenty-first century if the bias is large, or little change in JJA Prcp if the bias is small.
Projected future changes in ϕ in the Northwest, Southeast, and the Crop Area also show significant positive relationships with biases in JJA Prcp, with percentage variances explained of 23%, 36%, and 28%, respectively. These are related to greater increases in ET rates [slope = 0.24 mm month−1 (mm month−1)−1 increase over the Crop Area] and greater strengthening of ρ(SM, ET) [slope = 0.002 (mm month−1)−1 increase in the Crop Area] in historically wetter models. This last relationship is particularly interesting since wetter models during the historical period were found to be associated with weaker ρ(SM, ET) (Figs. 3g–n). The relationships from Figs. 4g–n show that these same wetter models project increases in ρ(SM, ET), albeit with a very shallow slope. Thus, wetter models during the historical period project a strengthening of the coupling due to increases in both components of ϕ in the future within the Southeast and the Crop Area, although especially due to that in ET. A possible explanation for this is that higher future temperatures will drive increases in ET such that the regions become more water limited despite increases in Prcp. In turn, drier models during the historical period project a weakening of ϕ likely due to the decreases in ET associated with decreases in JJA Prcp, since the small slope of the correlation component suggests that it has little impact on the overall changes of ϕ. In the Northwest, models with little bias in JJA Prcp tend to project a decrease in JJA ET. A drier model in this region would then tend to project an even larger decrease and a wetter model a very small decrease, or even an increase in JJA ET if the JJA Prcp bias was large.
c. Implications for extreme events: Droughts and heat waves
Figures 6 and 7 show the changes in yearly frequency plotted against changes in mean severity of drought and heat wave events, respectively. There are strong positive relationships between changes in drought frequency and severity throughout every subregion, with R values ranging from 0.64 in the Northwest and Crop Area, to 0.82 in the Northeast. Therefore, models that show the highest increases in the number of droughts relative to the historical period also experience larger increases in drought severity, which is to be expected given the use of a fixed percentile based threshold.
For example, MIROC5, MIROC-ESM, and MIROC-ESM-CHEM project the largest increases in drought frequency over the Crop Area, together with soil moisture drying (not shown). This is likely driven by their projected reductions in JJA rainfall (−10.6, −12.6, and −10.2 mm month−1, respectively) over this area relative to changes in ET (−0.2, −1.6, and −0.6 mm month−1, respectively), as shown in Table 4. Additionally, Table 3 displays that two of them have large negative biases in JJA Prcp (−3.9, −18.8, and −15.1 mm month−1, respectively).
Figure 7 shows that models exhibit a positive relationship between increases in heat wave frequency and severity throughout the domain, with the strongest correlation (R = 0.58) over the Northeast. Two models that project large increases in heat wave frequency and severity are MIROC-ESM and GFDL CM3. Both these models project higher changes in daily maximum and monthly values of near-surface air temperature (not shown). Given the projected changes and biases in MIROC-ESM already discussed, its projected increases in heat waves are possibly due to the increased partitioning of incoming radiation into sensible heat flux. GFDL CM3, on the other hand, has a small positive bias in JJA Prcp of 5.9 mm month−1 and projects an increase in Prcp (8.3 mm month−1) and ET (21.5 mm month−1). In this case, it could be that larger-scale factors are responsible for the higher increases in temperature (7.2 K compared to the 24-model ensemble increase of 5.3 K). Another possible explanation is that there might be changes in the distribution of rainfall throughout the summer, which might leave longer drier periods that might encourage the formation of heat waves.
A Spearman rank correlation was calculated between the absolute projected changes in each of the characteristics of droughts and the historical biases in JJA Prcp. This was repeated for the changes in heat waves and biases in JJA Tas. The results are shown in Fig. 8. Drought yearly frequency has significant (p < 0.05) negative correlations with biases in JJA Prcp over the Northeast, Southeast, Crop Area, and Crop Upper. Drought mean intensity has similar negative correlations with biases in JJA Prcp over the Northeast and the Southeast. Drought mean duration is also correlated with biases in JJA Prcp over the Northeast, Southeast, Crop Area, and Crop Lower. Finally, drought mean severity is negatively correlated with biases in JJA Prcp over every region except for the Northwest and Southwest. These results show that wetter models during the historical period tend to project less frequent, less intense, and shorter droughts, while drier models will produce more extreme projections of these drought characteristics in many of the subregions, particularly over those important for agriculture.
Fewer significant Spearman rank correlations were found between biases in JJA Tas and changes in heat wave characteristics (and none with biases in JJA Prcp). These biases are correlated with changes in heat wave yearly frequency over the Northeast and Crop Upper. Changes in mean intensity are also correlated with these biases over Crop Upper and the Northwest. Significant relationships were found for the changes in heat wave mean duration and mean severity, but solely over the Northwest. While these relationships are fewer, given the correlation between biases in JJA Tas and JJA Prcp, we can infer (albeit rather weakly) that drier and hotter models produce more extreme projections for heat waves, mainly over the Northwest region, compared to those models that tend to be wetter and cooler over the historical period.
4. Discussion and conclusions
a. Potential constraints on the uncertainty of future projections
The question remains whether it is possible to use the information on the model biases to constrain the uncertainty of future projections. In this section we use the biases to rank the models, assuming that an accurate representation of the historical climate is necessary (albeit not sufficient) for trusting the projected changes in future hydroclimate and its extremes. There is an incentive to develop these model rankings because climate change impact studies often select a small subset of the climate models on which to base their analyses (e.g., Brekke et al. 2009; Schewe et al. 2014). Since small subsets of climate models are driving the community’s research on the potential impacts of climate change (e.g., Gerten et al. 2011; Hagemann et al. 2011; Warszawski et al. 2014; Frieler et al. 2015), one would desire for the “best” models to be used, while encompassing a realistic range of uncertainty for the time frame of interest (e.g., near-term, midcentury, end-of-century). Constraining the uncertainty that arises from model diversity is important because it represents the largest contribution of the overall uncertainty of climate change by the end of the century for a given RCP scenario (Hawkins and Sutton 2009).
The 24 models were ranked according to the absolute values of their biases in JJA Prcp, JJA Tas, JJA ET, and JJA ϕ. These rankings were done separately for each variable. A Spearman correlation analysis between the rankings showed positive significant correlations (p < 0.05) between those from JJA Prcp and JJA ET (R = 0.53), and those from JJA Tas and JJA ET (R = 0.45). A negative and significant correlation was found between the rankings derived from JJA ET and JJA ϕ (R = −0.57). More details can be obtained from Table 3. The lack of more correlated rankings is possibly because negative biases are treated the same as positive ones and because a discrete ranking may amplify the differences between models with statistically similar biases.
To understand the error in the uncertainty range that derives from selecting a subset of the 24 models, we randomly sampled subsets of models and compared their ranges of projections to those when selecting the top performing models according to the rankings. This was done using bootstrap sampling whereby a subset of models was selected at random 1000 different times from the ensemble of 24 climate models. The interquartile range of the projected changes in droughts and heat waves was calculated for each sample as a measure of uncertainty. This sampling was done for sample sizes from 5 to 23 models to quantify how this uncertainty range changes as a function of the sample size. In parallel, subsets of models (from 5 to 23 models) were selected according to the four rankings over the Crop Area and the interquartile range of their projected changes calculated. This allowed us to compare the uncertainty derived when selecting the “better performing” models as opposed to selecting the same number of models at random.
The results of this uncertainty analysis are displayed in Fig. 9. The median of the bootstrap analysis shows that selecting a small sample of models (e.g., 5) at random will likely underestimate the variance of the projected changes compared to that from the 24 models. Selecting a small sample of models using the rankings based on JJA Prcp, Tas, and ϕ yields overall larger uncertainty ranges for the projected changes in drought yearly frequency than the median of the bootstrap analysis. Conversely, the ranking from JJA ET consistently produces a lower uncertainty range. For the changes in drought severity, all the rankings lead to lower uncertainty ranges for most of the model samples, although the rankings from JJA Prcp and ET approach the median value from the bootstrap for samples larger than 13 models. For the changes in heat wave frequency, all the rankings consistently yield higher uncertainty ranges than the bootstrap median. However, they all lie close to the bootstrap median when analyzing the projected changes in heat wave severity.
This analysis suggests that selecting small subsets of the CMIP5 models will most likely artificially reduce the uncertainty range of the projections in question (Knutti et al. 2010) regardless of how the models are chosen. It also reiterates the challenge of developing consistent model rankings (e.g., Gleckler et al. 2008), even with a particular application in mind (in this case, to study droughts and heat waves). Here we show that even when historical biases in hydroclimatic variables account for some of the variability of projections across models, it is not enough to generate consistent model rankings that can constrain the projections’ uncertainty ranges.
Without being able to determine the “best” models in a logical and rigorous way, it might be more appropriate to span the full range of model uncertainty, as we know it. This would allow for a more accurate characterization of the potential impacts of climate change. As shown by Fig. 9, it is possible in some cases to increase the likelihood of matching the full uncertainty range using a large enough subset (e.g., 10 models). However, the uncertainty may still be under- or overestimated depending on the subset. Studies that use a small number of climate models chosen arbitrarily should be cautious in their conclusions, since they are likely underestimating the range of possible outcomes resulting from climate change by artificially selecting a small subset. Model uncertainty is an important component of the overall uncertainty estimates of climate change both at short and long time scales, so it should not be neglected by arbitrarily choosing a small number of models.
While there are more models available in the CMIP5 archive than the 24 that were analyzed here, they were not selected because they did not report soil moisture content at different layers, had a total soil column deeper than 2.5 m, or were not readily available from the CMIP5 data portal. The selection of 24 models may underestimate the full uncertainty range from the CMIP5 models, as indicated by the subsampling experiments. Nevertheless this is likely to be small since there are decreasing marginal returns in added uncertainty as more models are added after around 10–15 models (Knutti et al. 2010; Ferro et al. 2012). A key question is whether the full CMIP5 ensemble of models represents the true uncertainty, or whether further diversity in the models is needed in terms of which processes are represented and how (Tebaldi and Knutti 2007; Knutti et al. 2010).
Linear relationships were found between historical biases and future projections, although the percentage of variance explained was relatively low for most variables and regions. This shows that using the climatologies of hydroclimatic variables to generate model rankings is not effective enough to reduce the uncertainty ranges, since there are many other factors involved. Moreover, the historical biases considered here were calculated from the limited time period of 1979–2005 that spans 27 years, so decadal variability is not fully captured by these climatologies leading to uncertainty in the calculated biases.
Nevertheless, the relationships of the historical biases on the models’ future projections also show that simply removing the historical bias from future projections data will not be enough to remove the effects that a model’s historical biases has on its resulting projections. More advanced statistical bias correction methodologies (e.g., Li et al. 2010; Hagemann et al. 2011) take into account the full distribution of the variables using quantile matching. However, future bias correction studies should also take into account the relationships between historical biases and projected changes that were explored here.
c. The role of land–atmospheric coupling
We show that there are significant biases across models in our chosen coupling metric that manifest in misrepresentation of whether a region is water limited or radiation limited as well as the magnitude of evapotranspiration. This study agrees with previous ones that have found that L–A coupling may intensify in the future over a large part of the United States (Dirmeyer et al. 2013a,b). Depending on the main control of evapotranspiration in a region, the effect of strengthening L–A coupling would be different. For example, the projected increase in coupling strength in the Southeast and the Crop Area, which are already water limited, could help drive the increase in drought persistence and severity. It could also lead to higher local increases in near-surface air temperature, leading to more frequent and intense heat waves and compound events. These potential increases in extreme events pose high dangers to future agriculture in the region.
This study quantified the biases of 24 CMIP5 models for precipitation, evapotranspiration, near-surface air temperature, and land–atmospheric coupling over the United States. The ensemble of models tends to be biased wet and cool in most of the country and dry and warm in the Southeast for 1979–2005. These biases were linked to projected changes in the climatologies of hydrometeorological variables and extreme events under the RCP8.5 scenario by the end of the twenty-first century. The wetter the models are during the historic period, the wetter they tend to project the end of the century to be due to larger increases in precipitation, and vice versa. This study finds stronger relationships between historical biases over the United States, compared to the results of Knutti et al. (2010), carried out at a global scale. However, in most cases the relationships found in this work only accounted for a small fraction of the observed variance across models.
Most models agree on a general drying trend in soil moisture by the end of the twentieth century, and therefore more frequent and severe droughts are expected in the future. There is a wide range of projected changes that were often inversely correlated with historical biases in precipitation, such that wetter (drier) models projected smaller (larger) changes in drought characteristics. However, changes in DJF Prcp were significantly correlated with changes in droughts (not shown), but few relationships were found between this and other changes or with the historical biases, showing that there are other factors involved in the projected changes in droughts. All models show a positive shift in near-surface air temperature toward higher temperatures by the end of the century. Given these changes, all models project increases in heat wave frequency and severity, with large uncertainty across models. To a lesser degree, this range of projected changes in heat wave characteristics was also related to historical biases in near-surface air temperature.
This work has reiterated the challenge of constraining the uncertainty of future projections of droughts and heat waves. Here the focus was on the United States, although it is likely that similar results would be obtained for other regions. There are, however, some changes with which most of the models in this study agree: there will be more frequent and severe droughts in the Southwest and the Southeast, and heat waves throughout the United States by the end of the century if we follow the path given by the RCP8.5. The uncertainty lies mainly in the magnitude of these changes, rather than on their direction. Further attempts to constrain model uncertainty may focus instead on model performance at the process level, providing more insights into the origins of biases in climatologies used here. In the meantime, until a robust methodology to rank climate models is developed, researchers should aim to include more climate models in their impacts studies to characterize the possible range of projections more accurately.
We thank the members of the Terrestrial Hydrology Research Group at Princeton University, in particular Dr. Niko Wanders, and three anonymous reviewers for their comments, which helped improve this study. This work was supported by the NOAA Climate Program Office (NA11OAR4310097 and NA15OAR4310091) and the NASA Headquarters under the NASA Earth and Space Science Fellowship Program (NNX14AL08H).