This is the second part of a three-part paper on North American climate in phase 5 of the Coupled Model Intercomparison Project (CMIP5) that evaluates the twentieth-century simulations of intraseasonal to multidecadal variability and teleconnections with North American climate. Overall, the multimodel ensemble does reasonably well at reproducing observed variability in several aspects, but it does less well at capturing observed teleconnections, with implications for future projections examined in part three of this paper. In terms of intraseasonal variability, almost half of the models examined can reproduce observed variability in the eastern Pacific and most models capture the midsummer drought over Central America. The multimodel mean replicates the density of traveling tropical synoptic-scale disturbances but with large spread among the models. On the other hand, the coarse resolution of the models means that tropical cyclone frequencies are underpredicted in the Atlantic and eastern North Pacific. The frequency and mean amplitude of ENSO are generally well reproduced, although teleconnections with North American climate are widely varying among models and only a few models can reproduce the east and central Pacific types of ENSO and connections with U.S. winter temperatures. The models capture the spatial pattern of Pacific decadal oscillation (PDO) variability and its influence on continental temperature and West Coast precipitation but less well for the wintertime precipitation. The spatial representation of the Atlantic multidecadal oscillation (AMO) is reasonable, but the magnitude of SST anomalies and teleconnections are poorly reproduced. Multidecadal trends such as the warming hole over the central–southeastern United States and precipitation increases are not replicated by the models, suggesting that observed changes are linked to natural variability.
This is the second part of a three-part paper on phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) model simulations for North America. This second part evaluates the CMIP5 models in their ability to replicate the observed variability of North American continental and regional climate, and related climate processes. Sheffield et al. (2013, hereafter Part I) evaluate the representation of the climatology of continental and regional climate features. The third part (Maloney et al. 2013, manuscript submitted to J. Climate hereafter Part III) describes the projected changes for the twenty-first century.
The CMIP5 provides an unprecedented collection of climate model output data for the assessment of future climate projections as well as evaluations of climate models for contemporary climate, the attribution of observed climate change, and improved understanding of climate processes and feedbacks. As such, these data contribute to the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) and other global, regional, and national assessments.
The goal of this study is to provide a broad evaluation of CMIP5 models in their depiction of North American climate variability. It draws from individual work by investigators within the CMIP5 Task Force of the U.S. National Oceanic and Atmospheric Administration (NOAA) Modeling Analysis and Prediction Program (MAPP) and is part of a Journal of Climate special collection on North America in CMIP5. We draw from individual papers within the special issue, which provide more detailed analysis that can be presented in this synthesis paper.
We begin in section 2 by describing the CMIP5, providing an overview of the models analyzed, the historical simulations, and the general methodology for evaluating the models. Details of the main observational datasets to which the climate models are compared are also given in this section. The next five sections focus on different aspects of North American climate variability, organized by the time scale of the climate feature. Section 3 covers intraseasonal variability with focus on variability in the eastern Pacific Ocean and summer drought over the southern United States and Central America. Atlantic and east Pacific tropical cyclone activity is evaluated in section 4. Interannual climate variability is assessed in section 5. Decadal variability and multidecadal trends are assessed in sections 6 and 7, respectively. Finally, the results are synthesized in section 8.
2. CMIP5 models and simulations
a. CMIP5 models
We use data from multiple model simulations of the “historical” scenario from the CMIP5 database. The CMIP5 experiments were carried out by 20 modeling groups representing more than 50 climate models with the aim of further understanding past and future climate change in key areas of uncertainty (Taylor et al. 2012). In particular, experiments have been focused on understanding model differences in clouds and carbon feedbacks, quantifying decadal climate predictability and why models give different answers when driven by the same forcings. The CMIP5 builds on the previous phase [phase 3 of CMIP (CMIP3)] experiments in several ways. First, a greater number of modeling centers and models have participated. Second, the models are more comprehensive in terms of the processes that they represent and are run at higher spatial resolution, therefore hopefully resulting in better skill in representing current climate conditions and reducing uncertainty in future projections. Table 1 provides an overview of the models used. The specific models used vary for each individual analysis because of data availability at the time of this study, and so the model names are provided within the results section where appropriate.
b. Overview of methods
Data from the historical CMIP5 scenario are evaluated, which is a coupled atmosphere–ocean mode simulation that is forced by historical estimates of changes in atmospheric composition from natural and anthropogenic sources, volcanoes, greenhouse gases, and aerosols, as well as changes in solar output and land cover. Historical scenario simulations were carried out for the period from the start of the industrial revolution to near present: 1850–2005. Our evaluations are generally carried out for the last 30 yr of the simulations, depending on the type of analysis and the availability of observations. For some analyses the only, or best available, data are from satellite remote sensing which restricts the analysis to the satellite period, which is generally from 1979 onward. In other cases the observational data are very uncertain for particular regions and time periods (e.g., precipitation in high latitudes in the first half of the twentieth century) and this is noted in the relevant subsection. For other analyses, multiple observational datasets are available and are used to capture the uncertainty in the observations. The observational datasets are summarized in Table 2 and further details of the datasets and data processing are given in the relevant subsections and figure captions. Where the comparisons go beyond 2005 (e.g., 1979–2008), data from the model representative concentration pathway 8.5 (RCP8.5) future projection scenario simulation (as this is regarded as closest to the business as usual trajectory) are appended to the model historical time series. About half the models have multiple ensemble members, but we select the first ensemble member for simplicity and discuss the variability in the results across the ensemble where appropriate.
3. Tropical intraseasonal variability
a. MJO-related variability over the eastern Pacific and adjoining regions
It has been well documented that convection over the eastern Pacific (EP) ITCZ and neighboring areas is characterized by pronounced intraseasonal variability (ISV) during boreal summer (e.g., Knutson and Weickmann 1987; Kayano and Kousky 1999; Maloney and Hartmann 2000a; Maloney and Esbensen 2003, 2007; de Szoeke and Bretherton 2005; Jiang and Waliser 2008, 2009; Jiang et al. 2011). ISV over the EP exerts broad impacts on regional weather and climate phenomena, including tropical cyclone activity over the EP and the Gulf of Mexico, the summertime gap wind near the Gulfs of Tehuantepec and Papagayo, the Caribbean low-level jet and precipitation, the midsummer drought over Central America and Mexico, and the North American monsoon (e.g., Magaña et al. 1999; Maloney and Hartmann 2000b,a; Maloney and Esbensen 2003; Lorenz and Hartmann 2006; Serra et al. 2010; Martin and Schumacher 2011).
Here, model fidelity in representing ISV over the EP and intra-American sea (IAS) region is assessed by analyzing daily output of rainfall and 850-hPa winds from 18 CMIP5 models. Figure 1 displays a Taylor diagram for summer-mean (May–September) precipitation from the CMIP5 models over the EP domain (5°S–30°N, 150°–80°W) compared to the TMPA precipitation (see Table 2 for expanded dataset names). While the two HadGEM models (HadGEM2-CC and HadGEM2-ES; see Table 1 for expanded model names) display the highest pattern correlations (~;0.93), the MRI-CGCM3 show the smallest RMS because of its better skill in simulating the spatial standard deviations of summer-mean rainfall over the EP. In addition, four models (MPI-ESM-LR, CSIRO Mk3.6.0, CanESM2, and CNRM-CM5) also exhibit relatively better pattern correlations than other models.
The leading ISV modes over the EP based on observed and simulated rainfall fields are identified using a complex empirical orthogonal function (CEOF) approach (Maloney et al. 2008). CEOF analyses were applied to 30–90-day bandpass filtered daily rainfall anomalies and the spatial amplitude and phase for the first CEOF mode (CEOF1) based on TMPA are illustrated in Figs. 2a,b. A single ensemble member was used for each model for 1981–2005. The TMPA data are available for a shorter time period (13 yr), but the sensitivity of the results to different sample sizes (based on data from a selected model) was found to be small. Similar to Maloney et al. (2008), the maximum amplitude of the observed rainfall CEOF1 occurs over the far eastern part of the EP. Figure 2b illustrates the pattern of spatial phase of observed rainfall CEOF1. In agreement with previous studies, the observed leading ISV mode associated with the CEOF1 largely exhibits an eastward propagation, while a northward component is also evident (e.g., Jiang and Waliser 2008; Maloney et al. 2008; Jiang et al. 2011).
Next, the fidelity of the CMIP5 models in simulating the leading EP ISV mode is assessed by calculating pattern correlations of the simulated rainfall CEOF1 against observations. To increase sampling, spatial patterns of rainfall anomalies associated with the CEOF1 based on both observations and model simulations are derived at two quadratic phases by multiplying the CEOF1 amplitude by the cosine and sine of spatial phase at each grid point, respectively. The pattern correlations are then calculated at both of these two quadratic phases. A final pattern correlation for a particular model is derived by averaging these two pattern correlation coefficients. Figure 2c illustrates pattern correlations in depicting the CEOF1 rainfall pattern for each model simulation versus domain-averaged CEOF1 amplitude relative to observations, which provide measures of model performance of variability in space and time, respectively. A majority of the CMIP5 models tend to underestimate the amplitude of the leading EP ISV mode associated with the rainfall CEOF1, except CNRM-CM5, MIROC5, MPI-ESM-LR, HadGEM2-CC, and HadGEM2-ES. Among the 18 models examined, 8 models exhibit relative higher pattern correlations (>0.75).
The models with relative better skill in representing the leading EP ISV mode also largely exhibit better skill for summer-mean rainfall (cf. Figs. 1, 2c) and 850-hPa wind patterns (not shown). A common feature among the more skillful models is the presence of westerly or very weak easterly mean low-level winds over the EP warm pool region, as in the observations. Most of the models with relatively lower skill exhibit a stronger easterly summer-mean flow (>4 m s−1). This suggests that realistic representation of the mean state could be crucial for improved simulations of the EP ISV, which is in agreement with a recent study by Rydbeck et al. (2013), and has also been discussed for Madden–Julian oscillation (MJO) simulations over the western Pacific and Indian Ocean (e.g., Kim et al. 2009). One hypothesis is that a realistic mean state produces the correct sign of surface flux anomalies relative to intraseasonal precipitation, which helps to destabilize the local intraseasonal disturbance (e.g., Maloney and Esbensen 2005). Extended analyses of the EP ISV in CMIP5 models are given in Jiang et al. (2012).
b. Midsummer drought over Central America
The rainy season in Central America and southern Mexico spans roughly May through October. For most of the region, the precipitation climatology features maxima in June and September and a period of reduced rainfall during July–August known as the midsummer drought (MSD; Portig 1961; Magaña et al. 1999). The MSD is regular enough to be known colloquially and plays an important role in farming practices (Osgood et al. 2009). A previous assessment of CMIP3 model performance at simulating the MSD and future projections (Rauscher et al. 2008) suggested that many models are capable of simulating the MSD despite an overall dry bias and that the MSD is projected to become stronger with an earlier onset. In this section, the CMIP5 performance at simulating summertime precipitation and the MSD is evaluated. We evaluate 23 CMIP5 models against the TMPA, GPCP, and UNAM observational datasets. A simple algorithm for detecting and quantifying the climatological MSD is used that does not assume a priori which months are maxima and which months constitute the MSD (Karnauskas et al. 2012).
Figure 3 shows the observational and CMIP5 estimates of the MSD and highlights the large uncertainties in its spatial distribution among observational datasets. The CMIP5 multimodel ensemble (MME) does reasonably well at representing the essence of the MSD over much of the inter-Americas region. The maximum strength of the MSD in the MME is found just offshore of El Salvador and represents a midsummer precipitation minimum that is ~;2.5 mm day−1 less than the early and late summer peaks. Significant differences in the location and strength of the MSD between the various observational datasets preclude a definitive evaluation of the CMIP5 MME, but it is clear that the strength of the MSD is underestimated in some regions, including along the Pacific coast of Central America, the western Caribbean, the major Caribbean islands, and Florida. Figure 3 also shows the MME standard deviation and a histogram of the spatial correlations of individual models with the MME mean. The largest uncertainties are collocated with the regions of largest magnitude of the MSD indicating that much of the model disagreement is in the magnitude. Several models stand out as outliers in representing the spatial distribution of the MSD relative to the MME mean (Table 3), such as MIROC-ESM and MIROC-ESM-CHEM, while the Hadley Centre models do particularly well.
4. East Pacific and Atlantic tropical storm track and cyclone activity
a. Tropical storm track
The density of traveling synoptic-scale disturbances across the tropics, referred to in the literature as the tropical storm track (e.g., Thorncroft and Hodges 2001; Serra et al. 2008, 2010), is examined in this section. These systems serve as precursors to a majority of tropical storms and hurricanes in the Atlantic and eastern North Pacific and their frequency at 850 hPa over Africa and the eastern Atlantic has been shown to be positively correlated with Atlantic hurricane activity (Thorncroft and Hodges 2001). As global models better resolve these systems than tropical cyclones, they provide an advantage over direct tracking of tropical cyclones to assess model tropical storm activity (see section 4b). As in Serra et al. (2010), the tropical storm track density is calculated based on the method of Hodges (1995, 1999) using smoothed, 6-hourly, 850-hPa relative vorticity. Only positive vorticity centers with a minimum threshold of 0.5 × 10−6 s−1 that persist for at least 2 days and have tracks of at least 1000 km in length are included in the analysis. This method primarily identifies westward moving disturbances such as easterly waves (e.g., Serra et al. 2010), although more intense storms that could potentially reach hurricane intensity are not excluded. We analyze a single ensemble member from nine CMIP5 models and compared the track statistics to the ERA-Interim (Fig. 4, left). These models were selected based on whether the 6-hourly pressure level data were available at the time of the analysis. Mean track strength, the mean of the smoothed 850-hPa vorticity along the track, is also examined (Fig. 4, right).
The multimodel mean track density is in good agreement with ERA-Interim; however, significant differences are seen with the individual models. The most apparent discrepancies are with the BCC-CSM1.1, CanESM2, and CCSM4 models, which strongly overestimate activity across the eastern Pacific and suggest a more longitudinally oriented track (CanESM2 and CCSM4) shifted south from what is observed. BCC-CSM1.1, HadGEM2-ES, and MIROC5 underestimate tracks in the west Atlantic, while GFDL-ESM2M underestimates tracks throughout the region except near 130°W. MPI-ESM-LR also underestimates tracks across the region as well as shifts their location southward. The track density maximum off the west coast of Mexico is best captured by HadGEM2-ES, while the overall smallest magnitude differences are seen with CNRM-CM5. The multimodel mean track strength maximum in the eastern Pacific lies along the west coast of Mexico similar to ERA-Interim; however, it is broader in scale and of larger magnitude than the observations (Fig. 4, left). On the other hand, the multimodel mean strength in the Gulf of Mexico and western Atlantic along the East Coast of the United States is strongly underestimated compared to ERA-Interim. Unlike for track density, these biases are fairly consistent among the models, with the exception of BCC-CSM1.1, which strongly overestimates mean strength across the region.
To better understand the biases in mean track density and strength, we examine the spatial correlations of 850- and 500-hPa winds and heights, as well as track density and strength with ERA-Interim. While all nine models have relatively good spatial correlations in the wind components and heights at 500 hPa (not shown), there is a wide spread in performance at the 850-hPa level that corresponds reasonably well with the rankings for the combined track density and strength correlations (Table 4). In particular, the top two models for the combined 850-hPa wind and height correlations (CNRM-CM5 and HadGEM2-ES) are also among the highest ranked for the combined track density and strength correlations. On the other hand, CanESM2 has a high ranking in the combined 850-hPa index but is one of the poorer models with respect to track density and spatial correlations, suggesting that there are other important factors contributing to the track statistics than just the large-scale low-level heights and winds across the region.
b. Tropical cyclones in the North Atlantic and eastern North Pacific
It is well known since the 1970s that climate models are able to simulate tropical cyclone-like storms (e.g., Manabe et al. 1970; Bengtsson et al. 1982), which are generally formed at the scale of the model grid when conditions are unstable enough and other factors, such as vertical wind shear, are favorable. As the resolution of the climate models increases, the modeled storm characteristics become more realistic (e.g., Zhao et al. 2009). Analysis of CMIP3 models showed that the tropical cyclone-like storms produced still had many biases common of low-resolution models (Walsh et al. 2010). Therefore, various dynamical and statistical techniques for downscaling tropical cyclone activity using only the CMIP3 large-scale variables have been employed (Emanuel et al. 2008; Knutson et al. 2008). Recent studies suggest that when forced by observed SSTs and sea ice concentration, a global atmospheric model with a resolution ranging from 50 to 20 km can simulate many aspects of tropical cyclone (TC)–hurricane frequency variability for the past few decades during which reliable observations are available (e.g., Oouchi et al. 2006; Bengtsson et al. 2007; Zhao et al. 2009). The success is not only a direct evaluation of model capability but also an indication of the dominant role of SST variability on TC–hurricane frequency variability. When assuming a persistence of SST anomalies, some of the models were also shown to exhibit significant skill in hurricane seasonal forecast (e.g., Zhao et al. 2010; Vecchi et al. 2011).
Tropical storms and cyclones in this study are identified using the tracking method of Camargo and Zebiak (2002), which uses low-level vorticity, surface winds, surface pressure, and atmospheric temperature and considers only warm core storms. The method uses model-dependent (and resolution) thresholds and storms have to last at least 2 days. Only a subset of the tropical disturbances examined in the previous section will intensify enough to be identified by this tracking method and the percentage that this occurs will vary among different models. As will be shown, the CMIP5 standard models have trouble simulating the number of tropical cyclones, which can be attributed in part to their coarse resolution. Therefore, we also show results from the GFDL high-resolution model.
TC-type structures were tracked in five models for 1950–2005. We compare with observations from best-track datasets of the National Hurricane Center (Fig. 5). The number of TCs in all models is much lower than in observations, which is common to many low-resolution global climate models (e.g., Camargo et al. 2005, 2007). The HadGEM2-ES has the largest low bias, and the MPI-ESM-LR model has the most realistic tracks in the Atlantic basin. The MRI-CGCM3 model tracks in the Atlantic are mostly in the subtropical region, with very few storms in the deep tropics. In contrast, in the eastern North Pacific the MRI-CGCM3 has storm activity too near the equator. In the eastern North Pacific, very few storms (in all models) have westward tracks. The models seem to have an easier time in producing storms that are in the northwestward direction parallel to the Central American coast.
Figure 6 shows the mean number of TCs per month for the North Atlantic and eastern North Pacific. In some cases, the models produce too many storms in the offseason, while all models produce too few storms in the peak season. The bottom panels show the spread of the number of storms per year, emphasizing the low number of storms per year in all models. The highest-resolution model MRI-CGCM3 (1.1° × 1.1°) has the least bias relative to the observations and the highest bias is for the coarsest-resolution model (GFDL-ESM2M; 2.5° × 2.0°). However, resolution cannot explain the rankings for all models, with the HadGEM2-ES and MPI-ESM-LR models having relatively large and small biases, respectively, despite both having intermediate resolutions. The model dynamical core, convection scheme and their interactions are other factors that have been shown to be important (Camargo 2013). Examination of variability across ensemble members in producing tropical cyclones was carried out for five member runs of the MRI-CGCM3 model (not shown) but was much less than among different models.
Figure 7 shows results for the GFDL-C180-HIRAM model, which has a higher resolution (~;50 km) than the standard coupled GFDL CM3 model and differs in some aspects of the physics such as the convection scheme. The model was run for a CMIP5 timeslice experiment forced by observed interannually and seasonally varying SSTs and sea ice concentration from HadISST (I. M. Held et al. 2013, unpublished manuscript). The tracking algorithm of Zhao et al. (2009) was used to identify TCs with near-surface wind speed reaching hurricane intensity. The model reproduces the observed statistics with the ratio of observed to model variances of interannual variability in both the North Atlantic and eastern Pacific not statistically different from one, according to an F test at the 5% significance level that assumes that the annual frequencies are normally distributed. Figure 7 also shows that the model captures the observed seasonal cycle in both the North Atlantic and eastern Pacific. The model can also reproduce the observed seasonal cycle in the North Atlantic and eastern Pacific as well as the observed year-to-year variation of annual hurricane counts and the decadal trend for both basins for this period (Zhao et al. 2009; I. M. Held et al. 2013, unpublished manuscript). The quality of the model's present-day simulation increases our confidence in the future projections, although the uncertainty in the projections is dominated by uncertainty in projected changes in SST boundary conditions across the CMIP5 standard-resolution models (Part III). Although not analyzed here, MIROC4h has a similar spatial resolution (0.56°) to C180-HIRAM. Evaluations by Sakamoto et al. (2012) show that MIROC4h can reproduce the global number of TCs, in part because of realistic SSTs, but severely underestimates the frequency in the North Atlantic, suggesting that higher model resolution is necessary but not sufficient to reproduce observed frequencies.
5. Interannual to decadal variability
The El Niño–Southern Oscillation (ENSO) is the most important driver of global climate variability on interannual time scales. It impacts many regions worldwide through climate teleconnections (Ropelewski and Halpert 1987), which link the tropical Pacific to higher latitudes through shifts in midlatitude weather patterns. The impact of ENSO on North American climate is felt most strongly in the wintertime, with El Niño events bringing warmer temperatures to much of the northern part of the continent and wetter conditions in the southern United States and northern Mexico. La Niña events tend to bring drier weather to the southern United States. Evaluation of the ability of CMIP5 models to simulate ENSO is carried out for several aspects of ENSO variability and for teleconnections with North American climate.
1) Evaluation of ENSO teleconnections
We examine how well the historical simulations of CMIP5 models reproduce the composite near-surface air temperature (SAT) and precipitation patterns over North America during El Niño and La Niña episodes. In both model and observed data, we define ENSO episodes similarly to the Climate Prediction Center (CPC). A monthly ENSO index is calculated from detrended and high-pass filtered SSTs over the Niño-3.4 region (5°S–5°N, 170°–120°W) from ERSST.v3b observations and CMIP5 models. An El Niño (La Niña) episode is defined as any sequence of months where the 3-month running mean Niño-3.4 SST is >0.5°C (<−0.5°C) for at least 5 consecutive 3-month running seasons.
In observations, approximately 90% of El Niño and 89% of La Niña episodes feature peak amplitudes in fall or winter. In the CMIP5 ensemble of the historical simulations, however, only 68% of El Niño and 65% of La Niña episodes have peak amplitudes in fall or winter, although several of the models (CanESM2, CNRM-CM5, HadCM3, and NorESM1-M) do have fall–winter peak frequencies exceeding 80% for both El Niño and La Niña episodes. This finding suggests that CMIP5 models do not fully reproduce the phase locking of ENSO to the seasonal cycle, a deficiency noted in CMIP3 models as well (Guilyardi et al. 2009). The following analysis focuses on those episodes that do peak in fall or winter. In the ensemble mean, the frequency of ENSO episodes and the mean peak amplitude are similar to observed values (not shown).
Because the dynamics of extratropical ENSO teleconnections are tied to upper-tropospheric processes and because these teleconnections are strongest during boreal winter, we examine how well CMIP5 models reproduce the December–February (DJF) composite 300-hPa geopotential height patterns in the NCEP–NCAR reanalysis. In addition, we attempt to identify what characteristics distinguish higher from lower performance models, where performance is based on the El Niño (La Niña) composites of all height fields for which the detrended Niño-3.4 SST anomaly is >0.5°C (<−0.5°C). The high performance models are defined as those with a pattern correlation that exceeds 0.6 and an RMS difference less than 13 m between the model and observed composites for both El Niño and La Niña (Fig. 8). This subjective partitioning is used as a means of discerning general properties that distinguish higher from lower performance models. Overall, 10 (11) models are characterized as high (low) performance based on these criteria.
Figure 9 shows the composites of 300-hPa geopotential height, SAT, precipitation, and tropical SST for El Niño. The corresponding composites for La Niña (not shown) are quite similar but of opposite sign. The higher performance ensemble performs rather well in capturing the basic El Niño geopotential height, SAT, and precipitation teleconnections over the North Pacific and North America, with the exception being the failure to capture the negative precipitation anomaly in the Tennessee and Ohio valleys. The lower performance ensemble features a much weaker teleconnection pattern and an Aleutian low anomaly that is shifted about 10° too far west. The composite El Niño SST anomalies (Figs. 2k,l), however, are quite similar.
To gain insight into possible reasons for the discrepancies between the higher and lower performance ensemble, Fig. 10a shows composite differences in tropical precipitation. The higher performance ensemble exhibits much higher precipitation anomalies in the central and eastern equatorial Pacific Ocean, which suggests that the enhanced convection in these regions could help to explain the stronger and eastward shifted teleconnection pattern relative to the lower performance ensemble. This enhanced convection may be explained in part by stronger SST anomalies in the higher performance ensemble (Fig. 10b), but most of the large precipitation differences actually occur where the SST anomaly differences are quite small. Instead, a more significant difference appears to be the difference in SST climatology, as the lower performance ensemble exhibits climatological SSTs more than 1°C cooler than the high performance ensemble over the eastern Pacific cold tongue region (Fig. 10c). Indeed, the lower performance ensemble features a negative SST climatology bias of more than 1.5°C in the equatorial central Pacific (Fig. 10e), where the El Niño convection anomalies generally are strongest. The bias for the higher performance ensemble in this region (Fig. 10d) is much weaker. Thus, in the lower performance ensemble, the convection anomalies in the eastern Pacific likely are too insensitive to ENSO SST anomalies because the climatological SSTs are too low. This finding suggests that simulation of ENSO teleconnections in some climate models might benefit from improving climatological SSTs rather than interannually varying ENSO SST anomalies. As discussed in Li and Xie (2012), tropical SST biases in CMIP models are linked to model errors in cloud cover and ocean dynamics, with equatorial cold tongue biases closely tied to errors in thermocline depth and upwelling.
2) East Pacific–central Pacific ENSO and teleconnections with U.S. winter surface air temperature
It has been increasingly recognized that different types of ENSO occur in the tropical Pacific (e.g., Wang and Weisberg 2000; Trenberth and Stepaniak 2001; Larkin and Harrison 2005; Yu and Kao 2007; Ashok et al. 2007; Kao and Yu 2009; Kug et al. 2009). Two particular types that have been emphasized are the EP type that produces SST anomalies near the South America coast and the central Pacific (CP) type that produces anomalies near the international date line. While the EP ENSO is the conventional type of ENSO, the CP ENSO has gradually increased its occurrence during the past few decades (e.g., Lee and McPhaden 2010). Recent observational studies have indicated that the impacts produced by these two types of ENSO on North American climate can be different (e.g., Mo 2010; Yu et al. 2012; Yu and Zou 2013). Here the ENSO teleconnection over the United States simulated in the CMIP5 models are further examined according to the ENSO type. Following Kao and Yu (2009) and Yu and Kim (2010), a regression-EOF analysis is used to identify the CP and EP types from monthly SSTs. The SST anomalies regressed with the Niño-1+2 SST index were removed before the EOF analysis was applied to obtain the spatial pattern of the CP ENSO. Similarly, we subtracted the SST anomalies regressed with the Niño-4 SST index before the EOF analysis was applied to identify the leading structure of the EP ENSO. The principal components of the leading EOF modes represent the ENSO strengths and are defined as the CP ENSO index and the EP ENSO index. The observed winter (DJF) SAT anomalies regressed to these two indices are different over the United States (Fig. 11) with a warm northeast to cold southwest pattern for the EP El Niño and a warm northwest to cold southeast pattern for the CP El Niño. Adding these two impact patterns together results in a pattern that resembles the well-known warm north–cold south pattern of El Niño impact. The robustness of these two different impact patterns has been examined in Yu et al. (2012) using numerical model experiments and case studies. They showed that impact patterns similar to those shown in Fig. 11 can be reproduced in two ensemble AGCM experiments forced separately by the EP and CP ENSO SST anomalies (see their Fig. 1). The regressed impact patterns can also be identified in U.S. winter temperature anomalies during the four strongest EP El Niño events (i.e., 1997/98, 1982/83, 1972/73, and 1986/87) and three of the four strongest CP El Niño events (i.e., 2009/10, 1957/58, and 2002/03).
We repeated the EOF and regression analyses to evaluate how well the CMIP5 models reproduce the different U.S. impacts to the two types of ENSO, while recognizing the uncertainty in the observational impacts due to the limited number of events in the observational record. The regressed winter SAT anomaly patterns calculated from 22 CMIP5 models are shown in Fig. 11. The observed patterns are well simulated by some models, such as the MIROC5 and MRI-CGCM3 for the EP ENSO and the NorESM1-M and HadGCM2-ES for the CP ENSO. However, some models show an impact pattern that is almost opposite to that observed, such as HadCM3 for the CP ENSO and INM-CM4.0 for the EP ENSO. To quantify how well the impact patterns are simulated, pattern correlation coefficients were calculated between the model regressed patterns and the NCEP regressed patterns. As shown in Fig. 12a, there is a cluster of 11 CMIP5 models (CSIRO Mk3.6.0, GFDL CM3, GFDL-ESM2G, GFDL-ESM2M, HadGEM2-CC, HadGEM2-ES, IPSL-CM5A-MR MIROC5, MPI-ESM-LR, MPI-ESM-P, and NorESM1-M) that have higher pattern correlation coefficients for both the EP ENSO and the CP ENSO than the rest of the models. This group of the CMIP5 models is considered as the models whose regressed U.S. winter temperature patterns are close to the observed patterns for the two types of ENSO. We also examine in Fig. 12b the intensities of the simulated EP and CP ENSO events, which are determined using an EOF-regression method (Yu and Kim 2010; Kim and Yu 2012). Models with realistically strong events are identified using the lower limit of the 95% confidence interval of the observed intensities (using an F test) as the criteria (0.78°C for EP and 0.51°C for CP). Based on these criteria, 10 of the 22 models simulate both EP and CP ENSO events with realistically strong intensities. Interestingly, 9 of these models are also among the 11 models that realistically produce U.S. winter temperature patterns for the two types of ENSO. Therefore, at least 9 out of 22 models can more realistically produce the two types of ENSO with higher intensities and their different impacts on U.S. winter temperatures: GFDL CM3, GFDL-ESM2G, GFDL-ESM2M, HadGEM2-CC, HadGEM2-ES, MIROC5, MPI-ESM-LR, MPI-ESM-P, and NorESM1-M.
3) ENSO warm–cold events asymmetry
ENSO asymmetry refers to the fact that the two phases of ENSO are not mirror images of each other (Burgers and Stephenson 1999). The asymmetry shows up in both the surface and subsurface fields (Rodgers et al. 2004; Schopf and Burgman 2006; Sun and Zhang 2006; Zhang et al. 2009). Causes for such an asymmetry are not yet clearly understood, but accumulating evidence suggests that it is likely a consequence of nonlinearity of ocean dynamics (Jin et al. 2003; Sun 2010, Liang et al. 2012). Asymmetry is also linked to the time-mean effect of ENSO (Sun and Zhang 2006; Schopf and Burgman 2006; Sun 2010; Liang et al. 2012). Understanding the causes and consequences of ENSO asymmetry may hold the key to understanding decadal variability in the tropics and beyond (Rodgers et al. 2004; Sun and Yu 2009; Liang et al. 2012). Figure 13 shows the sum of the SST anomalies between the warm and cold phases of ENSO from HadISST observations and CMIP5 models. The threshold value used for defining the warm and cold phase anomalies is set as +0.5° and −0.5°C, respectively. This sum has also been called the SST anomaly residual and has been a common measure of the ENSO asymmetry in the SST field. All models underestimate the observed positive SST residual (and therefore the asymmetry) over the eastern Pacific. Measured by the skewness of Niño-3 SST anomalies (which is a more customary measure of asymmetry), all the models also underestimate the observed ENSO asymmetry (Fig. 14). The figure also shows that the stronger variability of ENSO (measured by variance) does not guarantee a stronger asymmetry in ENSO (measured by skewness).
Lack of ENSO asymmetry remains a common bias in climate models that has continued since CMIP3 (van Oldenborgh et al. 2005) with implications for simulating tropical decadal variability. The causes are of current debate, but recent results indicate that it is related to the mean state and the excessive cold tongue in the models (D.-Z. Sun 2013, unpublished manuscript), which was also noted in CMIP3 models (Y. Sun et al. 2013), although there is evidence that the mean state could in turn be determined by the statistics of ENSO via nonlinearities in the system (Sun and Zhang 2006; Sun 2010; Liang et al. 2012; D.-Z. Sun et al. 2013, manuscript submitted to J. Climate; Ogata et al. 2013). On other hand, both the bias in the mean state and the bias in the asymmetry may be a consequence of a more fundamental reason: a weak thermal forcing relative to the dissipation (Sun 2000; Liang et al. 2012). Together, these results raise the question whether the coupled tropical system in the models is in a different dynamical regime to reality (Sun and Bryan 2010).
b. Persistent droughts and wet spells over Great Plains and the southern-tier states
Persistent dry and wet summers are features of the U.S. Great Plains and southern United States. We evaluate how the CMIP5 models describe the processes that cause such persistent anomalies in terms of low-level circulation and moisture flux anomalies by comparing with the NCEP–NCAR reanalysis. This complements the evaluations of the average seasonal circulation in the region, such as the low-level southerly jet as shown in Part I. Persistent wet and dry summers are defined by June–August (JJA) precipitation anomalies averaged over the Great Plains region from 90° to 105°W and from 30° to 50°N during 1971–2000. Wet (dry) summers are identified as having normalized JJA precipitation larger (smaller) than 0.6 (−0.6) standard deviation. The reanalysis data identify 8 wet and 7 dry summers in 1971–2000, and the models identify between 7 and 12 wet or dry events, depending on the model. We show the composites of vertically integrated moisture from the surface to top of the troposphere, the 850-hPa geopotential height, and near-surface winds at 925 hPa for the wet and dry summers and their differences for the reanalysis (Fig. 15) and for a single model, CCSM4, as an example (Fig. 16).
Comparison of the two figures indicates some similarities but also very different processes causing the persistent wet or dry summers. The integrated moisture fluxes in both datasets indicate high moisture in an averaged cyclonic rotation in the troposphere in persistent wet summers (Figs. 15a, 16a) but anticyclonic rotation in dry summers (Figs. 15b, 16b) in the Great Plains. However, the sources of the moisture and the low-level dynamic structure are quite different. For the reanalysis, the convergence of moisture in the central Great Plains during wet summers results from southerly flow anomalies in the enhanced subtropical high pressure system in the North Atlantic and northerly flow anomalies in low pressure anomalies centered in the Midwest (Fig. 15d). These anomalies suggest a frontal system along the depression from the Midwest to the Southwest. A nearly reversed pattern of flow anomalies is shown during the dry summers (Figs. 15e,f). The model simulations show a different pattern of flow anomalies (Figs. 16d,e). In wet summers, the moisture is primarily from the east along the easterly and southeasterly quadrants of a high pressure anomaly center in the Great Lakes areas, instead of from the south as in the reanalysis result (Fig. 16a versus Fig. 15a). In dry summers, the model shows dry flows from the Mexican plateau off the Sierra Madre Oriental in Mexico. These contrasts are shown in Fig. 16f. The other CMIP5 models also simulate different tropospheric circulation patterns from those in the reanalysis for both wet and dry summers in the Great Plains.
Although the integrated moisture fluxes in the models resemble those in the reanalysis estimates in wet and dry summers, the sources of moisture differ considerably, suggesting that the models are not correctly representing the mechanisms that force variability in the Great Plains. Controls on summertime Great Plains precipitation have been found to depend strongly on moisture transport from the Gulf of Mexico via the Great Plains low-level jet (GPLLJ; e.g., Ruiz-Barradas and Nigam 2006; Cook et al. 2008; Weaver and Nigam 2008) whose variability in turn may be related to remote SST forcing in the Pacific (e.g., Schubert et al. 2004; Ruiz-Barradas and Nigam 2010; McCabe et al. 2008) and Atlantic (e.g., Enfield et al. 2001; Sutton and Hodson 2005; McCabe et al. 2008) with contrasting anomalies in each basin associated with extreme conditions in the Great Plains (e.g., Hoerling and Kumar 2003; Schubert et al. 2009). Some of the models have shown improvement, compared to the CMIP3 models, in simulating the GPLLJ and the seasonal transitions (see Part I), a result largely attributable to the higher spatial resolution of CMIP5 models, but most models struggle to represent observed teleconnections between precipitation and Atlantic SSTs (see section 6). Even so, the transport of moisture transport is not the whole story and local dynamic processes (e.g., Veres and Hu 2013), as well as land–atmosphere feedbacks (Ruiz-Barradas and Nigam 2006), are important to initiate and further organize regional circulations that can transform the moisture into precipitation. Notably, previous studies focused on climate models find that they tend to overestimate the role of recycled precipitation over advected moisture (e.g., Ruiz-Barradas and Nigam 2006) for the Great Plains with implications for the modeled precipitation variability.
6. Decadal variability
a. PDO and its influence on North American climate
On interdecadal time scales, variability in the tropical and extratropical North Pacific, particularly that of the Pacific decadal oscillation (PDO), has significant physical and ecological impacts over North America (Mantua et al. 1997; Higgins et al. 2000; Meehl et al. 2013). We examine the PDO and its relationships with North American temperature and precipitation for 21 CMIP5 models. We define the PDO as the leading empirical orthogonal function of extended winter (November–April) monthly-mean SST anomalies in the North Pacific poleward of 20°N (Zhang et al. 1997; Mantua et al. 1997) for 1900–93 and subtract the monthly global mean SST. We then calculate the PDO index by projecting monthly North Pacific SST anomalies onto the PDO pattern for all available months and then standardizing the resulting time series. Figure 17 illustrates the PDO patterns in both observations and the CMIP5 ensemble (see Table 5 for a list of models) obtained by regressing the unfiltered monthly SST anomalies onto the PDO index for all calendar months. As in the CMIP3 models (Oshima and Tanimoto 2009; Furtado et al. 2011), the CMIP5 models reproduce the basic PDO horseshoe SST pattern. The most notable difference is the westward shift of the North Pacific center of action in models with respect to observations (Fig. 17c). The regions with the largest differences also correspond with regions of relatively high intermodel variability (Fig. 17d).
For each set of seasonal temperature and precipitation regressions, we calculate the centered pattern correlations and RMS differences between the observed and CMIP5 model regressions (Table 5). Despite fairly low pattern correlations in many cases, for most models and most seasons the differences in the regression patterns are not statistically significant. This may be due to a combination of small effective sample size, large uncertainty in the regression coefficients, a relatively modest impact of the PDO on seasonal SAT and precipitation, and the ability of the models to capture the general PDO behavior during the winter and spring when the PDO impacts are strongest. In particular, the full ensemble performs well in capturing the winter and spring PDO SAT patterns, but substantial differences in the precipitation regressions are evident, particularly in spring.
Figure 18 shows the DJF SAT and precipitation regressions in observations and the CMIP5 ensemble. The CMIP5 models do rather well in capturing the PDO influence on North American SAT, with positive (negative) SAT anomalies in northwestern (southeastern) North America during the positive phase of the PDO. Almost all local differences in the regression coefficients are not statistically significant. In contrast, the CMIP5 models perform somewhat poorly in reproducing the precipitation patterns over large parts of North America, although for high latitudes the observations are based on very sparse station data, especially before the 1950s (Zhang et al. 2000). Both observations (Fig. 18b) and CMIP5 ensemble (Fig. 18d) produce a tripole pattern of precipitation anomalies over the west coast of North America. Large differences, however, are found in eastern North America. In observations, the positive phase of the PDO is associated with reduced wintertime precipitation in the Tennessee and Ohio valleys, northeastern United States, and southeastern Canada (Fig. 18b), but the CMIP5 ensemble fails to discern this influence (Figs. 18d,f). Though of smaller magnitude, significant differences also occur in central North America (Fig. 18f). In spring [March–May (MAM)] the largest differences in the precipitation regressions occur along the coast of British Columbia, where observed regressions indicate positive anomalies but the CMIP5 ensemble produces a pronounced negative anomaly (not shown). Both observations and the CMIP5 ensemble reproduce positive precipitation anomalies along the West Coast and central plains of the United States.
The Atlantic multidecadal oscillation (AMO) is an important mode of multidecadal climate variability manifesting in North Atlantic SSTs (e.g., Kerr 2000; Enfield et al. 2001). The AMO has significant regional and global climate associations, such as northeast Brazilian and Sahel rainfall (e.g., Folland et al. 1986; Rowell et al. 1995; Wang et al. 2012), hurricane activity in the North Atlantic and the eastern North Pacific (Goldenberg et al. 2001; Wang and Lee 2009), and North American and European summer climate (Enfield et al. 2001; McCabe et al. 2004; Sutton and Hodson 2005). In spite of its importance, the mechanism of the AMO is still unclear. Several studies have indicated the role of variations in the Atlantic meridional overturning circulation (AMOC) and associated heat transport fluctuations (Delworth and Mann 2000; Knight et al. 2005). Some modeling studies indicate that solar variability and/or volcanoes are important (Hansen et al. 2005; Otterå et al. 2003) or that aerosols can be a primary driver (Booth et al. 2012), although the robustness of the latter has been questioned (Zhang et al. 2013). A recent observational study shows that a positive feedback between SSTs and dust aerosols in the North Atlantic via Sahel rainfall variability may be a mechanism (Wang et al. 2012).
The AMO index is defined as the detrended North Atlantic SST during the Atlantic hurricane season of June–November (JJASON) from the equator to 60°N and from 75° to 5°W with the 11-yr running mean (e.g., Enfield et al. 2001; Knight et al. 2005). As shown in Fig. 19a, the individual models show highly varying amplitudes and phases, with a large spread across models. This is to be expected given that the AMO is likely of internal origin. All models show the warming in the last two decades when anthropogenic warming becomes influential. The MME mean tends to follow the main variations in the earlier part of the record, albeit subdued because of averaging across models, but fails to show the warm period during 1926–65. Compared to the CMIP3 results (Medhaug and Furevik 2011), the CMIP5 simulation of the AMO has generally improved, particularly after 1960. This may be due to higher resolution, improved parameterizations, and the addition of time-evolving land cover. Results for individual models (Table 6) indicate that the standard deviations are comparable to or slightly weaker than the observations with typical amplitudes ranging from 0.09° to 0.19°C as compared to about 0.18°C in the observations, which is an improvement from CMIP3 models (Ting et al. 2009).
The lagged autocorrelation of the AMO index for lags zero to 35 yr (Fig. 19b) shows that the models generally represent the quasi-periodic nature of the observed AMO, with the peak oscillation at 30–35 yr in the observation but generally shorter for the models. The persistence in the AMO index as defined as the maximum time lag when the autocorrelation first crosses the significance line at the 10% level, and varies from 5 to 25 yr in the models, implying the potential for predicting future SSTs (Corti et al. 2012; H.-M. Kim et al. 2012). However, for most models the persistence is shorter (~;12 yr), which is nevertheless an improvement over CMIP3 models, which have an average persistence of about 5 yr (Medhaug and Furevik 2011).
The ability of the models to represent the AMO and its impact on precipitation over North America is evaluated by regressing the AMO index on regional seasonal precipitation and SSTs for 1901–99. The results are shown for autumn in Fig. 20 and shown in more detail in Kavvada et al. (2013). The SST signature of the AMO is stronger in autumn than in summer and this is reflected in its impact on central U.S. precipitation in observations (not shown). In both seasons the SST anomalies reach a maximum over the mid-Atlantic, over the subpolar gyre region. The warm phase of the AMO induces drying conditions over the central United States and wet conditions over Florida and the U.S. Northeast in both seasons but with more intensity in autumn. However, there are seasonally contrasting conditions along the Gulf of Mexico states where decreased precipitation occurs in summer but increased precipitation occurs in autumn.
In general, the models do not capture the SST seasonality of the AMO well. The simulated SST anomalies are generally larger in summer than in autumn in the majority of the models (not shown). While all models tend to place the maximum SST anomalies over the mid-Atlantic Ocean, they do not replicate the observed maximum south of Greenland and its spatial structure. For example, CCSM4, GFDL-ESM, and MIROC5 emphasize anomalies over the Norwegian Sea and GFDL-ESM, GISS-E2-R, and INM-CM4.0 do not show a signal over the tropical Atlantic. The spatial correlation of the anomalies (Table 7) shows higher correlations for HadGEM2-ES and GISS-E2-R, although visually there are large discrepancies in the spatial patterns.
The precipitation impact of the AMO is a bigger challenge for the models (see Table 7 for individual model spatial correlations for precipitation), and they generally fail to represent the drier conditions over the central United States and the wet conditions along the coastal southern Atlantic U.S. States and southern Mexico. The initial drying over the south-central United States in summer is shown by a few models (BCC-CSM1.1, HadGEM2-ES, IPSL-CM5A-LR, and MRI-CGCM3), but the intensification of the drying into the autumn is not replicated by most of the models. The wet conditions over the southern Atlantic U.S. States in the autumn are captured by a few models but to varying degrees of agreement and some models show regressions of the opposite sign (e.g., GISS-E2-R and HadGEM2-ES) and despite their high SST correlations. The increased precipitation over southern Mexico in autumn is shown only by a handful of models (e.g., BCC-CSM1.1, CSIRO Mk3.6.0, IPSL-CM5A-LR, and NorESM1-M).
Numerous studies have shown the importance of the AMO in generating precipitation variability over the region (e.g., Enfield et al. 2001; Sutton and Hodson 2005; Wang et al. 2006; Schubert et al. 2009; Nigam et al. 2011), with a key role played by the lower-level circulation, which modulates the Great Plains low-level jet and the convergence–divergence of moisture fluxes (see section 5b). Thus, given the differences in the model simulated structure of the AMO SST footprint, their poor performance in the simulation of the hydroclimate impact over the central United States is not surprising: a situation that has not shown improvement since CMIP3 (Ruiz-Barradas et al. 2013).
7. Multidecadal trends
a. Trends in temperature and the “warming hole” over the southeastern United States
A unique of feature of U.S. temperature change during the twentieth century is the so-called warming hole (WH) observed in the southeastern United States (Pan et al. 2004). While the globe has warmed over the twentieth century, the WH region experienced cooling, especially in summer during the latter half of the century. Studies have attributed the mechanisms for this abnormal cooling (lack of warming) trend to large-scale decadal oscillations such as PDO and AMO (Robinson et al. 2002; Kunkel et al. 2006; Wang et al. 2009; Weaver 2013; Meehl et al. 2013) and to regional-scale hydrological processes (Pan et al. 2004) and land surface interactions (Liang et at. 2007). Portmann et al. (2009) speculated that secondary organic aerosols during the growing season could contribute to the cooling in the WH region, while Christidis et al. (2010) emphasized the role of internal climate variability.
We evaluate whether the CMIP5 models show the warming hole as a forced response in Fig. 21, which shows the annual and seasonal trends, in near-surface air temperature from the observation and the CMIP5 multimodel mean from 17 models (see Fig. 21 caption). Model and observation data are regridded to a common resolution 2.5° × 2.5° using area averaging. Trends are calculated for the 1930–2004 period using the Theil–Sen approach (Theil 1950; Sen 1968). The choice of 1930–2004 gives a prominent WH signal in the observations starting from the warmest decade following the Dust Bowl drought. Only one ensemble member from each model is included in the analysis as ensemble members from the same model show similar spatial patterns of long-term trends (Kumar et al. 2013b). The MME mean shows neither a cooling trend in the eastern United States nor lesser warming relative to the western United States. This indicates that, similar to CMIP3 (Kunkel et al. 2006) simulations, the CMIP5 simulations do not show the WH as a forced response signal.
Figure 22 shows the temporal evolution of 30-yr moving window annual temperature trends over the eastern United States in the observational data and CMIP5 simulations and relative to the western United States. The multidecadal persistence of the WH is clearly visible in the observational data: that is, most negative temperature trends are clustered between 1925 and 1955. The 95% model spread range brackets the observed multidecadal variability in the eastern U.S. temperature trends and approximately 40% of the 95% model spread range is negative. The multimodel median captures the overall tendency of positive and negative trend evolution (r2 = 0.58). Pan et al. (2013) found that 19 out of 100 CMIP5 historical “all forcings” simulations showed negative temperature trends in the Southeast United States, whereas simulations based on greenhouse gas emissions forcing only showed a strong warming in the central United States. These results suggest that there is some fidelity with observations via external forcings, but natural climate variability plays a major role. Kumar et al. (2013a) found that the 30-yr running temperature trend variability in the eastern United States is significantly correlated (r2 = 0.76) with the AMO and models that have relatively higher skill in AMO simulations also have a higher chance of reproducing the WH in the eastern United States. There is essentially no skill in the model's representation of the difference in trends between the eastern and western U.S. running trends (Fig. 22b).
b. Trends in DTR
Observed warming during the day and night has been asymmetric, with nocturnal minimum surface air temperature (Tmin) rising about twice as fast than daytime maximum temperature (Tmax) during the second half of twentieth century, mostly during 1950–80 (Vose et al. 2005). Changes in cloud cover, atmospheric water vapor, soil moisture, and other factors account for 25%–50% of the diurnal temperature range (DTR) reduction (Dai et al. 1999). Cloud cover, soil moisture, precipitation, and atmospheric/oceanic teleconnections account for up to 80% of regional variance over 1901–2002. Over the United States, cloud cover alone accounts for up to 63% of regional annual DTR variability (Lauritsen and Rogers 2012). During 1950–2004, summer Tmax and Tmin over North America increased 0.07° and 0.12°C, respectively, resulting in a −0.05°C decrease in DTR (Vose et al. 2005). A similar decrease (−0.06°C) occurred in winter. Over the WH region, summer Tmax decreased sharply (−0.13°C) while Tmin increased slightly (0.05°C), yielding a DTR decrease of 0.18°C. Winter DTR also decreased by 0.13°C.
Figure 23 shows a comparison of DTR magnitude and the linear trend in DTR from 17 models against the CRU TS3.1 observational dataset. The observed mean DTR (Tmax – Tmin) is characterized by high values over the western high mountainous regions in summer and low values in high latitudes (Fig. 23a). The MME-mean simulates this general pattern with underestimation in the mountains. The observed DTR trend is predominantly negative in the United States and Mexico and largely positive in Canada in both seasons (Fig. 23b). The largest decreasing DTR trend up to 0.2°C decade−1 is over the southeastern U.S. warming hole region in summer. The model DTR trend is poorly reproduced, missing the extensive negative trend over the southeastern region where models simulated increasing DTR trend (Fig. 23b, right). The pattern correlation between the observed and simulated DTR is from 0.40 to 0.82, with a mean of 0.67 for the 17 models, but the correlation of DTR trend is much lower, ranging from 0.19 to −0.26 (mean = 0.03). The model skill in simulating DTR trends does not appear to have improved from CMIP3 (Zhou et al. 2010) and earlier model comparisons (e.g., Braganza et al. 2004); however, the role of anthropogenic forcings appears to be essential in producing a decline in DTR (Zhou et al. 2010), even if it is underestimated.
c. Trends in precipitation
Precipitation has generally increased over North America in the last half of the twentieth century (Karl and Knight 1998; Zhang et al. 2000). Trends in precipitation are positively correlated with streamflow trends, thereby affecting water resource availability and flood potential (Lettenmaier et al. 1994; McCabe and Wolock 2002; Kumar et al. 2009). Figure 24 shows the multimodel ensemble average precipitation trend for 1930–2004 from 17 models against the CRU observations. The multimodel average weakly captures the wetting trend in North America, particularly at higher latitudes. Note that the precipitation gauge density before the 1950s was very low, especially in high latitudes, and the observational trends are very uncertain, especially in high latitudes, at least for the first part of the time period. However, the MME-mean fails to capture the trend magnitude: for example, the higher wetting trend (>20 mm decade−1) in the eastern United States. Figures 25a,b show the 30-yr running trend during the twentieth century in the eastern and western United States, respectively. The 95% model spread brackets the observed precipitation trend magnitude in both regions. The higher wetting trend in the observations has slowed down in the last decade in the eastern United States. The muted magnitude of the trend in Fig. 24 seems to be a result of low signal to noise ratio (the multimodel median line hovers around the zero line in Fig. 25), rather than a robust feature of CMIP5 climate models. Some individual models capture very well the observed trend magnitude. Drying in Mexico is a dominant but incorrect feature in the CMIP5 simulations, which is symptomatic of CMIP3 models as well (Pachauri and Reisinger 2007) and is likely driven by the inadequate connection between increasing precipitation and global SST warming, at least for summer, in the majority of models as shown by R. Fu et al. (2013, unpublished manuscript) for the southern United States.
8. Discussion and conclusions
This study has evaluated the simulated variability from the CMIP5 multimodel ensemble at intraseasonal to multidecadal time scales for North America and adjoining seas. The results show a mixture of performance, with some aspects of climate variability well reproduced (e.g., the spatial footprint of the PDO and its teleconnections), others reproduced well by some models but not others (e.g., ISV in the tropical Pacific; and ENSO teleconnections and types), and others poorly by most models (e.g., tropical cyclone frequency; ENSO asymmetry; teleconnections with the AMO; and long-term trends in DTR and precipitation). No one model stands out as better than the others, but certain models do perform much better for certain features. For example, the Hadley Centre models do well for the Central America midsummer drought and the SST footprint of the AMO; the MRI-CGCM3 model does relatively well for intraseasonal and interannual variability in the tropical Pacific and for tropical cyclone counts. In general, higher-resolution models do better for features such as tropical cyclones, but this does not appear to be a dominant factor for other aspects of climate variability. Furthermore, no model stands out as being particularly unskillful, bolstering the argument to consider all models irrespective of performance to encompass the uncertainties (Knutti 2010). In fact, the range of processes and metrics analyzed is a key advantage of this study, because skill in one aspect does not necessarily mean good performance in another. For example, NorESM1.1 does very well at representing the two types of ENSO and its teleconnections but does poorly at representing ENSO asymmetry. As a consequence, an overall ranking of models, albeit seemingly attractive, is difficult given the challenges in quantitatively comparing performance across different types of analysis, as well as the logistical challenges of sampling the same set of models across all analyses.
For the climate features and models analyzed here, there does not appear to be a great deal of improvement since CMIP3. For example, CMIP5 models still cannot capture the seasonal timing of ENSO events, which tend to peak in the fall and winter, and the spurious drying signal in the southern United States and Mexico continues from CMIP3. However, some features continue to be well simulated, such as the SST pattern of the PDO, and features related to spatial resolution are likely to have improved, such as the representation of TCs. Overall, the models are less able to capture observed variability and long-term trends than they are the mean climate state as evaluated in Part I, although this may be a result of model tuning to observations (Räisänen 2007). This is understandable for decadal to multidecadal variability, which is dependent on the models' internal variability or the sensitivity to external forcing, for which the observations can be very uncertain. Some of the biases in variability, however, appear to be related to problems in simulating the mean state, and there are encouraging signs that improvements in the models or at least the understanding of the sources of errors can be made (e.g., biases in the depiction of the mean state of tropical Pacific may be linked to biases in the ISV, the lack of asymmetry in ENSO phases, and to teleconnections with North American climate).
The results have implications for the interpretation and robustness of the model projected future changes. Part III evaluates the model projections for a subset of the features analyzed here and in Part I. As noted in Part I, the accurate simulation of historic climate features is not sufficient for credible projections, although the depiction of large-scale climate features is necessary. Several studies of future projections show only small differences between models that do better at replicating observations and those that do worse (e.g., Brekke et al. 2008; Knutti et al. 2010), while others have found relationships between model performance and future projections that can be related to physical processes (e.g., Hall and Qu 2006; Boe et al. 2009). However, these types of studies are generally specific to certain climate features that do not necessarily provide confidence or pessimism in model skill in a broader sense.
The adequate depiction of the variability is nevertheless necessary because this is generally associated with the more extreme aspects of climate that impose the largest impacts. Furthermore, the depiction of the teleconnections associated with large-scale variability is especially important because the impacts of potential changes in the variability of, say, ENSO (van Oldenborgh et al. 2005; Muller and Roeckner 2008) are subject to uncertainties in the representation of teleconnections (Part III). Model variability can also have a large impact on future changes because the signal to noise ratio can be highly dependent on the model's natural variability resulting in misleading assessments of future changes and uncertainties across models (Tebaldi et al. 2011). The ability of the models to reproduce the observed trends may be a better indicator of model reliability than depiction of the mean climate or even its variability, because this indicates the model's sensitivity to an external forcing that may continue into the future, such as greenhouse gas concentrations. The problem here is that the trend analyzed is subject to uncertainties in the observations, the complications of natural variability in the real world and models, and uncertainties in feedbacks and how they may change in the future (Räisänen 2007; Knutti 2010). The generally poor ability of the models to reproduce the trends in precipitation, DTR, and some features of regional temperature shown here are indicative of this.
We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output. For CMIP, the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. The authors acknowledge the support of NOAA/Climate Program Office/Modeling, Analysis, Predictions and Projections (MAPP) program as part of the CMIP5 Task Force.
This article is included in the North American Climate in CMIP5 Experiments special collection.