1. Introduction
This is the second part of a three-part paper on phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) model simulations for North America. This second part evaluates the CMIP5 models in their ability to replicate the observed variability of North American continental and regional climate, and related climate processes. Sheffield et al. (2013, hereafter Part I) evaluate the representation of the climatology of continental and regional climate features. The third part (Maloney et al. 2013, manuscript submitted to J. Climate hereafter Part III) describes the projected changes for the twenty-first century.
The CMIP5 provides an unprecedented collection of climate model output data for the assessment of future climate projections as well as evaluations of climate models for contemporary climate, the attribution of observed climate change, and improved understanding of climate processes and feedbacks. As such, these data contribute to the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) and other global, regional, and national assessments.
The goal of this study is to provide a broad evaluation of CMIP5 models in their depiction of North American climate variability. It draws from individual work by investigators within the CMIP5 Task Force of the U.S. National Oceanic and Atmospheric Administration (NOAA) Modeling Analysis and Prediction Program (MAPP) and is part of a Journal of Climate special collection on North America in CMIP5. We draw from individual papers within the special issue, which provide more detailed analysis that can be presented in this synthesis paper.
We begin in section 2 by describing the CMIP5, providing an overview of the models analyzed, the historical simulations, and the general methodology for evaluating the models. Details of the main observational datasets to which the climate models are compared are also given in this section. The next five sections focus on different aspects of North American climate variability, organized by the time scale of the climate feature. Section 3 covers intraseasonal variability with focus on variability in the eastern Pacific Ocean and summer drought over the southern United States and Central America. Atlantic and east Pacific tropical cyclone activity is evaluated in section 4. Interannual climate variability is assessed in section 5. Decadal variability and multidecadal trends are assessed in sections 6 and 7, respectively. Finally, the results are synthesized in section 8.
2. CMIP5 models and simulations
a. CMIP5 models
We use data from multiple model simulations of the “historical” scenario from the CMIP5 database. The CMIP5 experiments were carried out by 20 modeling groups representing more than 50 climate models with the aim of further understanding past and future climate change in key areas of uncertainty (Taylor et al. 2012). In particular, experiments have been focused on understanding model differences in clouds and carbon feedbacks, quantifying decadal climate predictability and why models give different answers when driven by the same forcings. The CMIP5 builds on the previous phase [phase 3 of CMIP (CMIP3)] experiments in several ways. First, a greater number of modeling centers and models have participated. Second, the models are more comprehensive in terms of the processes that they represent and are run at higher spatial resolution, therefore hopefully resulting in better skill in representing current climate conditions and reducing uncertainty in future projections. Table 1 provides an overview of the models used. The specific models used vary for each individual analysis because of data availability at the time of this study, and so the model names are provided within the results section where appropriate.
CMIP5 models evaluated and their attributes.


b. Overview of methods
Data from the historical CMIP5 scenario are evaluated, which is a coupled atmosphere–ocean mode simulation that is forced by historical estimates of changes in atmospheric composition from natural and anthropogenic sources, volcanoes, greenhouse gases, and aerosols, as well as changes in solar output and land cover. Historical scenario simulations were carried out for the period from the start of the industrial revolution to near present: 1850–2005. Our evaluations are generally carried out for the last 30 yr of the simulations, depending on the type of analysis and the availability of observations. For some analyses the only, or best available, data are from satellite remote sensing which restricts the analysis to the satellite period, which is generally from 1979 onward. In other cases the observational data are very uncertain for particular regions and time periods (e.g., precipitation in high latitudes in the first half of the twentieth century) and this is noted in the relevant subsection. For other analyses, multiple observational datasets are available and are used to capture the uncertainty in the observations. The observational datasets are summarized in Table 2 and further details of the datasets and data processing are given in the relevant subsections and figure captions. Where the comparisons go beyond 2005 (e.g., 1979–2008), data from the model representative concentration pathway 8.5 (RCP8.5) future projection scenario simulation (as this is regarded as closest to the business as usual trajectory) are appended to the model historical time series. About half the models have multiple ensemble members, but we select the first ensemble member for simplicity and discuss the variability in the results across the ensemble where appropriate.
Observational and reanalysis datasets used in the evaluations.


3. Tropical intraseasonal variability
a. MJO-related variability over the eastern Pacific and adjoining regions
It has been well documented that convection over the eastern Pacific (EP) ITCZ and neighboring areas is characterized by pronounced intraseasonal variability (ISV) during boreal summer (e.g., Knutson and Weickmann 1987; Kayano and Kousky 1999; Maloney and Hartmann 2000a; Maloney and Esbensen 2003, 2007; de Szoeke and Bretherton 2005; Jiang and Waliser 2008, 2009; Jiang et al. 2011). ISV over the EP exerts broad impacts on regional weather and climate phenomena, including tropical cyclone activity over the EP and the Gulf of Mexico, the summertime gap wind near the Gulfs of Tehuantepec and Papagayo, the Caribbean low-level jet and precipitation, the midsummer drought over Central America and Mexico, and the North American monsoon (e.g., Magaña et al. 1999; Maloney and Hartmann 2000b,a; Maloney and Esbensen 2003; Lorenz and Hartmann 2006; Serra et al. 2010; Martin and Schumacher 2011).
Here, model fidelity in representing ISV over the EP and intra-American sea (IAS) region is assessed by analyzing daily output of rainfall and 850-hPa winds from 18 CMIP5 models. Figure 1 displays a Taylor diagram for summer-mean (May–September) precipitation from the CMIP5 models over the EP domain (5°S–30°N, 150°–80°W) compared to the TMPA precipitation (see Table 2 for expanded dataset names). While the two HadGEM models (HadGEM2-CC and HadGEM2-ES; see Table 1 for expanded model names) display the highest pattern correlations (~;0.93), the MRI-CGCM3 show the smallest RMS because of its better skill in simulating the spatial standard deviations of summer-mean rainfall over the EP. In addition, four models (MPI-ESM-LR, CSIRO Mk3.6.0, CanESM2, and CNRM-CM5) also exhibit relatively better pattern correlations than other models.

Taylor diagram for summer-mean (May–September) rainfall over the eastern Pacific (5°S–30°N, 150°–80°W) simulated in CMIP5 GCMs. The rainfall observations are based on TMPA data.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

Taylor diagram for summer-mean (May–September) rainfall over the eastern Pacific (5°S–30°N, 150°–80°W) simulated in CMIP5 GCMs. The rainfall observations are based on TMPA data.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Taylor diagram for summer-mean (May–September) rainfall over the eastern Pacific (5°S–30°N, 150°–80°W) simulated in CMIP5 GCMs. The rainfall observations are based on TMPA data.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
The leading ISV modes over the EP based on observed and simulated rainfall fields are identified using a complex empirical orthogonal function (CEOF) approach (Maloney et al. 2008). CEOF analyses were applied to 30–90-day bandpass filtered daily rainfall anomalies and the spatial amplitude and phase for the first CEOF mode (CEOF1) based on TMPA are illustrated in Figs. 2a,b. A single ensemble member was used for each model for 1981–2005. The TMPA data are available for a shorter time period (13 yr), but the sensitivity of the results to different sample sizes (based on data from a selected model) was found to be small. Similar to Maloney et al. (2008), the maximum amplitude of the observed rainfall CEOF1 occurs over the far eastern part of the EP. Figure 2b illustrates the pattern of spatial phase of observed rainfall CEOF1. In agreement with previous studies, the observed leading ISV mode associated with the CEOF1 largely exhibits an eastward propagation, while a northward component is also evident (e.g., Jiang and Waliser 2008; Maloney et al. 2008; Jiang et al. 2011).

Spatial distribution of (a) amplitude and (b) phase of the CEOF1 based on 30–90-day bandpass filtered Tropical Rainfall Measuring Mission (TRMM) rainfall during boreal summer (June–September) over the eastern Pacific. To make the spatial phase patterns of the CEOF1 based on the observations and simulations comparable to each other, the spatial phase of CEOF1 for each dataset is adjusted by setting the domain-averaged value to be 0 over a small box region of 10°–15°N, 110°–100°W. Contours are only displayed where the local variance explained by CEOF1 exceeds 8%. (c) On the x axis, pattern correlation coefficients of the CEOF1 mode between TRMM observations and CMIP5 GCM simulations; on the y axis, relative amplitudes of CEOF1 in model simulations to their observed counterparts. Both pattern correlations and amplitudes are derived by averaging over the area of 5°–25°N, 140°–80°W where the active ISV is observed. The black star represents the TMPA observations.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

Spatial distribution of (a) amplitude and (b) phase of the CEOF1 based on 30–90-day bandpass filtered Tropical Rainfall Measuring Mission (TRMM) rainfall during boreal summer (June–September) over the eastern Pacific. To make the spatial phase patterns of the CEOF1 based on the observations and simulations comparable to each other, the spatial phase of CEOF1 for each dataset is adjusted by setting the domain-averaged value to be 0 over a small box region of 10°–15°N, 110°–100°W. Contours are only displayed where the local variance explained by CEOF1 exceeds 8%. (c) On the x axis, pattern correlation coefficients of the CEOF1 mode between TRMM observations and CMIP5 GCM simulations; on the y axis, relative amplitudes of CEOF1 in model simulations to their observed counterparts. Both pattern correlations and amplitudes are derived by averaging over the area of 5°–25°N, 140°–80°W where the active ISV is observed. The black star represents the TMPA observations.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Spatial distribution of (a) amplitude and (b) phase of the CEOF1 based on 30–90-day bandpass filtered Tropical Rainfall Measuring Mission (TRMM) rainfall during boreal summer (June–September) over the eastern Pacific. To make the spatial phase patterns of the CEOF1 based on the observations and simulations comparable to each other, the spatial phase of CEOF1 for each dataset is adjusted by setting the domain-averaged value to be 0 over a small box region of 10°–15°N, 110°–100°W. Contours are only displayed where the local variance explained by CEOF1 exceeds 8%. (c) On the x axis, pattern correlation coefficients of the CEOF1 mode between TRMM observations and CMIP5 GCM simulations; on the y axis, relative amplitudes of CEOF1 in model simulations to their observed counterparts. Both pattern correlations and amplitudes are derived by averaging over the area of 5°–25°N, 140°–80°W where the active ISV is observed. The black star represents the TMPA observations.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Next, the fidelity of the CMIP5 models in simulating the leading EP ISV mode is assessed by calculating pattern correlations of the simulated rainfall CEOF1 against observations. To increase sampling, spatial patterns of rainfall anomalies associated with the CEOF1 based on both observations and model simulations are derived at two quadratic phases by multiplying the CEOF1 amplitude by the cosine and sine of spatial phase at each grid point, respectively. The pattern correlations are then calculated at both of these two quadratic phases. A final pattern correlation for a particular model is derived by averaging these two pattern correlation coefficients. Figure 2c illustrates pattern correlations in depicting the CEOF1 rainfall pattern for each model simulation versus domain-averaged CEOF1 amplitude relative to observations, which provide measures of model performance of variability in space and time, respectively. A majority of the CMIP5 models tend to underestimate the amplitude of the leading EP ISV mode associated with the rainfall CEOF1, except CNRM-CM5, MIROC5, MPI-ESM-LR, HadGEM2-CC, and HadGEM2-ES. Among the 18 models examined, 8 models exhibit relative higher pattern correlations (>0.75).
The models with relative better skill in representing the leading EP ISV mode also largely exhibit better skill for summer-mean rainfall (cf. Figs. 1, 2c) and 850-hPa wind patterns (not shown). A common feature among the more skillful models is the presence of westerly or very weak easterly mean low-level winds over the EP warm pool region, as in the observations. Most of the models with relatively lower skill exhibit a stronger easterly summer-mean flow (>4 m s−1). This suggests that realistic representation of the mean state could be crucial for improved simulations of the EP ISV, which is in agreement with a recent study by Rydbeck et al. (2013), and has also been discussed for Madden–Julian oscillation (MJO) simulations over the western Pacific and Indian Ocean (e.g., Kim et al. 2009). One hypothesis is that a realistic mean state produces the correct sign of surface flux anomalies relative to intraseasonal precipitation, which helps to destabilize the local intraseasonal disturbance (e.g., Maloney and Esbensen 2005). Extended analyses of the EP ISV in CMIP5 models are given in Jiang et al. (2012).
b. Midsummer drought over Central America
The rainy season in Central America and southern Mexico spans roughly May through October. For most of the region, the precipitation climatology features maxima in June and September and a period of reduced rainfall during July–August known as the midsummer drought (MSD; Portig 1961; Magaña et al. 1999). The MSD is regular enough to be known colloquially and plays an important role in farming practices (Osgood et al. 2009). A previous assessment of CMIP3 model performance at simulating the MSD and future projections (Rauscher et al. 2008) suggested that many models are capable of simulating the MSD despite an overall dry bias and that the MSD is projected to become stronger with an earlier onset. In this section, the CMIP5 performance at simulating summertime precipitation and the MSD is evaluated. We evaluate 23 CMIP5 models against the TMPA, GPCP, and UNAM observational datasets. A simple algorithm for detecting and quantifying the climatological MSD is used that does not assume a priori which months are maxima and which months constitute the MSD (Karnauskas et al. 2012).
Figure 3 shows the observational and CMIP5 estimates of the MSD and highlights the large uncertainties in its spatial distribution among observational datasets. The CMIP5 multimodel ensemble (MME) does reasonably well at representing the essence of the MSD over much of the inter-Americas region. The maximum strength of the MSD in the MME is found just offshore of El Salvador and represents a midsummer precipitation minimum that is ~;2.5 mm day−1 less than the early and late summer peaks. Significant differences in the location and strength of the MSD between the various observational datasets preclude a definitive evaluation of the CMIP5 MME, but it is clear that the strength of the MSD is underestimated in some regions, including along the Pacific coast of Central America, the western Caribbean, the major Caribbean islands, and Florida. Figure 3 also shows the MME standard deviation and a histogram of the spatial correlations of individual models with the MME mean. The largest uncertainties are collocated with the regions of largest magnitude of the MSD indicating that much of the model disagreement is in the magnitude. Several models stand out as outliers in representing the spatial distribution of the MSD relative to the MME mean (Table 3), such as MIROC-ESM and MIROC-ESM-CHEM, while the Hadley Centre models do particularly well.

Summertime (June–September) MSD strength (mm day−1) for three observational estimates: (top) (left)TRMM 3B43 and (right) UNAM, and (middle left) GPCP. (middle right) The CMIP5 MME mean for 23 models (see Table 3). (bottom) (left) The MME standard deviation and (right) histogram of the pattern correlations between individual models and the MME mean. All model output and observational data were regridded onto a common 0.5° grid.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

Summertime (June–September) MSD strength (mm day−1) for three observational estimates: (top) (left)TRMM 3B43 and (right) UNAM, and (middle left) GPCP. (middle right) The CMIP5 MME mean for 23 models (see Table 3). (bottom) (left) The MME standard deviation and (right) histogram of the pattern correlations between individual models and the MME mean. All model output and observational data were regridded onto a common 0.5° grid.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Summertime (June–September) MSD strength (mm day−1) for three observational estimates: (top) (left)TRMM 3B43 and (right) UNAM, and (middle left) GPCP. (middle right) The CMIP5 MME mean for 23 models (see Table 3). (bottom) (left) The MME standard deviation and (right) histogram of the pattern correlations between individual models and the MME mean. All model output and observational data were regridded onto a common 0.5° grid.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Spatial correlation of the MSD between the CMIP5 models and the MME mean, calculated for 1850–2005.


4. East Pacific and Atlantic tropical storm track and cyclone activity
a. Tropical storm track
The density of traveling synoptic-scale disturbances across the tropics, referred to in the literature as the tropical storm track (e.g., Thorncroft and Hodges 2001; Serra et al. 2008, 2010), is examined in this section. These systems serve as precursors to a majority of tropical storms and hurricanes in the Atlantic and eastern North Pacific and their frequency at 850 hPa over Africa and the eastern Atlantic has been shown to be positively correlated with Atlantic hurricane activity (Thorncroft and Hodges 2001). As global models better resolve these systems than tropical cyclones, they provide an advantage over direct tracking of tropical cyclones to assess model tropical storm activity (see section 4b). As in Serra et al. (2010), the tropical storm track density is calculated based on the method of Hodges (1995, 1999) using smoothed, 6-hourly, 850-hPa relative vorticity. Only positive vorticity centers with a minimum threshold of 0.5 × 10−6 s−1 that persist for at least 2 days and have tracks of at least 1000 km in length are included in the analysis. This method primarily identifies westward moving disturbances such as easterly waves (e.g., Serra et al. 2010), although more intense storms that could potentially reach hurricane intensity are not excluded. We analyze a single ensemble member from nine CMIP5 models and compared the track statistics to the ERA-Interim (Fig. 4, left). These models were selected based on whether the 6-hourly pressure level data were available at the time of the analysis. Mean track strength, the mean of the smoothed 850-hPa vorticity along the track, is also examined (Fig. 4, right).

(left) Storm track density and (right) mean strength for ERA-Interim and seven CMIP5 models (CanESM2, CCSM4, GFDL-ESM2M, HadGEM2-ES, MIROC5, MPI-ESM-LR, and MRI-CGCM3) on facing pages. Tracks are based on 6-hourly 850-hPa relative vorticity smoothed to T42 spatial resolution to better capture the synoptic features of the vorticity field.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

(left) Storm track density and (right) mean strength for ERA-Interim and seven CMIP5 models (CanESM2, CCSM4, GFDL-ESM2M, HadGEM2-ES, MIROC5, MPI-ESM-LR, and MRI-CGCM3) on facing pages. Tracks are based on 6-hourly 850-hPa relative vorticity smoothed to T42 spatial resolution to better capture the synoptic features of the vorticity field.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
(left) Storm track density and (right) mean strength for ERA-Interim and seven CMIP5 models (CanESM2, CCSM4, GFDL-ESM2M, HadGEM2-ES, MIROC5, MPI-ESM-LR, and MRI-CGCM3) on facing pages. Tracks are based on 6-hourly 850-hPa relative vorticity smoothed to T42 spatial resolution to better capture the synoptic features of the vorticity field.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
The multimodel mean track density is in good agreement with ERA-Interim; however, significant differences are seen with the individual models. The most apparent discrepancies are with the BCC-CSM1.1, CanESM2, and CCSM4 models, which strongly overestimate activity across the eastern Pacific and suggest a more longitudinally oriented track (CanESM2 and CCSM4) shifted south from what is observed. BCC-CSM1.1, HadGEM2-ES, and MIROC5 underestimate tracks in the west Atlantic, while GFDL-ESM2M underestimates tracks throughout the region except near 130°W. MPI-ESM-LR also underestimates tracks across the region as well as shifts their location southward. The track density maximum off the west coast of Mexico is best captured by HadGEM2-ES, while the overall smallest magnitude differences are seen with CNRM-CM5. The multimodel mean track strength maximum in the eastern Pacific lies along the west coast of Mexico similar to ERA-Interim; however, it is broader in scale and of larger magnitude than the observations (Fig. 4, left). On the other hand, the multimodel mean strength in the Gulf of Mexico and western Atlantic along the East Coast of the United States is strongly underestimated compared to ERA-Interim. Unlike for track density, these biases are fairly consistent among the models, with the exception of BCC-CSM1.1, which strongly overestimates mean strength across the region.
To better understand the biases in mean track density and strength, we examine the spatial correlations of 850- and 500-hPa winds and heights, as well as track density and strength with ERA-Interim. While all nine models have relatively good spatial correlations in the wind components and heights at 500 hPa (not shown), there is a wide spread in performance at the 850-hPa level that corresponds reasonably well with the rankings for the combined track density and strength correlations (Table 4). In particular, the top two models for the combined 850-hPa wind and height correlations (CNRM-CM5 and HadGEM2-ES) are also among the highest ranked for the combined track density and strength correlations. On the other hand, CanESM2 has a high ranking in the combined 850-hPa index but is one of the poorer models with respect to track density and spatial correlations, suggesting that there are other important factors contributing to the track statistics than just the large-scale low-level heights and winds across the region.
Spatial correlations of model fields with ERA-Interim for the months indicated and for 1979–2005. Correlations of the 850-hPa wind components and geopotential height have been combined into one index R_ZUV850, while 850-hPa track density and strength correlations have been combined into a second index R_TRK850 to simplify the comparisons. Values in boldface are the upper 25th percentile of the nine models shown.


b. Tropical cyclones in the North Atlantic and eastern North Pacific
It is well known since the 1970s that climate models are able to simulate tropical cyclone-like storms (e.g., Manabe et al. 1970; Bengtsson et al. 1982), which are generally formed at the scale of the model grid when conditions are unstable enough and other factors, such as vertical wind shear, are favorable. As the resolution of the climate models increases, the modeled storm characteristics become more realistic (e.g., Zhao et al. 2009). Analysis of CMIP3 models showed that the tropical cyclone-like storms produced still had many biases common of low-resolution models (Walsh et al. 2010). Therefore, various dynamical and statistical techniques for downscaling tropical cyclone activity using only the CMIP3 large-scale variables have been employed (Emanuel et al. 2008; Knutson et al. 2008). Recent studies suggest that when forced by observed SSTs and sea ice concentration, a global atmospheric model with a resolution ranging from 50 to 20 km can simulate many aspects of tropical cyclone (TC)–hurricane frequency variability for the past few decades during which reliable observations are available (e.g., Oouchi et al. 2006; Bengtsson et al. 2007; Zhao et al. 2009). The success is not only a direct evaluation of model capability but also an indication of the dominant role of SST variability on TC–hurricane frequency variability. When assuming a persistence of SST anomalies, some of the models were also shown to exhibit significant skill in hurricane seasonal forecast (e.g., Zhao et al. 2010; Vecchi et al. 2011).
Tropical storms and cyclones in this study are identified using the tracking method of Camargo and Zebiak (2002), which uses low-level vorticity, surface winds, surface pressure, and atmospheric temperature and considers only warm core storms. The method uses model-dependent (and resolution) thresholds and storms have to last at least 2 days. Only a subset of the tropical disturbances examined in the previous section will intensify enough to be identified by this tracking method and the percentage that this occurs will vary among different models. As will be shown, the CMIP5 standard models have trouble simulating the number of tropical cyclones, which can be attributed in part to their coarse resolution. Therefore, we also show results from the GFDL high-resolution model.
TC-type structures were tracked in five models for 1950–2005. We compare with observations from best-track datasets of the National Hurricane Center (Fig. 5). The number of TCs in all models is much lower than in observations, which is common to many low-resolution global climate models (e.g., Camargo et al. 2005, 2007). The HadGEM2-ES has the largest low bias, and the MPI-ESM-LR model has the most realistic tracks in the Atlantic basin. The MRI-CGCM3 model tracks in the Atlantic are mostly in the subtropical region, with very few storms in the deep tropics. In contrast, in the eastern North Pacific the MRI-CGCM3 has storm activity too near the equator. In the eastern North Pacific, very few storms (in all models) have westward tracks. The models seem to have an easier time in producing storms that are in the northwestward direction parallel to the Central American coast.

Tracks of tropical cyclone-like storms in the CMIP5 historical runs in the period 1950–2005 [GFDL-ESM2M (1 ensemble member), HadGEM2 (1 ensemble member), MPI-ESM-LR (3 ensemble members), MRI-CGCM3 (5 ensemble members), and MIROC5 (1 ensemble member)] and in observations for the same period. The number of storms in each case is given in the bottom-right corner of each panel. One ensemble member is used for each model.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

Tracks of tropical cyclone-like storms in the CMIP5 historical runs in the period 1950–2005 [GFDL-ESM2M (1 ensemble member), HadGEM2 (1 ensemble member), MPI-ESM-LR (3 ensemble members), MRI-CGCM3 (5 ensemble members), and MIROC5 (1 ensemble member)] and in observations for the same period. The number of storms in each case is given in the bottom-right corner of each panel. One ensemble member is used for each model.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Tracks of tropical cyclone-like storms in the CMIP5 historical runs in the period 1950–2005 [GFDL-ESM2M (1 ensemble member), HadGEM2 (1 ensemble member), MPI-ESM-LR (3 ensemble members), MRI-CGCM3 (5 ensemble members), and MIROC5 (1 ensemble member)] and in observations for the same period. The number of storms in each case is given in the bottom-right corner of each panel. One ensemble member is used for each model.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Figure 6 shows the mean number of TCs per month for the North Atlantic and eastern North Pacific. In some cases, the models produce too many storms in the offseason, while all models produce too few storms in the peak season. The bottom panels show the spread of the number of storms per year, emphasizing the low number of storms per year in all models. The highest-resolution model MRI-CGCM3 (1.1° × 1.1°) has the least bias relative to the observations and the highest bias is for the coarsest-resolution model (GFDL-ESM2M; 2.5° × 2.0°). However, resolution cannot explain the rankings for all models, with the HadGEM2-ES and MPI-ESM-LR models having relatively large and small biases, respectively, despite both having intermediate resolutions. The model dynamical core, convection scheme and their interactions are other factors that have been shown to be important (Camargo 2013). Examination of variability across ensemble members in producing tropical cyclones was carried out for five member runs of the MRI-CGCM3 model (not shown) but was much less than among different models.

Mean number of TCs per month in models [GFDL-ESM2M, HadGEM2-ES (in the figure HGEM2), MPI-ESM-LR, MRI-CGCM3, and MIROC5] and observations in (top left) the North Atlantic and (top right) eastern North Pacific, using only ensemble 1 for MRI-CGCM3. Number of TCs per year in the period 1950–2005 in models and observations for (bottom left) the North Atlantic and (bottom right) the eastern North Pacific. The blue box shows the 25th–75th percentile range, with the median shown as a red line. The whiskers and red crosses show the data outside of middle quartiles.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

Mean number of TCs per month in models [GFDL-ESM2M, HadGEM2-ES (in the figure HGEM2), MPI-ESM-LR, MRI-CGCM3, and MIROC5] and observations in (top left) the North Atlantic and (top right) eastern North Pacific, using only ensemble 1 for MRI-CGCM3. Number of TCs per year in the period 1950–2005 in models and observations for (bottom left) the North Atlantic and (bottom right) the eastern North Pacific. The blue box shows the 25th–75th percentile range, with the median shown as a red line. The whiskers and red crosses show the data outside of middle quartiles.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Mean number of TCs per month in models [GFDL-ESM2M, HadGEM2-ES (in the figure HGEM2), MPI-ESM-LR, MRI-CGCM3, and MIROC5] and observations in (top left) the North Atlantic and (top right) eastern North Pacific, using only ensemble 1 for MRI-CGCM3. Number of TCs per year in the period 1950–2005 in models and observations for (bottom left) the North Atlantic and (bottom right) the eastern North Pacific. The blue box shows the 25th–75th percentile range, with the median shown as a red line. The whiskers and red crosses show the data outside of middle quartiles.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Figure 7 shows results for the GFDL-C180-HIRAM model, which has a higher resolution (~;50 km) than the standard coupled GFDL CM3 model and differs in some aspects of the physics such as the convection scheme. The model was run for a CMIP5 timeslice experiment forced by observed interannually and seasonally varying SSTs and sea ice concentration from HadISST (I. M. Held et al. 2013, unpublished manuscript). The tracking algorithm of Zhao et al. (2009) was used to identify TCs with near-surface wind speed reaching hurricane intensity. The model reproduces the observed statistics with the ratio of observed to model variances of interannual variability in both the North Atlantic and eastern Pacific not statistically different from one, according to an F test at the 5% significance level that assumes that the annual frequencies are normally distributed. Figure 7 also shows that the model captures the observed seasonal cycle in both the North Atlantic and eastern Pacific. The model can also reproduce the observed seasonal cycle in the North Atlantic and eastern Pacific as well as the observed year-to-year variation of annual hurricane counts and the decadal trend for both basins for this period (Zhao et al. 2009; I. M. Held et al. 2013, unpublished manuscript). The quality of the model's present-day simulation increases our confidence in the future projections, although the uncertainty in the projections is dominated by uncertainty in projected changes in SST boundary conditions across the CMIP5 standard-resolution models (Part III). Although not analyzed here, MIROC4h has a similar spatial resolution (0.56°) to C180-HIRAM. Evaluations by Sakamoto et al. (2012) show that MIROC4h can reproduce the global number of TCs, in part because of realistic SSTs, but severely underestimates the frequency in the North Atlantic, suggesting that higher model resolution is necessary but not sufficient to reproduce observed frequencies.

(top) Comparison of observed and C180-HIRAM (one realization) simulated hurricane tracks for the North Atlantic and eastern Pacific for 1981–2008. (middle) Comparison of observed and C180-HIRAM simulated annual hurricane count statistics. Blue boxes show the 25th–75th percentile range, with the median shown as a red line and the mean shown as a red star. The whiskers show the maximum and minimum values. The annual statistics are computed based on a 3-member ensemble mean for 1981–2008. (bottom) Observed and model simulated seasonal cycle (number of hurricanes per month) for (left) the North Atlantic and (right) eastern Pacific from the 3-member ensemble mean (1 = January; 12 = December).
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

(top) Comparison of observed and C180-HIRAM (one realization) simulated hurricane tracks for the North Atlantic and eastern Pacific for 1981–2008. (middle) Comparison of observed and C180-HIRAM simulated annual hurricane count statistics. Blue boxes show the 25th–75th percentile range, with the median shown as a red line and the mean shown as a red star. The whiskers show the maximum and minimum values. The annual statistics are computed based on a 3-member ensemble mean for 1981–2008. (bottom) Observed and model simulated seasonal cycle (number of hurricanes per month) for (left) the North Atlantic and (right) eastern Pacific from the 3-member ensemble mean (1 = January; 12 = December).
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
(top) Comparison of observed and C180-HIRAM (one realization) simulated hurricane tracks for the North Atlantic and eastern Pacific for 1981–2008. (middle) Comparison of observed and C180-HIRAM simulated annual hurricane count statistics. Blue boxes show the 25th–75th percentile range, with the median shown as a red line and the mean shown as a red star. The whiskers show the maximum and minimum values. The annual statistics are computed based on a 3-member ensemble mean for 1981–2008. (bottom) Observed and model simulated seasonal cycle (number of hurricanes per month) for (left) the North Atlantic and (right) eastern Pacific from the 3-member ensemble mean (1 = January; 12 = December).
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
5. Interannual to decadal variability
a. ENSO
The El Niño–Southern Oscillation (ENSO) is the most important driver of global climate variability on interannual time scales. It impacts many regions worldwide through climate teleconnections (Ropelewski and Halpert 1987), which link the tropical Pacific to higher latitudes through shifts in midlatitude weather patterns. The impact of ENSO on North American climate is felt most strongly in the wintertime, with El Niño events bringing warmer temperatures to much of the northern part of the continent and wetter conditions in the southern United States and northern Mexico. La Niña events tend to bring drier weather to the southern United States. Evaluation of the ability of CMIP5 models to simulate ENSO is carried out for several aspects of ENSO variability and for teleconnections with North American climate.
1) Evaluation of ENSO teleconnections
We examine how well the historical simulations of CMIP5 models reproduce the composite near-surface air temperature (SAT) and precipitation patterns over North America during El Niño and La Niña episodes. In both model and observed data, we define ENSO episodes similarly to the Climate Prediction Center (CPC). A monthly ENSO index is calculated from detrended and high-pass filtered SSTs over the Niño-3.4 region (5°S–5°N, 170°–120°W) from ERSST.v3b observations and CMIP5 models. An El Niño (La Niña) episode is defined as any sequence of months where the 3-month running mean Niño-3.4 SST is >0.5°C (<−0.5°C) for at least 5 consecutive 3-month running seasons.
In observations, approximately 90% of El Niño and 89% of La Niña episodes feature peak amplitudes in fall or winter. In the CMIP5 ensemble of the historical simulations, however, only 68% of El Niño and 65% of La Niña episodes have peak amplitudes in fall or winter, although several of the models (CanESM2, CNRM-CM5, HadCM3, and NorESM1-M) do have fall–winter peak frequencies exceeding 80% for both El Niño and La Niña episodes. This finding suggests that CMIP5 models do not fully reproduce the phase locking of ENSO to the seasonal cycle, a deficiency noted in CMIP3 models as well (Guilyardi et al. 2009). The following analysis focuses on those episodes that do peak in fall or winter. In the ensemble mean, the frequency of ENSO episodes and the mean peak amplitude are similar to observed values (not shown).
Because the dynamics of extratropical ENSO teleconnections are tied to upper-tropospheric processes and because these teleconnections are strongest during boreal winter, we examine how well CMIP5 models reproduce the December–February (DJF) composite 300-hPa geopotential height patterns in the NCEP–NCAR reanalysis. In addition, we attempt to identify what characteristics distinguish higher from lower performance models, where performance is based on the El Niño (La Niña) composites of all height fields for which the detrended Niño-3.4 SST anomaly is >0.5°C (<−0.5°C). The high performance models are defined as those with a pattern correlation that exceeds 0.6 and an RMS difference less than 13 m between the model and observed composites for both El Niño and La Niña (Fig. 8). This subjective partitioning is used as a means of discerning general properties that distinguish higher from lower performance models. Overall, 10 (11) models are characterized as high (low) performance based on these criteria.

Taylor diagrams for (a) El Niño and (b) La Niña composite 300-hPa geopotential patterns over the region from East Asia to North America. Higher performance [pattern correlation > 0.6 and RMS difference < 13 m in both (a) and (b)] models are indicated in red, whereas lower performance models are indicated in blue. In (a) HadCM3, which falls outside of the plot, has a pattern correlation of −0.3 and RMS difference of 17.6 m. The points labeled ens in red, blue, and green represent the higher performance, lower performance, and total ensemble, respectively. The composites are normalized by the Niño-3.4 SST amplitude to focus on pattern differences independent of ENSO amplitude differences. The observational reference is based on the NCEP–NCAR reanalysis for 1950–2010, whereas the CMIP5 calculations are based on the full historical period (1850–2005) for one run of each model.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1

Taylor diagrams for (a) El Niño and (b) La Niña composite 300-hPa geopotential patterns over the region from East Asia to North America. Higher performance [pattern correlation > 0.6 and RMS difference < 13 m in both (a) and (b)] models are indicated in red, whereas lower performance models are indicated in blue. In (a) HadCM3, which falls outside of the plot, has a pattern correlation of −0.3 and RMS difference of 17.6 m. The points labeled ens in red, blue, and green represent the higher performance, lower performance, and total ensemble, respectively. The composites are normalized by the Niño-3.4 SST amplitude to focus on pattern differences independent of ENSO amplitude differences. The observational reference is based on the NCEP–NCAR reanalysis for 1950–2010, whereas the CMIP5 calculations are based on the full historical period (1850–2005) for one run of each model.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Taylor diagrams for (a) El Niño and (b) La Niña composite 300-hPa geopotential patterns over the region from East Asia to North America. Higher performance [pattern correlation > 0.6 and RMS difference < 13 m in both (a) and (b)] models are indicated in red, whereas lower performance models are indicated in blue. In (a) HadCM3, which falls outside of the plot, has a pattern correlation of −0.3 and RMS difference of 17.6 m. The points labeled ens in red, blue, and green represent the higher performance, lower performance, and total ensemble, respectively. The composites are normalized by the Niño-3.4 SST amplitude to focus on pattern differences independent of ENSO amplitude differences. The observational reference is based on the NCEP–NCAR reanalysis for 1950–2010, whereas the CMIP5 calculations are based on the full historical period (1850–2005) for one run of each model.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1
Figure 9 shows the composites of 300-hPa geopotential height, SAT, precipitation, and tropical SST for El Niño. The corresponding composites for La Niña (not shown) are quite similar but of opposite sign. The higher performance ensemble performs rather well in capturing the basic El Niño geopotential height, SAT, and precipitation teleconnections over the North Pacific and North America, with the exception being the failure to capture the negative precipitation anomaly in the Tennessee and Ohio valleys. The lower performance ensemble features a much weaker teleconnection pattern and an Aleutian low anomaly that is shifted about 10° too far west. The composite El Niño SST anomalies (Figs. 2k,l), however, are quite similar.

Composites of (a)–(c) 300-hPa height (z300; m), (d)–(f) SAT (°C), (g)–(i) precipitation (mm day−1), and (j)–(l) SST (°C) anomalies during DJF El Niño episodes (left) in observations and in (center) high and (right) low performance CMIP5 ensembles described in Fig. 8. The observational SAT and precipitation composites are based on the CRU TS3.1 land near-surface temperature and precipitation datasets for 1901–2009. The z300, SAT, and precipitation composites are normalized by the Niño-3.4 SST anomaly. Stippling in the observed (a) z300, (d) SAT, and (g) precipitation composites indicates anomalies that are statistically significant at the 5% level.
Citation: Journal of Climate 26, 23; 10.1175/JCLI-D-12-00593.1