1. Introduction
This is the first part of a three-part paper on the phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) model simulations for North America. The first two papers evaluate the CMIP5 models in their ability to replicate the observed features of North American continental and regional climate and related climate processes for the recent past. This first part evaluates the models in terms of continental and regional climatology, and Sheffield et al. (2013, hereafter Part II) evaluates intraseasonal to decadal variability. Maloney et al. (2013, manuscript submitted to J. Climate, hereafter Part III) describes the projected changes for the twenty-first century.
The CMIP5 provides an unprecedented collection of climate model output data for the assessment of future climate projections as well as evaluations of climate models for contemporary climate, the attribution of observed climate change, and improved understanding of climate processes and feedbacks. As such, these data feed into the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) and other global, regional, and national assessments. The goal of this study is to provide a broad evaluation of CMIP5 models in their depiction of North American climate and associated processes. The set of climate features and processes examined in this first part were chosen to cover the climatology of basic surface climate and hydrological variables and their extremes at daily to seasonal time scales, as well as selected climate features that have regional importance. Part II covers aspects of climate variability, such as intraseasonal variability in the tropical Pacific, the El Niño–Southern Oscillation (ENSO), and the Atlantic multidecadal oscillation, which play major roles in driving North American climate variability. This study draws from individual work by investigators within the CMIP5 Task Force of the National Oceanic and Atmospheric Administration (NOAA) Modeling Analysis and Prediction Program (MAPP). This paper is part of a Journal of Climate special collection on North America in CMIP5 models, and we draw from individual papers within the special collection, which provide detailed analysis of some of the climate features examined here.
We begin in section 2 by describing the CMIP5, providing an overview of the models analyzed, the historical simulations, and the general methodology for evaluating the models. We focus on a core set of 17 CMIP5 models that represent a large set of climate centers and model types and synthesize model performance across all analyses for this core set. Details of the observational datasets to which the climate models are compared are also given in this section. The next two sections focus on different aspects of North American climate and surface processes. Section 3 begins with an overview of climate model depictions of continental climate, including seasonal precipitation, air temperature, sea surface temperatures, and atmospheric and surface water budgets. Section 4 evaluates the model simulations of extremes of temperature and surface hydrology and temperature-based biophysical indicators such as growing season length. Section 5 focuses on regional climate features such as North Atlantic winter storms, the Great Plains low-level jet, and Arctic sea ice. The results are synthesized in section 6 and compared to results from CMIP3 models for selected variables.
2. CMIP5 models and simulations
a. CMIP5 models
We use data from multiple model simulations of the “historical” scenario from the CMIP5 database. The scenarios are described in more detail below. The CMIP5 experiments were carried out by 20 modeling groups representing more than 50 climate models with the aim of further understanding past and future climate change in key areas of uncertainty (Taylor et al. 2012). In particular, experiments focus on understanding model differences in clouds and carbon feedbacks, quantifying decadal climate predictability and why models give different answers when driven by the same forcings. The CMIP5 builds on the previous phase (CMIP3) experiments in several ways. First, a greater number of modeling centers and models have participated. Second, the models generally run at higher spatial resolution with some models being more comprehensive in terms of the processes that they represent, therefore hopefully resulting in better skill in representing current climate conditions and reducing uncertainty in future projections. Table 1 provides an overview of the models used.
CMIP5 models evaluated and their attributes. Model types are atmosphere–ocean coupled (AO), ocean–atmosphere–chemistry coupled (ChemOA), Earth system model, and Earth system model chemistry coupled.
To provide a consistent evaluation across the various analyses, we focus on a core set of 17 models, which are highlighted in the table by asterisks. The core set was chosen to span a diverse set of modeling centers and model types [coupled atmospheric–ocean models (AOGCM), Earth system models (ESM), and models with atmospheric chemistry (ChemAO and ChemESM)] and includes an AOGCM and ESM from the same modeling center for three centers [Geophysical Fluid Dynamics Laboratory (GFDL), Hadley Center, and Atmosphere and Ocean Research Institute (AORI)/National Institute for Environmental Studies (NIES)/ Japan Agency for Marine-Earth Science and Technology (JAMSTEC)]. The set was restricted by data availability and processing constraints, and so for some analyses (in particular those requiring high-temporal-resolution data) a smaller subset of the core models was analyzed. When data for noncore models were available, these were also evaluated for some analyses and the results are highlighted if they showed better (or particularly poor) performance. The specific models used for each individual analysis are provided within the results section where appropriate.
b. Overview of methods
Data from the historical CMIP5 scenarios are evaluated in this study. The historical simulations are run in coupled atmosphere–ocean mode forced by historical estimates of changes in atmospheric composition from natural and anthropogenic sources, volcanoes, greenhouse gases (GHGs), and aerosols, as well as changes in solar output and land cover. Note that only anthropogenic GHGs and aerosols are prescribed common forcings to all models, and each model differs in the set of other forcings that it uses, such as land-use change. For ESMs, the carbon cycle and natural aerosols are modeled and are therefore feedbacks, and we take this into consideration when discussing the results. For certain basic climate variables we also analyze model simulations from the CMIP3 that provided the underlying climate model data to the IPCC Fourth Assessment Report (AR4). Several models have contributed to both the CMIP3 and CMIP5 experiments, either for the same version of the model or for a newer version, and this allows a direct evaluation of changes in skill in individual models as well as the model ensemble.
Historical scenario simulations were carried out for the period from the start of the industrial revolution to near the present: 1850–2005. Our evaluations are generally carried out for the most recent 30 yr, depending on the type of analysis and the availability of observations. For some analyses the only—or best available—data are from satellite remote sensing, which restricts the analysis to the satellite period, which is generally from 1979 onward. For other analyses, multiple observational datasets are used to represent the uncertainty in the observations. An overview of the observational datasets used in the evaluations is given in Table 2, categorized by variable. Further details of these datasets and any data processing are given in the relevant subsections and figure captions. Where the comparisons go beyond 2005 (e.g., 1979–2008), model data from the representative concentration pathway 8.5 (RCP8.5) future projection scenario simulation are appended to the model historical time series. Most of the models have multiple ensemble members and in general we use the first ensemble member. In some cases, the results for multiple ensembles are averaged where appropriate or used to assess the variability across ensemble members. Results are generally shown for the multimodel ensemble (MME) mean and for the individual models using performance metrics that quantify the errors relative to the observations.
Observational and reanalysis datasets used in the evaluations.
3. Continental seasonal climate
We begin by evaluating the seasonal climatologies of basic climate variables: precipitation, near-surface air temperature, sea surface temperature (SST), and atmosphere–land water budgets.
a. Seasonal precipitation climatology
Figure 1 shows the model precipitation climatology and Global Precipitation Climatology Project (GPCP; Adler et al. 2003) observations for December–February (DJF) and June–August (JJA) for 1979–2005. Table 3 shows the seasonal biases in precipitation for North America, the United States, and six regions. Most of the models do reasonably well in producing essential large-scale precipitation features, and the bias in the MME mean seasonal precipitation over North America is about 12% and −1% for DJF and JJA, respectively. However, there are substantial differences among the models and with observations at the regional scale (Table 3), with generally an overestimation of precipitation in more humid and cooler regions, and an underestimation in drier regions. For the winter season (Fig. 1, left), the Pacific storm track is very reasonably placed in latitude as it approaches the coast. One important aspect of this, the angle of the storm track as it bends northward approaching the coast from roughly Hawaii to central California, is well reproduced in the models. The intensity of the storm tracks off the West Coast compares reasonably well to the GPCP product shown here. The model rainfall is not quite intense enough at the coast and spreads slightly too far inland, as might be expected for the typical model resolution, which does not fully resolve mountain ranges and may help explain the overestimation by all models for WNA (see Table 3 for region definitions). The East Coast storm tracks are well placed in DJF (see section 5a on wintertime extratropical cyclones) and the multimodel ensemble mean does a good job in replicating the eastern Pacific intertropical convergence zone (ITCZ), although northern Mexico receives too much rainfall. Figure 1c provides a model by model view of these features using the 3 mm day−1 contour for each model to provide an outline of the major precipitation features. If the models were in line with observations, all contours would lie exactly along the boundary of the shaded observations. Taking into account the high latitude precipitation excess in the Pacific storm track, individual models do quite well at reproducing each of the main features of the DJF climatology, including the arrival point at the North American west coast of the southern edge of the Pacific storm track. Only a few models exhibit the ITCZ extension feature that accounts for the northern Mexico precipitation excess.
DJF and JJA bias (% of observed mean) in CMIP5 continental and regional precipitation relative to the GPCP observations. The mean and standard deviation of the biases across the multimodel ensemble are also given. NA is 10°–72°N and 190°–305°E; CONUS is 25°–50°N and 235°–285°W; and the regions defined in the table are ALA, NEC, ENA, CAN, WNA, and CAM, as modified from Giorgi and Francisco (2000) and shown in supplementary Fig. S3.
For the summer season (JJA; Fig. 1, right), the ITCZ and the Mexican monsoon are reasonably well simulated in terms of position (see section 5d on the North American monsoon), although the precipitation magnitude in parts of the Caribbean is underestimated relative to GPCP. The East Coast storm track in the multimodel ensemble mean is too spread out and less coherent than observed. This is due to substantial differences in the placement of these storm tracks in the individual models (Fig. 1d). The majority of the models exhibit excessive precipitation in at least some part of the continental interior. While the bulk of the models do reasonably well at the poleward extension of the monsoon over Central America, Mexico, and the inter-American seas region, a few models underestimate this extent, putting a split between the poleward extension of the monsoon feature and the start of the East Coast storm track. Overall, the models underestimate JJA precipitation over the Central America (including Mexico) and central North America regions (Table 3).
b. Seasonal surface air temperature climatology
Figure 2 compares the model simulated surface air temperature climatology to the observation estimates from National Centers for Environmental Prediction (NCEP)–U.S. Department of Energy (DOE) Reanalysis 2 and the Climatic Research Unit (CRU) TS3.0 station-based analysis. The MME mean compares well to the observations in most respects. Differences from both observational estimates are less than or on the order of 1°C over most of the continent except for certain regions (see Table 4). The multimodel ensemble mean is cooler than both datasets over northern Mexico in DJF. In high latitudes, differences between the observational estimates are large enough that the error patterns in Figs. 2f and 2g differ substantially, especially in DJF [NCEP–DOE is also slightly warmer than the North American Regional Reanalysis (not shown) in this region and season]. Beyond the overall simulation of the north–south temperature gradient and seasonal evolution, certain regional features are well represented. In JJA, this includes the regions of temperatures exceeding 30°C over Texas and near the Gulf of California and the extent of temperatures above 10°C, including the northward extension of this region into the Canadian prairies. Individual model surface air temperature climatologies, which are shown in the supplementary material (Figs. S1 and S2) and in terms of biases in Table 4, exhibit substantial regional scatter, including excessive northward extent of the region above 30°C through the Great Plains in three of the models (CanESM2, CSIRO Mk3.6.0, and FGOALS-s2; see Table 1 for expanded model names). In DJF, the multimodel ensemble mean does a good job of representing the 0°C contour, while the 10°C contour extends slightly too far south, yielding slightly cool temperatures over Mexico, with 15 out of 18 models with cold biases over the broader CAM region (Table 4). The wintertime cold bias relative to both observational estimates in very high latitudes is more pronounced in certain models such as HadGEM2-ES, which is biased low by −7.0° and −5.1°C over the ALA and NEC regions, respectively (Table 4). The intermodel scatter in surface temperature simulations is summarized in Fig. 2d for DJF and Fig. 2j for JJA using the intermodel standard deviation of the ensemble (i.e., the standard deviation at each gridpoint among the 18 model climatologies seen in Figs. S1 and S2). For DJF, the intermodel standard deviation is less than 2.5°C through most of the contiguous United States but increases toward high latitudes, exceeding 3.5°C over much of the area north of 60°N. In JJA, there is a region of high intermodel standard deviation, exceeding 3.5°C, roughly in the Great Plains region in the northern United States and southern Canada. This is a region with fairly high precipitation uncertainty in JJA (Fig. 1f), and changes in surface temperature in this region have been linked to factors affecting soil moisture, including preseason snowmelt (e.g., Hall et al. 2008), so this may be a suitable target for further study to reduce model uncertainty.
DJF and JJA bias in CMIP5 continental and regional near-surface air temperature (°C) relative to the CRU TS3.0 observations. The mean and standard deviation of the biases across the multimodel ensemble are also given. NA is 10°–72°N and 190°–305°E; CONUS is 25°–50°N and 235°–285°W; and the regions are ALA, NEC, ENA, CAN, WNA, and CAM, as modified from Giorgi and Francisco (2000) and shown in supplementary Fig. S3.
c. Seasonal sea surface temperature
The annual cycle of sea surface temperature (SST) is shown in Fig. 3 as winter-to-spring (December–May) and summer-to-fall (June–November) means. We also show precipitation over land, which is generally associated with SST variations in adjoining ocean regions. Maps for individual models are shown in supplementary Figs. S4 and S5. The Western Hemisphere warm pool (WHWP), where temperatures are equal or larger than 28.5°C, usually is absent from December to February and appears in the Pacific from March to May, while it is present in the Caribbean and Gulf of Mexico from June to November (Wang and Enfield 2001). The cooler part of the year is characterized by the small extension of SST in excess of 27°C and a suggestion of a cold tongue in the eastern equatorial Pacific, while during the warmer part of the year the extension of SSTs in excess of 27°C is maximum and the cold tongue is well defined over the eastern Pacific. High precipitation along the Mexican coasts, Central America, the Caribbean islands, and the central–eastern United States are associated with the warm tropical SSTs during the warm half of the year. A decrease in the regional precipitation south of the equator is also evident in this warm half of the year.
The MME mean shows the observed change in SST from cold to warm around the WHWP region; however, the warm pool is absent over the Caribbean and Gulf of Mexico region. The change in precipitation from the cold to the warm parts of the year is represented by the MME mean including the increase in precipitation over the central United States and Mexico as well as the decrease south of the equator. The eastern Pacific in the models is slightly cooler than observations in the cold part of the year but not in the form of weak cold tongue from the Peruvian coast but rather as a confined equatorial cold zone away from the coast. The cold tongue along the eastern equatorial Pacific and along the coast of Peru during the warmer part of the year is reasonably represented by the MME mean, although its extension is farther to the west. Differences with observations of the multimodel mean indicate cool SST biases of the WHWP over the Pacific and intra-American seas in all models in both the cold and warm parts of the year. Warm biases are evident close to the coasts of northeastern United States, western Mexico, and Peru. Precipitation biases indicate a wet/dry bias to the west/east of ∼97°W northward of 15°N over Mexico and the United States during both parts of the year (as well as the intense and extensive dry bias over South America to the east of the Andes); the cold bias over the intra-American seas and the dry bias over the Great Plains in the United States suggests a link between the two, considering the former is a great source of moisture for the latter.
A set of error statistics for the mean annual SSTs are summarized in Table 5 for the individual CMIP5 models and the MME mean. The spatial correlations are >0.9 for all models, and are not able to quantitatively distinguish the performance of the models. The MME mean maximizes the spatial correlation (0.97) and minimizes the RMSE (0.77°C) but not the bias (−0.54°C). Eight of the models have RMSE values less than 1°C, and the largest biases (>1.3°C) are for CSIRO Mk3.6.0, HadCM3, INM-CM4.0, IPSL-CM5A-LR, and MIROC-ESM. The biases, except for INM-CM4.0 and CCSM4, are negative, with the smallest bias for INM-CM4.0, and the largest for CSIRO Mk3.6.0.
CMIP5 error statistics for annual average SSTs. The mean and standard deviation of the statistics across the multimodel ensemble are given, as well as the statistics of the MME mean SSTs. Statistics are calculated over neighboring oceans to North America (170°–35°W, 10°S–40°N; domain displayed in Fig. 3) for average annual values for 1979–2004.
d. Seasonal atmospheric and land water budgets
We next evaluate the climatologies of the atmospheric and land water budgets. Seasonal changes in atmospheric water content are relatively small compared to the moisture fluxes and so we focus on the latter. Variations in moisture divergence are generally correlated with seasonal precipitation and so may help explain biases in model precipitation. The vertically integrated moisture transport (vectors) and its divergence (contours) are shown in Fig. 4 for five CMIP5 models (the number of models was limited by the availability of high-temporal-resolution model data required to calculate the moisture fluxes) and observational estimates from the Twentieth-Century Reanalysis (20CR) for mean JJA and DJF for 1981–2000. In summer, the 20CR shows southerly transport from the North Atlantic anticyclone that splits into two distinct branches: one flanking the Atlantic seaboard with large-scale convergence off the East Coast and a second branch of moisture flows into the interior central plains, which is associated with convergence over the Rocky Mountains. The western United States is dominated by divergence associated with the northerly component of the North Pacific anticyclone. The five models show the two branches of moisture transport, with associated convergence off the East Coast and divergence in the plains, albeit weaker. They also simulate the divergence in much of the west, but they do not simulate the strong convergence over the Rockies and Mexican Plateau as seen in 20CR, which is associated with the low bias in precipitation over these regions (Table 3; mean biases for the five models shown here are −19.8% and −31.5% for the CNA and CAM regions, respectively). Spatial correlations for divergence in the North American region range from 0.08 to 0.42, with MIROC5 and CNRM-CM5 performing the best out of the five models according to this measure (Table 6). In winter, the 20CR shows a more zonal transport than during summer, with weaker flow around the subtropical anticyclones and moisture convergence across much of the continent. The models represent both the moisture transport and divergence patterns well including the stronger convergence in the Pacific Northwest and Northern California and divergence in Southern California, although the magnitude of divergence is too strong along the coasts, most notably for the CCSM4 and CNRM-CM5 models, and precipitation over western North America is overestimated by all five models examined here (WNA and ALA regions; Table 3), especially for the CCSM4. The improvement in winter over summer for the whole domain is evident in the spatial correlations, which range between 0.60 and 0.76 for winter, with a different set of models performing better than in summer (CanESM2, CCSM4, and CNRM-CM5; Table 6).
Spatial correlations between simulated and observed estimates of column integrated moisture divergence for summer (JJA) and winter (DJF) seasons for the North American region. The CMIP5 model data were regridded to the 20CR grid (∼200 km) for this calculation. The mean and standard deviation of the correlations across the multimodel ensemble are also given.
Evaluations of the model simulated terrestrial water budget are shown in Figs. 5 and 6 against the offline land surface model (LSM) simulations. Figure 5 shows the regional mean seasonal cycles of the components of the land surface water budget (precipitation, evapotranspiration, runoff, and change in water storage). In reality, water storage includes soil moisture; surface water such as lakes, reservoirs, and wetlands; groundwater; and snowpack, but in general the climate models only simulate the soil moisture and snowpack components. Figure 5 also separates out the snow component of the water budget in terms of the snow water equivalent (SWE). Most models have a reasonable seasonal cycle of precipitation and evapotranspiration but tend to overestimate precipitation in the more humid and cooler regions (WNA, ENA, ALA, and NEC) as noted previously and overestimate evapotranspiration throughout the year and especially in the cooler months. Runoff is generally underestimated, particularly in the CNA and ENA regions and in NEC and CAM. It also peaks earlier in the spring in some models (which can be linked to a shortened snow season; see below), although the models generally replicate the spatial variability in annual total runoff (Fig. 6 and Fig. S6 in the supplementary material). The majority of models overestimate total runoff over dry regions and high latitudes, particularly for the Pacific Northwest and Newfoundland. SWE is generally overestimated by the multimodel ensemble for western North America, underestimated in the east, and overestimated in the Alaskan and western Canada region, which are a reflection of the precipitation biases. These biases are also reflected in the change in storage, particularly for the Alaska region where many of the models show a large negative change during late spring melt due to overestimation of SWE.
Figure 6 (Fig. S6 for individual models) also shows the runoff ratio (runoff divided by precipitation) over North America, which indicates the production of water at the land surface that is subsequently potentially available as water resources. The remaining precipitation is partitioned into evapotranspiration (assuming that storage does not change much over long time periods). Overall the MME mean replicates the spatial pattern from the observational estimate with higher ratios in humid and cooler regions, and lower ratios in dry regions. However, the MME mean tends to underestimate the ratios, especially for central North America and Central America but overestimates the ratio in Alaska (Table 7). For North America overall, the models tend to underestimate the ratios. The biases in runoff are better explained by biases in runoff ratios rather than biases in precipitation (not shown), especially in higher latitudes, highlighting the importance of the land surface schemes in the climate models and whether they are able to realistically partition precipitation into runoff and evapotranspiration and accumulate and melt snow.
Bias (model minus observational estimates) in annual runoff ratio (total runoff/precipitation) averaged over 1979–2004 for the North American continent, the contiguous United States, and the six regions defined in Table 3. The mean and standard deviation of the statistics across the multimodel ensemble are also given. Observed runoff is estimated from VIC and GLDAS2 Noah; precipitation is from GPCP. All data were interpolated to 2.5° resolution.
4. Continental extremes and biophysical indicators
This next section examines the performance of the models in representing observed temperature and hydrological extremes. We first focus on temperature extremes and temperature dependent biophysical indicators and then persistent seasonal hydrological extremes for precipitation and soil moisture. Regional extremes in temperature and precipitation are evaluated in section 5.
a. Temperature extremes and biophysical indicators
Temperature extremes have important consequences for many sectors including human health, ecosystem function, and agricultural production. We evaluate the models' ability to replicate the observed spatial distribution over North America of the frequency of extremes (Fig. 7) for the number of summer days with maximum temperature (Tmax) >25°C and the number of frost days with minimum temperature (Tmin) <0°C (Frich et al. 2002) and a set of biophysical indicators related to temperature: spring and fall freeze dates and growing season length. We define the growing season length following Schwartz and Reiter (2000), which is the number of days between the last spring freeze of the year and the first hard freeze of the autumn in the same year. A hard freeze is defined as when the daily minimum temperature drops below −2°C.
Overall, the models tend to underestimate the number of summer days by about 18 days over North America (Table 8), with regional underestimation of over 50 days in the western United States and Mexico and parts of the eastern United States, but otherwise are within 20 days of the observations for most other regions. Several models (CanESM4, CCSM4, CNRM-CM5, MIROC, and MIROC-ESM) overestimate the number of summer days from the northeastern United States up to the Canadian northern territories but tend to have smaller underestimation in the western United States and Mexico (see supplementary Fig. S7). Nearly all other models have low biases of up to 50 days in these drier regions, which, at least for the western United States, may be related to overestimation of precipitation and evapotranspiration (as shown in section 3d) and thus a reduction in sensible heating of the atmosphere. Several models have small biases for North America as a whole (Table 8) but often because large regional biases cancel out and only the BCC-CSM1.1, CSIRO Mk3.6.0, and HadGEM2-ES models have reasonably low biases (<30 days) across all regions. The first two of these models also have relatively low runoff ratio biases for the WNA and central North American (CNA) regions (HadGEM2-ES was not evaluated for surface hydrology), suggesting that their simulation of warm summer days is not impeded by biases in the surface energy budget. The number of frost days is better simulated in terms of overall MME mean bias (−2.8 days), but there is a positive bias across the Canadian Rockies and down into the U.S. Rockies for most models, suggesting that the models' generally coarse resolution and differences in topographic heights are partly responsible (see supplementary Fig. S8). Some of the models are biased low in the central United States by over 50 days. Models with the least bias in frost days also tend to be the least biased models for summer days, but again many of the regional biases cancel out for the North America values.
Bias and spatial correlation between the HadGHCND observations and the CMIP5 ensemble for number of summer days, number of frost days, and growing season length averaged over 1979–2005. The mean and standard deviation of the statistics across the multimodel ensemble are given, as well as the statistics of the MME mean. The frequencies were calculated on the model grid and then interpolated to 2.0° resolution for comparison with the observational estimates.
The models do reasonably well at depicting the spatial distribution of growing season length (MME mean bias = −8.5 days over North America). The largest biases of between 30–50 days are in western Canada where the models underestimate and in the central United States where they overestimate. The former is mainly because the last spring freeze is too late in western Canada and for the latter because of biases in both the last spring freeze (too early) and the first autumn freeze (too late). The INM-CM4.0 model has the largest bias overall (−76 days), which is consistent over most of the continent (see Fig. S9). The MIROC5 and MIROC-ESM models have the largest overestimations of 33 and 38 days, respectively, and these biases are also consistent over much of the continent.
b. Hydroclimate extremes
We examine the ability of CMIP5 models to simulate persistent drought and wet spells in terms of precipitation and soil moisture (SM). We focus on the United States because of the availability of long-term estimates of SM from the University of Washington North American Land Data Assimilation System (NLDAS-UW) dataset. Meteorological drought and wet spells are characterized by the 6-month standardized precipitation index (SPI6; McKee et al. 1993). Agricultural drought and wet spells are evaluated in terms of soil moisture percentiles (Mo 2008). The record length Ntotal is defined as the total months from all ensemble simulations of a model or the total months of the observed dataset. At each grid point, an extreme negative (positive) event is selected when the SPI6 index is below (above) −0.8 (0.8) for a dry (wet) event (Svoboda et al. 2002). For SM percentiles, the threshold is 20% (80%) for a dry (wet) event. At each grid cell, the number of months that extreme events occur N is 20% of the record length by construct (N/Ntotal = 20%). Because a persistent drought event (wet event) usually means persistent dryness (wetness), a drought (wet) episode is selected when the index is below/above this threshold for three consecutive seasons (9 months) or longer. The frequency of occurrence of persistent drought or wet spells (FOC) is defined as FOC= Np/N, where Np is the number of months that an extreme event persists for 9 months.
Figures 8 and 9 show the FOC averaged for persistent wet and dry events for SPI6 and SM, respectively, for 15 of the core models (the GFDL-ESM2M and INM-CM4.0 model datasets only had a single ensemble member and the total record is therefore too short for the analysis). The most noticeable feature is the east–west contrast of the FOC for both SPI6 and SM as driven by the gradient in precipitation amount and variability (Mo and Schemm 2008). Persistent drought and wet spells are more likely to occur over the western interior region, while extreme events are less likely to persist over the eastern United States and the West Coast. The maxima of the FOC are located in two bands, one located over the mountains and one extending from Oregon to Texas (Fig. 8a). Persistent events are also found over the Great Plains. The CanESM2, CCSM4, and MIROC5 models show the east–west contrast, although the magnitudes of FOC are too weak for the CanESM2 model. The center of maximum FOC for MIROC5 is too far south.
Table 9 shows the performance of the models in representing the east–west contrast in terms of a FOC index defined as the difference in the fraction of grid cells with FOC greater than a given threshold between the western (32°–48°N, 92°–112°W) and eastern (32°–48°N, 70°–92°W) regions. The thresholds are 0.2 for SPI and 0.3 for SM. The FOC index values for the CCSM4 (0.35) and MIROC5 (0.34) models are closest to the observations (0.37) for SPI6. The MPI-ESM-LR model also shows the east–west contrast with one maximum located over Utah and another over the Great Plains, but the second maximum is too spatially extensive. The MIROC-ESM, MRI-CGCM3, and NorEMS1-M models all show a band of maxima over the Southwest, but the FOC north of 35°N is too weak. Other models such as CSIRO Mk3.6.0, IPSL-CM5A-LR, CNRM-CM5, GISS-E2-R, and GFDL CM3 have the maxima located over the Gulf region, which is too far south. Finally, the HadCM3 and HadGEM2-ES (not shown) models do not have enough persistent events.
Frequency of occurrence of persistent extreme precipitation and soil moisture events over the United States for the CPC observations/NLDAS analysis and 15 CMIP5 models. The mean and standard deviation of the statistics across the multimodel ensemble are also given.
For SM (Fig. 9), the FOC from the NLDAS-UW shows that persistent anomalies are located west of 90°W over the western interior region, with a FOC index of 0.68. Many of the models, such as BCC-CSM1.1, HadCM3, and IPSL-CM5A-LR, do not have enough persistent events, and the CanESM2, GISS-E2-R, and MRI-CGCM3 models shift the maxima to the central United States. The CCSM4, GFDL CM3, and NorESM1-M models fail to replicate the east–west contrast because of their high FOC values throughout most of the United States. The best model for SM is the MPI-ESM-LR with a FOC index of 0.62, because it represents the east–west contrast and also has realistic magnitudes. The CSIRO Mk3.6.0 model also simulates the east–west contrast, but the maximum is located south of the NLDAS-UW analysis maximum.
5. Regional climate features
We next evaluate the CMIP5 models for a set of regional climate features that have important regional consequences, either directly such as extreme temperature and precipitation in the southern United States and the North American monsoon or indirectly such as western Atlantic cool season cyclones and the U.S. Great Plains low-level jet. The last analysis examines the simulation of Arctic sea ice, which is important locally but also has implications for North American climate and elsewhere (Francis and Vavrus 2012).
a. Cool season western Atlantic extratropical cyclones
Extratropical cyclones can have major impacts (heavy snow, storm surge, winds, and flooding) along the east coast of North America given the proximity of the western Atlantic storm track. The Hodges (1994, 1995) cyclone tracking scheme was implemented to track cyclones in 15 models (of which 12 were in the core set) for the cool seasons (November–March) for 1979–2004. The Climate Forecast System Reanalysis (CFSR) was used to estimate observed cyclone tracks. Six-hourly mean sea level pressure (MSLP) data were used to track the cyclones, since it was found that including 850-hPa vorticity tracking yielded too many cyclones. Since MSLP is strongly influenced by large spatial scales and strong background flows, a spectral bandpass filter was used to preprocess the data. Those wavelengths between 600 and 10 000 km were kept, and the MSLP pressure anomaly had to persist for at least 24 h and move at least 1000 km. Colle et al. (2013) describes the details of the tracking approach and validation of the tracking procedure.
Figure 10 shows the cyclone density during the cool season for the CFSR, mean and spread of the 15 models (see the legend of Fig. 11 for a complete listing), and select models for eastern North America and the western and central North Atlantic. There is a maximum in cyclone density in the CFSR over the Great Lakes, the western Atlantic from east of the Carolinas northeastward to east of Canada, and just east of southern Greenland (Fig. 10a). The largest maximum over the western Atlantic (6–7 cyclones per cool season per 50 000 km2) is located along the northern boundary of the Gulf Stream Current. The MME mean is able to realistically simulate the three separate maxima locations (Fig. 10b), but the amplitude is 10%–20% underpredicted. The cyclone density maximum over the western Atlantic does not conform to the boundary of the Gulf Stream as much as observed. There is a large intermodel spread near the Gulf Stream, since some models are able to better simulate western Atlantic density amplitude, such as the CCSM4 and HadGEM2-CC (Figs. 10e,f). However, the CCSM4 maximum is shifted a few hundred kilometers to the north.
The distribution of cyclone central pressures at their maximum intensity were also compared (Fig. 11) between the CFSR, MME mean, and individual models for the dashed box region in Fig. 10b. There is a peak in cyclone intensity in both the CFSR and MME mean around 900–1000 hPa, and there is large spread in the model intensity distribution by almost a factor of 2. The ensemble mean realistically predicts the number of average strength to relatively weak cyclones; however, the intensity distribution is too narrow compared to the CFSR, especially for the deeper cyclones <980 hPa.
Colle et al. (2013) verified the 15 models by calculating the spatial correlation and mean absolute errors of the cyclone track densities and central pressures. They ranked the models and showed that six of the seven best models were the higher-resolution models (top three: EC-Earth, MRI-CGM3, and CNRM-CM5), since many lower-resolution models, such as GFDL-ESM2M (Fig. 10d), underpredict the cyclone density and intensity. The MME mean calculated using the 12 core models has verification scores within 5% of those from all 15 models (not shown), so it is likely that using all 17 core models in the cyclone analysis would not have much impact on the results.
b. Northeast cool season precipitation
We next examine regional precipitation in the highly populated northeast United States, which is expected to increase in the future (Part III). The focus is on the cool season, since extratropical cyclones provide much of the heavy precipitation in the northeast. Of the 17 core models (listed in Fig. 11), 14 models (daily precipitation data were not available for 3 models) were evaluated for the cool seasons (November–March) of 1979–2004. The model daily precipitation was compared with the Climate Prediction Center (CPC)-Unified daily precipitation at 0.5° and CPC Merged Analysis of Precipitation (CMAP) monthly precipitation at 2.5° resolution.
Figures 12a–c shows the seasonal average precipitation for the two observational analyses and the MME mean and spread. The heaviest precipitation (700–1000 mm) is over the Gulf Stream, which is associated with the western Atlantic storm track. This maximum is well depicted in the multimodel mean, although it is underestimated by 50–200 mm and there is a moderate spread between models (100–200 mm). The precipitation over the northeast United States ranges from 375 mm in the northwestern part to around 500 mm at the coast. The finer-resolution CPC-Unified analysis has more variability downstream of the Great Lakes (lake effect snow) as well as some terrain enhancements. The models cannot resolve these smaller-scale precipitation features, but the MME mean realistically represents the north to south variation. However, the MME mean overestimates precipitation by 25–75 mm (5%–20%) over northern parts. Much of this overestimation is for thresholds greater than 5 mm day−1 over land. (Fig. 12d). The seasonal precipitation MME spread over the northeast is 100–150 mm (25%–40%), and much of this spread is reflected in the higher (>10 mm day−1) thresholds, with the BCC-CSM1.1 simulating less than the CPC-Unified analysis and a cluster of models, such as the INM-CM4.0 and MIROC5, having many more heavy precipitation events than observed.
The model precipitation was verified against the CPC-Unified analysis for the black box region over the northeast United States shown in Fig. 12b, and the models are ranked in terms of their mean absolute errors (MAE) (Table 10). The MME mean has the lowest MAE. There is little relationship with resolution, since some relatively higher-resolution models (e.g., MIROC5 and MRI-CGCM3) perform worse than many other lower-resolution models. Most models have a 5%–15% high bias in this region. There is little correlation (∼0.22) between the high biases in precipitation in this region and the cyclone overestimation along the U.S. East Coast, thus suggesting the cyclone biases are coming from other processes than diabatic heating errors from precipitation.
Error statistics for the CMIP5 model precipitation over the northeastern United States. The mean absolute error (millimeters per season), RMSE (mm day−1), and mean bias (model/observed) for 14 CMIP5 models verified using the daily CPC-Unified precipitation within the black box in Fig. 10b. The mean and standard deviation of the statistics across the multimodel ensemble are given, as well as the statistics of the MME mean precipitation.
c. Extreme temperature and rainfall over the southern United States
The southern regions of the United States are historically prone to extreme climate events such as extreme summer temperatures, flood and dry spells. Previous CMIP and U.S. climate impact assessments (Karl et al. 2009) have projected a large increase of these extreme events over regions of the south [southwest (SW), south central (SC), southeast (SE)], especially for the SW and SC United States. However, to what extent climate models can adequately represent the statistical distributions of these extreme events over these regions is still unclear. Figure 13 compares the model simulated precipitation and temperature with observations as Taylor diagrams for 1) the annual number of heavy precipitation days (precipitation > 10 mm day−1) and 2) the number of hot days [Tmax > 32°C (90°F)]. The observations are derived from the GHCN daily Tmax and Tmin gauge data and the CPC United States–Mexico daily gridded precipitation dataset. Results are shown for 15 models, 11 of which are core models, in terms of the spatial correlation with the observations and standard deviation normalized by the observations. Table 11 also shows the regional biases.
Annual bias in the number of heavy precipitation days (precipitation > 10 mm day−1) and hot days (Tmax > 32°C (90°F)] for the southern U.S. regions (defined in Fig. 13) for 1979–2005. The mean and standard deviation of the biases across the multimodel ensemble are given, as well as the statistics of the MME mean heavy precipitation and hot days. Observed actual values from the GHCN and CPC datasets are shown in parentheses. All data were regridded to 2.5° resolution.
Overall, the spatial distribution of the number of heavy precipitation days is better simulated in the SW and SC than the SE, for which the spatial correlations are below 0.5, with many models having negative correlations. The normalized standard deviations are less than observed indicating that the models cannot capture the high spatial variability in this region. Part of the reason for this may be the severe underestimation of number of tropical cyclones (Part II), although other factors are likely involved such as the biases in summertime convective precipitation. For the SW and SC regions, the models do reasonably well at replicating the spatial variation although with some spread across models (correlation values of 0.56–0.91 and 0.59–0.97 for the SW and SC, respectively). The MME mean simulated number of heavy precipitation is biased slightly high for the SW (but note that the observed number of days, 8.5, is small) and low for the SC and SE. For individual models, the GISS-E2-R model has a large high bias in the SW and the CanESM2, GFDL CM3, HadCM3, IPSL-CM5A-LR, and MIROC5 models have large low biases (>10 days) in the SC and SE. Several models do reasonably well for all regions (GFDL-ESM2G, GFDL-ESM2M, HadGEM2-CC, HadGEM2-ES, MIROC4h, MPI-ESM-LR, and MRI-CGCM3) in terms of their biases.
The number of hot days (Tmax > 32°C) is underestimated by the MME mean for all regions by between about 12 and 19 days, which is consistent with the underestimation of summer days (Tmax > 25°C) shown for North America in Fig. 7. Again the performance of the models in terms of spatial patterns and variability, and regional bias is generally worse for the SE. Interestingly, the three Hadley Center models considered here (HadCM3, HadGEM2-CC, and HadGEM2-ES) have the lowest biases for the SW and SC (except for the CCSM4) and in the SE (except for the MIROC models). The models tend to overestimate the spatial variability in the SW and underestimate it the SE and the spatial correlations for the SW > SC > SE. The MIROC4h, which is a very-high-resolution model (0.56° grid), stands out for all regions and both variables as having high spatial correlation and low bias for heavy precipitation days, although it generally has too high spatial variability relative to the observations.
d. North American monsoon
The North American monsoon (NAM) brings rainfall to southern Mexico in May, expanding northward to the southwest United States by late June or early July. Monsoon rainfall accounts for roughly 50%–70% of the annual totals in these regions (Douglas et al. 1993; Adams and Comrie 1997), with the annual percentages decreasing northward where winter rains become increasingly important. The annual cycle of precipitation from the ITCZ through the NAM region is examined in Fig. 14. The MME mean from the 17 core models (averaged for longitudes 102.5°–115°W for 1979–2005) replicates the northward migration of precipitation in the NAM region during the warm season but is biased low. However, the MME mean precipitation begins later, ends later, and is stronger than the observed estimate from CMAP within the core monsoon region north of 20°N. Within the latitudes of the ITCZ (up to 12°N), the models strongly underestimate the precipitation and fail to show the northward migration from stronger precipitation in May south of 8°N to a maximum in July near 10°N. Instead, the models tend to place the spring maximum at 10°N and have a late buildup and late demise at all latitudes of the ITCZ through boreal summer. Table 12 shows the RMSE for individual models over the domain shown in Fig. 14 and indicates that the CanESM2, HadCM3, and HadGEM2-ES models have the lowest errors (<0.75 mm day−1) and the BCC-CSM1.1, NorESM1-M, and MRI-CGCM3 have the highest errors (>1.9 mm day−1).
Annual mean RMSE for precipitation (mm day−1) for each of the 17 core CMIP5 models compared with CMAP observed estimates for the North American monsoon region, 20°–35°N, 102.5°–115°W. The mean and standard deviation of the RMSE across the multimodel ensemble are also given. The CMAP and model data were regridded to T42 resolution (∼2.8°).
The seasonal cycle of monthly precipitation in the core NAM region of northwest Mexico (23.875°–28.875°N, 108.875°–104.875°W) is also examined in Table 13 and Fig. 15 for the core models plus four other models. Our core domain is similar to that used by the North American Monsoon Experiment (NAME; Higgins et al. 2006) and related studies (e.g., Higgins and Gochis 2007; Gutzler et al. 2009) but has been reduced in size to ensure consistency of the monsoon precipitation signal at each grid point. Following the methodology of Liang et al. (2008) for analysis of CMIP3 data, we calculate a phase and RMS error of each model's seasonal cycle, where the phase error is defined as the lag in months with the best correlation to the observations (Table 13). The observations used are the NOAA precipitation dataset (P-NOAA), which is a recently developed gauge-based dataset that is likely more accurate than the CMAP for this region. We additionally calculate each model's annual bias as a percentage of the mean monthly climatological P-NOAA value (1.66 mm day−1). The seasonal cycles for models with small (lag = 0), moderate (lag = 1) and large (lag = 2–4) phase errors are shown in Figs. 15a–c. Figure 15d shows the MME mean for all phase errors, their spread, and the observations.
CMIP5 model error statistics for the simulation of the North American monsoon in the core region of northwest Mexico (23.875°–28.875°N, 108.875°–104.875°W), calculated with respect to the P-NOAA observational dataset. The mean and standard deviation of the statistics across the multimodel ensemble are given also. The model data were regridded to the P-NOAA resolution (0.5°).
Overall the small phase error models tend to overestimate rainfall in the core NAM region compared to the two observational datasets throughout the year, with the largest errors seen in fall, consistent with Fig. 14. The overestimation of rainfall by the models beyond the end of the monsoon season is also apparent in the small and large phase error CMIP3 models (Liang et al. 2008). The similarity between the range of RMSE values (0.46–2.23 mm day−1) in their study of CMIP3 models and that of the CMIP5 models in this analysis indicates that there has been no improvement in the magnitude of the simulated annual cycle of monthly precipitation, with the lowest and highest RMSE values having increased slightly since the previous generation of models. On the other hand, there does seem to be improvement in the timing of seasonal precipitation shifts, with 13 out of 21 (62%) CMIP5 models having a phase lag of zero months as compared to 6 out of 17 (35%) CMIP3 models in Liang et al. (2008). The top ranking models for phase, RMSE, and bias shown in Table 13 (HadCM3, HadGEM2-ES, CNRM-CM5, CanESM2, and HadGEM2-CC) are also the models with the highest spatial correlations of May–October 850-hPa geopotential heights and winds when compared with the Interim European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-Interim; Geil et al. 2013). The HadCM3, HadGEM2-ES, and CanESM2 also perform the best over the larger monsoon region (Table 12). Geil et al. (2013) find that the models that best represent the seasonal shift of the monsoon ridge and subtropical highs over the North Pacific and Atlantic tend to have the least trouble ending the monsoon, suggesting there is room for improvement over the region through an improved representation of the seasonal cycle in these large-scale features.
e. Great Plains low-level jet
An outstanding feature of the warm season (May–September) circulation in North America is the strong and channeled southerly low-level flows, or the Great Plains low-level jet (LLJ), from the Gulf of Mexico to the central United States and the Midwest (Bonner and Paegle 1970; Mitchell et al. 1995). The LLJ emerges in early May in the transition of the circulation from the cold to the warm season. It reaches its maximum strength in June and July. After August, the jet weakens and disappears in September when the cold season circulation starts to set in. While many studies have examined specific processes associated with the LLJ (Blackadar 1957; Wexler 1961; Holton 1967), such as its nocturnal peak in diurnal wind speed oscillation, as well as precipitation, the jet is a part of the seasonal circulation shaped primarily by the orographic configuration in North America, particularly the Rocky Mountain Plateau (e.g., Wexler 1961). An important climatic role of the LLJ is transporting moisture from the Gulf of Mexico to the central and eastern United States (Benton and Estoque 1954; Rasmusson 1967; Helfand and Schubert 1995; Byerle and Paegle 2003). Because the moisture is essential for development of precipitation, even though additional dynamic processes are required for the latter to happen (Veres and Hu 2013), correctly describing the LLJ and its seasonal cycle is critical for simulating and predicting warm season precipitation and climate in central North America.
Outputs from eight of the core models (CanESM2, CCSM4, CNRM-CM5, GFDL-ESM2M, HadGEM2-ES, MIROC5, MPI-ESM-LR, and MRI-CGCM3) were analyzed for their simulation of the LLJ. Figure 16 compares the spatial profile and seasonal cycle between the MME mean and the NCEP–National Center for Atmospheric Research (NCAR) reanalysis in terms of the summer 925-hPa winds, the vertical structure of the summer meridional wind, and the seasonal cycle of the LLJ. While the overall features of the simulated LLJ compare well with the reanalysis results, several details differ. First of all, the models produce a peak meridional wind around 925 hPa, whereas the reanalysis result peaks around 850 hPa. This difference has little impact from the vertical resolution of the models and the reanalysis because they share the same vertical resolution below 500 hPa. For a few models that have more model levels below 500 hPa, their vertical profile of the meridional wind shows a similar peak at 925 hPa. The vertical extent of the LLJ is shallower than that shown in the reanalysis, as suggested by the differences in Fig. 16f, which may be related to the peak wind being at a lower level in the troposphere. Second, the simulated LLJ extends much further northward in the Great Plains than the reanalysis (Figs. 16g–i). For the seasonal cycle, the models show strong southerly winds that persist from mid-May to near the end of July, whereas the reanalysis shows that the LLJ weakens substantially in early July (Fig. 16i). While these detailed differences exist, the error statistics in Table 14 indicate that these eight models simulate the LLJ satisfactorily.
Error statistics for the simulation of the Great Plains low-level jet (GPLLJ). The statistics are calculated over the regions shown in Fig. 17 and are the RMSE and the index of agreement (Legates and McCabe 1999). The mean and standard deviation of the statistics across the multimodel ensemble are also given. The data were regridded to 2.5° resolution before calculating the statistics.
f. Arctic/Alaska sea ice
Since routine monitoring by satellites began in late October 1978, Arctic sea ice has declined in all calendar months (e.g., Serreze et al. 2007). Trends are largest at the end of the summer melt season in September with a current rate of decline through 2012 of −14.3% decade−1. Regionally, summer ice losses have been pronounced in the Beaufort, Chukchi, and East Siberian Seas since 2002 causing a lengthening of the ice-free season. The presence of sea ice helps to protect Alaskan coastal regions from wind-driven waves and warm ocean water that can weaken frozen ground. As the sea ice has retreated farther from coastal regions and ice-free summer conditions are lasting for longer periods of time (in some regions by more than 2 months during the satellite data record), wind-driven waves, combined with permafrost thaw and warmer ocean temperatures, have led to rapid coastal erosion (Mars and Houseknecht 2007; Jones et al. 2009).
While the winter ice cover is not projected to disappear in the near future, all models that contributed to the IPCC 2007 report showed that, as temperatures rise, the Arctic Ocean would eventually become ice free in the summer (e.g., Stroeve et al. 2007). However, estimates differed widely, with some models suggesting a transition toward a seasonally ice-free Arctic may happen before 2050 and others sometime after 2100. To reduce the spread, some studies suggest using only models that are able to reproduce the historical sea ice extent (e.g., Overland 2011; Wang and Overland 2009).
Historical sea ice extent (1953–2005) from 26 models during September and March is presented as box and whisker plots (Fig. 17), constructed from all ensemble members of all models, with the width of the box representing the number of ensemble members. Table 15 shows the biases for the individual models. Five climate models (CanESM2, EC-EARTH, GISS-E2-R, HadGEM2-AO, and MIROC4h) have mean September extents that fall below the minimum observed value, with EC-EARTH, GISS-E2-R, and CanESM2 having more than 75% of their extents below the minimum observed value. Three models (CSIRO Mk3.6.0, FGOALS-s2, and NorESM1-M) have more than 75% of their extents above the maximum observed value. Overall, 14 models have mean extents below the observed 1979–2005 mean September extent. During March, several models fall outside the observed range of extents, with 16 models having more than 75% of their extents outside the observed maximum and minimum values (8 above and 8 below). Six models essentially straddle the mean observed March sea ice extent.
Biases in CMIP5 model Arctic sea ice extent and thickness. Biases are based on the ensemble mean for each model that has more than one ensemble member and computed relative to the observed value. The mean and standard deviation of the statistics across the multimodel ensemble are also given. September extent bias is in 106 km2. March ice thickness bias is in meters. For the calculation of March thickness bias, the model data for 1993–2005 were regridded to the ICESat resolution and compared to the ICESat data for 2003–09.
Spatial maps of March CMIP5 sea ice thickness averaged from 1993 to 2005 are shown in Fig. 18 together with thickness estimates from the Ice, Cloud, and Land Elevation Satellite (ICESat; 2003–09; Kwok and Cunningham 2008). Table 15 shows the biases for the individual models relative to the ICESat data. While we do not expect the models to be in phase with the observed natural climate variability and therefore accurately represent the magnitude of the ICESat thickness fields, it is important to assess whether or not the models are able to reproduce the observed spatial distribution of ice thickness. Data from ICESat and IceBridge, as well as earlier radar altimetry missions [European Remote Sensing Satellite-1 (ERS-1) and ERS-2], and submarine tracks indicate that the thickest ice is located north of Greenland and the Canadian Archipelago (>5 m thick), where there is an onshore component of ice motion resulting in strong ridging. Thicknesses are smaller on the Eurasian side of the Arctic Ocean, where there is persistent offshore motion of ice and divergence, leading to new ice growth in open water areas. Most models fail to show thin ice close to the Eurasian coast and thicker ice along the Canadian Arctic Archipelago and north coast of Greenland. Instead, many models show a ridge of thick ice that spans north of Greenland across the Lomonosov Ridge toward the East Siberian shelf, with thinner ice in the Beaufort/Chukchi and the Kara/Barents Seas. In large part, this is explained in terms of biases in the distribution of surface winds; for example, if a model fails to produce a well-structured Beaufort Sea High, this will adversely affect the ice drift pattern and hence the thickness pattern. Nevertheless, when we compare mean thickness fields from ICESat with thickness fields from the CMIP5 models, we find that, for the Arctic Ocean as a whole, the thickness distributions from the models overlap with those from the satellite product. However, for the North American side of the Arctic Ocean model thicknesses tend be smaller than thicknesses estimated from ICESat. This in part explains the low bias in September ice extent for some of the models, as thinner ice is more prone to melting out in summer. Models with extensively thick winter ice (e.g., NorESM1-M and MIROC5) on the other hand tend to overestimate the observed September ice extent.
6. Discussion and conclusions
a. Synthesis of model performance
This study evaluates the CMIP5 models for a set of basic climate and surface hydrological variables for annual and seasonal means and extremes, and selected regional climate features. Evaluations of model performance are not straightforward because of the broad range of uses of climate model data (Gleckler et al. 2008) and therefore there is not an accepted universal set of performance metrics. Issues relevant to performance are dependent on several elements including decadal variability; observational uncertainties; and the fact that some models are tuned to certain processes, often at the expense of other aspects of climate. The performance metrics evaluated here are generally focused on basic climate variables and standard statistical measures such as bias, RMSE, and spatial correlation. One of the strengths of this study is the broad range of evaluations that test multiple aspects of the model simulations at various time and space scales and for specific important regional features that we do not necessarily expect coarse-resolution models to simulate well. Independently these metrics indicate much better performance by certain models relative to the ensemble, while some models have poor performance in that a feature is not simulated at all, such as lack of persistence in extreme hydrological events, or the errors are unacceptably large. However, it is not clear whether one model or set of models performs better than others for the full set of climate variables.
Figure 19 shows a summary ranking of model performance across all continental and U.S. domain analyses presented in sections 3 and 4 in terms of biases with the observational estimates. We choose not to show results for the regional processes as these are generally for fewer models and only provide one sample of important features of North American climate. Other metrics, such as the RMSE, could have been used, but the bias values were available for all continental analyses. Model performance is shown by two methods: the first is the normalized bias, calculated as the difference of the absolute model bias from the lowest absolute bias value, divided by the range in absolute bias values across all models. A value of 0.0 indicates the lowest absolute bias and a value of 1.0 indicates the highest value. The values reflect the distribution of values across models such that outlier models are still identified. The second method is the rank of the sorted absolute bias values, which is uniformly distributed.
The first thing to note is the difference in spread between the two panels in Fig. 19 for individual metrics, which is a reflection of the two types of distribution of values (model ensemble dependent and uniform). The normalized metrics highlight outlier models that perform much better than the rest of the models [e.g., GFDL-ESM2M for JJA temperature] or much worse (e.g., INM-CM4.0 for DJF precipitation or CanEMS2 for JJA temperature). Another example is persistent precipitation events (P Persist for the United States), for which there is a cluster of eight models that do equally poorly compared to the rest of the models. No single model stands out as being better or worse for multiple metrics. Some models do relatively well for the same variable and a single/both season across all regions, such as HadGEM2-ES and MIROC5 for precipitation.
The rankings in the bottom panel are more clustered across analyses, such as for the Hadley Center models for DJF temperature, although the actual biases are generally not that different to the other models. The MRI-CGCM3 is consistently ranked low for runoff ratios in all regions and for the number of summer/frost days and growing season length (and in terms of the normalized metrics). The INM-CM4.0 model is consistently ranked low for precipitation in both seasons and DJF temperature, although again its normalized values are generally not very different from the other models. It is tempting to provide an overall ranking or weighted metric across all analyses for each model, but there is no obvious way of doing this for a diverse set of metrics, although this has been attempted in other studies (e.g., Reichler and Kim 2008). Nevertheless, it is useful to identify those models that are ranked highly for multiple metrics. For example, the following models are ranked in the top 5 for at least 12 metrics (approximately one-third of the total number of metrics): MPI-ESM-LR (16 metrics), GISS-E2-R (15 metrics), CCSM4 (14 metrics), CSIRO Mk3.6.0 (14 metrics), and BCC-CSM1.1 (12 metrics). The following are the bottom two models: GFDL CM3 (6 metrics) and INM-CM4.0 (4 metrics).
b. Changes in performance between CMIP3 and CMIP5 for basic climate variables
A key question is whether the CMIP5 results have improved since CMIP3 and why. As mentioned in the introduction, the CMIP5 models generally have higher horizontal resolution and have improved parameterizations and additional process representations since CMIP3. Several of the analyses presented here indicate improved results since CMIP3 (e.g., for the North American monsoon), by comparison with earlier studies. Here we show a direct comparison of CMIP5 with CMIP3 results for basic climate variables in Fig. 20, which shows RMSE values for CMIP5 and CMIP3 models for seasonal precipitation and surface air temperature over North America and SSTs over the surrounding oceans. Of the 17 core CMIP5 models, 14 have an equivalent CMIP3 model that is the same model (HadCM3), a newer version, or an earlier related version, and so a direct comparison of any improvements since CMIP5 is feasible.
Overall, the MME mean performance has improved slightly in CMIP5 for nearly all variables. For example, there is a reduction in the MME mean RMSE for summer precipitation (0.90 mm day−1 for CMIP3 and 0.86 mm day−1 for CMIP5) and for winter SSTs (1.72°–1.55°C). The largest percentage reduction in RSME for the MME mean is for summer temperatures (11.8% reduction in RMSE). The spread in model performance (as quantified by the standard deviation) has remained about the same for precipitation, increased for temperature, and decreased for SSTs. The increase in spread for temperature is due to both increases and decreases in model performance relative to the CMIP3 models. Several models have improved considerably and across nearly all variables and seasons, such as the CCSM4, INM-CM4.0, IPSL-CM5A-LR, and MIROC5. Reductions in performance for individual models are less prevalent across variables but are large for CSIRO Mk3.6.0, HadCM3, and MRI-CGCM3 for SSTs in both seasons. The CanESM2 has worse performance than its CMIP3 equivalent (CGCM3.1) for all variables, although it is unclear how the two models are related. Interestingly the HadCM3 model, which is used for both the CMIP3 and CMIP5 simulations, appears to have degraded in performance for SSTs.
c. Summary and conclusions
We have evaluated the CMIP5 multimodel ensemble for its depiction of North American continental and regional climatology, with a focus on a core set of models. Overall, the multimodel ensemble does reasonably well in representing the main features of basic surface climate over North America and the adjoining seas. Regional performance for basic climate variables is highly variable across models, however, and this can bias the assessment of the ensemble because of outlier models and therefore the median value may be a better representation of the central tendency of model performance (Liepert and Previdi 2012). No particular model stands out as performing better than others across all analyses, although some models perform much better for sets of metrics, mainly for the same variable across different regions. Higher-resolution models tend to do better at some aspects than others, especially for the regional features as expected, but not universally so and not for basic climate variables. ESMs that simulate the coupled carbon cycle and therefore have different atmospheric GHG concentrations do not stand out as performing better or worse than other models.
There are systematic biases in precipitation with overestimation for more humid and cooler regions and underestimation for drier regions. Biases in precipitation filter down to biases in the surface hydrology, although this is also related to the representation of the land surface in many models, with implications for assessment of water resources and hydrological extremes. The poor performance in representing observed seasonal persistence in precipitation and soil moisture is a reflection of this. As many of the errors are systematic across models, there is potential for diagnosing these further based on a multimodel analysis.
The models have a harder time representing extreme values, such as those based on temperature and precipitation. The biases in temperature means and extremes may be related to those in land hydrology that affect the surface energy balance and therefore can impact how much energy goes into heating the near-surface air during dry periods and in drier regions. Biases in precipitation and its extremes are likely related to differences in large-scale circulation and SST patterns, as well as problems in representing regional climate features. Hints of this are shown in the some of the analyses presented here, such as the errors in regional moisture divergence over North America, but linkages between other regional climate features and terrestrial precipitation biases are not apparent, such as for western Atlantic winter cyclones, and further investigation is required to diagnose these. Part II indicates that most models have trouble representing teleconnections between modes of climate variability (such as ENSO) and continental surface climate variables, and this may also reflect the representation of mean climate.
Overall, the performance of the CMIP5 models in representing observed climate features has not improved dramatically compared to CMIP3, at least for the set of models and climate features analyzed here. There are some models that have improved for certain features (e.g., the timing of the North American monsoon), but others that have become worse (e.g., continental seasonal surface climate).
The results of this paper have implications for the robustness of future projections of climate and its associated impacts. Part III evaluates the CMIP5 models for North America in terms of the future projections for the same set of climate features as evaluated for the twentieth century in this first part and in Part II. While model historical performance is not sufficient for credible projections, the depiction of at least large-scale climate features is necessary. Overall, the models do well in replicating the broad-scale climate of North America and some regional features, but biases in some aspects are of the same magnitude as the projected changes (Part III). For example, the low bias in daily maximum temperature over the southern United States in some models is similar to the future projected changes. Furthermore, the uncertainty in the future projections across models can also be of the same magnitude the model spread for the historic period.
Acknowledgments
We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output. For CMIP, the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. The authors acknowledge the support of NOAA Climate Program Office Modeling, Analysis, Predictions and Projections (MAPP) program as part of the CMIP5 Task Force.
REFERENCES
Adams, D. K., and A. C. Comrie, 1997: The North American monsoon. Bull. Amer. Meteor. Soc.,78, 2197–2213.
Adler, R. F., and Coauthors, 2003: The version-2 Global Precipitation Climatology Project (GPCP) monthly precipitation analysis (1979–present). J. Hydrometeor., 4, 1147–1167.
Arora, V. K., and Coauthors, 2011: Carbon emission limits required to satisfy future representative concentration pathways of greenhouse gases. Geophys. Res. Lett., 38, L05805, doi:10.1029/2010GL046270.
Bao, Q., and Coauthors, 2012: The Flexible Global Ocean-Atmosphere-Land System model, spectral version 2: FGOALS-s2. Adv. Atmos. Sci., 30, 561–576.
Benton, G. S., and M. A. Estoque, 1954: Water-vapor transfer over the North American continent. J. Meteor.,11, 462–477.
Bi, D., and Coauthors, 2012: ACCESS: The Australian Coupled Climate Model for IPCC AR5 and CMIP5. Extended Abstracts, 18th AMOS National Conf., Sydney, Australia, Australian Meteorological and Oceanographic Society, 123.
Blackadar, A. K., 1957: Boundary layer wind maxima and their significance for their growth of nocturnal inversions. Bull. Amer. Meteor. Soc., 38, 283–290.
Bonner, W. D., and J. Paegle, 1970: Diurnal variations in boundary layer winds over the south-central United States in summer. Mon. Wea. Rev.,98, 735–744.
Byerle, L. A., and J. Paegle, 2003: Modulation of the Great Plains low-level jet and moisture transports by orography and large scale circulations. J. Geophys. Res., 108, 8611, doi:10.1029/2002JD003005.
Caesar, J., L. Alexander, and R. Vose, 2006: Large-scale changes in observed daily maximum and minimum temperatures: Creation and analysis of a new gridded data set. J. Geophys. Res., 111, D05101, doi:10.1029/2005JD006280.
Colle, B. A., Z. Zhang, K. Lombardo, P. Liu, E. Chang, M. Zhang, and S. Hameed, 2013: Historical evaluation and future prediction of eastern North American and western Atlantic extratropical cyclones in CMIP5 models during the cool season. J. Climate, 26, 6882–6903.
Collins, M., S. F. B. Tett, and C. Cooper, 2001: The internal climate variability of HadCM3, a version of the Hadley Centre Coupled Model without flux adjustments. Climate Dyn., 17, 61–81.
Compo, G. P., and Coauthors, 2011: The Twentieth Century Reanalysis Project. Quart. J. Roy. Meteor. Soc., 137, 1–28, doi:10.1002/qj.776.
Donner, L. J., and Coauthors, 2011: The dynamical core, physical parameterizations, and basic simulation characteristics of the atmospheric component AM3 of the GFDL global coupled model CM3. J. Climate, 24, 3484–3519.
Douglas, M. W., R. A. Maddox, K. Howard, and S. Reyes, 1993: The Mexican monsoon. J. Climate,6, 1665–1677.
Dufresne, J.-L., and Coauthors, 2013: Climate change projections using the IPSL-CM5 Earth system model: From CMIP3 to CMIP5. Climate Dyn., 40, 2123–2165.
Fetterer, F., K. Knowles, W. Meier, and M. Savoie, 2002: Sea ice index. National Snow and Ice Data Center, Boulder, CO, digital media. [Available online at http://nsidc.org/data/G02135.html.]
Francis, J. A., and S. J. Vavrus, 2012: Evidence linking Arctic amplification to extreme weather in mid-latitudes. Geophys. Res. Lett., 39, L06801, doi:10.1029/2012GL051000.
Frich, P., L. V. Alexander, P. Della-Marta, B. Gleason, M. Haylock, A. M. G. Klein Tank, and T. Peterson, 2002: Observed coherent changes in climatic extremes during the second half of the twentieth century. Climate Res., 19, 193–212.
Geil, K. L., Y. L. Serra, and X. Zeng, 2013: Assessment of CMIP5 model simulations of the North American monsoon system. J. Climate,in press.
Gent, P. R., and Coauthors, 2011: The Community Climate System Model version 4. J. Climate,24, 4973–4991.
Giorgi, F., and R. Francisco, 2000: Uncertainties in regional climate change prediction: A regional analysis of ensemble simulations with the HADCM2 coupled AOGCM. Climate Dyn., 16 (2–3), 169–182, doi:10.1007/PL00013733.
Gleckler, P. J., K. E. Taylor, and C. Doutriaux, 2008: Performance metrics for climate models. J. Geophys. Res., 113, D06104, doi:10.1029/2007JD008972.
Gutzler, D. S., and Coauthors, 2009: Simulations of the 2004 North American monsoon: NAMAP2. J. Climate,22, 6716–6740.
Hall, A., X. Qu, and J. D. Neelin, 2008: Improving predictions of summer climate change in the United States. Geophys. Res. Lett., 35, L01702, doi:10.1029/2007GL032012.
Hazeleger, W., and Coauthors, 2010: EC-Earth: A seamless Earth-system prediction approach in action. Bull. Amer. Meteor. Soc., 91, 1357–1363.
Helfand, H. M., and S. D. Schubert, 1995: Climatology of the simulated Great Plains low-level jet and its contribution to the continental moisture budget of the United States. J. Climate, 8, 784–806.
Higgins, W., and D. Gochis, 2007: Synthesis of results from the North American Monsoon Experiment (NAME) process study. J. Climate,20, 1601–1607.
Higgins, W., J. E. Janowiak, and Y.-P. Yao, 1996: A Gridded Hourly Precipitation Data Base for the United States (1963-1993). NCEP/Climate Prediction Center Atlas 1, 47 pp.
Higgins, W., and Coauthors, 2006: The NAME 2004 field campaign and modeling strategy. Bull. Amer. Meteor. Soc.,87, 79–94.
Hodges, K. I., 1994: A general method for tracking analysis and its application to meteorological data. Mon. Wea. Rev., 122, 2573–2586.
Hodges, K. I., 1995: Feature tracking on the unit sphere. Mon. Wea. Rev., 123, 3458–3465.
Holton, J. R., 1967: The diurnal boundary layer wind oscillation above sloping terrain. Tellus, 19, 199–205.
Jones, B. M., C. D. Arp, M. T. Jorgenson, K. M. Hinkel, J. A. Schmutz, and P. L. Flint, 2009: Increase in the rate and uniformity of coastline erosion in Arctic Alaska. Geophys. Res. Lett., 36, L03503, doi:10.1029/2008GL036205.
Jones, C. D., and Coauthors, 2011: The HadGEM2-ES implementation of CMIP5 centennial simulations. Geosci. Model Dev., 4, 543–570, doi:10.5194/gmd-4-543-2011.
Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437–471.
Kanamitsu, M., W. Ebisuzaki, J. S. Woollen, S.-K. Yang, J. J. Hnilo, M. Fiorino, and G. L. Potter, 2002: NCEP–DOE AMIP II Reanalysis (R-2). Bull. Amer. Meteor. Soc., 83, 1631–1643.
Karl, T. R., J. M. Melillo, and T. C. Peterson, 2009: Global Climate Change Impacts in the United States. Cambridge University Press, 188 pp.
Kim, D., A. H. Sobel, A. D. Del Genio, Y. Chen, S. Camargo, M.-S. Yao, M. Kelley, and L. Nazarenko, 2012: The tropical subseasonal variability simulated in the NASA GISS general circulation model. J. Climate, 25, 4641–4659.
Kwok, R., and G. F. Cunningham, 2008: ICESat over Arctic sea ice: Estimation of snow depth and ice thickness. J. Geophys. Res., 113, C08010, doi:10.1029/2008JC004753.
Legates, D. R., and G. J. McCabe, 1999: Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res., 35, 233–241.
Liang, X.-Z., J. Zhu, K. E. Kunkel, M. Ting, and J. X. L. Wang, 2008: Do CGCMs simulate the North American monsoon precipitation seasonal–interannual variations? J. Climate, 21, 3755–3775.
Liepert, B. G., and M. Previdi, 2012: Inter-model variability and biases of the global water cycle in CMIP3 coupled climate models. Environ. Res. Lett., 7, 014006, doi:10.1088/1748-9326/7/1/014006.
Mars, J. C., and D. W. Houseknecht, 2007: Quantitative remote sensing study indicates doubling of coastal erosion rate in past 50 yr along a segment of the Arctic coast of Alaska. Geology, 35, 583–586, doi:10.1130/G23672A.1.
Maurer, E. P., A. W. Wood, J. C. Adam, D. P. Lettenmaier, and B. Nijssen, 2002: A long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States. J. Climate,15, 3237–3251.
McKee, T. B., J. Nolan, and J. Kleist, 1993: The relationship of drought frequency and duration to time scales. Proc. Eighth Conf. on Applied Climatology, Anaheim, CA, Amer. Meteor. Soc., 179–184.
Merryfield, W. J., and Coauthors, 2013: The Canadian seasonal to interannual prediction system. Part I: Models and initialization. Mon. Wea. Rev., 141, 2910–2945.
Mitchell, M. J., R. W. Arritt, and K. Labas, 1995: A climatology of the warm season Great Plains low-level jet using wind profiler observations. Wea. Forecasting, 10, 576–591.
Mitchell, T. D., and P. D. Jones, 2005: An improved method of constructing a database of monthly climate observations and associated high-resolution grids. Int. J. Climatol., 25, 693–712.
Mo, K. C., 2008: Model-based drought indices over the United States. J. Hydrometeor.,9, 1212–1230.
Mo, K. C., and J. E. Schemm, 2008: Droughts and persistent wet spells over the United States and Mexico. J. Climate,21, 980–994.
Overland, J. E., 2011: Potential Arctic change through climate amplification processes. Oceanography, 24, 176–185, doi:10.5670/oceanog.2011.70.
Rasmusson, E. M., 1967: Atmospheric water vapor transport and the water balance of the North America. Part I: Characteristics of the water vapor flux field. Mon. Wea. Rev., 95, 403–426.
Rayner, N. A., D. E. Parker, E. B. Horton, C. K. Folland, L. V. Alexander, D. P. Rowell, E. C. Kent, and A. Kaplan, 2003: Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. J. Geophys. Res.,108, 4407, doi:10.1029/2002JD002670.
Reichler, T., and J. Kim, 2008: How well do coupled models simulate today's climate? Bull. Amer. Meteor. Soc., 89, 303–311.
Rodell, M., and Coauthors, 2004: The Global Land Data Assimilation System. Bull. Amer. Meteor. Soc.,85, 381–394.
Rotstayn, L. D., M. A. Collier, Y. Feng, H. B. Gordon, S. P. O'Farrell, I. N. Smith, and J. Syktus, 2010: Improved simulation of Australian climate and ENSO-related rainfall variability in a GCM with an interactive aerosol treatment. Int. J. Climatol.,30, 1067–1088, doi:10.1002/joc.1952.
Saha, S., and Coauthors, 2010: The NCEP Climate Forecast System Reanalysis. Bull. Amer. Meteor. Soc.,91, 1015–1057.
Sakamoto, and Coauthors, 2012: MIROC4h—A new high-resolution atmosphere-ocean coupled general circulation model. J. Meteor. Soc. Japan, 90, 325–359.
Schwartz, M. D., and B. E. Reiter, 2000: Changes in North American spring. Int. J. Climatol.,20, 929–932.
Serreze, M. C., M. M. Holland, and J. Stroeve, 2007: Perspectives on the Arctic's shrinking sea ice cover. Science,315, 1533–1536.
Sheffield, J., and E. F. Wood, 2007: Characteristics of global and regional drought, 1950–2000: Analysis of soil moisture data from off-line simulation of the terrestrial hydrologic cycle. J. Geophys. Res., 112, D17115, doi:10.1029/2006JD008288.
Sheffield, J., and Coauthors, 2013: North American climate in CMIP5 experiments. Part II: Evaluation of twentieth-century intraseasonal to decadal variability. J. Climate, in press.
Stroeve, J., M. M. Holland, W. Meier, T. Scambos, and M. Serreze, 2007: Arctic sea ice decline: Faster than forecast. Geophys. Res. Lett., 34, L09501, doi:10.1029/2007GL029703.
Svoboda, M., and Coauthors, 2002: The Drought Monitor. Bull. Amer. Meteor. Soc.,83, 1181–1190.
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498.
Veres, M. C., and Q. Hu, 2013: AMO-forced regional processes affecting summertime precipitation variations in the central United States. J. Climate, 26, 276–290.
Voldoire, A., and Coauthors, 2013: The CNRM-CM5.1 global climate model: Description and basic evaluation. Climate Dyn., 40, 2091–2121, doi:10.1007/s00382-011-1259-y.
Volodin, E. M., N. A. Diansky, and A. V. Gusev, 2010: Simulating present-day climate with the INMCM4.0 coupled model of the atmospheric and oceanic general circulations. Inv. Atmos. Oceanic Phys., 46, 414–431.
Vose, R. S., R. L. Schmoyer, P. M. Steurer, T. C. Peterson, R. Heim, T. R. Karl, and J. Eischeid, 1992: The Global Historical Climatology Network: Long-term monthly temperature, precipitation, sea level pressure, and station pressure data. Oak Ridge National Laboratory Carbon Dioxide Information Analysis Center Rep. ORNL/CDIAC-53, NDP-041, 324 pp.
Wang, A., T. J. Bohn, S. P. Mahanama, R. D. Koster, and D. P. Lettenmaier, 2009: Multimodel ensemble reconstruction of drought over the continental United States. J. Climate, 22, 2694–2712.
Wang, C., and D. B. Enfield, 2001: The tropical Western Hemisphere warm pool. Geophys. Res. Lett., 28, 1635–1638.
Wang, M., and J. E. Overland, 2009: A sea ice free summer Arctic within 30 years? Geophys. Res. Lett., 36, L07502, doi:10.1029/2009GL037820.
Watanabe, M., and Coauthors, 2010: Improved climate simulation by MIROC5: Mean states, variability, and climate sensitivity. J. Climate, 23, 6312–6335.
Wexler, H., 1961: A boundary layer interpretation of the low level jet. Tellus, 13, 368–378.
Xie, P., and P. A. Arkin, 1997: Global precipitation: A 17-year monthly analysis based on gauge observations, satellite estimates, and numerical model outputs. Bull. Amer. Meteor. Soc., 78, 2539–2558.
Xie, P., M. Chen, and W. Shi, 2010: CPC unified gauge-based analysis of global daily precipitation. Preprints, 24th Conf. on Hydrology, Atlanta, GA, Amer. Meteor. Soc., 2.3A. [Available online at https://ams.confex.com/ams/90annual/webprogram/Paper163676.html.]
Xin, X., T. Wu, and J. Zhang, 2012: Introductions to the CMIP5 simulations conducted by the BCC climate system model (in Chinese). Adv. Climate Change Res., 8, 378–382.
Yukimoto, S., and Coauthors, 2012: A new global climate model of the Meteorological Research Institute: MRI-CGCM3—Model description and basic performance. J. Meteor. Soc. Japan, 90A, 23–64.
Zanchettin, D., A. Rubino, D. Matei, O. Bothe, and J. H. Jungclaus, 2012: Multidecadal-to-centennial SST variability in the MPI-ESM simulation ensemble for the last millennium. Climate Dyn., 39, 419–444, doi:10.1007/s00382-012-1361-9.
Zhang, Z. S., and Coauthors, 2012: Pre-industrial and mid-Pliocene simulations with NorESM-L. Geosci. Model Dev., 5, 523–533, doi:10.5194/gmd-5-523-2012.