Abstract

There were two major multiyear, Arctic-wide (60°–90°N) warm anomalies (>0.7°C) in land surface air temperature (LSAT) during the twentieth century, between 1920 and 1950 and again at the end of the century after 1979. Reproducing this decadal and longer variability in coupled general circulation models (GCMs) is a critical test for understanding processes in the Arctic climate system and increasing the confidence in the Intergovernmental Panel on Climate Change (IPCC) model projections. This study evaluated 63 realizations generated by 20 coupled GCMs made available for the IPCC Fourth Assessment for their twentieth-century climate in coupled models (20C3M) and corresponding control runs (PIcntrl). Warm anomalies in the Arctic during the last two decades are reproduced by all ensemble members, with considerable variability in amplitude among models. In contrast, only eight models generated warm anomaly amplitude of at least two-thirds of the observed midcentury warm event in at least one realization, but not its timing. The durations of the midcentury warm events in all the models are decadal, while that of the observed was interdecadal. The variance of the control runs in nine models was comparable with the variance in the observations. The random timing of midcentury warm anomalies in 20C3M simulations and the similar variance of the control runs in about half of the models suggest that the observed midcentury warm period is consistent with intrinsic climate variability. Five models were considered to compare somewhat favorably to Arctic observations in both matching the variance of the observed temperature record in their control runs and representing the decadal mean temperature anomaly amplitude in their 20C3M simulations. Seven additional models could be given further consideration. Results support selecting a subset of GCMs when making predictions for future climate by using performance criteria based on comparison with retrospective data.

1. Introduction

Climate changes are being experienced in the Arctic. These changes pose challenges to the resilience of Arctic life including humans (Symon et al. 2005; Overland and Wang 2005). Considering that the Arctic domain is a relatively small fraction of the earth, it is likely that larger anomalies of temperature and other variables would occur in the Arctic-wide mean compared to the global mean. On the other hand, large multiyear anomalies are also possible based on internal positive feedbacks in the Arctic associated with sea ice, ocean, and land processes. Due to the lack of pan-Arctic observations in the past and the complexity of processes involved, coupled atmosphere–ocean general circulation models (AOGCMs) are tools for studying Arctic climate and its response to changing external forcing.

To assess whether recent changes in the arctic climate are outside the range of natural variability, it is helpful to compare observations from recent decades with those from earlier in the twentieth century. Surface air temperature (SAT) is related to regional energy budgets and is a robust climate parameter, in the sense that it can show large-scale anomaly patterns and is generally well observed (Lambert and Boer 2001). Previous studies show that warming during the last two decades exhibits the greatest trends in the high latitudes of the Northern Hemisphere (Hansen et al. 1999; Jones et al. 1999), and are attributed to anthropogenic forcing changes (Houghton et al. 2001; Broccoli et al. 2003). The two 20-yr periods of largest temperature anomalies in the Arctic for the twentieth century are 1925–44 (midcentury), and 1979 to present.

Although the warm anomalies in midcentury are recently receiving attention from the polar research community, the mechanisms behind them are still in debate (Polyakov et al. 2002; Overland et al. 2004). Bengtsson et al. (2004) suggest that the existence of the multiyear, midcentury warm anomalies are associated with considerable internal variations over several years initiated by the stochastic variations of the high-latitude atmospheric circulation and subsequently enhanced and maintained by sea ice feedbacks, particularly, over the Barents Sea. Overland et al. (2004) supports natural variability as the source of these warm anomalies, based on regional and temporal variability in the observed atmospheric circulation. Delworth and Knutson (2000) also propose that this warming was a manifestation of internal variability. By comparing index trends in observations and model simulations, Karoly et al. (2003) conclude that the observed warming from 1900 to 1949 over North America was likely due to natural climate variation, whereas the trend from 1950 to 1999 was consistent with simulations that include anthropogenic forcing from increasing atmospheric greenhouse gases and sulfate aerosols.

However, closure has not been reached on the ultimate cause of the midcentury warm event. The length and spatial coverage of the meteorological records in the Arctic, particularly upper air, do not allow a full analysis of the dynamics of the early/midtwentieth-century multiyear event, whereas results from AOGCMs provide useful information in studying the causes of these anomalies. In addition to model evaluation, we investigate the hypothesis that the midcentury multiyear warm event is based on intrinsic atmospheric variability amplified by internal Arctic feedback processes. Thus, we do not require the models to have a year-to-year correspondence to data, but the models should be able to replicate similar multiyear events.

A suite of scenario simulations were conducted by coupled AOGCM from worldwide sources for the Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (AR4). Section 2 summarizes these models and their ensemble runs together with the observed datasets used in present study. Section 3 analyzes the simulation results for the Arctic in recent decades, and in the early part of the twentieth century with emphasis on the midcentury warm anomalies based on two scenarios: the twentieth-century climate in coupled models (20C3M) and the corresponding control runs (PIcntrl). Spatial distributions of temperature anomalies from a subgroup of models are discussed in section 4, followed by the conclusions.

2. Coupled atmosphere–ocean models and observation data

Several scenarios are provided by modeling groups for their simulation experiments of late-nineteenth–twentieth- and twenty-first–twenty-third-century projection for IPCC AR4. These state-of-the-art model outputs are archived by the Program for Climate Model Diagnosis and Intercomparison (PCMDI) at Lawrence Livermore National Laboratory (LLNL). In this study we analyze the 20C3M simulation, which includes 63 realizations produced by 20 models. The 20C3M ensemble runs are initialized from their corresponding control runs (PIcntrl), which have neither natural nor anthropogenic external forcing (solar, volcanic aerosol), and then continued with prescribed anthropogenic and, in many cases, natural external forcings based on observations for the twentieth century. Global energy-related and industrial CO2 emissions were relatively low in the first half of the twentieth century; pronounced increases occur after the 1950s (Nakicenovic and Swart 2000, their Fig. 1–3). The resultant CO2 concentration was below 310 ppm before 1950 and then gradually increased to 368 ppm by the year 2000 (Watson et al. 2001). This set of model runs represents the various groups’ best effort to simulate the twentieth-century climate.

Table 1 briefly summarizes the features of the models that were contributed to the 20C3M intercomparison project, including their simulation length, number of realizations, and length of control runs (PIcntrl). By comparison of Table 1 with Table 8.1 from the IPCC Third Assessment Report (TAR: Houghton et al. 2001), both horizontal and vertical resolution of AOGCMs have been improved since TAR. The spectral resolution of the atmosphere GCM was R15 to T47 (only 1) in TAR. Among the 20 models for AR4, 13 are spectral, and the remaining 7 are grid point. The lowest spectral resolution is T42, and the highest resolution is T106, close to 1° in longitude and latitude {Model for Interdisciplinary Research on Climate 3.2, high-resolution version [MIROC3.2(hires)]}. The gridpoint models have about the same spatial resolutions as in TAR. The vertical resolution ranges from 13 levels [Goddard Institute for Space Studies Model E-R (GISS-ER)] to 56 [MIROC3.2(hires)] with 14 models having more than 20 vertical levels. For TAR 27 of 31 models had less than 20 vertical levels.

Table 1.

The spatial resolution of coupled atmosphere–ocean models, their analyzed ensemble runs, and length of control runs.

The spatial resolution of coupled atmosphere–ocean models, their analyzed ensemble runs, and length of control runs.
The spatial resolution of coupled atmosphere–ocean models, their analyzed ensemble runs, and length of control runs.

Studies of temperature change over land areas based on the meteorological station network are routinely made by groups such as the University of East Anglia (UEA), United Kingdom (Jones et al. 1982, 1999), GISS (Hansen and Lebedeff 1987; Hansen et al. 1999), and the National Climatic Data Center (NCDC) (Peterson and Vose 1997; Quayle et al. 1999). Because of different methods for handling data issues, such as incomplete spatial and temporal coverage, urban influences, etc., magnitudes of the derived data fields vary in some degree. The Arctic region is especially sensitive as few stations are available, and the length of the records varies considerably. The variance-adjusted version 5° × 5° land SAT anomalies [Climate Research Unit temperature dataset version 2 (CRUTEM2v)] are widely used; however, a relatively large portion of the high Arctic is data void. Using an “anomaly” approach that attempts to maximize available station data in space and time, the group from the Climate Research Unit (CRU) of UEA constructed a dataset with mean monthly SAT at 0.5° grid horizontal resolution (New et al. 1999, 2000). The anomaly approach first defines the fields of monthly climate anomalies relative to a standard normal period (in this case, 1961–90) mean. The anomaly grids were then combined with this high-resolution mean climatology to arrive at grids of monthly climate over the 100-yr period to estimate monthly surface climate. This monthly SAT covers the global land surface (excluding Antarctica) for 1901–2000 (CRUTS2.0). Our previous comparisons of this dataset with observed records show encouraging results for the Arctic, particularly after 1930 with increased station coverage (Wang and Overland 2004).

Figure 1 shows the comparison of the winter (November–March) averaged land SAT (LSAT) anomaly time series for the Arctic north of 60°N (solid), midlatitude (45°–60°N) (dashed), and global (dotted) based on CRUTS2.0. We will refer to “winter” as this 5-month average. The positive anomalies in the global mean SAT in recent decades are consequences of the warming in both the Arctic and midlatitude; however, polar amplification is more significant during the midcentury from late 1920s to early 1940s (Johannessen et al. 2004) when a 1901–80 base period is used. For comparison, the winter LSAT anomalies for the Arctic based on CRUTEM2v are also shown (dash–dotted line) in Fig. 1. Both datasets indicate prolonged Arctic-wide warm anomalies of more than 0.7°C at midcentury. They both show a “twin peak” structure in the midcentury warm anomalies, though with a slightly different amplitude. After 1950 there is little difference between the two datasets. The cold period of the 1960s–70s is distinct, and the warm anomalies since the 1980s show an upward trend. In the remainder of this study we focus on the Arctic, and contrast the two periods of warm anomalies as simulated by the AR4 climate models.

Fig. 1.

Winter land surface air temperature (LSAT) anomalies averaged over the Arctic (60°–90°N) (thick solid line), midlatitude (45°–60°N) (dashed), and globe (dotted) based on CRUTS2.0 in units of °C. Also shown is the Arctic averages based on CRUTEM2v dataset. The anomalies are relative to a 1901–80 base period. All curves are smoothed with 5-yr running mean.

Fig. 1.

Winter land surface air temperature (LSAT) anomalies averaged over the Arctic (60°–90°N) (thick solid line), midlatitude (45°–60°N) (dashed), and globe (dotted) based on CRUTS2.0 in units of °C. Also shown is the Arctic averages based on CRUTEM2v dataset. The anomalies are relative to a 1901–80 base period. All curves are smoothed with 5-yr running mean.

3. Simulations from climate models

Time series of winter-averaged LSAT for all model ensemble members and observations (thick yellow line) over the Arctic land area (60°–90°N) is shown in Fig. 2. To be consistent among the models, and to avoid the impact from late-twentieth-century warming, all the anomalies are calculated relative to the 1901–80 period mean in each realization. It is apparent that almost all model realizations are able to reproduce positive temperature anomalies in the last two decades. Some of the realizations have relatively large interannual to decadal variability, while the others are less variable. All three runs from the Flexible Global Ocean–Atmosphere–Land System Model gridpoint version 1.0 (FGOALS-g1.0; magenta thin lines with open triangles in Fig. 2) started from relative warm states, contrary to simulations from other models and observations. The sea ice simulation by this model apparently shows inappropriate initialization for simulating the climate of the twentieth century (Zhang and Walsh 2006). Another explanation is that the model is still in a nonequilibrium state (Y. Yu 2005, IPCC workshop, personal communication). Because of this, the results from FGOALS-g1.0 are excluded from the statistics and discussions in the next sections.

Fig. 2.

Time series of LSAT anomalies over the Arctic (60°–90°N) based on 63 realizations from 20 models investigated in their 20C3M simulations. The observed series based on CRUTS2.0 is shown by a thick orange line. The anomalies are relative to the mean of 1901–80. All curves are smoothed with 5-yr running mean, in units of °C.

Fig. 2.

Time series of LSAT anomalies over the Arctic (60°–90°N) based on 63 realizations from 20 models investigated in their 20C3M simulations. The observed series based on CRUTS2.0 is shown by a thick orange line. The anomalies are relative to the mean of 1901–80. All curves are smoothed with 5-yr running mean, in units of °C.

a. Late-twentieth-century warm anomalies

Stott et al. (2000) demonstrate that global mean surface air temperature changes since 1979 have contributions from both natural and anthropogenic factors based on their Third Hadley Centre Coupled Ocean–Atmosphere General Circulation Model (HadCM3) simulations. Over the Arctic we see that the majority of the ensemble members show warm anomalies during the last two decades, which are comparable with the observed (Fig. 2) in their 20C3M simulations. Figure 3 (top) displays the averaged LSAT anomalies for the 1979–99 period for all ensemble members from 19 models. The period 1979–99 is chosen, because nearly half of the 20C3M simulations (27 runs out of 63) ended at December 1999 (Table 1). Although there are differences among the models and each of their realizations, all ensemble members from all models show positive anomalies for the last two decades in various degrees. The smallest amplitudes are from ensemble members of the three models: the Geophysical Fluid Dynamics Laboratory Climate Model version 2.1 (GFDL-CM2.1; bars 22–24), the GISS-EH (bars 27–31), and the GISS-ER (bars 32–40). In addition there are four realizations—the ECHAM5/Max Planck Institute Ocean Model (MPI-OM): run 2, the GFDL-CM2.0: run 3, the ECHAM5 Hamburg Ocean Primitive Equation (ECHO-G): run 2, and the single run from the Met Office (UKMO-HadCM3) in which amplitudes of averaged anomalies in these two decades are less than two-thirds of the observed value. On the other hand, there are 10 realizations in which the warm anomalies are one-third larger than observed. Among these, three are from the Community Climate System Model version 3 (CCSM3: bars 3, 5, and 8), two from the Meteorological Research Institute Coupled GCM version 2.3.2 (MRI-CGCM2.3.2: bars 53 and 54), and one realization each from the Commonwealth Scientific and Industrial Research Organisation Mark version 3.0 (CSIRO-Mk3.0: bar 14), and the Parallel Climate Model (PCM: bar 57). The remaining three are the single realizations provided by the Canadian Centre for Climate Modelling and Analysis Coupled GCM version 3.1 (CCCma-CGCM3.1-T47: bar 10), the Centre National de Recherches Météorologiques Coupled Global Climate Model version 3 (CNRM-CM3: bar 12), and MIROC3.2(hires) (bar 43).

Fig. 3.

Mean winter arctic LSAT anomalies for the 1979–99 period from observation (first bar in each panel, light shaded) and model simulations in the 20C3M scenario. (top) Individual realizations of each model (bars 2–61), and (bottom) the ensemble mean for models when more than one realization is provided, or the only realization available. The confidence limits are two standard deviations derived from the detrended control run time series. Due to an abrupt change in the GISS-EH control run, the confidence limit is not shown. The last bar in the bottom panel shows the ensemble mean from all runs of all models.

Fig. 3.

Mean winter arctic LSAT anomalies for the 1979–99 period from observation (first bar in each panel, light shaded) and model simulations in the 20C3M scenario. (top) Individual realizations of each model (bars 2–61), and (bottom) the ensemble mean for models when more than one realization is provided, or the only realization available. The confidence limits are two standard deviations derived from the detrended control run time series. Due to an abrupt change in the GISS-EH control run, the confidence limit is not shown. The last bar in the bottom panel shows the ensemble mean from all runs of all models.

The bottom panel in Fig. 3 shows the model ensemble means of the anomalies for the last two decades. Confidence limits are estimated as ± two standard deviations of the detrended time series from the corresponding control run (PIcntrl) of each model. That all but two ensemble mean anomalies are different from zero, with the lower bounds of the confidence limits being above the zero line, suggests that the warm anomalies in these two decades are beyond the range of natural variability. In other words, differences caused by intrinsic variability, which have essentially cancelled each other out, imply that the late-twentieth-century warm anomalies could be a consequence of long-term change in external forcing. Nine models show anomalies at least the same or larger than the amplitudes of observed [CCSM3, CGCM3.1-T47, CGCM3.1-T63, CNRM-CM3, CSIRO-MK3.0, the Institute of Numerical Mathematics Coupled Model version 3.0 (INM-CM3.0), MIROC3.2(hires), MRI-CGCM2.3.2, and PCM], and another seven models show the ensemble means are within two-thirds of the observed, these are ECHAM5/MPI-OM, GFDL-CM2.0, the Goddard Institute for Space Studies Atmosphere–Ocean Model (GISS-AOM), the L’Institut Pierre-Simon Laplace Coupled Model version 4 (IPSL-CM4), the Model for Interdisciplinary Research on Climate 3.2, medium-resolution version [MIROC3.2(medres)], ECHO-G, and UKMO-HadCM3. Ensemble means from the three models (GFDL-CM2.1, GISS-EH, and GISS-ER) that have small amplitude of warm anomalies in every single realization, are less than two-thirds of the observed. As a group, the multimodel mean of averaged winter Arctic LSAT anomalies is 0.62°C (rightmost bar in bottom panel in Fig. 3) for 1979–99, which is close to the observed value of 0.64°C (leftmost bar) from CRUTS2.0. This is encouraging. However, between model differences are not small.

b. Midcentury warm anomalies

As discussed in section 2 there were prolonged warm anomalies of more than 0.7°C in the Arctic in the midcentury from the late 1920s to the 1940s. The decadal mean of individual realizations from the IPCC models for 1939–49 display large variability in magnitude and sign (Fig. 4): among 61 realizations, 30 of the decadal mean SAT anomalies are positive, 21 are negative, and another 9 are near zero. None of the decadal mean anomalies from any model is greater than the observed value. In contrast to the warm anomalies simulated by models for the last two decades (Fig. 3), the large discrepancies with observations for the 1940s among the models and among their ensemble members indicate the potential for large internal variability within the models. It is interesting to note that at least one realization has the opposite sign of decadal mean anomalies from other ensemble members when multiple realizations are provided for a single model (except for CSIRO-Mk3.0 which generated negative anomalies for all three realizations in this decade).

Fig. 4.

Decadal mean winter LSAT anomalies for the 1939–49 period based on individual realizations from each model over the region of 60°–90°N. The first bar on the left is the observed mean (CRUTS2.0). Units are in °C.

Fig. 4.

Decadal mean winter LSAT anomalies for the 1939–49 period based on individual realizations from each model over the region of 60°–90°N. The first bar on the left is the observed mean (CRUTS2.0). Units are in °C.

Based on Fig. 4, our hypothesis is that intrinsic natural variability is the main cause behind the large anomalies in the early/midpart of the century. Thus, the models should not necessarily replicate the year-to-year changes in the observations, but should produce events with the same type of multiyear behavior as the observations. Separate panels in Fig. 5 display the model simulated and observed (thick black solid line) winter LSAT over the Arctic from the late nineteenth to twentieth century. Each panel shows the ensemble members from one model, and all time series are presented with 5-yr running mean. All realizations are different. For example, four of eight runs from CCSM3 (top-left panel of Fig. 5) have relatively sizable amplitudes of anomalies during the midcentury (red, blue, yellow, and black line) with run 1 (thin red line) matching the observations in both amplitude and timing. One realization from GFDL-CM2.0 (blue line) matches the timing and amplitude of the observed time series, while the other two (red and green lines) have an amplitude similar to the warm anomaly in the 1950s. The amplitudes of anomalies from GFDL-CM2.1 are slightly weaker than the observed, but one realization has a long duration (red line). The two Canadian models CGCM3.1-T47 and CGCM3.1-T63 present similar results: both show twin peak warm anomalies in the midcentury with amplitude weaker than observations. Two realizations from CSIRO-Mk3 (red and blue line) produce warm anomalies in the 1930s, which last about 10 years. ECHAM5/MPI-OM has one realization (red line) in which the warm anomalies are close to those observed around 1940s, while other realizations show weaker amplitudes at later times. Similar situations are seen in MIROC3.2(medres). The warm anomalies from all PCM and ECHO-G realizations show comparable amplitude, but are not synchronized with observations. This is also true for the single realization from CNRM-CM3 (late), INM-CM3.0 (late), and UKMO-HadCM3 (early). Two GISS models (GISS-EH and GISS-ER) have little variability through their entire runs, even at the end of twentieth century. Another two models (GISS-AOM and ISPL-CM4) also have a rather flat curve for the first 100 years until the end of the twentieth century. Large warm anomalies are simulated by the high-resolution model developed by Japan [MIROC3.2(hires)] at the end of twentieth century, but warm anomalies in the midcentury are weak. In many cases the warm anomalies simulated by models have comparable amplitude to the observed midcentury warm events, but with a shorter duration.

Fig. 5.

Winter LSAT anomalies over the Arctic for individual realizations of each model. Thick solid black line is the observed time series based on CRUTS2.0. (left), (middle) The models with natural forcing included, (right) the models without natural forcing in their simulations. All the time series are smoothed with 5-yr running mean. Units are in °C.

Fig. 5.

Winter LSAT anomalies over the Arctic for individual realizations of each model. Thick solid black line is the observed time series based on CRUTS2.0. (left), (middle) The models with natural forcing included, (right) the models without natural forcing in their simulations. All the time series are smoothed with 5-yr running mean. Units are in °C.

To provide a quantified estimate of model performance, we assess the models’ ability to reproduce a midcentury-type warm anomaly by applying the following criterion, labeled 2/3CRU. A 5-yr running window is applied to the simulated winter LSAT time series. A decadal mean is calculated around the maximum value found in the models during any portion of a 50-yr period (1911–60). If the decadal averaged anomaly equals to or exceeds two-thirds (2/3) of the observed decadal mean (0.36°C), then it is considered to be a comparable simulation. Although the 2/3CRU criterion is an arbitrary selection, it does provide a quantitative measure. Because we are interested in decadal and longer phenomenon, a 2/3CRU criterion of continuous positive temperature anomalies for 10 years is a minimum requirement for examining warm events. The decadal mean of the SAT anomalies for each realization is shown in Fig. 6. Compared with Fig. 3 where 21 realizations are found to be at least the same or larger than the observed at the end of twentieth century, only 3 realizations (one each from three models: CCSM3, ECHO-G, and PCM) produced warm anomalies larger than the observed in the midcentury. Another 14 realizations from 8 models produced warm anomaly amplitudes within two-thirds of the observed value: CCSM3, CISRO-Mk3.0, ECHAM5/MPI-OM, GFDL-CM2.0, GFDL-CM2.1, INM-CM3.0, ECHO-G, and PCM. Over 60% of the realizations (37 out of 60) do not produce midcentury warm anomalies greater than half of the observed CRUTS2.0 value (0.27°C). One run [run 2 from MIROC3.2(medres), bar 45] missed the 2/3CRU cutoff line by a small fraction. A summary of the success rate of the twentieth-century simulations (20C3M) under this criterion is provided in Table 2 (third column).

Fig. 6.

LSAT anomalies averaged over a decade that is centered in the peak value detected during the 1910–60 period in the 20C3M simulation. The thin black line indicates a value that is two-thirds of the observed amplitude. The first gray bar is based on CRUTS2.0. Units are in °C.

Fig. 6.

LSAT anomalies averaged over a decade that is centered in the peak value detected during the 1910–60 period in the 20C3M simulation. The thin black line indicates a value that is two-thirds of the observed amplitude. The first gray bar is based on CRUTS2.0. Units are in °C.

Table 2.

Statistics of model runs that produced two-thirds the amplitude of the observed winter LSAT anomalies over for 60°–90°N in midcentury and variance analysis of their corresponding control runs. The bold fonts indicate that the standard deviations could not be excluded compared to the 90% confidence limit of the observed time series based on the χ2 test. The yellow highlights the models passing both the 2/3CRU test in 20C3M simulations and the variance test in control run, the blue highlights the models passing the variance test but not the 2/3CRU test, and the pink highlights the model passing the 2/3CRU test in both 20C3M simulation and control runs, but not the variance test in control runs. The green highlights the model passing the test only in 20C3M.

Statistics of model runs that produced two-thirds the amplitude of the observed winter LSAT anomalies over for 60°–90°N in midcentury and variance analysis of their corresponding control runs. The bold fonts indicate that the standard deviations could not be excluded compared to the 90% confidence limit of the observed time series based on the χ2 test. The yellow highlights the models passing both the 2/3CRU test in 20C3M simulations and the variance test in control run, the blue highlights the models passing the variance test but not the 2/3CRU test, and the pink highlights the model passing the 2/3CRU test in both 20C3M simulation and control runs, but not the variance test in control runs. The green highlights the model passing the test only in 20C3M.
Statistics of model runs that produced two-thirds the amplitude of the observed winter LSAT anomalies over for 60°–90°N in midcentury and variance analysis of their corresponding control runs. The bold fonts indicate that the standard deviations could not be excluded compared to the 90% confidence limit of the observed time series based on the χ2 test. The yellow highlights the models passing both the 2/3CRU test in 20C3M simulations and the variance test in control run, the blue highlights the models passing the variance test but not the 2/3CRU test, and the pink highlights the model passing the 2/3CRU test in both 20C3M simulation and control runs, but not the variance test in control runs. The green highlights the model passing the test only in 20C3M.

A second test is to compare the variance of the control runs with the variance from observations. While there is almost no “error” in estimating the control run variance because of their length, one can consider estimated confidence limits of the standard deviation from observations. The standard deviation of CRUTS2.0 on an interannual time scale is computed from the detrended time series for 1902–59. The decadal and multidecadal scales are represented by the detrended time series with a 5-yr and 15-yr running mean. A simple test is whether the model standard deviations are less than the value of observed standard deviation minus the 90% confidence interval based on χ2 estimates. The ratios of the model/observed standard deviations on time scales from interannual to multidecadal are shown in Fig. 7. The effective sample size is estimated based on a formula by Santer et al. (2000). As a result, the 90% normalized confidence limits are (0.83, 1.27), (0.58, 4.42), and (0.51, 15.95) for the three time scales. Nine models [ECHAM5/MPI-OM, GFDL-CM2.0, GFDL-CM2.1, GISS-AOM, GISS-ER, IPSL-CM4, MIROC3.2(hires), MIROC3.2(medres), and MRI-CGCM2.3.2] lie outside the range of the observed variability on decadal to interdecadal time scales (Figs. 7b and 7c). The GISS-EH model is excluded due to a large abrupt change in the time series of its control run. An autocorrelation analysis further revealed that there is no preferred times scale in all of the model control runs.

Fig. 7.

The ratio of standard deviation of model control runs to the observed (CRUTS2.0) on (a) interannual, (b) decadal, and (c) interdecadal time scales. GISS-EH is excluded from the figure due to a large abrupt change found in its control run. All standard deviations are calculated after the time series is detrended and a (b) 5-yr and (c) 15-yr running mean applied, respectively. The dashed line indicates the lower range of the 90% confidence limit on the standard deviation normalized by CRUTS2.0.

Fig. 7.

The ratio of standard deviation of model control runs to the observed (CRUTS2.0) on (a) interannual, (b) decadal, and (c) interdecadal time scales. GISS-EH is excluded from the figure due to a large abrupt change found in its control run. All standard deviations are calculated after the time series is detrended and a (b) 5-yr and (c) 15-yr running mean applied, respectively. The dashed line indicates the lower range of the 90% confidence limit on the standard deviation normalized by CRUTS2.0.

Model standard deviations from their control runs are listed in Table 2 (last three columns), with those with values within the confidence limit range of the observations shown bold. Five models (CCSM3, CSIRO-Mk3.0, INM-CM3.0, ECHO-G, and PCM) passed both the variance test in control runs and the 2/3CRU criterion in 20C3M simulations (highlighted by yellow). Another four models (CGCM3.1-T47, CGCM3.1-T63, CNRM-CM3, and UKMO-HadCM3) also passed the 90% confidence limit in their control run, indicating that these models may have enough intrinsic variability from the interannual to interdecadal time scale, yet they fail the 2/3CRU criterion (highlighted by blue) in their single realization of the 20C3M simulations. It is therefore important to have multiple ensemble runs to evaluate a model’s performance. Three models (ECHAM5/MPI-OM, GFDL-CM2.0, and GFDL-CM2.1) show quite reasonable amplitude and duration of the midcentury warm events but do not have enough variance in their control runs based on variance test. The reason behind this is unclear.

The warm event criterion (2/3CRU) was also applied to the control runs based on 100-yr segments. As the length of the control runs of each model ranges from 100 to 500 years, the number of the truncated time series is different among the models. The “yes” in column 4 of Table 2 indicates that at least one of the truncated control run time series passes the 2/3CRU criterion. All of the nine models that passed the variance test for decadal and interdecadal time scales also passed the 2/3CRU criterion. The 2/3CRU criterion has good correspondence between the control runs and the 20C3M simulations with exception for the single run simulations. The MIROC3.2(medres) model fails to reproduce the midcentury warm events in all three 20C3M simulations, even though it passed the 2/3CRU criterion in its control runs. However this model shows only enough variance on the interannual time scale, but not on longer time scales. Two more models (ECHAM5/MPI-OM and GFDL-CM2.0) passed the 2/3CRU criterion in the control run without passing the variance test in any scale.

In summary, seven models do not have enough variance nor do they produce enough magnitude comparable to the midcentury warm event. These are GISS-AOM, GISS-EH, GISS-ER, IPSL-CM4, MIROC3.2(hires), MIROC3.2(medres), and MRI-CGCM2.3.2. The FGOALS-g1.0 model has an unrealistic initial condition in its 20C3M simulations and also has a large abrupt change in its control run, and is therefore excluded.

Rather than calculating across-model means when assessing projections for future climate, one should concentrate on those models that simulate reasonable results in the past. Based on the present study, we suggest a subgroup of 12 models, for further review, that passed either the criterion based on their 20C3M simulations or the variance test in their control runs. Five models are of special note, passing both criteria: CCSM3, CSIRO-Mk3.0, INM-CM3.0, ECHO-G, and PCM. Seven other models have, at best, limited applicability for projections of change relative to natural variability: CGCM3.1-T47, CGCM3.1-T63, CNRM-CM3, UKMO-HadCM3 (passed the standard deviation test), ECHAM5/MPI-OM, and GFDL-CM2.0 and GFDL-CM2.1 (passed the 2/3CRU test). Figure 8 (top panel) shows the time series from sixteen 20C3M realizations from eight models that replicated a reasonable magnitude compared to the observed midcentury warm event. Almost all the realizations that replicate the midcentury warm anomaly amplitudes at random timing also produce reasonable magnitude of warm anomalies at the end of twentieth century. The bottom panel in Fig. 8 shows the truncated 100-yr time series from control runs from the nine models that passed the variance test. The maximum anomaly of these control runs is lined up at year 1937 with the CRUTS2.0 analysis. Figure 8b shows that the midcentury warm anomalies in the models can be reproduced under no-external forcing conditions, whereas the late-twentieth-century warming cannot.

Fig. 8.

(top) Winter LSAT anomalies averaged over the Arctic based on model ensemble runs that passed the proposed 2/3CRU criterion in their 20C3M simulations. (bottom) The truncated 100-yr-long time series from control runs of the nine models that pass the variance test. All time series are smoothed with a 5-yr running mean. Models with natural forcing are shown in solid lines, while those without are shown in dashed lines. All models show upward warming trend in the Arctic for the last two decades in 20C3M scenario, while none is shown in the control runs. The range of variation during 1911–60 is about the same in the 20C3M simulations as well as in the control runs.

Fig. 8.

(top) Winter LSAT anomalies averaged over the Arctic based on model ensemble runs that passed the proposed 2/3CRU criterion in their 20C3M simulations. (bottom) The truncated 100-yr-long time series from control runs of the nine models that pass the variance test. All time series are smoothed with a 5-yr running mean. Models with natural forcing are shown in solid lines, while those without are shown in dashed lines. All models show upward warming trend in the Arctic for the last two decades in 20C3M scenario, while none is shown in the control runs. The range of variation during 1911–60 is about the same in the 20C3M simulations as well as in the control runs.

Because external forcing (either natural or anthropogenic) is not imposed on the control runs, we consider variability in Fig. 8b to be representative of intrinsic climate variability, including internal feedback processes from atmosphere–sea ice–ocean interactions. Although more than half of the twentieth-century simulations have natural forcing, that is, solar and volcanic aerosols (as shown in the last column of Table 1), comparison of the magnitude of SAT anomalies in the Arctic for twentieth-century simulations before 1980 with those of the control runs, and the random timing of midcentury warm events in the 20C3M simulations, support the conclusion that intrinsic variability is a first-order effect of arctic climate. The similarity of midcentury events in the 20C3M simulations compared to the control runs and the qualitatively different behavior of the time series at the end of twentieth century are evidence that the midcentury Arctic warming event in the observational data was due to different causes from those of the late-twentieth-century warming.

4. The spatial distribution of midcentury warm anomalies

Based on observations from 59 Arctic stations, Overland et al. (2004) found that warm anomalies in the midcentury were regional and episodic, that is, interannual to multiyear. An example of the spatial distribution of 5-yr-averaged winter temperature anomalies from CRUTS2.0 (Fig. 9, top left) shows that in the mid-1930s the warm anomalies were regional in extent: Eurasia and Greenland were dominated by warm anomalies with largest amplitude in northern Scandinavia, while North America was occupied by cold anomalies. Around the 1940s (Fig. 9, bottom left), warm anomalies were found with the largest amplitudes over central Alaska and east Siberia. During both periods there was an out of phase (seesaw) pattern between the eastern and western Arctic.

Fig. 9.

Spatial distribution of LSAT anomalies based on (left) CRUTS2.0 and (right) one realization each from the eight models that pass the 2/3CRU criterion. All patterns have a 5-yr running mean applied. The years selected are around maxima of the warm anomalies during 1911–60 for both the observation and models. Contour interval is 0.5°C.

Fig. 9.

Spatial distribution of LSAT anomalies based on (left) CRUTS2.0 and (right) one realization each from the eight models that pass the 2/3CRU criterion. All patterns have a 5-yr running mean applied. The years selected are around maxima of the warm anomalies during 1911–60 for both the observation and models. Contour interval is 0.5°C.

The spatial patterns from one realization for each of the eight models identified by the 2/3CRU criterion are shown in Fig. 9. All eight display warm anomalies over Alaska, except ECHAM5/MPI-OM (bottom-right panel). Five models (CSIRO-Mk3.0, ECHAM5/MPI-OM, ECHO-G, INM-CM3.0, and GFDL-CM2.0) produced the seesaw pattern. CCSM3 and both GFDL models produced a pattern similar to the mid-1930s of CRUTS2.0 (top-left panel in Fig. 9) with warm anomalies over the central Eurasian continent and cold anomalies over the central North American continent and along the east coast of Greenland. The ECHAM5/MPI-OM model also produced a similar feature; however, the center is over the continent instead of over Scandinavia. The pattern produced by CSIRO-Mk3.0 is similar to the 1940s observations (bottom-left panel in Fig. 9). The ECHO-G model displays the seesaw pattern, with the amplitudes of the warm anomalies weaker than the observed, and the position of the cold anomaly center shifted to the east. The PCM model has wavenumber-1 pattern, but the region of the cold anomalies is smaller than the observed. The contrast of observed warm anomalies in 60°–70°N and cold anomalies between 50° and 60°N over Eurasia is not seen in these models. Even though models agree on domain-averaged anomalies, their spatial distributions differ significantly, which also suggests the importance of regional intrinsic variability due to shifts in atmospheric circulation patterns.

5. Conclusions

The mid-twentieth-century warm event in the Arctic is an interesting phenomenon of relevance to current climate change issues. The simulated results by AOGCMs add to the understanding of this event. We are encouraged that all model ensembles for IPCC AR4 portray upward trends to various degrees for the Arctic in the last two decades of the twentieth century. Five models reproduce somewhat reasonable amplitudes compared to the midcentury event and have comparable variance to arctic temperature observations: (CCSM3, CRISO-Mk3.0, INM-CM3.0, ECHO-G, and PCM). Three other models (ECHAM5/MPI-OM, GFDL-CM2.0, and GFDL-CM2.1) also reproduced reasonable magnitudes compared to the midcentury warm event, even though the intrinsic variability in the control runs is small. However, all of these models do not have the sustained duration of the observed midcentury event. Four additional models cannot be excluded based on the variance test of their control runs (CGCM3.1-T47, CGCM3.1-T63, CNRM-CM3, and UKMO-HadCM3), but they fail to reproduce the required magnitude of midcentury warm anomalies in their single realization in the 20C3M simulations. We consider that the eight models [GISS-AOM, GISS-EH, GISS-ER, IPSL-CM4, MIROC3.2(hires), MIROC3.2(medres), MRI-CGCM2.3.2, and FGOALS-g1.0] that did not pass both criteria (magnitude in 20C3M simulation and control runs variance) do not have enough intrinsic decadal variability to produce a reasonable magnitude for arctic warm anomalies. Passing our criteria is not a complete acceptance of the models for climate projections in the Arctic, only that they should be given priority in assessments of projected change relative to natural variability.

The random timing of the midcentury warm anomalies in the model 20C3M simulations together with the similarity of midcentury events in the 20C3M simulations to the control runs (with neither natural nor anthropogenic external forcing), and the qualitative difference in the behavior of their time series in the early and end of the twentieth century, are evidence that the midcentury Arctic warming event in the observational data was due to different causes from those of the late twentieth century. The intrinsic variability of the atmosphere together with the feedbacks between the atmosphere and other components of the climate system (e.g., sea ice, ocean, and land processes) are likely responsible for the observed warm anomalies in the midcentury, as also noted by Bengtsson et al. (2004).

Finally, in IPCC TAR, the ACIA report, and other documents, the projections from the climate models are often the averages from all of the models and their ensemble members. We suggest that the projection of the future climate should be based on a subgroup of models that perform reasonable simulations of the past based on fixed criteria. Here, eight models have serious limitations for the near-term Arctic climate predictions (20–50 yr), because of the lack of the potential interplay of anthropogenic contributions and intrinsic variability. On the other hand, five models show promise in this aspect, and another seven might be further considered with reservation. Our results are a step toward constraining the currently scattered projections of the Arctic climate (see, e.g., Symon et al. 2005).

Acknowledgments

We acknowledge the international modeling groups for providing their results for analysis, the PCMDI for collecting and archiving the model data, the JSC/CLIVAR Working Group on Coupled Modelling (WGCM) and their Coupled Model Intercomparison Project (CMIP) and Climate Simulation Panel for organizing the model data analysis activity, and the IPCC WG1 TSU for technical support. The IPCC data archive at LLNL is supported by the Office of Science, U.S. Department of Energy. We thank three anonymous reviewers for their thorough review and suggestion in the review process, which helped up keep good focus on the discussion. This research is supported by the NOAA/CMEP Project of Office of Global Programs and the NOAA/Arctic Research Program. Kattsov and Pavlova were supported by the NSF via the IARC (Subaward UAF05-0074 of OPP-0327664). Zhang was supported by Japan Agency for Marine-Earth Science and Technology. Preparation of this manuscript was supported by the NOAA/Arctic Research Office. This publication is partially completed through the Joint Institute for the Study of the Atmosphere and Ocean (JISAO) under NOAA Cooperative Agreement NA17RJ1232.

REFERENCES

REFERENCES
Bengtsson
,
L.
,
V. A.
Semenov
, and
O. M.
Johannessen
,
2004
:
The early-twentieth-century warming in the Arctic—A possible mechanism.
J. Climate
,
17
,
4045
4057
.
Broccoli
,
A. J.
,
K. W.
Dixon
,
T. L.
Delworth
, and
T. R.
Knutson
,
2003
:
Twentieth-century temperature and precipitation trends in ensemble climate simulations including natural and anthropogenic forcing.
J. Geophys. Res.
,
108
.
4798, doi:10.1029/2003JD003812
.
Delworth
,
T. L.
, and
T. R.
Knutson
,
2000
:
Simulation of early 20th century global warming.
Science
,
287
,
2246
2250
.
Hansen
,
J.
, and
S.
Lebedeff
,
1987
:
Global trends of measured surface air temperature.
J. Geophys. Res.
,
92
,
13345
13372
.
Hansen
,
J.
,
R.
Ruedy
,
J.
Glascoe
, and
M.
Sato
,
1999
:
GISS analysis of surface temperature change.
J. Geophys. Res.
,
104
,
30997
31022
.
Houghton
,
J. T.
,
Y.
Ding
,
D. J.
Griggs
,
M.
Noguer
,
P. J.
van der Linden
,
X.
Dai
,
K.
Maskell
, and
C. A.
Johnson
,
2001
:
Climate Change: The Scientific Basi.
s. Cambridge University Press, 881 pp
.
Johannessen
,
O. M.
, and
Coauthors
,
2004
:
Arctic climate change: Observed and modelled temperature and sea-ice variability.
Tellus
,
56A
,
328
341
.
Jones
,
P. D.
,
T. M. L.
Wigley
, and
P. M.
Kelly
,
1982
:
Variations in surface air temperatures. Part I: Northern Hemisphere, 1881–1980.
Mon. Wea. Rev.
,
110
,
59
70
.
Jones
,
P. D.
,
M.
New
,
D. E.
Parker
,
S.
Martin
, and
I. G.
Rigor
,
1999
:
Surface air temperature and its changes over the past 150 years.
Rev. Geophys.
,
37
,
173
199
.
Karoly
,
D.
,
K.
Braganza
,
P. A.
Stott
,
J. M.
Arblaster
,
G. A.
Meehl
,
A. J.
Broccoli
, and
K. W.
Dixon
,
2003
:
Detection of a human influence on North American climate.
Science
,
302
,
1200
1203
.
Lambert
,
S. J.
, and
G. J.
Boer
,
2001
:
CMIP1 evaluation and intercomparison of coupled climate models.
Climate Dyn.
,
17
,
83
106
.
Nakicenovic
,
N.
, and
R.
Swart
,
2000
:
Special Report on Emissions Scenarios.
Cambridge University Press, 612 pp
.
New
,
M.
,
M.
Hulme
, and
P.
Jones
,
1999
:
Representing twentieth-century space–time climate variability. Part I: Development of a 1961–90 mean monthly terrestrial climatology.
J. Climate
,
12
,
829
856
.
New
,
M.
,
M.
Hulme
, and
P.
Jones
,
2000
:
Representing twentieth-century space–time climate variability. Part II: Development of 1901–96 monthly grids of terrestrial surface temperature.
J. Climate
,
13
,
2217
2238
.
Overland
,
J. E.
, and
M.
Wang
,
2005
:
The Arctic climate paradox: The recent decreases of the Arctic Oscillation.
Geophys. Res. Lett.
,
32
.
L06701, doi:10.1029/2004GL021752
.
Overland
,
J. E.
,
M. C.
Spillane
,
D. B.
Percival
,
M.
Wang
, and
H. O.
Mofjeld
,
2004
:
Seasonal and regional variation of Pan-Arctic air temperature over the instrumental record.
J. Climate
,
17
,
3263
3282
.
Peterson
,
T. C.
, and
R. S.
Vose
,
1997
:
An overview of the global historical climatology network temperature database.
Bull. Amer. Meteor. Soc.
,
78
,
2837
2849
.
Polyakov
,
I. V.
, and
Coauthors
,
2002
:
Observationally based assessment of polar amplification of global warming.
Geophys. Res. Lett.
,
29
.
1878, doi:10.1029/2001GL011111
.
Quayle
,
R. G.
,
T. C.
Peterson
,
A. N.
Basist
, and
C. S.
Godfrey
,
1999
:
An operational near-real-time global temperature index.
Geophys. Res. Lett.
,
26
,
333
335
.
Santer
,
B. D.
,
T. M. L.
Wigley
,
J. S.
Boyle
,
D. J.
Gaffen
,
J. J.
Hnilo
,
D.
Nychka
,
D. E.
Parker
, and
K. E.
Taylor
,
2000
:
Statistical significance of trends and trend differences in layer-average atmospheric temperature time series.
J. Geophys. Res.
,
105
,
7337
7356
.
Stott
,
P. A.
,
S. F. B.
Tett
,
G. S.
Jones
,
M. R.
Allen
,
J. F. B.
Mitchell
, and
G. J.
Jenkins
,
2000
:
External control of 20th century temperature by natural and anthropogenic forcings.
Science
,
290
,
2133
2137
.
Symon
,
C.
,
L.
Arris
, and
B.
Hill
,
2005
:
Arctic Climate Impact Assessment.
Cambridge University Press, 1042 pp
.
Wang
,
M.
, and
J. E.
Overland
,
2004
:
Detecting Arctic climate change using Köppen climate classification.
Climatic Change
,
67
,
43
62
.
Watson
,
R. T.
, and
Coeditors
,
2001
:
Climate Change 2001: Synthesis Report.
Cambridge University Press, 398 pp
.
Zhang
,
X.
, and
J. E.
Walsh
,
2006
:
Toward a seasonally ice-covered Arctic Ocean: Scenarios from the IPCC AR4 model simulations.
J. Climate
,
19
,
1730
1747
.

Footnotes

Corresponding author address: Muyin Wang, JISAO/PMEL, 7600 Sand Point Way NE, Seattle, WA 98115. Email: muyin.wang@noaa.gov

* Joint Institute for the Study of the Atmosphere and Ocean Contribution Number 1132 and Pacific Marine Environmental Laboratory Contribution Number 2804.