1. Introduction
A key problem in the detection of an anthropogenic signal in today’s climate is estimating the levels of variability one might expect to occur naturally. It is only when the anthropogenic signal differs significantly from those expected from natural variability that claims of detection can be taken seriously.
Unfortunately, the instrumental record is relatively short and so provides only a few realizations of natural variability with timescales of order decades, the timescales at which detection is most likely (e.g., Barnett et al. 1998; Hegerl et al. 1996; Santer et al. 1995). Given this situation, recent detection studies (Hegerl et al. 1997; Hasselmann et al. 1995; Santer et al. 1996) have taken their estimates of natural variability from long control runs of coupled global climate models (CGCMs). This would be a valid procedure if the internally generated variability in the models was a realistic estimate of natural variability. Whether this is true or not is at the moment uncertain (Barnett et al. 1996).
Another aspect of using CGCMs in control run mode to estimate natural variability is the degree to which their estimates are similar. Clearly, if comparable models given widely different estimates of natural variability then the question arises as to which one(s) of them should we believe. The purpose of this paper is to intercompare estimates of internal variability in the near surface air temperature in the control runs of 11 different CGCMs. The next section describes briefly the models and data analysis methods. Subsequent sections intercompare the variability of the different models, the levels of intramodel variability, and contrast variability from the instrumental record versus that obtained from the models. Finally, the projection of an anticipated anthropogenic signal onto the variability most common to all the models is carried out.
2. Methods and data
a. Models and data
The models used in this study are listed in Table 1, as are their general properties. The appropriate references documenting each model may be found on the World Wide Web (www.pcmdi.llnl.gov/modeldoc/cmip/). The model data and their description are part of the CMIP (Coupled Model Intercomparison Project) described by Meehl et al. (1997). The project is being conducted under the Program for Climate Model Diagnosis and Intercomparison (PCMDI) (Gates 1992). The data were provided by Dr. C. Covey of PCMDI. We selected only models whose control run length was at least 100 yr. For longer runs, we used the first 100 yr of the simulation. Examples of the raw near-surface temperature from four of the models are displayed in Fig. 1 for a midlatitude and tropical site over the Pacific Ocean. The difference in long-term means at these locations between the models is large. These systematic differences are removed from the data prior to their use in this study (see below).
The model data, yearly averaged near-surface temperatures, were projected via linear interpolation to a common T42 grid. Note the models generally have coarser resolution than T42 and the definition of “near-surface” temperature varies somewhat from model to model. A number of studies, many informal, suggest that higher resolution versions of the models used here will perform better but such runs were not available for analysis. The gridded time series were trimmed in time to be exactly 100 yr long. One of the models required an additional 2 yr of data to come up to this length and that was accomplished by adding its climatological mean field in yr 99 and 100. Finally, the model data poleward of 40°S were omitted so that the mask for the data would match exactly the observed temperature data mask (below). The long-term mean for each model was subtracted from that model’s data on a grid point by gridpoint basis, so that the long term mean over 100 yr at any model grid point was zero.
The observed temperature data used in this study were the combined land–sea set from the United Kingdom Meteorological Office and the Climatic Research Unit at the University of East Anglia, United Kingdom, and are referred to as the Hadley Centre’s observed Global Sea-Ice and Sea Surface Temperature Dataset (GISST) (Jones and Briffa 1992; Rayner et al. 1996). The ocean data in this set comes mainly from ship weather reports and has been heavily processed including corrections for suspected instrumental biases. By limiting the data to the region north of 40°S it was possible to obtain an observational dataset covering a 50-yr time span from 1946–95. Some chronic holes in the dataset were filled by linear interpolation but this action effected only a few percent of the total data. These data were then linearly interpolated to the same T42 gird used by the model data. The anomaly fields were calculated in a manner identical to that used on the model data.
The net result of the above operations was that all datasets used in this study were on the same space–time grid. Hence, the number of space points and time points were identical for each model run and the observed data. This is a critical point in the analysis to be described below.
b. Analysis method: Common EOFs
As noted above, each model’s time series has been adjusted to have zero mean at each grid point. This means the array T′ can be immediately subjected to a standard EOF (empirical orthogonal function) analysis of its covariance matrix. Prior to estimation of the covariance matrix, each grid point’s time series is weighted by the cosine of its latitude.
3. Results: Multiple CGCMs
The first two common EOFs for the 11 member CGCM ensemble are shown in Fig. 2. The leading EOF (13.7% of the variance) is of uniform sign over the study region that means the signal it represents is spatially coherent and of like sign. The second mode (8.0%) shows a fairly strong asymmetry between the Northern and Southern Hemispheres. The clear exceptions to this statement are in the Bering Sea and Southeast Asia. Note there is substantial energy in the tropical oceans, especially the Pacific. Nonetheless, the second mode is not statistically distinct from the next higher mode.
The manner in which the energy of the common EOFs is partitioned between model is shown by the partial eigenvalue spectrum (Fig. 3, Table 2). Highly energetic models are denoted on this illustration by their designator code given in Table 1. Figure 3 and Table 2 lead to several conclusions.
The BMRC and, to a lesser extent, the older National Center for Atmospheric Research (NCAR) and Meteorological Research Institute (MRI) models dominate the leading EOF. The same three models also dominate the second mode, although the separation from the others is not as dramatic
Most of the other CGCMs group are surprisingly close together, especially for modes at or above 4. Nevertheless, they all project to some degree on the leading, global modes shown in Fig. 2.
The older NCAR model contains considerably more variability than the other models down through at least mode 6.
The above points are illustrated in Figs. 4 and 5 that show the first two common principal components. The Bureau of Meteorology Research Centre (BMRC) position in the partial eigenvalue spectrum is associated with a slow oscillation over the 100-yr integration. Note that a simple detrending will not remove this signal. The MRI and old NCAR models’ role in mode 1 is due to a nearly linear trend or drift in their global temperature. Removing the trends in these three models and repeating the calculation brings them closer to the other models, but they still have between 6 and 10 times as much energy in their leading mode than their competitors. The other interesting feature of these illustrations is that the older NCAR model does indeed have more variability than the other CGCMs in the study. The detrending operation does not modify this conclusion.
A qualitative attempt was made to see if the partial eigenvalue spectrum had a dependence on key simulation parameters. A weak relation between size of the leading partial eigenvalue and the longitudinal model resolution was apparent. However, no relation seems to exist between leading eigenvalue and the latitudinal model resolution. Similarly, it did not seem to matter if the model was or was not flux corrected.
In summary, the common EOF analysis of the 11 CGCM dataset suggests that there are rather wide differences in the interannual variability exhibited by the models. Approximately one-half of this difference at global scales is due to model drift or fluctuations long compared to the 100-yr integrations. However, several of the models demonstrate variability that is clearly different from the others at space scales comparable with those expected of an anthropogenic signal.
4. Intramodel variability
It is natural to wonder if the differences described above among the 11 CGCMs could be due to sampling and model variability alone. This was investigated using a 1000-yr integration of the Geophysical Fluid Dynamics Laboratory (GFDL) model (Manabe and Stouffer 1996).1 That control run was divided into 10 nonoverlapping segments, each 100 yr long, and the analysis of section 3 repeated. It is assumed the segments are independent. The results were as follows.
The leading common EOF (Fig. 6, 7.3%) has many similarities to the second common mode from the 11-member ensemble; compare Fig. 6 with Fig. 2 (lower). It does not represent a coherent global mode of variations.
The partial eigenvalue spectrum (Fig. 7) suggests very little variation between different 100-yr segments of the Geophysical Fluid Dynamics Laboratory 1000-yr run. Since the space–time dimensions of all the datasets used here are identical to those used in section 3, it is possible to compare Figs. 3 and 7 directly. Clearly, the intramodel variation in the GFDL model is typically an order of magnitude less than the intermodel variation.
The common principal components (Fig. 8) are all visually similar, a result that perhaps better illustrates the results found in the partial eigenvalue spectrum.
In summary, the measures of variability studied in the 10 independent 100-yr-long segments of the GFDL control run are highly similar. This is in stark contrast to the wide difference in similar variance measures among the 11 CGCMs. Clearly, intermodel differences are far larger than intramodel differences, if the GFDL run is typical of the other models.
5. Other comparisons
a. Observations
Given that the models differ so much from each other it is reasonable to ask which is most apt to be correct. While a definitive answer to this question, if it exists, is well beyond this study it was a simple matter to compare the levels of CGCM air temperature variability with those found in nature. This was done by projecting the observed air temperature data onto the common EOFs from the models. The results are displayed in two ways.
1) The common principal components associated with the observed data are shown in the upper panels of Figs. 4 and 5 between the BMRC and CCCM curves. Although the observational record is only 50 yr long, inspection of the illustrations suggests the many of the CGCMs are less variable than the observations. This might be expected since the models have no forcing from possible solar variability and volcanoes. Many of the models also do not have realistic (El Niño–Southern Oscillation) variability.
2) The above impression was quantified by estimating the partial eigenvalue spectrum of the observations projected onto the common EOFs. Because of the identical structure of the original datasets as described above, it is possible to compare the observed partial eigenvalue spectrum directly with the model spectra previously shown in Figs. 3 and 7. The observed partial eigenvalue spectrum is shown by the heavy line on the above figures and the estimated 95% confidence limits by the vertical lines (after North et al. 1982). The various CGCMs bracket the observations with no one model faithfully reproducing them. Differences in energy between the models and observations frequency exceed a factor of 2. This is especially true for the lowest modes that represent the variability on scales comparable to those expected of the anthropogenic signal (see below). The same comparison with the intramodel variability of the GFDL run (Fig. 7) suggests the first two model modes are underestimated by order 50%, but the higher-order modes have energy levels quite comparable to those observed. The confidence limits on the observed spectral values suggest a good degree of correspondence between model and data.
The results of the above comparison show the model variability to generally be scattered about the observed signal. Differences of a factor of 2 or more are common for even the best behaved models. Hence, the internal model variability and the natural variability seen in the observations are typically within a factor of 2 of each other. There is no model that consistently agrees well with the observations. Part of this may be due to the model resolution problems noted previously. In short, there is no consensus among the models regarding their relative levels of variability.
b. Anthropogenic signal
How similar are the common EOFs of the CGCM control runs and an estimate of the expected anthropogenic signal. The MPI kindly made an estimate of this signal available for this study (Fig. 9). The simulation that produced this estimate was made with the HAM3L model that consists of ECHAM4, a T42 atmospheric GCM, and the Large-Scale Geostrophic global ocean model (Roeckner et al. 1992; Maier-Reimer et al. 1993; Cubasch et al. 1992; Hasselmann et al. 1995). The model was forced with projected amounts of greenhouse gases and aerosols (Voss et al. 1998). Figure 9 represents the leading EOF of this simulation computed over exactly the same space–time grid as described above for the CGCM study.
The relation between the anthropogenic signal and the common EOFs was estimated by computing the dot product between them as a function of mode number. The leading modes had a pattern correlation of 0.84. This means there is a high pattern similarity between the principal anthropogenic signal predicted by the HAM3L and the leading common EOF of the unforced, internal variability runs from the 11 CGCMs. This high similarity means the CGCMs can, without any anthropogenic forcing, produce patterns that resemble those expected from anthropogenic causes (at least as estimated from the Hamburg CGCM). This, in turn, will make it more difficult to apply a fingerprint strategy to detect anthropogenic signal since the “natural” variability estimated from the CGCMs and used in the detection scheme “looks like” the anthropogenic signal itself.
The second and third HAM3L modes pattern correlated with their common EOF counterparts at 0.54 and 0.11, respectively. Hence, even the second mode in the CGCM ensemble has a considerable resemblance to that from the anthropogenic simulation. This just adds to the detection problem noted immediately above.
6. Summary
The common variance between 100-yr control runs from 11 CGCMs has been studied by use of common EOFs. This is essentially equivalent to estimating the wavenumber spectrum of the various models. The results suggest that there is a considerable disparity between the CGCMs. About one-half of this difference can be attributed to model drift or other low-frequency variations in several of the models. However, even after accounting for this effect, it was found that the models’ energy levels can easily differ by a factor of 2 or more for different EOF mode (wave) numbers. Comparison with observations showed that no one model consistently reproduced the observed partial eigenvalue spectrum. Again, differences between observed and model energy levels was commonly a factor of 2 or more. It is speculated that part of these differences may result from the coarse resolution of many of the models used here.
Separate analysis of a 1000-yr control run of the GFDL model also suggested that intramodel variability is much smaller than intermodel variability. It was also found that an estimate of the anthropogenic signal due to greenhouse gases and aerosols from the Max Planck Institute had strong spatial similarities to the leading modes of the models’ common EOFs.
The above results raise problems for those attempting to detect an anthropogenic signal. For instance, which model should be used to estimate the natural variability for they are not all equivalent? It seems that a detailed, quantitative comparison of the model fields used in a detection scheme and their counterparts in the observations is required. Also, the anthropogenic signal appears to have much in common with the internal variability signal in the models. This suggests use of an optimal fingerprint approach (Hasselmann 1993; North et al. 1995) to detection is necessary to maximize chances of detection.
Acknowledgments
This work was partially supported jointly by NOAA’s Office of Global Program’s and the Department of Energy’s Climate Change Data and Detection Program Element, NSF Grant ATM93-14495, and the Scripps Institute of Oceanography. We are grateful to Curt Covey and the CMIP Program for providing us with the model data used in this study. Thanks are due to G. Hegerl for useful discussions on the project. We also gratefully acknowledge the use of the global air temperature dataset from the Climate Research Unit (courtesy of Phil Jones) and use of version 2.2 of the Global Sea-Ice and Sea Surface Temperature dataset, 1903–1994 by N. A. Rayner, E. B. Horton, D. E. Parker, C. K. Folland, and R. B. Hackett of the Hadley Centre for Climate Prediction and Research Meteorological Office, London Road, Bracknell, Berkshire, United Kingdom.
REFERENCES
Barnett, T. P., and R. Preisendorfer, 1987: Origins and levels of monthly and seasonal forecast skill for united states surface air temperatures determined by canonical correlation analysis. Mon. Wea. Rev.,115, 1825–1850.
——, B. D. Santer, P. D. Jones, R. S. Bradley, and K. R. Briffa, 1996:Estimates of low frequency natural variability in near-surface air temperature. Holocene,6, 255–263.
——, G. Hegerl, B. Santer, and K. Taylor, 1998: The potential effect of GCM uncertainties on greenhouse signal detection. J. Climate,11, 569–675.
Cubasch, U., K. Hasselmann, H. Höck, E. Maier-Reimer, U. Mikolajewicz, B. Santer, and R. Sausen, 1992: Time-dependent greenhouse warming computations with a coupled ocean-atmosphere model. Climate Dyn.,8 (2), 55–69.
Gates, W. L., 1992: AMIP: The Atmospheric Model Intercomparison Project. Bull. Amer. Meteor. Soc.,73, 1962–1970.
Hasselmann, K., 1993: Optimal fingerprints for the detection of time-dependent climate change. J. Climate,6, 1957–1971.
——, and Coauthors, 1995: Detection of anthropogenic climate change using a fingerprint method. Max-Planck-Institute für Meteorologie Rep. 168, 20 pp. [Available from MPI, Bundestrasse 55, D-20146 Hamburg, Germany.].
Hegerl, G. C., H. von Storch, K. Hasselmann, B. D. Santer, U. Cubasch, and P. D. Jones, 1996: Detecting anthropogenic climate change with an optimal fingerprint method. J. Climate,9, 2281–2306.
——, K. Hasselmann, U. Cubasch, J. F. B. Mitchell, E. Roeckner, R. Voss, and J. Waszkewitz, 1997: On multi-fingerprint detection and attribution of greenhouse gas- and aerosol forced climate change. Climate Dyn.,13 (9), 613–634.
Jones, P. D., and K. Briffa, 1992: Global surface air temperature variations over the twentieth century. Part 1: Spatial, temporal, and seasonal details. Holocene,2, 165–179.
Maier-Reimer, E., U. Mikolajewicz, and K. Hasselmann, 1993: Mean circulation of the Hamburg LSG OGCM and its sensitivity to the thermohaline surface forcing. J. Phys. Oceanogr.,23, 731–757.
Manabe, S., and R. J. Stouffer, 1996: Low frequency variability of surface air temperature in a 1000-year integration of a coupled ocean-atmosphere model. J. Climate,9, 376–393.
Meehl, G.A., G.J. Boer, C. Covey, M. Latif, and R. J. Stouffer, 1997:Intercomparison makes for a better climate model. Eos, Trans. Amer. Geophys. Union,78, 445–446, 451.
North, G. R., T. L. Bell, R. F. Callahan, and F. J. Moeng, 1982: Sampling errors in the estimation of the empirical orthogonal functions. Mon. Wea. Rev.,110, 699–706.
——, K. Y. Kim, S. P. Shen, and J. W. Hardin, 1995: Detection of forced climate signals. Part I: Filter theory. J. Climate,8, 401–408.
Rayner, N. A., E. B. Horton, D. E. Parker, C. K. Folland, and R. B. Hackett, 1996: Version 2.2 of the Global Sea-Ice and Sea Surface Temperature dataset, 1903–1994. CRTN Rep. 74. [Available from Hadley Centre for Climate Prediction and Research Meteorological Office, London Road, Bracknell, Berkshire, United Kingdom.].
Roeckner, E., and Coauthors, 1992: Simulation of the present-day climate with the ECHAM model: Impact of model physics and resolution. Max-Planck-Institut für Meteorologie Rep. 93, 171 pp. [Available from MPI, Bundestrasse 55, D-20146, Hamburg, Germany.].
Santer, B. D., K. E. Taylor, T. M. L. Wigley, J. E. Penner, P. D. Jones, and U. Cubasch, 1995: Towards the detection and attribution of an anthropogenic effect on climate. Climate Dyn.,12 (2), 77–100.
——, and Coauthors, 1996: A search for human influences on the thermal structure of the atmosphere. Nature,382, 39–46.
Voss, R., R. Sausen, and U. Cubasch, 1998: Periodically synchronously coupled integrations with the atmosphere–ocean general circulation model ECHAM3/LSG. Climate Dyn.,14, 249–266.
Two-meter annual average air temperatures (°C) for selected locations: (top) 45°N, 180°, (bottom) 5°N, 150°W.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
First (top) and second (bottom) common EOFs for the 11 coupled global climate model ensemble. The leading mode accounts for 13.7% of the variance while the second mode accounts for 8% of the variance.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
Partial eigenvalue spectrum. The letter codes refer to models listed in Table 1. The heavy solid line represent the partial eigenvalue spectrum obtained by projecting the observed air temperature onto the CGCM common EOF basis set. The vertical bars show the approximate 95% confidence limits on the observed partial eigenvalues.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
First common principal component. The heavier line near the top of the illustration is the pseudoprincipal component obtained by projecting the observations onto the first common EOF.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
Same as Fig. 4 but for the second common EOF.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
First common EOF for the 10 member 100-yr control runs from the GFDL model. It accounts for 7.3% of the variance.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
Same as for Fig. 3, except for the 10 independent 100-yr control runs of the GFDL model.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
Same as Fig. 4 except for the GFDL ensemble.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
Leading EOF of the MPI anthropogenic run force by greenhouse gases and aerosols. See Voss et al. (1998) for additional details.
Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0511:CONSAT>2.0.CO;2
CGCM control run output received for CMPI1.
Partial eigenvalues.
A full investigation of the 1000-yr run from GFDL, Max Planck Institute (MPI), and UKMO is presented in Stouffer et al. (1998, manuscript submitted to J. Climate).