## Abstract

An approach to identifying climate changes is presented that does not hinge on simulations of natural climate variations or anthropogenic changes. Observed interdecadal climate variations are decomposed into several discriminants, mutually uncorrelated spatiotemporal components with a maximal ratio of interdecadal-to-intradecadal variance. The dominant discriminants of twentieth-century variations in surface temperature exhibit large-scale warming in which, particularly in the Northern Hemisphere summer months, localized cooling is embedded. The structure of the large-scale warming is consistent with expected effects of increases in greenhouse gas concentrations. The localized cooling, with maxima on scales of 1000–2000 km over East Asia, eastern Europe, and North America, is suggestive of radiative effects of anthropogenic sulfate aerosols.

## 1. Introduction

Procedures for distinguishing anthropogenic climate changes from natural climate fluctuations can be divided into two parts. In the first part, an *exploratory data analysis* (Tukey 1962, 1977), one lifts components of slow interdecadal climate variations from the faster intradecadal climate variations. Anthropogenic climate changes that evolve slowly over decades contribute to the interdecadal climate variations, and inasmuch as there is a significant human influence on climate, signatures of anthropogenic changes should be discernible in the components of interdecadal variations. In the second part, a *confirmatory data analysis* (Tukey 1977), one then assesses in statistical comparisons between observations and simulations the probability with which features of the components of interdecadal climate variations represent anthropogenic changes or natural fluctuations. In the exploratory analysis, one extracts components of observed interdecadal climate variations that should by themselves, independent of specific climate simulations, be interpretable and suggest hypotheses about their origin. In the confirmatory analysis, one tests these hypotheses in statistical comparisons with climate simulations.

In climate change detection studies (Santer et al. 1996), more weight has been placed on the confirmatory part of the analysis than on the exploratory part, the latter often being limited to consideration of global and hemispheric means or local linear trends of climate variables. For example, it has been shown that local linear trends of twentieth-century surface temperatures are only statistically consistent with simulations if radiative effects of anthropogenic greenhouse gases and sulfate aerosols are taken into account in the simulations (Santer et al. 1996; Tett et al. 1999; Knutson et al. 1999). Yet components of interdecadal temperature variations that exhibit a clear signature of, say, radiative effects of anthropogenic sulfate aerosols have not been identified in observational data. Here we present an exploratory analysis that takes the spatially nonlocal covariance of climate variables into account and allows for interdecadal variations evolving nonlinearly in time. In this analysis, interdecadal climate variations are decomposed into spatiotemporal components that are more amenable to interpretation than spatial means or local linear trends.

## 2. Method

Spatiotemporal components of interdecadal climate variations can be lifted from intradecadal climate variations by discriminant analysis (Ripley 1996, chapter 3; Fukunaga 1990, chapter 10). A discriminant analysis identifies canonical variates, the linear combinations of variables in a multivariate dataset that best discriminate among predefined data groups. The canonical variates are mutually uncorrelated and are ordered according to decreasing discriminatory power, the discriminatory power being measured as the ratio of among-group variance to within-group variance. In this study, the variables represent earth surface temperatures at the nodes of a spatial grid, and data for approximately one decade form a group. The first canonical variate, discriminating among decadal data groups, is that linear combination of surface temperatures for which the ratio of interdecadal variance to intradecadal variance is maximized. Higher-order canonical variates maximize this ratio subject to the constraint that they are uncorrelated with lower-order canonical variates. Associated with the canonical variates are discriminating patterns, spatial patterns that consist of the regression coefficients of the temperature data onto the canonical variates (see the appendix, section c). The value of a canonical variate indicates, for any given time, the amplitude of the associated discriminating pattern in the data. Thus, a pair consisting of a discriminating pattern and a canonical variate, a pair we call a discriminant, characterizes interdecadal temperature variations spatially and temporally. In that the discriminant analysis extracts uncorrelated discriminants with successively decreasing ratios of interdecadal to intradecadal variance, it can be viewed as lifting off successive decoupled layers of interdecadal variations from the temperature data.

## 3. Results

We identified discriminants of interdecadal variations in the monthly mean surface temperatures for the years 1916–98 in the region between 25°S and 60°N. In this period and region, data coverage is sufficiently continuous for a multivariate analysis (see the appendix for details).

Figure 1 shows the first discriminants for January and July. (The first discriminants for the other months of the solstice seasons have a similar structure.) Since temporal correlations within or among the decadal data groups are not taken into account in the discriminant analysis, the fact that the time evolution of the first canonical variate is coherent over decades, for both January (Fig. 1b) and July (Fig. 1d), is evidence that the first discriminants are not artifacts of the analysis but represent significant interdecadal temperature variations. For the first discriminants for January and July, the ratio ℛ of interdecadal to intradecadal variance is approximately 7.^{1} For comparison, for the spatial mean temperatures in the period and region analyzed, the variance ratio ℛ is 0.7 for January and 1.1 for July. That is, the discriminant analysis separates interdecadal and intradecadal climate variations much more efficiently than a spatial mean.

The first discriminants are robust; their qualitative features do not depend on analysis parameters such as choice of data groups. The second and third canonical variates show time evolutions that are likewise coherent over decades, with variance ratios ℛ ≈ 2; canonical variates of yet higher order show no temporal coherence and variance ratios ℛ < 1. The time evolutions of the higher-order canonical variates are nonmonotonic, in contrast to the nearly monotonic increase of the first canonical variates (Figs. 1b,d) with which the higher-order canonical variates are uncorrelated. Some features of the second and third discriminants can be associated with known characteristics of observed temperature variations, such as temporary cooling of the North Pacific in the 1970s (Trenberth 1990; Graham 1994; Deser et al. 1996; Knutson et al. 1999). The structure of the second and third discriminants depends more strongly on analysis parameters, however, so we restrict the present discussion to the robust first discriminants.^{2}

The first discriminants for January and July indicate a relatively uniform (Figs. 1a,c) and steady (Figs. 1b,d) warming of the ocean surface, with a weak cooling of the North Atlantic as a prominent exception. Over the continents, the first discriminants exhibit a more structured distribution of warming and cooling, with cooling being more widespread in July than in January. Regional temperature averages confirm what the first discriminants indicate: The July mean temperature averaged over the three continental regions outlined in Fig. 1c, the regions in which the discriminating pattern indicates summertime cooling, did indeed decrease between the 1940s and 1970s, and the first discriminant captures a substantial portion of this temperature decrease (Fig. 2a). In contrast, the July mean temperature over the Northern Hemisphere land surfaces outside the regions outlined in Fig. 1c did not decrease significantly between the 1940s and 1970s (Fig. 2b). Neither is there a discernible temperature decrease over Northern Hemisphere land surfaces between the 1940s and 1970s in winter (see also Jones 1994). Hence, the oft-discussed decrease of the annually and spatially averaged Northern Hemisphere land temperatures between the 1940s and the 1970s (Jones 1994) can largely be attributed to summer cooling in North America, eastern Europe, and East Asia, the regions of maximal cooling in the first discriminating pattern.

## 4. Discussion

Many features of the temperature changes indicated by the first discriminants are suggestive of human influences on climate. The relatively uniform and steady warming of the ocean surface points to a gradual global increase in the radiative forcing of the earth’s surface, consistent with expected effects of the increase in greenhouse gas concentrations, or possibly, an increase in solar irradiance (Schimel et al. 1996; Tett et al. 1999). Since multidecadal internal fluctuations in the climate system can only be sustained through the large thermal and dynamic inertia of the oceans, multidecadal processes of internal climate variability usually involve the ocean circulation and lead to spatially structured variations in sea surface temperatures. A process of internal climate variability that leads to as uniform and steady an ocean warming as that indicated by the first discriminants is difficult to conceive. Deviations from uniform ocean warming, which are seen, for example, in linear trend analyses (Nicholls et al. 1996, Fig. 3.4), are in this analysis largely confined to higher-order discriminants with nonmonotonic time evolutions. In the first discriminants, the only prominent deviation from uniform ocean warming is the weak cooling of the North Atlantic, which may be part of an internal fluctuation of the climate system. In some climate simulations, the North Atlantic is the region of maximal interdecadal variability (Delworth et al. 1993; Delworth and Mann 2000), and strong multidecadal fluctuations have also been identified in observational data for this region (Kushnir 1994). Alternatively, part of the North Atlantic cooling might be attributable to the increase in greenhouse gas concentrations: in response to increases in greenhouse gas concentrations, most simulations show a minimum in the warming or even a cooling in the North Atlantic, caused by deep oceanic mixing in this area and/or a decrease in the intensity of the thermohaline circulation (Kattenberg et al. 1996).

The structure of the warming over continents is consistent with expected effects of an increase in radiative forcing. The warming is stronger in winter than in summer, because the atmosphere over continents is more strongly stratified in winter, so that in winter the local temperature response to an increase in radiative forcing is concentrated in lower layers of the atmosphere and has a stronger surface signature than it would have if the temperature response were more evenly distributed vertically (Wetherald and Manabe 1975). Snow-albedo feedback should further amplify this surface temperature response in high latitudes (Kattenberg et al. 1996).

The localized cooling over continents is suggestive of radiative effects of anthropogenic sulfate aerosols. The cooling in the Northern Hemisphere is stronger in July (Fig. 1c) than in January (Fig. 1a), consistent with the radiative forcing due to anthropogenic sulfate aerosols being greatest in summer (Mitchell and Johns 1997;Chuang et al. 1997). For January, the first discriminating pattern (Fig. 1a) shows the strongest cooling over the southeastern United States and over central Africa, weaker cooling over Southeast Asia, and over Europe a warming that is reduced when compared with other continental areas of similar latitude. In simulations of the effects of increased concentrations of greenhouse gases and sulfate aerosols on Northern Hemisphere winter climate, linear temperature trends show a similar pattern of warming and cooling (Knutson et al. 1999; Kattenberg et al. 1996). Only the cooling over central Africa does not appear in simulations; given that the sulfate aerosol loadings in this area are estimated to be small (Mitchell and Johns 1997), other processes such as changes in properties of the land surface or emissions from biomass burning might be responsible for the cooling. For July, the first discriminating pattern (Fig. 1c) shows strong cooling over eastern Europe and East Asia, industrial regions with large sulfate aerosol loadings (Mitchell and Johns 1997; Chuang et al. 1997). It also shows strong cooling over central North America, though the anthropogenic sulfate aerosol loadings for North America are estimated to be largest over the East Coast, roughly coincident with where the January cooling is maximal (Mitchell et al. 1995; Mitchell and Johns 1997; Chuang et al. 1997). Given the sensitivity with which, in regional climate simulations, summer temperatures in central North America depend on vegetation parameters, it is conceivable that changes in land use have been the dominant influence on summer temperatures in this region (Xue at al. 1996; Bonan 1997; Stohlgren et al. 1998).

## 5. Conclusions

Independent of specific climate simulations, the discriminants identified from observational data indicate in what way surface temperatures have distinctly changed over the course of the twentieth century. Both spatial and temporal features of the dominant discriminants are consistent with expected effects of anthropogenic greenhouse gases and sulfate aerosols. This new observational evidence for a human influence on climate complements the simulation-dependent evidence from climate change detection studies.

The probability with which features of the discriminants in Fig. 1 can be attributed to different natural and anthropogenic climate processes remains to be assessed in confirmatory analyses, that is, in statistical comparisons of features of the observational discriminants with simulations. The markedly different structures of the January and July discriminants identified in this exploratory analysis suggest that confirmatory analyses be carried out not with annually averaged data, but with data that resolve seasons or months. Furthermore, because much of the cooling seen in the discriminants is localized on scales of 1000–2000 km, confirmatory analyses need to consider climate variations on scales smaller than the scales of 5000 km or more considered in many climate change detection studies (e.g., Santer et al. 1996; Stott and Tett 1998; Tett et al. 1999). Improved statistical methods might be needed to distinguish anthropogenic changes from natural climate fluctuations on these smaller scales, methods based, for example, on comparisons of observational discriminants with ensembles of discriminants from simulations.

## Acknowledgments

T. Schneider was supported by a NASA Earth System Science Fellowship. We thank Keith Dixon for providing an ensemble of climate simulations with which we tested the analysis methods; Tom Delworth, Stephen Griffies, Tom Knutson, Paul Kushner, Thomas Müller, and Francis Zwiers for comments on drafts of this paper; and Heidi Swanson for editing the manuscript.

## REFERENCES

### APPENDIX

#### Methods

##### Data and preprocessing

We analyzed monthly mean surface temperatures arranged on a 5° × 5° grid (Jones 1994; Parker et al. 1994, 1995). We carried out a separate analysis for each month considered, restricting analyses to the years 1916–98 and to the region between 25°S and 60°N and excluding grid nodes for which more than 70% of the temperature values were missing. In the datasets with temperatures for the remaining nodes, approximately 15% of the values were missing. A regularized expectation–maximization (EM) algorithm was used to impute the missing values and to compute, for each month analyzed, an estimate **Σ** of the spatial covariance matrix of surface temperatures (Schneider 2001). To reduce the underestimation of the spatial variances and covariances, the residual covariance matrices in the regularized EM algorithm were multiplied with an inflation factor determined from climate simulations in which temperature values were deleted in the same manner in which they were missing in the observational data (Schneider 2001;Dixon and Lanzante 1999).

The completed datasets were centered by subtracting from the temperatures for each grid node and month the corresponding mean temperatures. The centered data for each month were assembled into an *n* × *p* data matrix **X**, with the *n* = 83 rows representing years and with the *p* columns representing nodes of the grid. The number of nodes *p* was about 1200, depending on how many nodes were excluded from the analysis for a given month.

Spatial mean temperatures were computed from the completed and centered datasets. For the discriminant analyses, the data were smoothed with a radial Gaussian smoothing filter **H** of standard deviation 300 km and scaled with a diagonal matrix **D** containing, for each node, a factor proportional to the square root of the area the node represents. The matrix **X**_{s} = **XHD** with the smoothed and scaled data and the correspondingly transformed covariance matrix estimate **Σ**_{s} = (**HD**)^{T}**Σ**(**HD**), with the superscript ^{T} indicating transposition, formed the basis of the discriminant analysis.

##### Data groups

We defined *g* = 6 data groups centered at the years *y*^{c} = (1918, . . . , 1993), with 15 years between successive group centers *y*^{c}_{k}. The groups overlap, with a fractional membership

in the *k*th group assigned to the year *y*_{j} to which the *j*th row of the data matrix **X**_{s} belongs. The fractional memberships *G*_{jk} form the *n* × *g* group matrix **G**. All row sums of the group matrix are one, Σ_{k} *G*_{jk} = 1, such that each year *y*_{j} carried the same weight in the analysis.

##### Discriminant analysis

We adapted Ripley’s (1996, chapter 3) formulation of linear discriminant analysis to the case of a group matrix **G** with fractional memberships and a covariance matrix estimate **Σ**_{s} not proportional to the sample covariance matrix of the data **X**_{s}. The discriminant analysis is based on an eigendecomposition of the matrix **Γ** = **Σ**^{+}_{s}**Σ**_{>}, where **Σ**^{+}_{s} is an approximate pseudoinverse of **Σ**_{s} (see the appendix, section d), and

is an estimate of the among-group, or interdecadal, covariance matrix. [The matrix **Γ**, however, need not be computed explicitly (Ripley 1996, chapter 3; Schneider and Griffies 1999).] The (*g* − 1) right eigenvectors **u** of **Γ** are the weight vectors that, multiplied with the data matrix **X**_{s}, give the canonical variates **c** = **X**_{s}**u**. What we call the discriminating patterns are the regression coefficients **v** = (**u**^{T}**Σ**_{s}**u**)^{−1}**D**^{−1}**Σ**_{s}**u** of the smoothed data **XH** = **X**_{s}**D**^{−1} onto the canonical variates **c**. Figure 1 shows the first discriminating patterns **v** and the first canonical variates **c** = **X**_{s}**u**, the weight vectors **u** being normalized such that the canonical variates have unit variance, var(**c**) = **u**^{T}**Σ**_{s}**u** = 1. For the first canonical variate, the ratio of interdecadal variance **u**^{T}**Σ**_{>}**u** to total variance **u**^{T}**Σ**_{s}**u** is maximal among the linear combinations **X**_{s}**u** for which the weight vector **u** lies in the range of **Σ**^{+}_{s} (see the appendix, section d; Ripley 1996, chapter 3). Maximizing the ratio of interdecadal variance to total variance is equivalent to maximizing the ratio ℛ = (**u**^{T}**Σ**_{>}**u**)/(**u**^{T}**Σ**_{<}**u**) of interdecadal variance to within-group, or intradecadal, variance **u**^{T}**Σ**_{<}**u**, where

is an estimate of the intradecadal covariance matrix (Fukunaga 1990, chapter 10).

This discriminant analysis can be viewed as a predictable component analysis (Schneider and Griffies 1999) of the regression of the data **X**_{s} onto the group matrix **G**. Upon identification of predictable components with canonical variates and of predictable patterns with discriminating patterns, the rationale of the predictable component analysis can be applied: the canonical variates are those linear combinations of surface temperatures that are most predictable given the group memberships *G*_{jk} as predictors. Like predictable components and predictable patterns, canonical variates and discriminating patterns can be contrasted with principal components and empirical orthogonal functions (cf. Schneider and Griffies 1999).

##### Regularization

Since the number of variables *p* exceeds the sample size *n,* the discriminant analysis is ill-posed and must be regularized (Hansen 1997). We performed the discriminant analysis in a truncated principal component basis, which amounts to using an approximate pseudoinverse **Σ**^{+}_{s} in place of the inverse **Σ**^{−1}_{s} that would usually appear in a discriminant analysis. The regularization restricts the weight vectors **u** to the range of **Σ**^{+}_{s}, that is, to linear combinations of the retained principal component vectors. The truncation parameter was chosen by generalized cross-validation (Hansen 1997) of a principal component regression **G** = **X**_{s}**A** + ;eR of the group matrix **G** onto the data **X**_{s}, the matrix **A** consisting of unknown regression coefficients and ;eR being a residual. This heuristic yielded, for the January data, a truncation after 24 principal components, which account for 90% of the January temperature variations; and for the July data, a truncation after 22 principal components, which account for 75% of the July temperature variations. Qualitatively, the results shown are insensitive to the choice of truncation parameter. However, the quoted values of the variance ratio ℛ are affected by selection bias and depend on the truncation parameter, with ℛ generally increasing with increasing truncation parameter. For example, with a truncation after 17 principal components, the variance ratio ℛ is approximately 5 for the first discriminants for January and July, whereas ℛ is approximately 7 with the truncation after 24 (January) and 22 (July) principal components.

##### Bias of imputed temperature values

Because changes in the availability of temperature data for the twentieth century are concomitant with multidecadal temperature changes, the missing temperature values are not missing at random and the imputed values are biased (cf. Little and Rubin 1987). Imputing deleted values in an ensemble of simulated twentieth-century data (Dixon and Lanzante 1999) in the same way as in the observational data and examining the imputation errors, we found significant biases of the imputed values primarily for the years before 1950. Because of the biases of the imputed values, the spatial-mean warming trend between 1916 and the 1940s was underestimated on average by 18% (January) and 10% (July). The local bias of the imputed temperature values was greatest in high northern latitudes in January, where it led to an underestimation of the warming trend. We found no evidence that the discussed prominent features of the discriminants might be due to biases of imputed temperature values.

## Footnotes

*Corresponding author address:* Tapio Schneider, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, NY 10012.

Email: tapio@cims.nyu.edu

* Current affiliation: Courant Institute of Mathematical Sciences, New York University, New York, New York.

^{2}

Plots of the second and third discriminants as well as the analyzed datasets are available from the authors upon request.