## Abstract

Empirical orthogonal function (EOF) analyses (rotated or not) are widely used in climate research. In recent years there have been several studies in which EOF analyses were used to highlight potential physical mechanisms associated with climate variability. For example, several SST modes were identified such as the “Tropical Atlantic Dipole,” the “Tropical Indian Ocean Dipole,” and different SLP modes in the Northern Hemisphere winter. In this note it is emphasized that caution should be used when trying to interpret these statistically derived modes and their significance. Indeed, from a synthetic example it is shown that patterns derived from EOF analyses can be misleading at times and associated with very little climate physics.

## 1. Introduction

In recent years the EOF technique has been largely used to identify potential physical modes. The problems that may arise by using EOF or rotated EOFs is the subject of this note.

In North et al. (1982) and Richman (1986) the problem of statistical uncertainty in the estimation of the EOFs is discussed. Here we would like to focus on problems of the EOF technique that are not due to statistical uncertainties and more inherent to the method itself.

In order to derive the leading modes of variability in a multivariate dataset, the EOF and rotated EOF analyses work with some basic, subjective, but well-motivated, assumptions. In the standard EOF analysis it is assumed that the modes are orthogonal in space and time, and that the first mode is the mode that maximizes the explained variance over the total dataset. The VARIMAX rotation of EOFs finds modes that are more localized in the space than the standard EOF modes.

It is often claimed that the VARIMAX method is more subjective than the EOF analysis itself because there are more free parameters that have to be defined. However, within the context of this study the VARIMAX method is as objective as the EOF analysis. The only difference between the two statistical analyses, as defined in this note, is the criterion with which we chose our spatial patterns. An overview of the different ways in rotating EOFs or defining the VARIMAX rotation can be found in Kaiser (1958), Richman (1986), and Mestas Nuñez (2000).

Although the assumptions made by the EOF and VARIMAX methods seem to be well motivated, the discussion of the examples presented below will show that these methods may lead to misinterpretations of the variability modes.

This paper is organized as follows. In the next section we shall present three examples of observed climate variability in which the interpretation by EOF, VARIMAX, and regression analyses leads to conflicting results in recent publications. Section 3 a simple low-dimensional example of multivariate data analysis is described that has, by construction, no statistical uncertainties in the determination of the EOF and VARIMAX patterns. In section 4, the main focus of this note, we shall then discuss the problems in interpreting the EOF, VARIMAX, and regression patterns in different examples and in general. We shall conclude this note by highlighting some caveats when interpreting the results of EOF or rotated EOF analyses.

## 2. Examples of EOF analyses

In the following text, the dominant modes of variability in three different observed datasets are shown as derived by EOF, VARIMAX, and regression analyses.

Unless otherwise noted, all patterns are derived from unnormalized data (the amplitudes are in units of the analyzed quantity), and each pattern is associated with a time series of unit standard deviation (normalized time series). We shall call the spatial pattern of the EOF or VARIMAX mode the pattern and refer to the normalized time series of the modes as the principal component (PC).

For all datasets the VARIMAX representation has been calculated by the rotation of the first 10 EOF patterns. We used the “raw” VARIMAX criterion instead of the “normal” criterion (see Kaiser 1958), in order to be consistent with the EOF analysis, in which we did not normalize the data (as it would be necessary to obtain the normal VARIMAX criterion). However, we shall discuss the differences between the two VARIMAX criteria in section 4 when we discuss the differences between covariance- and correlation-matrix-based EOF analysis.

We would like to discuss here only the patterns obtained from the different analyses. We do not intend to present new evidence about the variability in the three domains. Since statistical uncertainties in the estimation of the EOF patterns do not matter in the following discussion, we assume that all spatial patterns are well defined.

For the analyses of the SST we have used monthly mean SST anomalies based on the dataset of Reynolds and Smith (1994), which covers the period from 1958 to 1998. For the SLP analysis we have taken the monthly mean anomalies from November to April from the National Centers for Environmental Prediction (NCEP) reanalysis dataset covering the period from 1958 to 1997 (Kalnay et al. 1996).

### a. SST in the tropical Atlantic

The first two EOFs and VARIMAX patterns of the tropical Atlantic SST anomalies are shown in Fig. 1. The EOF-1 pattern is more or less uniform over the entire domain, while the EOF-2 is an interhemispheric dipole pattern. In contrast to the two EOF patterns, the two leading VARIMAX patterns are more localized, while each of the two leading VARIMAX pattern covers just one hemisphere and the two patterns do not overlap significantly. Two regression patterns between the box averaged SST of the centers of the dipole pattern and the SST field are shown additionally for comparison in Fig. 1.

The interhemispheric dipole pattern in the EOF-2 has received a lot of attention in terms of whether this pattern represents a potential physical mode of SST variability on decadal timescales (Weare 1977; Servain 1991; Nobre and Shukla 1996; Chang et al. 1997; Tourre et al. 1999), or is only an artifact of the EOF analysis (Houghton and Tourre 1992; Enfield et al. 1999; Dommenget and Latif 2000). Dommenget and Latif (2000) basically argue on the basis of coupled model results and observations that the dipole in the tropical Atlantic does not represent a physical mode.

### b. SST in the tropical Indian Ocean

A similar analysis is now repeated for the tropical Indian Ocean. The first two EOF and VARIMAX patterns and two regression patterns between box averaged SST and the SST field are shown in Fig. 2.

Again, the EOF-2 of the SST variability is characterized by a dipole. However, there are some significant differences compared to the tropical Atlantic. First, the EOF-1 of the Indian Ocean explains much more variance than the EOF-1 of the tropical Atlantic, and second, the EOF-1 explains also much more variance than the EOF-2 of the tropical Indian Ocean. Furthermore, the VARIMAX patterns do not pick up the two centers of EOF-2. The eastern center of the dipole does not show up in any of the four most dominant VARIMAX patterns (patterns 3 and 4 are not shown).

The first two EOF patterns have been interpreted in terms of potential physical processes by Saji et al. (1999). They point out that the EOF-1 has a strong correlation with the El Niño in the tropical Pacific and can therefore be interpreted as the Indian Ocean response to El Niño. A response of the Indian Ocean to ENSO is well known and has also been pointed out by others (e.g., Venzke et al. 2000; Reason et al. 2000). Since the EOF-2 has an orthogonal time evolution to EOF-1, they argue that the EOF-2 can be interpreted as an El Niño–independent mode of variability, which is unique to the tropical Indian Ocean. However, the VARIMAX representation and the regressions provide no indication for the existence of a dipole mode, as suggested by EOF-2.

### c. SLP variability in the Northern Hemisphere

We shall now analyze the Northern Hemisphere winter SLP variability. The following example is different in many aspects compared to the ones described above. In contrast to SST anomalies, SLP anomalies in one region are usually compensated by SLP anomalies of opposite sign in a nearby region at the same time. Therefore, the patterns of SLP have, in general, a dipole or multipole structure.

Furthermore, the standard deviation of the SLP is very inhomogeneous, with much stronger variance in higher latitudes compared to lower latitudes. In datasets with inhomogeneous standard deviations, the covariance-matrix-based EOF can be very different from a correlation-matrix-based EOF analysis. It is therefore instructive to additionally calculate the correlation-matrix-based EOF-analysis.

In Fig. 3, the first two covariance-matrix-based EOFs, correlation-matrix-based EOFs, VARIMAX modes, and two regression patterns are shown. Again, the different methods of representing the SLP variability in the Northern Hemisphere give quite different results with respect to the teleconnections. This may be one of the reasons why there is a scientific debate about which of these patterns best describe the dominant modes of SLP variability. For an overview of this controversy, see Ambaum et al. 2001 (see also Barnston and Livezey 1987; Thompson and Wallace 2000; Wallace 2000).

## 3. A simple low-dimensional example

A simple three-dimensional example might help to understand the difficulty in interpreting the patterns of the former examples. The advantage of the following artificial example compared to the ones described above is that we discuss a low-dimensional problem that is well defined and in which statistical uncertainties do not exist.

We assume that our domain can be divided into three regions. We then define three different modes of variability, which are shown in the upper panel of Fig. 4. We have one mode that acts only in the left region, one only in the right region, and one that covers all three regions. The explained variance of each mode is shown in the titles of each plot in Fig. 4. We assume that the time evolutions of theses modes are uncorrelated and that the standard deviation of all time series of these modes amount to unity.

The structures of the physical modes are motivated by the analyses of the SST in the tropical Atlantic and Indian Oceans. The three modes may therefore yield some further insight into the modal structure in these regions.

For the SLP in the Northern Hemisphere, mode-1 could be interpreted as the North Atlantic oscillation (NAO) of the Atlantic–European region (similar to VARIMAX-1 in Fig. 3), mode-2 as the Pacific–North America (PNA) pattern (similar to VARIMAX-2 in Fig. 3), and the mode-3 would be an annular mode (similar to EOF-1 in Fig. 3, but much weaker and more zonal). The three regions of the simple example would then be interpreted as the Atlantic–European region (the left region in Fig. 4), the Pacific domain (the right region), and the rest of the Northern Hemisphere (the central region).

However, to keep the problem as simple as possible we represent each region by one point only. The values at these points are printed on top of the mode (see Fig. 4). We can therefore interpret each physical mode as a three-dimensional vector, where each component of this vector represents the variability of one region. The set of the three vectors defines a matrix 𝗠. The actually observed variability in the three regions defines a vector 𝗬 that is related to 𝗠 by

The coordinates *P*_{i} of vector **P** describe the time evolutions (PCs) of the basis modes. By construction is the variance–covariance matrix **Σ ****PP** = 𝗜, where 𝗜 is the identity matrix.

The construction of our example allows us to calculate the covariance matrix exactly because our example has been constructed such that the characteristics of the physical modes are known exactly. Therefore, all structures that appear in the following statistical analysis are well defined.

The square root of the covariance matrix yields the regressions of one coordinate of the vector space (one region) with all coordinates (regions) of the vector space. The regression patterns and values are shown in the lower panel of Fig. 4.

Based on the covariance matrix we can also calculate the EOF vectors exactly. We therefore do not have to consider the sampling error problem, which can lead to unstable estimations of the EOF vectors (North et al. 1982). The EOFs are also shown in Fig. 4. The EOF vectors are not degenerated, because all eigenvalues of the covariance matrix are different (see explained variances of the EOFs in Fig. 4).

The set of the three EOF vectors define a matrix 𝗤. Similar to Eq. (1) the observed vector **Y** is related to 𝗤 by

The vector **P*** _{Q}* describes the time evolutions (PCs) of the EOFs. Using Eqs. (1) and (3) we can show that the vector

**P**

*can be presented by a linear combination of the vector*

_{Q}**P**:

with **Λ** = 𝗤^{T}𝗤 = the diagonal matrix of the eigenvalues of the EOFs we find

Thus the matrix 𝗔 describes the linear combination of the vector **P**, which constructs the vector **P**_{Q}.

The coefficients of **A** are listed in Table 1. A row in Table 1 describes the relative influence of the basis modes onto a single EOF mode. For example, it can easily be seen (in Fig. 4 and corresponding in Table 1) that the EOF-2 includes the time evolutions of mode-2 with positive loadings and mode-1 with slightly smaller negative values (Table 1). Please note that the EOF-2 represents a pattern that does not really exist in our simple example, so that it is completely artificial.

Usually the VARIMAX representation is calculated by using the EOF patterns. Here we can directly calculate the VARIMAX representation from our basis vectors because the basis vectors are already given with orthogonal time evolutions, which is usually not the case in climatological datasets. Therefore, the VARIMAX vectors are well defined. The VARIMAX patterns and their explained variances are also shown in Fig. 4, and the corresponding transformation matrix 𝗔 for the PCs of the VARIMAX vectors are listed in Table 2.

Our simple three-dimensional example has an inhomogeneous distribution of the local standard deviation, with larger variability in the left and right region and less variability in the center region. It is therefore similar to the inhomogeneous standard deviation of Northern Hemisphere winter SLP variability.

In datasets with inhomogeneous standard deviations, the covariance-matrix-based analysis can be very different from a correlation-matrix-based analysis. We have therefore calculated the correlation-matrix-based EOFs and VARIMAX modes, and correlations between the different regions (Fig. 5). The correlation-matrix-based VARIMAX analysis is equivalent to the normal VARIMAX as stated in Kaiser (1958).

The transformation matrix *A* for the PCs of the correlation-matrix-based EOFs and VARIMAX vectors are listed in Tables 3 and 4.

## 4. Discussion

We used three different statistical methods (EOF, VARIMAX, and regression analysis) to identify the different variability modes in different multivariate datasets.

In the following discussion we compare the results from the simple low-dimensional example with those from the other three examples using observed data. We do not consider statistical uncertainties because the problems due to statistical uncertainties in EOF analysis have already been discussed elsewhere (e.g., North et al. 1982; Richman 1986). Furthermore, the points that we make here are not related to statistical uncertainties.

Although the discussion will be mostly focused on the differences in the spatial patterns, one has to take into account that each pattern is related to a specific time series. Patterns that do show large differences in the spatial structures will, in general, also have large differences in the corresponding time series.

Additionally, we like to mention that the whole discussion is focused on standing modes of variability. Statistical analysis of propagating signals is an entirely different issue and cannot be discussed within this framework. For propagating signals other statistical methods such as principal oscillating patterns (POPs), extended or complex EOFs may be used (e.g., Hasselmann 1988; von Storch et al. 1990).

In the simple low-dimensional example we consider three variability modes. The three modes can be interpreted as the “real physical modes” of the domain. From a mathematical point of view all representations (e.g., EOF, VARIMAX) of the simple low-dimensional example are equally valid, but from a physical point of view we would like to find the representation, which is most clearly pointing toward the real physical modes of the problem.

We constructed the simple low-dimensional example by two local and spatially orthogonal modes, which should represent some simple internal modes (see Fig. 4). The third mode in this example represents a domainwide mode, which may be regarded, for instance, as the response of the domain to some kind of external influence. The third mode is not orthogonal in space with the other two, which will be important in the following discussion. By construction the simple low-dimensional example does not contain any statistical uncertainties, which allows us to determine the EOF and VARIMAX patterns exactly.

Although the mode-3 is the weakest one in the simple low-dimensional example, the EOF-1 is very similar to it (see Fig. 4). Despite the fact that it captures some features of the two other basis modes, it may be interpreted as the domain response to some kind of external influence, similar to how Saji et al. (1999) have interpreted the EOF-1 of the tropical Indian Ocean. Although the EOF-1 in the simple low-dimensional example is very similar to the mode-3, the PC-1 is a superposition of all three basis modes (see Table 1).

In the tropical Indian and Atlantic Ocean SSTs this kind of weak external influence may be the ENSO response or a greenhouse warming trend, as expressed by the leading EOFs (see Figs. 1 and 2). In the Northern Hemisphere SLP such an external influence might manifest itself as an annular mode such as the EOF-1 of Northern Hemisphere SLP (see Fig. 3).

On the other hand, we would like to clarify that the EOF-1 does not need to be a superposition of many modes. If we would have chosen the mode-3 as the most dominant mode in our simple example, then the patterns of the EOF and VARIMAX analyses would not look much different compared to the ones shown in Fig. 4, but the PC-1 would clearly be dominated by the mode-3. It is, for example, well known that the EOF-1 of the tropical Pacific SST is really representing the El Niño mode.

The orthogonality constraint in space forces the EOF-2 of the simple low-dimensional example to be a domainwide dipole, although the two centers of the dipole are not anticorrelated by construction (see Fig. 4). It can therefore be concluded that a domain that has an EOF-1 pattern with a shape of a domainwide monopole must have a dipole in the EOF-2. The dipole, however, is totally an artifact of the orthogonality constraint.

The EOF-3 and VARIMAX-3 patterns in Fig. 4 are interesting because they indicate a kind of central mode that does not really exist. Interestingly, the time evolution of this mode is a superposition of all three basis modes. This leads to the fact that the PC-3 includes variability from the basis mode-1 and mode-2 that actually are not influencing this region at all (see Tables 1 and 2).

By construction the EOF analysis maximizes the explained variance in the leading EOFs. This will generally lead to the fact that only a few EOF patterns are needed to explain a large amount of variability. In the artificial example the two leading EOFs explain more than 95% of the total variance (see Fig. 4). However, our artificial example has three modes. This indicates that the EOF analysis will, in general, underestimate the complexity of the problem. This is also indicated in the tropical Indian Ocean SST analysis in which the two leading EOFs explain much more total variance than the two leading VARIMAX patterns (see Fig. 2).

Sometimes maps of explained local variances are shown in order to highlight certain regions in which a relatively high amount of variance is explained, indicating that these regions should be analyzed in greater detail. This approach will generally favor the VARIMAX method because VARIMAX optimizes the simplicity and therefore produces local patterns. Although, the VARIMAX representation is often a very instructive representation of the data, it may often fail to find global modes, such as the mode-3, due to the optimization of the simplicity that favors localized modes.

In Fig. 5, we have repeated the analysis of our simple example but with correlation-matrix-based EOFs and VARIMAX analysis, and by computing the correlations between the different regions. The patterns are presented in terms of correlation values. These representations look quite different from the covariance-matrix-based analyses. Here, the VARIMAX analysis and the correlations are in very good agreement with the original modes, but the EOF patterns are again very different from the original modes.

This example and the example of the SLP variability in the Northern Hemisphere (see Fig. 3) may indicate that correlation-matrix-based analyses are more instructive than covariance-matrix-based analyses. However, we believe that this cannot be generalized. Whether correlation- or a covariance-matrix-based analysis gives a better representation of the “physical modes” depends strongly on the spatial structure of the physical modes. Imagine, for instance, that the Pacific and the Atlantic pole in the covariance-matrix-based EOF-1 in Fig. 3 would have the same spatial structure, but the Pacific pole would have a larger amplitude than the Atlantic pole. In this case a correlation-matrix-based analysis would not be able to focus on one of the poles, as in Fig. 3, because the larger amplitude of the Pacific pole is not known to the correlation matrix. In this case the covariance-matrix-based analyses would be a better representation, and the EOF-1 would be focused on the stronger Pacific pole.

In the artificial example the regression patterns seem to be most instructive in representing the dominant modes of variability. However, the disadvantages of the regression analysis is that the choice of the index region is highly subjective and it is much easier to choose an index that is not instructive at all than to choose an adequate index. For the SLP in the Northern Hemisphere, for instance, we could have chosen an index region over the North Pole, and the regression would look very much like the covariance-matrix-based EOF-1 (the regression pattern is not shown, but see Fig. 3 for the EOF-1). Thus, the disadvantage of the regression analysis is its subjectivity so that one always needs to argue why a certain index has been chosen.

Often regression indices are motivated by EOF analysis (e.g., the tropical Atlantic or Indian Ocean dipole indices), which seem to make the regression indices more objective. However, one has to consider that these indices are as limited in the interpretations as the EOF patterns themselves from which these indices are derived.

In our simple example both covariance-matrix-based EOF and VARIMAX analysis some-how fail to adequately represent the weak global mode (mode-3) and one can imagine that in many practical problems the correlation-matrix-based EOF and VARIMAX analysis will also fail to identify the weak global mode. It may therefore be a good approach to eliminate the weak global mode prior to the EOF analysis.

However, there is no simple way to determine the pattern and time series of such a weak global because one cannot derive these structures by analyzing the domain itself. This would again lead to a superposition of the local and weak global modes into one mode, as in the EOF-1 in Figs. 4 and 5. The structure of the weak global mode has to be determined by some additional knowledge about external influences such as global warming or ENSO.

## 5. Conclusions

We have shown that EOF and rotated EOF analyses have problems in identifying the dominant centers of action or the teleconnections between these centers of action in multivariate datasets. We therefore have to be very careful in interpreting the EOF or rotated EOF modes as potential physical modes.

The problems in interpreting the patterns derived from EOF and VARIMAX analyses arise from the basic assumptions that are made by these statistical methods that are not identical to the assumptions that we make to derive the so-called physical modes of the problem. The EOF analysis always represents modes of variability that are orthogonal in time and space. The constraint of the orthogonality in space is often not consistent with the real nature of the problem, as in the simple example, in which the basis modes are not orthogonal in space (see Fig. 4). The VARIMAX analysis is looking for localized modes, which is also not adequate for our simple example because the mode-3 is highly nonlocal.

A good strategy for statistical analysis of climate data is to look at the data with different statistical tools, such as regressions, VARIMAX, or EOF analysis, and develop a hypothesis for the potential physical modes that is consistent with all representations of the data, instead of developing a hypothesis for the potential physical modes based on only one representation, which is often in contradiction with other representations.

We would like to conclude our discussion with the following caveats for the interpretation of the results of the EOF or VARIMAX methods.

The teleconnection patterns derived from the orthogonal analysis cannot necessarily be interpreted as teleconnections that are associated with a potential physical process (e.g., the dipole pattern Fig. 4).

The centers of action derived from the EOF or VARIMAX methods do not need to be the centers of action of the real physical modes (see EOF-3 or VARIMAX-3 in Fig. 4).

The PCs of the dominant patterns are often a superposition of many different modes that are uncorrelated in time and that are often modes of remote regions that have no influence on the region in which the pattern of this PC has its center of action.

## Acknowledgments

We would like to thank Astrid Baquero, David Enfield, Ian Jolliffe, Ute Merkel, Alberto Mestas Nuñez, Holger Pohlmann, Hans von Storch, Francis Zwiers, and an anonymous reviewer for useful comments. This work was supported by the German Government's Ocean CLIVAR Programme.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Dietmar Dommenget, Physical Oceanography Research Division, Scripps Institution of Oceanography, 9500 Gilman Dr., La Jolla, CA 92093-0230. Email: dommenget@ucsd.edu