## Abstract

The canonical correlation analysis (CCA) and singular value decomposition (SVD) approaches for estimating a time series from a time-dependent vector and vice versa are investigated, and their relationship to multiple linear regression (MLR) and to regression maps is discussed. Earlier findings are reviewed and combined with new aspects to provide a systematic overview. It is shown that regression maps are proportional to canonical patterns and to singular vectors and that the estimate of a time-dependent vector from a time series does not depend on whether CCA, SVD, or component-wise regressions are used. When a time series is linearly estimated from a time-dependent vector, it is known that CCA is equivalent to MLR. It is demonstrated that an estimate for the time series based on a time expansion coefficient of the regression map that is calculated by orthogonal projection is identical to an SVD estimate, but different from the CCA and MLR estimate. The two approaches also lead to different correlations between the time series and the time expansion coefficient of its signal.

The CCA–MLR and the SVD–regression map approaches are compared in an example where the January Arctic Oscillation index for the period 1948–2002 was estimated from extratropical Northern Hemispheric 850-hPa temperature. For CCA–MLR the leading principal components (PCs) of the temperature field were used as predictors, while for SVD the full field was employed. For more than seven retained PCs the skill in terms of correlations and mean squared error based on cross validation was for both approaches practically identical, but CCA–MLR showed a higher bias. For a smaller number of predictor PCs the SVD–regression map approach performed better. The discrepancy between the skill on the fitting data and on the independent data used for validation was in this example larger for the CCA–MLR approach.

## 1. Introduction

Regression maps have been used in many studies to find climate signals associated with a given time series, for instance, to identify temperature and precipitation signals associated with the North Atlantic, Arctic, or Antarctic Oscillations (Hurrel 1996; Thompson and Wallace 1998; Shindell et al. 1999; Reichert et al. 2001; Jones and Widmann 2003); to find geopotential height anomalies related to the leading principal components (PCs) of precipitation (Quadrelli et al. 2001); or to find the climatic response to solar forcing (Waple et al. 2002). When the link between the time series and the time-dependent field is formulated individually for each location, regression maps are a straightforward way to capture the signal of the time series in the field. A nonlocal interpretation of regression maps was given in Wallace et al. (1995), who showed for the special case of fields with a spatial mean of zero that the time expansion coefficient (TEC) of the regression map, defined by orthogonal projection of the field onto the regression map, has maximal covariance with the time series that was used to define the regression map and that, thus in this case, the results are identical to those of a singular value decomposition (SVD). TECs of regression maps between the Arctic Oscillation index (AOI) and temperature or geopotential height fields were also calculated by Thompson and Wallace (1998) using an orthogonal projection, whereas in Thompson et al. (2000) TECs of correlation maps were used. It was not discussed in these papers that calculating TECs of regression maps through orthogonal projection is related to SVD and the TECs were compared to the AOI, but not used to estimate the AOI or vice versa.

It is the purpose of this paper to clarify the relationship between regression maps, SVD, and canonical correlation analysis (CCA) and to discuss the various ways in which a time series can be linearly linked to a time-dependent field. All the derived properties follow directly from the basic definitions and general solutions of CCA and SVD. Some of the findings could also be easily inferred from relations listed in Bretherton et al. (1992), hereafter referred to as BSW92. However, the case in which one of the two fields is just a time series, which then must be proportional to the TEC of a canonical pattern or singular vector, was not explicitly discussed in BSW92, and thus the relationship between regression maps and CCA or SVD may have gone unnoted.

## 2. Some properties of canonical correlation analysis and singular value decomposition

In this section properties of CCA and SVD that are needed for the arguments in this paper are briefly reviewed. The nomenclature follows closely that of BSW92. Basic definitions and properties that are omitted for brevity can be found for instance in BSW92 and in von Storch and Zwiers (1999).

We consider *N _{s}*- and

*N*-dimensional, real, time-dependent fields with zero temporal mean in each dimension, which are represented as column vectors

_{z}**s**(

*t*) and

**z**(

*t*). In statistical climatology

*N*and

_{s}*N*typically refer to a spatial index. Both CCA and SVD find coupled patterns in these fields, based on different optimization criteria. Let us first consider CCA and let

_{z}**u**

*and*

_{k}**v**

*be the weight vectors or adjoint canonical patterns, and*

_{k}**p**

*,*

_{k}**q**

*the canonical patterns in the*

_{k}**s**(

*t*) and

**z**(

*t*) field, respectively. In CCA the general solution for the adjoint canonical patterns

**u**

*is given by the eigenvector equation*

_{k}where 𝗖* _{ss}* and 𝗖

*denote the covariance matrices of*

_{zz}**s**and

**z**, 𝗖

*the cross-covariance matrix between the two fields, and*

_{sz}*λ*the eigenvalues. A similar equation holds for the adjoint canonical patterns

_{k}**v**

*. Pairs of adjoint patterns are related through*

_{k}with a suitable normalization factor *η* determined by the normalization convention

Patterns and adjoint patterns are related through

The weight vectors are used to define the TECs *a _{k}*(

*t*) for the patterns

**p**

*through*

_{k}and analogously for **z**(*t*).

In SVD the set of patterns is obtained through a factorization of the cross-covariance matrix

where **P** ∈ ℝ^{Ns×Ns} and **Q** ∈ ℝ^{Nz×Nz} are orthogonal matrices, whose columns are the patterns or singular vectors **p*** _{k}* and

**q**

*. Because in contrast to CCA the set of singular vectors*

_{k}**p**

*(*

_{k}**q**

*) is orthonormal, the*

_{k}*k*th TEC of the pattern is obtained by orthogonal projection of the data at each time step onto the pattern, which means that the weight vectors

**u**

*and*

_{k}**v**

*are identical to the patterns*

_{k}**p**

*and*

_{k}**q**

*.*

_{k}CCA and SVD can be used to describe links between two fields in a symmetrical way through pairs of patterns (**p*** _{k}*,

**q**

*) or weight vectors (*

_{k}**u**

*,*

_{k}**v**

*). A second application, which is used frequently for instance for statistical downscaling or for climate reconstructions, is the estimation of one field from the other. If*

_{k}**z**is estimated from

**s**, the estimate

**ẑ**based on the leading

*n*pairs of CCA or SVD patterns is given by

where the estimated TECs *b̂ _{k}*(

*t*) are obtained by linear regression from the TECs

*a*(

_{k}*t*) of the other field. Because in CCA and SVD the TEC

*b*(

_{k}*t*) is only correlated with

*a*(

_{k}*t*) and uncorrelated with all other TECs of the field

**s**, the estimate is given by

For CCA-based reconstructions this can be rewritten using the canonical correlation *ρ _{k}*

Note that Eqs. (8) and (9) include weight vectors of the predictor field and patterns of the predictand field. Because in CCA weight vectors and patterns are not identical, the best way to present the results of a CCA used for estimating one field from another is to show the adjoint canonical patterns of the predictor field and the canonical patterns of the predictand field since these are the terms actually used in the estimation. In SVD weight vectors and pattern are identical, and thus the question whether to present the former or the latter does not arise.

## 3. Solutions for one-dimensional CCA and SVD

When one of the two fields **s**(*t*) and **z**(*t*) is one-dimensional some of the matrices that define the solutions become considerably simpler, and there is only one pair of canonical vectors or singular vectors. Let for instance *s*(*t*) be the one-dimensional time series. Then 𝗖* _{ss}* is just a scalar given by 𝗖

*= var[*

_{ss}*s*(

*t*)], while 𝗖

*reduces to a row vector 𝗖*

_{sz}*= cov[*

_{sz}*s*(

*t*),

**z**

^{T}(

*t*)], whose components are the covariances between

*s*(

*t*) and the components of

**z**(

*t*).

As *u* is known, Eq. (2) and the normalization constraint (3) can be used to calculate the adjoint canonical pattern **v**, which is then given by

Employing (4) yields for the canonical patterns

The singular value decomposition becomes trivial, because 𝗖* _{sz}* is just a vector, and yields

which is proportional to the CCA solution (13). (To keep the notation simple the same variable names as in the CCA section are used, although the CCA and SVD solutions differ).

Thus the patterns **q** derived from SVD and from CCA are both proportional to the regression map

while the adjoint canonical pattern **v** is proportional to the weights

obtained from a multiple linear regression (MLR) with **z** as the predictors and *s* as the predictand. Note that the first proportionality could be derived from Table 1 in BSW92, where it is noted that singular vectors and canonical patterns are proportional to heterogeneous covariance maps, which are defined as the covariances between the TEC *a _{k}*(

*t*) and the field

**z**(

*t*), or between

*b*(

_{k}*t*) and

**s**(

*t*). When one takes into account that for one-dimensional

*s*the SVD and CCA TEC

*a*(

*t*) is just a multiple of

*s*, the proportionality between

**q**and

**m**follows.

### a. Estimating a scalar from a vector

In this subsection the CCA, SVD, and MLR approaches for estimating a scalar time series *s*(*t*) from a time-dependent vector **z**(*t*) are discussed.

The CCA estimate for *s* is given by an equation analogous to Eq. (9) as

where we have used Eqs. (11) and (12). The CCA estimate includes the adjoint canonical pattern **v**, which, as mentioned above, is proportional to the MLR weights, and the CCA estimate is identical to the MLR estimate. The coefficient of multiple determination in MLR is identical to the canonical correlation *ρ*. This equivalence of MLR and one-dimensional CCA is well known in statistical climatology (e.g., Glahn 1968).

Using the general form for an estimate (8) and the one-dimensional SVD solutions (14) with the role of **z**(*t*) and **s**(*t*) interchanged, as well as the fact that *s* and the TEC *a* are identical, one obtains for the SVD estimate

which can be rewritten as

This equation includes weights for the predictor field **z**(*t*) that are proportional to the regression map (15). The SVD-based estimate is usually not used in statistical climatology, but it should be noted that the normalized TECs of regression maps in Thompson and Wallace (1998) are in line with the idea of estimating *s* from a time series obtained by orthogonal projection of the data onto the regression map and only need to be scaled properly to obtain the SVD estimate *ŝ*(*t*). For estimating a time series *s*(*t*) from a multivariate predictor, often PC-prefiltered MLR, or equivalently CCA, are used, as they maximize the explained variance. However, this optimization criterion holds only for the fitting data, and it is not a priori clear whether CCA–MLR has a better skill than SVD on independent data.

The SVD approach has the advantage that no PCs need to be calculated and no subjective decision on the number of retained PCs is required. BSW92 analyzed examples in which both fields were multivariate and obtained similar results with SVD and with prefiltered CCA, whereas CCA without prefiltering was uncompetitive due to high sampling variability. SVD was used later in several studies for investigating links between two fields (e.g., Qian et al. 2003; Loschnigg et al. 2003) or for estimating one field from another (Widmann et al. 2003). Another potential problem with CCA is that it includes the inversion of the within-field covariance matrices, which may lead to unstable results on small samples and requires using generalized inverses when the number of variables is higher than the number of time steps and, as a consequence, the results may be difficult to interpret (BSW92; Cherry 1996). However, this problem is partly accounted for by the PC prefiltering and, as pointed out by Cherry (1996), SVD can under certain circumstances also yield spurious coupled patterns. Thus in the multidimensional case no method performs generally better than the other, and therefore both the prefiltered CCA and the SVD approach may also be useful for estimating a scalar from a vector. A comparison in a practical example follows below.

### b. Estimating a vector from a scalar

According to Eq. (9) the CCA estimate for **z** from *s* is given by

which is identical to the estimate obtained from individual linear regression equations for the components of **z** or, in other words, to the product of *s* and the regression map **m** [Eq. (15)].

Using the general from for an estimate (8), the one-dimensional SVD solutions (14), and the fact that *s* and the TEC *a* are identical one obtains for the SVD estimate

which can be shown to be identical to the CCA and component-wise regression estimate in Eq. (24). Thus the regression map var(*s*)^{−1 }**C**^{T}_{sz} can be interpreted as the signal of *s* in **z** from the CCA, the SVD, and the component-wise regression perspective.

### c. Coupling strength between a time series and a time-dependent vector

CCA and SVD allow one to express the strength of the linear coupling between a time series and a time-dependent vector through the correlation of the time series and the TEC of the canonical pattern or singular vector. When the time series *s*(*t*) is estimated from the time-dependent vector **z**(*t*), multiple regression analysis also yields a measure for the strength of the coupling through the coefficient of multiple determination while, in the case when **z**(*t*) is estimated from *s*(*t*), component-wise regression analysis does not.

When *s*(*t*) is estimated from **z**(*t*), CCA and MLR are equivalent, and the coefficient of multiple determination is identical to the squared canonical correlation, which is given by

due to the proportionality of *s* and the TEC *a*, and the fact that *b* is obtained by weighting **z** with the adjoint canonical pattern **v** given in Eq. (11).

The SVD approach yields in general another TEC *b* and another value for the correlation *r*, namely

Here we have used the identity of patterns and weight vectors in SVD and Eq. (14). Given the same set of predictors, *r* will be less than or equal to the canonical correlation *ρ*, as CCA maximizes the correlation between the TECs.

When **z**(*t*) is estimated from *s*(*t*) by component-wise regression, the strength of the coupling between *s* and the individual components *z _{i}* of

**z**can be expressed by means of local correlations, but no measure for the overall strength of the coupling is available. If one considers the entire regression map rather than the individual regression coefficients, one can calculate its TEC in the two ways described above, either according to CCA by weighting

**z**with the adjoint canonical pattern or according to SVD by weighting

**z**proportional to the regression map itself. Despite the fact that the CCA and SVD estimates for

**z**are identical, this leads again to the two different correlations given in Eqs. (27) and (29).

Estimating *s*(*t*) from **z**(*t*) and estimating **z**(*t*) from *s*(*t*) are very different problems, and the formulation in terms of MLR or local regression analysis is indeed quite different. However, when considered as a one-dimensional case of CCA or SVD the problem becomes more symmetrical, and the strength of the linear link between *s*(*t*) and **z**(*t*) is the same regardless of whether *s*(*t*) is estimated from **z**(*t*) or vice versa.

As mentioned above, using a large number of predictors in MLR or CCA may lead to problems related to the inversion of 𝗖* _{zz}* and to overfitting. Therefore it is common practice to reduce the number of predictors by PC prefiltering. Note that a consistent prefiltered CCA approach would include a regression map derived from prefiltered data, and thus it is difficult to obtain the CCA TEC of the unfiltered regression map. The SVD-based correlation

*r*can be used without prefiltering as an alternative measure of the strength of the coupling.

## 4. Example: The relationship between the AOI and the temperature field

We now compare the SVD and PC-prefiltered CCA approaches for linking a time series to a time-dependent vector in a typical climatological application. The time series *s*(*t*) is given by the January AOI calculated as PC1 of 1948–2002 January SLP means between 20° and 85°N from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis (Kalnay et al. 1996; Kistler et al. 2001). The vector **z**(*t*) represents the spatial field of January means of 850-hPa reanalysis temperature (*T*_{850}) given on a 2.5° × 2.5° grid between 20° and 85°N when regression maps or the SVD AOI estimate are calculated, and a PC-filtered version of this field for the PC-prefiltered CCA calculations. As mentioned in the introduction, estimating the AOI temperature signal has been part of several papers that have investigated circulation-induced temperature changes, while estimating the variability of the AOI or of other dominant circulation modes from the temperature field is for instance relevant in the context of proxy-based circulation reconstructions (e.g., Cook et al. 2002; D’Arrigo et al. 2003; Jones and Widmann 2003).

The AOI was estimated from the unfiltered temperature field by means of SVD according to Eq. (21), and by MLR or equivalently CCA according to Eq. (18) after PC prefiltering with a varying number of retained PCs. For cross validation the dataset was split into two parts, from 1948 to 1975 and from 1976 to 2002. The first two rows of panels in Fig. 1 refer to these two periods. The left-hand column shows regression maps, which give the temperature change for a positive change in the AOI of one standard deviation (left-hand color bar applies). As the weights for the grid cell temperatures used in the SVD-based AOI estimate are proportional to the regression map, the left-hand column can also be interpreted as these statistical weights (right-hand color bar applies). The effective weights for grid cell temperatures that result from the CCA equation are shown in the middle column for two retained PCs, which is the lowest number for which the reconstruction has good skill and in the right-hand column, as an example for a relatively large number, for 12 retained PCs. The estimate *ŝ*(*t*) does not change when **z**(*t*) is represented with respect to a different basis, and thus the effective weights can be expressed as the sum of the products of the retained EOFs with the statistical weights for the retained PCs that are obtained when the PCs are used as predictors for a CCA-based or, equivalently, MLR-based AOI estimate.

As the grid cells do not represent equal areas, the data were area weighted. For the SVD-based estimate this was done by weighting the temperature field with the cosine of the grid cell latitudes before projecting it onto the regression map derived from the unweighted data [which is equivalent to applying Eq. (21) with **z**(*t*) as the temperature field weighted with the square root of the cosine of the latitude]. For the CCA-based estimate the area weighting was included in the PCs by weighting the temperature field with the square root of the cosine of the latitude prior to performing the principal component analysis. The weight patterns in Fig. 1 do not include this area weighting. They are correct for an equal-area grid (up to an overall scaling factor that depends on the number of grid cells). Temperature data on a non-equal-area grid have to be weighted proportional to the area size and then multiplied by the displayed weights in order to obtain the correct AOI estimate (again, up to a scaling factor that depends on the number of grid cells). Regression maps and weights were calculated from detrended data and then applied to the undetrended data to obtain the AOI estimates.

The regression maps or SVD weights are similar to the CCA weights with two PCs retained, while the CCA weights with 12 retained PCs are in some areas substantially different from the two PC version. Similar differences between the two fitting periods occur in all three weight patterns. The NCEP–NCAR AOI and the various estimates are shown in the lower panel of Fig. 1, with the estimates for 1948–75 being based on the weights derived from the period 1976–2002 and vice versa. All estimates are in reasonable agreement with the true AOI. A detailed assessment of the skill in terms of correlation, rmse, and bias is given in Fig. 2. Solid lines refer to the reconstructions for independent data as presented in Fig. 1; dashed lines refer to the skill within the fitting periods. The figure shows the skill of the CCA estimates for 2 to 22 retained PCs and, as a horizontal line, the skill of the SVD estimates. The correlations for the cross-validated CCA estimate for less than seven retained PCs are slightly lower and the rmse slightly higher than for the SVD estimate, and practically identical when more PCs are retained. The number of effective degrees of freedom derived from the eigenvalue spectrum of the temperature covariance matrix (Bretherton et al. 1999) is about nine, and thus close to the number of retained PCs after which the cross-validation skill levels off. For all numbers of retained PCs the cross-validated CCA estimate has a higher bias than the SVD estimate. The overestimation of the true skill on independent data by the correlations and rmse calculated from the fitting data is for most numbers of retained PCs higher for CCA than for SVD (the bias during the fitting period is zero by definition). Note that the small differences between the SVD weights and the CCA weights with two retained PCs do noticeably affect the skill and that the substantial differences between the CCA weights for different numbers of retained PCs affect the skill during the fitting period more than the skill on independent data.

## 5. Summary and discussion

It was shown that regression maps calculated by regressing the components of a time-dependent vector **z**(*t*) on a time series *s*(*t*) are proportional to canonical patterns and singular vectors, and that CCA, SVD, and component-wise regressions lead to identical estimates for **z**(*t*) from *s*(*t*), whereas the estimate of *s*(*t*) from **z**(*t*) depends on whether CCA (or equivalently MLR) or SVD is used. The definition of the TEC of a regression map, and as a consequence the correlation between the TEC and *s*(*t*), depends on whether the CCA or the SVD perspective is adopted. Other authors have calculated the TEC of the regression map by orthogonal projection of **z**(*t*) onto the regression map, which in this paper was shown to be equivalent to performing SVD, while the calculation of the CCA TEC involves the adjoint pattern.

Although CCA minimizes by definition the mean square difference between *s*(*t*) and its estimate from **z**(*t*), it appears difficult to decide from a theoretical standpoint whether the CCA or the SVD approach yields better estimates for *s*(*t*) when applied to independent data. In the practical example considered in this paper a very similar skill on independent data was found for CCA and SVD when a sufficient number of PCs was retained in the prefiltering for CCA. Skills calculated from the fitting data overestimated the skill on independent data more strongly in the CCA (or MLR) than in the SVD model.

Calculating the TEC of the regression map by orthogonal projection and then using it as the predictor in a linear regression for the time series *s*(*t*) is thus conceptually well defined because it is equivalent to performing SVD, and may in climatological applications yield estimates for *s*(*t*) that have similar skill to those obtained from CCA or MLR. SVD has the advantage that the signal of *s*(*t*) in **z**(*t*), which is given by the regression map, and the statistical weights for **z**(*t*) used to estimate *s*(*t*) are proportional, whereas in the CCA approach the signal is given by the regression map but the statistical weights are proportional to the adjoint pattern. Therefore the SVD perspective may be particularly useful when both directions, estimating **z**(*t*) from *s*(*t*) and vice versa, are of interest. When in this case CCA is used, the signal pattern and its adjoint are relevant, which may complicate the discussion. Moreover, one would have to address the issue that it is often natural to define signals based on unfiltered data, but CCA often requires PC prefiltering.

## Acknowledgments

This research was supported by the Helmholtz Society under the KIHZ project (Klima in Historischen Zeiten, climate in historical times) and by the Federal Ministry of Education and Research under the DEKLIM program (Deutsches Klimaforschungsprogramm). The author thanks C. B. Bretherton, U. Callies, Y. Dmitriev, J. M. Jones, C. Matulla, H. von Storch, E. Zorita, and two anonymous reviewers for valuable comments.

## REFERENCES

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Dr. Martin Widmann, Institute for Coastal Research, GKSS Research Centre, D 21502 Geesthacht, Germany. Email: widmann@gkss.de