## Abstract

This paper shows that if a measure of predictability is invariant to affine transformations and monotonically related to forecast uncertainty, then the component that maximizes this measure for normally distributed variables is independent of the detailed form of the measure. This result explains why different measures of predictability such as anomaly correlation, signal-to-noise ratio, predictive information, and the Mahalanobis error are each maximized by the same components. These components can be determined by applying principal component analysis to a transformed forecast ensemble, a procedure called *predictable component analysis* (PrCA). The resulting vectors define a complete set of components that can be ordered such that the first maximizes predictability, the second maximizes predictability subject to being uncorrelated of the first, and so on. The transformation in question, called the whitening transformation, can be interpreted as changing the norm in principal component analysis. The resulting norm renders noise variance analysis equivalent to signal variance analysis, whereas these two analyses lead to inconsistent results if other norms are chosen to define variance. Predictable components also can be determined by applying singular value decomposition to a whitened propagator in linear models. The whitening transformation is tantamount to changing the initial and final norms in the singular vector calculation. The norm for measuring forecast uncertainty has not appeared in prior predictability studies. Nevertheless, the norms that emerge from this framework have several attractive properties that make their use compelling. This framework generalizes singular vector methods to models with both stochastic forcing and initial condition error. These and other components of interest to predictability are illustrated with an empirical model for sea surface temperature.

## 1. Introduction

It is well established that some components in the climate system are more predictable than others. For instance, large-scale structures tend to be more predictable than small-scale structures (Shukla 1981); sea surface temperatures in the tropical Pacific tend to be more predictable than those in the Atlantic (Schneider et al. 2003); rainfall in the tropical Pacific tends to be more predictable than rainfall in Europe (Palmer et al. 2004). It is natural, then, to seek the *most* predictable components. However, attempts to identify maximally predictable components based on such methods as principal component analysis and singular value decomposition generally lead to different results. The purpose of this paper is to suggest a framework in which familiar methods of predictability analysis lead to *consistent* results.

A clue as to how diverse statistical methods can produce consistent results lies in the fact that many methods depend implicitly or explicitly on a norm. For instance, principal component analysis determines components that maximize variance, as defined by some norm. Singular vector decomposition depends on *two* norms: one for measuring “response” and another for constraining “initial condition.” Without a firm basis for choosing these norms, variance analysis could generate virtually any set of vectors by a suitable choice of norm.

One approach to these problems is to define predictability precisely and then to choose norms to ensure consistency with the associated measure of predictability. Unfortunately, no universally accepted measure of predictability exists. Therefore, this approach merely replaces the problem of the choice of norm with the problem of the choice of predictability measure.

Surprisingly, there is a path that gives a reasonable way out. The first step is to restrict attention to predictability measures that are invariant to affine transformations and monotonically related to forecast uncertainty. These measures are consistent in the following sense: measuring the predictability of the same system in two different coordinate systems gives the same result. We then show that components that maximize these measures are *independent of the details of the measure*. This result, proven in section 2, explains why different measures of predictability, such as signal-to-noise ratio, anomaly correlation, predictive information, and the Mahalanobis error all have the same maximally predictable component (DelSole and Tippett 2007). We then show that this component can be obtained by applying principal component analysis to transformed forecast variables. The transformation, called the whitening transformation, can be interpreted as specifying the norm in empirical orthogonal function (EOF) analysis. This result is generalized in section 3 to specify the norms in singular vector analysis to obtain the same predictable components. The norm for measuring forecast uncertainty has not appeared in previous predictability studies, but nonetheless these norms have several attractive properties that make their use compelling. Several components of interest to predictability are illustrated with an empirical stochastic model for sea surface temperatures in section 4. This paper concludes with a summary and discussion of results.

## 2. Predictable components

In this section we define a class of predictability measures and then show that components that maximize these measures are independent of the detailed form of the measure. The first step to defining predictability is to recognize that no forecast is complete without a description of its uncertainty in the form of a probability distribution. Following DelSole and Tippett (2007), the *forecast distribution* is defined as the conditional distribution of the state given antecedent observations of the system, and the *climatological distribution* is the unconditional distribution of the state. A system is deemed unpredictable if the forecast and climatological distributions are identical. Thus, a measure of predictability should indicate predictability only if the forecast and climatological distributions differ.

To ensure consistency, we propose that if the predictability of a system is measured in two different coordinate systems, then the measure should be the same. At the very least, then, the measure should be invariant to affine transformations, that is, to translations and to nonsingular, linear transformations. This property will be called the *invariance* property.

Finally, we consider measures of predictability that increase if and only if the uncertainty in the forecast decreases. This principle requires defining uncertainty. However, for normal distributions, any reasonable measure of uncertainty is an increasing function of variance. Thus, we assume that the predictability of normal distributions increases if and only if the forecast variance decreases, holding all other parameters constant.

Many measures of predictability satisfy the above properties, including signal-to-noise ratio, anomaly correlation, predictive information, and Mahalanobis error. The Ω index of Koster et al. (2000) also satisfies the above properties. However, mean square error does not satisfy the above properties because it is not invariant to linear transformation. Nevertheless, we retain linear invariance to ensure *consistency*. Similarly, not all measures increase if and only if forecast uncertainty decreases. For instance, measures of the “distance” between distributions, such as relative entropy (Kleeman 2002) or Bhattacharyya distance (Mardia et al. 1979), will change if the mean changes, even for constant forecast uncertainty. However, components that optimize the restricted class of measures can provide useful lower bounds on other measures.

We now show that if a measure satisfies the above properties and the distributions are Gaussian, then the component that maximizes the measure is independent of the details of the measure. Let ** ν** be the state of the system and

**q**be a projection vector. We seek the projection vector

**q**such that the inner product

**q**

^{T}

**optimizes predictability, where the superscript T denotes the transpose. Let the forecast distribution have mean**

*ν*

*μ**and covariance matrix*

_{f}**Σ**

*, and the climatological distribution have mean*

_{f}

*μ**and covariance matrix*

_{c}**Σ**

*. Since the variables are normally distributed, any linear combination of them also is normal. Thus, the climatological and forecast distributions of the projected variable*

_{c}**q**

^{T}

**have the following scalar means and variances:**

*ν*In predictability studies, *μ _{f}* and

*σ*

^{2}

_{f}measure the signal and noise, respectively. The variable’s predictability depends only on its forecast and climatological distributions, which are described completely by the parameters

*μ*,

_{f}*μ*,

_{c}*σ*

^{2}

_{f}, and

*σ*

^{2}

_{c}. By the invariance property, the projected variable can be translated and rescaled without altering the value of predictability. Accordingly, we standardize the variable to have zero mean and unit variance under the climatological distribution. The mean and variance of the standardized forecast become (

*μ*−

_{f}*μ*)/

_{c}*σ*and

_{c}*σ*

^{2}

_{f}/

*σ*

^{2}

_{c}, respectively. By the property that predictability increases if only if forecast uncertainty decreases, the parameter (

*μ*−

_{f}*μ*)/

_{c}*σ*can be dropped, because it does not affect uncertainty. Thus, maximum predictability can be found by minimizing

_{c}*σ*

^{2}

_{f}/

*σ*

^{2}

_{c}. The projection vector that minimizes

*σ*

^{2}

_{f}/

*σ*

^{2}

_{c}will be called a

*predictable component*, following the equivalent usage of Déqué (1988), Renwick and Wallace (1995), and Schneider and Griffies (1999). Without loss of generality, we hereafter assume

*μ*= 0.

_{c}Predictable components minimize the parameter *σ*^{2}_{f}/*σ*^{2}_{c} and hence maximize the discrepancy between forecast and climatological spreads. The ratio of spreads can be written as

where we have introduced the new variables

The matrix **Σ**^{1/2}_{c} denotes the matrix square root of **Σ*** _{c}*, which satisfies

**Σ**

_{c}=

**Σ**

^{1/2}

_{c}

**Σ**

^{1/2T}

_{c}. It is well known that if the eigenvectors of

**Σ̃**

_{f}are ordered in ascending order of eigenvalues, then the first eigenvector minimizes the right-hand side of (2), the second minimizes the right-hand side of (2) subject to being orthogonal to the first, and so on (Noble and Daniel 1988, theorem 10.28). We call the above procedure

*predictable component analysis*(PrCA). This procedure is equivalent to the procedures proposed independently by Déqué (1988) and Schneider and Griffies (1999). Following Renwick and Wallace (1995), this technique will be denoted PrCA, not to be confused with principal component analysis (PCA).

Since **Σ̃**_{f} is a positive-definite, symmetric matrix, its eigenvectors form a complete set that satisfy the relations **u**^{T}_{j}**u**_{k} = 0 and **u**^{T}_{j}**Σ̃**_{f}**u**_{k} = 0 for all *j* ≠ *k*. These relations imply that the covariance between any two projected variables vanish; that is, if *j* ≠ *k*, then

where relation (3) has been used, ** ν**′ =

**−**

*ν*

*μ**, and the square brackets [] and angle brackets 〈〉 denote an expectation with respect to the forecast and climatological distributions, respectively. The above orthogonality properties imply that the variables are uncorrelated with respect to both the forecast and climatological distributions. Accordingly, the predictable components define an uncorrelated set of components such that the first maximizes predictability; the second maximizes predictability subject to being uncorrelated with the first, and so on.*

_{f}### a. Predictable components as EOFs

The predictable components defined above can be interpreted as the EOFs of the *whitened* variable

This equivalence follows from the fact that the EOFs of the random variable ** ν̃** are the eigenvectors of the covariance matrix

**Σ̃**

_{f}, which in turn are the predictable components. That the predictable components are the EOFs of whitened variables also was noted by Schneider and Griffies (1999). The variable

**is said to be whitened because, following Fukunaga (1990, p. 28), its climatological covariance matrix equals the identity matrix:**

*ν̃*We denote whitened variables by tildes. The transformation operator in (5) could be eigenvectors of **Σ*** _{c}*, arranged columnwise, each multiplied by the inverse square root of the corresponding eigenvalue; the resulting whitened variable specifies the state in a normalized EOF space.

It is critical to understand why the predictable components are the EOFs of the whitened forecast, and not the EOFs of the forecast itself. The leading eigenvector of **Σ*** _{f}* explains the most spread and hence the most uncertainty. However, uncertainty of a component does not measure predictability. Rather, predictability of a component depends on the amount of spread relative to its climatological spread. For instance, the component with the most uncertainty could be the most predictable, if its climatological spread were sufficiently large. The virtue of whitening is that any projection of whitened variables has unit climatological variance, and thus the forecast variance of such a projection immediately measures relative variance, or equivalently relative uncertainty.

Since predictable components form a complete and orthogonal set, any vector may be decomposed in terms of them. Specifically, if the eigenvectors **u**_{1}**u**_{2} . . . **u*** _{k}* are normalized such that

**u**

^{T}

_{j}

**u**

_{j}= 1 for all

*j*, then

**may be written as a linear combination of these eigenvectors as**

*ν̃*Each term in the above sum is a projection **u**^{T}_{j}** ν̃** times a component

**u**

*. In addition, each term is independent of the others, and each successive term explains decreasing predictability. Note that the vectors*

_{j}**u**

*in (9) play the dual role of defining the projection vector and defining the component, because they are orthogonal. When the whitening transformation is inverted, the two roles are played by two distinct vectors. In particular, the above decomposition takes the form*

_{j}where we have used (5) and the definitions

In the context of predictability analysis, the vector **p*** _{k}* is associated with a spatial structure, and the inner product

**q**

^{T}

_{k}

**gives the corresponding time series.**

*ν*We now show that the whitening transformation can be interpreted as changing the norm in EOF analysis. The leading eigenvector of **Σ̃**_{f} is the vector **u** that minimizes the average *L*_{2} norm difference between the whitened variable and its projection

By substituting (5), this expression can be written equivalently as

where **p** and **q** are defined in (9). The latter expression implies that the predictable components minimize the difference between the state and its projection onto the predictable components, but with the norm of the difference measured with respect to the metric **Σ*** _{c}*. This norm is called the

*Mahalanobis norm*in data assimilation (Swinbank and Lahoz 2003). In general, transformation of a variable prior to variance-based analysis is equivalent to changing the norm in the analysis.

### b. Average predictability

Predictable components characterize a forecast ensemble at an instant in time. We may also be interested in components that maximize the average predictability. We call the former *instantaneous* predictable components and the latter *average* predictable components.

In general, the average predictable components depend on the details of a measure and hence are not universal. However, many familiar measures of predictability that satisfy our set of properties are convex functions of the variance ratio *σ*^{2}_{f}/*σ*^{2}_{c}, including signal-to-noise ratio, anomaly correlation, predictive information, and the Mahalanobis error. By Jensen’s inequality (Cover and Thomas 1991), the average of these measures is bounded below by the measure evaluated at 〈*σ*^{2}_{f}〉/*σ*^{2}_{c}. By the property that predictability is a decreasing function of forecast uncertainty, minimizing 〈*σ*^{2}_{f}〉/*σ*^{2}_{c} is equivalent to maximizing the lower bound. Since the component that minimizes *σ*^{2}_{f}/*σ*^{2}_{c} is the trailing eigenvector of the whitened forecast covariance **Σ̃**_{f} , we can surmise that the component that minimizes 〈*σ*^{2}_{f}〉/*σ*^{2}_{c} is the trailing eigenvector of

Thus, the trailing eigenvector of the average whitened forecast covariance 〈**Σ̃**_{f}〉 can be used to construct a lower bound on the maximum of the average predictability. If the measure is a linear function of *σ*^{2}_{f}/*σ*^{2}_{c}, as in the case of signal-to-noise ratio and anomaly correlation, then the lower bound is an exact equality. Orthogonality of the predictable components implies that the second predictable component maximizes the lower bound of the maximum average predictability, out of all components that are uncorrelated with the first, and so on.

It is perhaps worth noting that if the forecast covariance **Σ*** _{f}* is constant, that is, independent of observation or initial condition, as occurs in the case of a linear, autonomous, stochastic models with stationary noise, then the instantaneous and average predictable components coincide.

### c. Signal EOFs, noise EOFs, and signal-to-noise EOFs

We now discuss the relation between predictable components and other components that can be derived from EOF methods. Let us call the EOFs of the forecast ensemble the *noise EOFs*, since they describe the variability of the forecast ensemble about the ensemble mean (recent examples include Yang et al. 1998 and Straus and Shukla 2002). An alternative approach is to compute the EOFs of the *ensemble mean* over time, which we call the *signal EOFs* and can be interpreted as describing the predictable part of a forecast (recent examples include Sutton et al. 2000; Straus and Shukla 2002; Peng and Kumar 2005). The relevance of these components to predictability defined here is not immediately evident because these components optimize absolute measures of spread or signal, whereas predictability depends on relative measures of spread or signal.

The key to linking the signal EOFs and noise EOFs is the following: an unconditional covariance can be written as the sum of an average conditional covariance plus the covariance of conditional means (DelSole and Tippett 2007). Since the forecast and climatological distributions can be interpreted as a conditional and unconditional distribution, respectively, we have the identity

where

We call the above identity the *signal*–*noise decomposition*, after DelSole and Tippett (2007). This equation implies that the average whitened forecast and signal covariances are related by

The above relation immediately implies that the signal EOFs of the whitened forecast (i.e., the eigenvectors of **Σ̃**_{s}) are identical to the average noise EOFs, *but with reversed ordering*. Thus, the average predictable components may be obtained either as the signal EOFs of whitened variables or as the average noise EOFs of whitened variables.

The equivalence between the average noise EOFs and signal EOFs demonstrated above holds only for whitened variables. In general, the noise and signal EOFs of the original forecast variables differ; that is, the component that explains the most signal differs from the component that explains the least noise. This discrepancy is problematic when one attempts to identify one of these components as “most predictable,” since there is no compelling reason for choosing one over the other. The whitening transformation removes this discrepancy. Furthermore, as shown in section 2a, the whitening transformation can be interpreted as changing the norm used to measure “variance” in the EOF calculation. These considerations imply that consistency of signal analysis and error analysis constrains the norm to be the Mahalanobis norm.

Recently, the concept of a “signal-to-noise EOF” has emerged as a component of interest to predictability (Venzke et al. 1999; Sutton et al. 2000; Tippett and Giannini 2006; Hu and Huang 2007). For a forecast ensemble at an instant in time, the signal-to-noise ratio of a component is *μ _{f}*/

*σ*, which can be optimized by fingerprint methods (DelSole and Tippett 2007). For an average over all forecast ensembles, the signal variance of the projection vector

_{c}**q**is

**q**

^{T}

**Σ**

_{s}**q**, and the average noise variance is

**q**

^{T}〈

**Σ̃**

_{f}〉

**q**, in which case the signal-to-noise ratio is

Using the signal–noise decomposition (13) to eliminate the signal term, we obtain

Since the signal-to-noise ratio *s* is a monotonic function of 〈*σ*^{2}_{f}〉/*σ*^{2}_{c}, optimizing signal-to-noise ratio is equivalent to optimizing the ratio of variances, hence signal-to-noise EOFs are identical to the predictable components that optimize (or bound) the average predictability. Thus, the Mahalanobis norm renders the signal EOFs and the average noise EOFs identical to each other, identical to the signal-to-noise EOFs, and identical to the average predictable components.

## 3. Singular vectors

We now show how predictable components can be obtained from a singular value decomposition (SVD) in a variety of problems. The SVD of a matrix 𝗚 will be denoted by

where 𝗨 and 𝗪 are unitary matrices, and 𝗦 is a diagonal matrix with nonnegative diagonal elements. The columns of 𝗨 and 𝗪 are called the *left and right singular vectors*, respectively, and the diagonal elements of 𝗦 are the *singular values*. It is convention to order the singular values in descending order. A standard fact is that for mappings of the form **u** = 𝗚**w**, where **w** is “initial condition” and **u** is “response,” the leading right singular vector gives the initial condition that maximizes **u**^{T}**u** out of all vectors that satisfy **w**^{T}**w** = 1, the second right singular vector gives the initial condition that maximizes **u**^{T}**u ***subject to being orthogonal to the first vector*, and so on.

In the discussion below, the state of the system at the initial time will be denoted by **i**. The initial state is assumed to be estimated from observations by a data assimilation system. In practice, the assimilation assumes a normal distribution for the initial state **i**. For ease of interpretation, we write the initial condition as **i** = **a** + **e**, where **a** is the mean of the initial condition, often called the *analysis*, and **e** is the *analysis error* with zero mean and covariance matrix **Σ*** _{e}*. The covariance of

**a**over all initial conditions will be denoted

**Σ**

*.*

_{a}### a. Deterministic systems

We first consider a deterministic model with imperfect initial conditions. Lorenz (1965) showed that separate solutions of a dynamical system with slightly different initial conditions is governed by a *tangent linear model* of the form ** ν**′ = 𝗚

**e**, where 𝗚 is a square matrix called the

*propagator*, which depends on time, and

**e**and

**′ denote initial and final “errors” (perturbations about a solution of the full dynamical system). Lorenz then effectively computed the noise EOFs for this system, which are the eigenvectors of the forecast covariance matrix**

*ν*As is well known, the left singular vectors of 𝗚 are the eigenvectors of 𝗚𝗚^{T}. By analogy, the left singular vectors of 𝗚**Σ**^{1/2}_{e} are the eigenvectors of the forecast covariance matrix 𝗚**Σ*** _{e}*𝗚

^{T}, as previously noted by Ehrendorfer and Tribbia (1997). Therefore, the left singular vector of 𝗚

**Σ**

^{1/2}

_{e}is the noise EOF, and the right singular vector gives the initial condition that excites this noise EOF. To the extent that the noise EOFs optimally describe forecast errors, the singular vector method determines the fewest number of ensemble members with which to approximate the forecast spread (Palmer 1995; Ehrendorfer and Tribbia 1997).

Importantly, the noise EOFs are the singular vectors of the transformed propagator 𝗚**Σ**^{1/2}_{e}, not the singular vectors of 𝗚 itself. To interpret this result, let us define the new variables:

Then the governing equation becomes ** ν′** = 𝗚*

**ê**. By analogy with the usual interpretation, the singular vectors of 𝗚* maximize the forecast error variance

*ν*′^{T}

**subject to the constraint**

*ν*′**ê**

^{T}

**ê**= 1. Thus, noise EOFs maximize the forecast error variance

*ν*′^{T}

**subject to the constraint**

*ν*′This result shows that the singular vectors of the transformed propagator 𝗚**Σ**^{1/2}_{e} are the singular vectors of 𝗚 but with the initial norm (21). This result can be generalized to show that any linear transformation of the propagator is equivalent to changing the norm for measuring the initial or final vectors. The fact that constraint (21) compels the singular vectors to be the EOFs of forecast error was noted previously by Palmer (1995) and Ehrendorfer and Tribbia (1997).

A standard result in probability states that if **e** is normally distributed with zero mean and covariance matrix **Σ*** _{e}*, then

**ê**defined in (20) also is normal with zero mean and covariance matrix 𝗜. The distribution of

**ê**thus depends only on the distance from the origin

**ê**

^{T}

**ê**and hence is isotropic. Measuring initial error using a norm based on initial error covariance

**Σ**

*is thus consistent with Lorenz’s consideration of an ensemble of initial errors such that “no direction in . . . [state] space is preferred over any other direction.” Constraint (21) also is consistent with the constraint typically used to compute ensemble-based estimates of forecast error with singular vectors (Houtekamer 1995; Molteni et al. 1996). Finally, since the distribution of*

_{e}**ê**is isotropic, all states satisfying (21) have equal probability density. Accordingly, we call (21) the

*equal likelihood constraint*. Thus, the leading EOF of forecast error can be interpreted as the forecast with maximum error out of all forecasts with equally likely initial errors. This constraint immediately solves the problem of ensuring that singular vectors are “realistic,” since any vector that satisfies (21) is just as likely as any other vector to be drawn from the initial analyses.

Noise EOFs generally do not optimize predictability—the component with maximum absolute error is not necessarily the least predictable, in the sense that the error is close to the “saturation” or climatological error. Instead, components that optimize the relative error are the eigenvectors of the whitened forecast covariance matrix, which in the present problem is

where

It follows from (22) that the predictable components (i.e., the eigenvectors of **Σ̃**_{f} ) are the left singular vectors of . We call a *whitened propagator*.

We showed earlier that the noise EOFs were the singular vectors of 𝗚*, and that the transformation from 𝗚 to 𝗚***** implied a change in initial norm. We now show that transformation from 𝗚 to , for computing predictable components, implies a change in both the initial and final norms. To see this, note that the governing equation ** ν**′ = 𝗚

**e**can be transformed into

Therefore, by analogy with the usual interpretation of singular vectors, the singular vectors of the whitened propagator maximize *ν̃*′^{T}** ν̃′** subject to

**ê**

^{T}

**ê**= 1; that is, they maximize

subject to the constraint

The above norms have been discussed previously: the final norm (25) is the Mahalanobis norm of the forecast errors, while the constraint (26) is the equal likelihood constraint (21). Thus, the predictable components can be obtained from an SVD calculation if the initial norm is based on the analysis error covariance and the final norm is based on the Mahalanobis norm.

It is perhaps worth emphasizing that a predictable component characterizes an ensemble of forecasts, whereas a singular vector pertains to a single forecast. These two components coincide only in linear models, and only for proper choice of norms.

The average predictability depends on the average whitened forecast covariance matrix

This expression is difficult to evaluate for time dependent propagator and analysis errors. The case of constant propagator and error statistics will be considered next in a more general context.

### b. Linear stochastic models

Another widely used class of models in predictability studies is a linear stochastic model (Hasselmann 1976; Farrell and Ioannou 1993; Penland and Sardeshmukh 1995; DelSole and Farrell 1995; Kleeman and Moore 1997; Thompson and Battisti 2000). This model has the form

where 𝗚 is a constant propagator, ** ξ** is a Gaussian white noise variable with zero mean and positive-definite covariance matrix

**Σ**, and

_{ξ}**i**is the initial condition. Recall that the initial condition is

**i**=

**a**+

**e**. Substituting this relation into the state space model (28) gives

From this equation, the mean and covariance matrix of the forecast can be derived as

The forecast covariance has two distinct terms: the term 𝗚**Σ*** _{e}*𝗚

^{T}measures the forecast spread due to initial condition error, and

**Σ**measures the forecast spread due to model noise. The predictable components of this model are the eigenvectors of the whitened covariance matrix

_{ξ}Note that the right-hand side has an extra term relative to the deterministic case (22). Hence, the predictable components are not the singular vectors of when stochastic forcing exists. This makes sense because two sources of forecast spread exist—initial condition error and stochastic forcing—but the singular vectors of maximize only the spread due to initial condition error.

Since the forecast covariance is constant, the instantaneous and average predictable components coincide. In the average case, however, the signal–noise decomposition (13) gives

where the mean forecast (30) has been substituted and

The above equation immediately implies that the left singular vectors of are the eigenvectors of the whitened forecast covariance matrix, that is, the predictable components.

The fact that two different whitened propagators, and arise in the two predictability problems deserves comment. The two whitened propagators arise because two physically different terms in the covariance balance are considered— describes the noise due to initial condition error while describes the signal. Since one operator acts on error while the other acts on signal, the definition of “whitened” differs in the two cases. In general, there is only one signal but two possible sources of noise, namely initial condition error and stochastic forcing. Thus, the singular vectors of give the predictable components only in the absence stochastic forcing, because these vectors account only for initial condition error, whereas the singular vectors of give the predictable components whether or not stochastic forcing exists.

In the absence of stochastic forcing, the predictable components are the trailing singular vectors of and the leading singular vectors of . This opposite relation arises from the fact that the former propagator measures spread due to initial error while the latter measures forecast signal. Since signal and noise are inversely related, predictable components derived from the two propagators have opposite ordering. Note that the two sets of singular vectors differ only by the constraint on the initial condition. Surprisingly, then, the norm used to constrain the initial condition determines whether singular vectors maximize or minimize predictability.

We now show that the singular values of the whitened propagator in the stochastic model (28) with constant 𝗚 must lie between zero and one. Let the SVD of the whitened propagator be

Substituting this SVD into the whitened forecast covariance matrix (32) gives

Since **Σ̃**_{f} is positive definite and thus has positive eigenvalues, relation (35) proves that the singular values of are less than one, and hence the eigenvalues of **Σ̃**_{f} are less than one. Thus, there is no “growth” due to , in the sense that the ratio of the final norm to the initial norm is greater than one. Although the vectors do not grow in the Mahalanobis norms, they may grow in other norms. In effect, the singular values of measure the strength of the predictability on a scale from 0 to 1, with one corresponding to maximum predictability. Thus, the whitening transformation normalizes singular values into a direct measure of predictability. We note that the singular values of fluctuate in time if 𝗚 varies in time, and can exceed one.

### c. Maximum covariance analysis

Another approach to finding components of interest to predictability is to identify initial and final components with strong covariability. In contrast to the previous methods, no explicit forecast is involved—only statistical relations between initial and final states are required. It is sensible to identify the “initial state” with the analysis instead of with the full initial condition, since adding noise to the analysis only reduces the strength of the relation. This approach is called *maximum covariance analysis* (MCA), and von Storch and Zwiers (1999, p. 321) show that this procedure is equivalent to computing the SVD of the cross-covariance matrix:

Waliser et al. (1999) apply this method to identify “modes” between past and present rainfall and future rainfall. Numerous other studies have applied SVD to maximize covariance between concurrent datasets to identify “predictable anomaly patterns” (Renwick and Wallace 1995) or “coupled” patterns in climate data (Bretherton et al. 1992; Syu et al. 1995; Kleeman et al. 2003).

Components that maximum covariance are not necessarily components that maximize predictability. For instance, two nearly unpredictable variables may have a large covariance simply because they have large individual variances. To normalize the variances, we may whiten the variables, which ensures that any projection of either variable has unit variance. This suggests that the singular vectors of the whitened time-lagged covariance

may be relevant to predictability. The left singular vectors of the whitened time-lagged covariance are the eigenvectors of **Σ̃**_{va}**Σ̃**^{T}_{va} and thus satisfy the eigenvalue problem

Multiplying both sides by **Σ**^{−1/2}_{c} and substituting the relation between **q** and **u **(3) gives

This generalized eigenvalue problem is precisely the same eigenvalue problem that arises in *canonical correlation analysis* [CCA; see von Storch and Zwiers 1999, Eq. (14.10)]. CCA is a procedure that finds components in two fields that are maximally correlated. Numerous studies use CCA to highlight the evolution of predictable patterns (Barnett and Preisendorfer 1987; Barnston and Ropelewski 1992; Barnston and Smith 1996). The above analysis demonstrates that CCA is equivalent to SVD of the whitened time-lagged covariance matrix, a fact that has been noted in previous studies (Bretherton et al. 1992; DelSole and Chang 2003).

The canonical components derived above have not been linked to a forecast model, and thus have an unspecified connection to predictable components. The key to linking these components is to recognize that the time-lagged covariance is related to the propagator in (28) by

This equation can be derived by multiplying the dynamical Eq. (28) on the right by **a**^{T} and taking expectations, noting that 〈*ξ*a^{T}〉 = **0** by causality. The above relation also is the least squares operator for predicting ** ν** given

**a**. The corresponding whitened propagator is thus

This result demonstrates that the canonical variates that maximize correlation between initial and final states are the predictable components of an associated least squares model.

## 4. An example based on sea surface temperatures

In this section, we illustrate various components of interest to predictability. We adopt the linear inverse model of Penland and Sardeshmukh (1995) for tropical sea surface temperature (SST). The main difference between our model and others of this type is that we include estimates of analysis errors when computing the climatological covariance matrix, which affects the EOF basis set used to represent the state vector. We use the 2° × 2° extended reconstruction of sea surface temperature analysis by Smith and Reynolds (2003), denoted ERSSTv2. We utilize all months in the 56-yr period 1950–2005 in the tropical Indo-Pacific ocean basin bounded by 30°S–30°N, 30°E–60°W. There are 3424 grid boxes in this domain, excluding land points, and a total of 672 time points. Monthly anomalies were computed by subtracting the mean of each calendar month from the corresponding monthly mean value at each grid point.

Components of interest to predictability can be identified with the leading singular vector of the following matrices:

The covariance matrices were estimated in a space spanned by a truncated EOF basis set, the dimension of which was chosen to be eight, based on cross-validation experiments. The time lag was chosen to be seven months, which is near the limit of predictability of SST for most models (Kirtman 2003). The leading singular vector of the propagator is shown in the left panels of Fig. 1. The results are similar to those shown in Penland and Sardeshmukh (1995), including a comparable amplification factor, the El Niño–Southern Oscillation (ENSO)-like structure for the left singular vector, and the northeast–southwest tilt of the SST structure in the northern part of the basin for the right singular vector. The peaks at the major ENSO years 1972, 1982, 1989, and 1998 also are evident in the time series. The leading singular vector of the whitened propagator—the predictable components—are shown in the right panels. The difference in spatial structures and time series are immediately evident. In particular, the predictable component is dominated by a trend, which is sensible because trends are highly predictable. This component also maximizes the correlation between forecast and analysis because this SVD is equivalent to CCA. Importantly, the right singular vector of the unwhitened propagator explains much less variance than that of the whitened propagator. The leading singular vector of the time-lagged covariance and the leading signal EOF are shown in Fig. 2. The figure reveals that the latter components are dominated by the leading EOF, raising questions as to whether they emerge simply because the structures have high variance.

The climatological covariance matrix can be decomposed into three terms that measure signal variance, model noise, and initial condition error. To gain an idea of the relative contribution of the three terms, we plot in Fig. 3 the trace of the whitened covariance matrices. We see immediately that the initial condition error is negligible. The result probably is a consequence of the fact that the analysis error covariance matrix is diagonal in physical space, and so has only weak projections on the eight EOFs. Presumably, a more realistic, nondiagonal error covariance would lead to a larger contribution by initial condition error.

## 5. Summary and discussion

This paper showed that if a measure of predictability is invariant to affine transformation and monotonically related to forecast uncertainty, then the component that maximizes the measure for normal distributions is a universal function of the distributions, independent of the details of the measure. This result explains why different measures of predictability, such as signal-to-noise ratio, anomaly correlation, predictive information, and the Mahalanobis error all have the same maximally predictable component (DelSole and Tippett 2007). It also implies that the Ω index of Koster et al. (2000) is maximized by the same components. These components can be obtained by applying EOF analysis to whitened forecast variables, a procedure called *predictable component analysis*. The resulting vectors, called *predictable components*, define a complete set that can be ordered such that the first maximizes predictability, the second maximizes predictability subject to being uncorrelated with the first, and so on.

Predictable components also can be obtained by applying singular value decomposition to the whitened propagator of linear models. The whitening transformation is tantamount to changing the initial and final norms in the singular vector calculation. In the tangent linear case, the initial norm is based on the analysis error covariance, consistent with previous studies, while the final norm is based on the Mahalanobis norm. The Mahalanobis norm has several attractive properties that make its use compelling. Specifically, the Mahalanobis norm is invariant to linear transformation and has unit climatological variance, and thus constitutes a consistent measure of predictability. Also, the Mahalanobis norm renders the signal EOFs identical to noise EOFs, but with reversed ordering, where signal and noise identify forecast mean and spread, respectively. Furthermore, these components are identical to the signal-to-noise EOFs. This equivalence does not hold for other norms. Finally, maximum covariance analysis between two whitened variables is equivalent to CCA of the two variables, which in turn is equivalent to determining the predictable components of an associated least squares model.

In essence, the whitening transformation converts variance analysis to predictability analysis. The components identified with conventional variance analysis, such as structures with large signal variance, are of interest to predictability but do not necessarily play a distinguished role in predictability. For instance, the structure with maximum signal variance may not be the most predictable, since the corresponding climatological variance could be very large by comparison. It is remarkable that a large class of predictability measures has the same predictable components, and these components can be obtained from variance analysis merely by transforming variables, or equivalently by using the Mahalanobis norm to measure size.

Just as singular vectors of propagators optimally represent error variance, singular vectors of whitened propagators optimally represent predictability. Therefore, if only a few of the singular values indicate significant predictability, then an ensemble based on just the corresponding singular vectors should give a reasonable estimate of the total predictability. The singular values of whitened propagators measure the strength of predictability, in contrast with the usual interpretation of singular values as a measure of variance growth. It is also worth noting that this paper appears to give for the first time the generalization of singular vector methods to models that contain both stochastic forcing and initial condition error.

Some predictability measures are not additive, for example, signal-to-noise ratio and anomaly correlation. In contrast, information theory measures are additive for independent events. For the class of measures and distributions considered in this paper, any nonadditive measure can be converted into any additive measure because these measures are monotonically related to forecast uncertainty, and hence monotonically related to each other. This transformation may prove useful for computing total predictability or the fractional contribution of each component to predictability.

Distance-related measures of predictability, such as relative entropy (Kleeman 2002) and Bhattacharyya distance (Mardia et al. 1979), can increase even if the forecast uncertainty is constant, for example, by a change in mean, and thus do not satisfy all properties assumed in this paper. However, these measures tend to be convex functions of forecast uncertainty, so predictable components provide lower bounds on distance-related predictability measures.

Components of interest to predictability were illustrated with a linear inverse model for SST. The signal EOFs, the maximum covariance components, and the leading singular vector of the propagator were all dominated by the leading EOF of SST variance. In contrast, the leading predictable component exhibited a linear trend over 50 yr. A linear trend may be identified sensibly as highly predictable. The forecast spread in this model was dominated completely by the stochastic forcing; that is, the analysis errors were negligible. These conclusions pertain to our particular empirical model and may not carry over to the real system.

Attempts to generalize the above framework meet with significant difficulties. For instance, relaxing the assumption of normal distributions is difficult because the measure would then depend on higher-order moments and thus depend on higher-order nonlinearities of the projection vector. Relaxing the linear model assumption loses contact with singular vector methods. Relaxing the perfect model scenario requires accounting for model error in the forecast distribution, which is a largely unsolved problem. The above framework also involves significant practical difficulties. For instance, the framework assumed all covariances were known, whereas in practice they must be estimated from relatively small samples. Also, the Mahalanobis norm is very sensitive to estimation errors in the variance of the trailing EOFs. However, an interesting by-product of the framework discussed in this paper is clarification of the fact that seemingly different statistical methods are fundamentally connected. For instance, predictable component analysis has been related to EOF analysis, SVD analysis, CCA, and linear regression. These connections imply that estimation techniques which have proven to be effective in one statistical method can be applied directly to predictable component analysis.

## Acknowledgments

Comments from Ben Kirtman and two anonymous reviewers led to significant clarifications in this paper. The first author’s research was supported by the National Science Foundation (ATM0332910), National Aeronautics and Space Administration (NNG04GG46G), and the National Oceanographic and Atmospheric Administration (NA04OAR4310034). The second author’s research was supported by a Grant/Cooperative Agreement from the National Oceanic and Atmospheric Administration, NA05OAR4311004. The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Timothy DelSole, 4041 Powder Mill Rd., Suite 302, Calverton, MD 20705. Email: delsole@cola.iges.org