## 1. Introduction

A large body of observational, theoretical, and numerical evidence supports the hypothesis that the climate state is predictable beyond 2 weeks (Shukla 1998; Shukla and Kinter 2006). This evidence is based almost entirely on analysis of time-averaged fields. For instance, climate predictability, such as that due to anthropogenic forcing, is routinely established on the basis of 50-yr means (Hegerl et al. 2007). Seasonal-to-interannual predictability is established almost exclusively on the basis of 3-month averages. Intraseasonal predictability, if any, is often claimed on the basis of 5- to 14-day averages. The reason for using time averages in these cases is that the predictability cannot be detected easily otherwise. That is, if one computes predictability, based on instantaneous states, predictability decays until essentially all predictability is lost after 2 weeks (Simmons and Hollingsworth 2002). Yet, if there is predictability of time averages after 2 weeks, then not all predictability of instantaneous states can be lost after 2 weeks. A plausible explanation for the near vanishing of predictability after 2 weeks is that the predictable part beyond 2 weeks explains relatively little variance and hence is obscured by weather noise.

If the predictability beyond weeks arises from slowly varying structures, then time averaging will reduce the weather noise without appreciably reducing the predictable signal, thereby increasing the signal-to-noise ratio. Nevertheless, there are three obvious problems with using time averaging to diagnose predictability: time averages cannot resolve variations on time scales shorter than the averaging window, time averaging fails to capture structures that are predictable but not persistent, and the ability of time averaging to separate signal from noise is limited by the number of samples in the averaging window. An example of a predictable structure that is not persistent is a propagating wave packet—if the time averaging window exceeds the time during which the packet remains in a domain, then time averaging within the domain removes the packet.

Time averaging is effective in predictability analysis when different predictable phenomena are characterized by different time scales. An equally plausible assumption is that different predictable phenomena are characterized by different spatial structures. The latter assumption implies that spatial filters could be constructed to optimally estimate the amplitude of the structure. Such filters would allow predictable structures to be diagnosed even if they are not persistent and even if they vary on time scales shorter than typical averaging windows. Filters based on optimal projection vectors are a common tool in signal processing theory and are used extensively in climate and seasonal predictability under the names optimal fingerprinting and signal-to-noise empirical orthogonal functions (EOFs). These techniques are now recognized to be generalizations of linear regression theory and emerge naturally in a framework based on information theory (DelSole and Tippett 2007).

There are at least two bases for constructing spatial filters for diagnosing predictability beyond 2 weeks. The first is to identify structures that are highly persistent. For instance, DelSole (2001) proposed a procedure called optimal persistence analysis (OPA) that identifies structures with the longest time scales, as measured by the integral time scale. This approach, however, fails to capture predictability that is not persistent. The second is to identify structures that are highly predictable, relative to a chosen forecast model. Schneider and Griffies (1999) proposed a procedure called predictable component analysis that identifies structures whose forecast spread differs as much as possible from its climatological spread. One drawback of this method is that it optimizes predictability at a single lead time and hence returns structures that depend on lead time.

This paper proposes a new approach to constructing projection vectors for diagnosing predictability. The main idea is to optimize predictability integrated over all time lags, thereby removing the lead time dependence of conventional decomposition approaches (e.g., predictable component analysis, signal-to-noise EOFs, canonical correlation analysis). The integral of predictability, called the average predictability time (APT), was proposed in DelSole and Tippett (2009, hereafter Part I) and shown to posses several attractive properties, as reviewed in the next section. The general procedure for decomposing APT is discussed in section 3, and the specific case of linear regression models is discussed in section 4. The estimation of APT from finite time series is addressed in section 5. The result of applying this method to 1000-hPa zonal velocity fields is discussed in section 6, where the method is shown to seamlessly diagnose predictability on time scales from 6 h to decades. We conclude with a summary and discussion of our results.

## 2. Review of average predictability time

**Σ**

*is the forecast covariance matrix at lead time*

_{τ}*τ*,

**Σ**

_{∞}is the climatological covariance matrix, and

*K*is the state dimension; also, tr [·] denotes the trace operator. In one dimension, the Mahalanobis signal reduces to 1 minus the normalized error variancewhich in turn is the explained variance of a linear regression model. Therefore, the Mahalanobis signal can be interpreted as the multivariate generalization of a familiar measure of predictability. The Mahalanobis signal equals unity for a perfect deterministic forecast (i.e.,

**Σ**

*= 0) and vanishes for a forecast that is no better than a randomly drawn state from the climatological distribution (i.e.,*

_{τ}**Σ**

*=*

_{τ}**Σ**

_{∞}). Because in this paper only discrete time is considered, the APT

*S*is defined to be twice the sum of the Mahalanobis signal over all positive lead times:The factor of 2 makes APT agree with the usual

*e*-folding time in the univariate case. Important properties of APT include the following: 1) It is invariant to nonsingular, linear transformations of the variables and hence is independent of the arbitrary basis set used to represent the state; 2) upper and lower bounds on the APT of linear stochastic models can be derived from the dynamical eigenvalues; and 3) it is related to the shape of the power spectra and hence clarifies the relation between predictability and power spectra.

## 3. Decomposition of average predictability time

**x**through the inner product

**q**

^{T}

**x**, where

**q**is a projection vector and the superscript T denotes the transpose operation. We seek the projection vector that optimizes APT. The component

**q**

^{T}

**x**has forecast and climatological variancesSubstituting (5) into (4) givesThus, the problem of finding the projection vector

**q**that optimizes APT is equivalent to finding the projection vector that optimizes (6). This optimization problem can be solved by noting that (6) is a Rayleigh quotient. Specifically, a standard theorem in linear algebra (Noble and Daniel 1988) states that the vectors that optimize a Rayleigh quotient of the form (6) are given by the eigenvectors of the generalized eigenvalue problemBecause 𝗚 and

**Σ**

_{∞}are symmetric, the eigenvectors of (8) produce components that are uncorrelated. The eigenvalue

*λ*gives the value of APT associated with each eigenvector. It is convention to order the eigenvectors by decreasing order of eigenvalue, in which case the first eigenvector maximizes APT, the second maximizes APT subject to being uncorrelated with the first, and so on.

**p**can be obtained simply by projecting the component time series

**q**

^{T}

**x**on the original data, which giveswhere the angle brackets denote an average over the climatological distribution; also, we have assumed without loss of generality that the climatological mean vanishes and that the component has unit variance—that is,

*σ*

_{∞}

^{2}=

**q**

^{T}

**Σ**

_{∞}

**q**= 1. With this notation, the state

**x**can be decomposed into components that optimize APT aswhere the projection patterns

**q**and spatial patterns

**p**have been ordered in decreasing order of APT. The above decomposition is analogous to the manner in which principal component (PC) analysis decomposes data by variance, except that here we are decomposing APT.

Because the component time series are uncorrelated, the total variance is the sum of the variance explained by each component. Because the components have unit variance, if **p** represents the state vector on a grid, then the absolute value of each element of the vector **p** is the standard deviation at that grid point of the component in question. Furthermore, the total variance due to the component is given by **p**^{T}**p**. The fraction of variance explained by the component is therefore **p**^{T}**p** divided by the total variance 〈**x**^{T}**x**〉.

## 4. Decomposition of APT for linear regression models

**x**

_{t+τ}based on the present state

**x**

*is of the formwhere 𝗟*

_{t}_{τ}is a regression operator and the hat denotes a predicted quantity. The regression operator that minimizes the mean square prediction error is a standard problem with solutionwhere 𝗖

*is the time-lagged covariance matrix, defined asfor zero mean stationary processes, 𝗖*

_{τ}_{0}=

**Σ**

_{∞}. The covariance matrix of forecast error is therefore given bywhere we have used (12). Substituting this forecast covariance matrix into the generalized eigenvalue problem (8) givesThus, the components that maximize the APT of a linear regression model are obtained by solving the generalized eigenvalue problem (15). Note that this solution depends only on time-lagged covariances, which can be estimated from data.

*R*

_{τ}^{2}is called the coefficient of multiple determination; in predictability theory, this term is called the explained variance. This term measures the amount of variation in the predictand that is accounted for by the variation in the predictors. In the present case, the predictors are the state variables at the initial time and the predictand is a component of the state vector at the final time. The positive square root of the coefficient of multiple determination,

*R*, is called the multiple correlation coefficient. The multiple correlation coefficient is the correlation between the predictand and the predicted value of the predictand by the regression equation. In this sense, the multiple correlation can be interpreted as a multivariate generalization of the familiar correlation coefficient. Because (18) is a Rayleigh quotient, it follows immediately that the

_{τ}**q**that maximizes the multiple correlation at fixed

*τ*is given by the leading eigenvector ofThis eigenvalue problem is precisely that which arises in canonical correlation analysis (CCA). CCA finds components in two datasets that are maximally correlated; in the present case, the two datasets are

**x**

_{t+τ}and

**x**

*. Thus, the decomposition of APT is closely related to CCA, the main difference being that CCA maximizes the multiple correlation at one lag, whereas the decomposition of APT maximizes the sum of squared multiple correlations at all lags. This connection shows that the procedure we propose is an extension of a familiar procedure in predictability analysis.*

_{t}*ρ*is the autocorrelation function of a component and

_{τ}*τ*is the lag, because

*T*

_{2}often is used as a measure of the time scale of a process. For instance, DelSole (2001) used this form to define optimal persistence patterns and showed that this expression is preferable to the traditional integral time scale if the autocorrelation function oscillates between positive and negative values. This expression also arises in statistical theory, as will be explained in section 5. We now show thatTo prove (21), we first recall the extended Cauchy–Schwartz inequality (Johnson and Wichern 1982)Substituting

**x**=

**q**and

**y**= 𝗖

_{τ}^{T}

**q**giveswhich states the simple fact that

*ρ*

_{τ}^{2}≤

*R*

_{τ}^{2}; that is, the autocorrelation is always less than or equal to the multiple correlation, for a given component. This inequality is consistent with the principle that adding predictors to a regression equation can never decrease the explained variance (recall that

*ρ*

_{τ}^{2}is the explained variance of a prediction of a component based on its time-lagged value alone, whereas

*R*

_{τ}^{2}is the explained variance of a prediction of a component based on time-lagged values of all the components). Because inequality (23) holds for each lead time

*τ*, it also holds for the sum over all lead times, which proves (21).

Inequality (21) reveals a key distinction between optimal persistence analysis, which maximizes *T*_{2}, and maximization of APT, which maximizes *S*. In particular, the persistence time of a component bounds the APT of a component. This fact follows naturally from the fact that univariate prediction is less accurate than multivariate prediction (or at most, equally accurate). This difference becomes especially important if a system is characterized by propagating structures. In particular, a propagating structure can be captured by a regression model if the model contains several predictors, which allows different components to interact with each other. In contrast, a single component cannot capture a propagating structure. Put another way, OPA requires the initial and final patterns to be the same, whereas predictable components allow these patterns to differ.

## 5. Estimation of APT

Two practical difficulties arise when attempting to decompose predictability by solving the eigenvalue problem (15) based on finite time series: 1) the time lag covariances cannot be estimated at arbitrarily large lags, and 2) the covariance matrices are not full rank if the dimension of the state space exceeds the number of samples. This second difficulty is a reflection of the more general problem that supervised learning methods such as regression are vulnerable to overfitting. We discuss methods for dealing with these issues in this section.

### a. Lag window

*y*are weights, called the lag window, that decrease with the absolute value of the time lag,

_{τ}*M*is an upper bound called the truncation point, and primes denote sample estimates. The numerous desirable properties of estimators based on lag windows have been discussed by Jenkins and Watts (1968). A particularly attractive lag window for our purposes is the Parzen window, defined asAn important property of the Parzen window is that it never produces negative spectra. This property implies that the Parzen window never produces negative APT.

### b. Principal component truncation

The other practical difficulty with solving the eigenvalue problem (8) is that the matrices are not full rank when the state dimension exceeds the number of samples. A similar problem occurs in classical linear regression and a typical procedure is to reduce the dimension of the space by projecting the data onto a few leading principal components. Accordingly, we determine the predictable components only in the space spanned by the leading *K* principal components. Because the results are invariant to invertible linear transformation, we may, after deciding the number of components *K*, normalize the *K* principal components to unit variance. This transformation transforms the climatological covariance matrix into the identity matrix (i.e., **Σ**_{∞} = 𝗜) and transforms the time-lagged covariance matrix 𝗖* _{τ}* into a correlation matrix (i.e., a matrix in which each element is the time-lagged correlation between the corresponding principal components). After this transformation, the Mahalanobis signal equals 1/

*K*times the Frobenius norm of the correlation matrix.

A critical parameter in this approach is the number of principal components—if the number of PCs is too small, then the basis set may fail to capture important structures, but if the number of PCs is too large, then overfitting becomes a problem and APT is overestimated. A standard method for selecting the number of PCs is to construct multivariate regression models for the principal components and then find the number of PCs that minimizes the cross-validated mean square error of the models. Unfortunately, this approach is too computationally demanding to be useful because our problem requires computing all time lag covariances within a specified range of lags.

Given the connection between CCA and the new decomposition method, it is not unreasonable to use a typical truncation value as used in CCA. For seasonal predictability studies based on 50 yr of data, a typical truncation value is about a dozen principal components (Barnston and Smith 1996). The present study differs from previous studies in that we include time lags from hours to months, plus we include all seasons together. Thus, the number of samples in this study is three orders of magnitude larger than that used in previous studies, but the samples are not independent. In light of these differences, we suggest that a truncation point of 50 principal components is acceptable. The appropriateness of choosing 50 PCs will be justified a posteriori by the validation procedure and significance test described below.

To validate the predictable components, we adopt the following procedure. First, we split the data into two parts: a training sample for estimating statistical parameters and an assessment sample for testing forecasts. The predictable components are derived by solving (8) using sample covariance matrices estimated from the training sample. Then, the linear regression models derived from the training sample [i.e., derived from (12)] are used to make forecasts in the assessment sample. Because, theoretically, a regression model predicts the mean of the forecast distribution, the difference between the regression forecast and verification represents a random draw from the forecast error distribution; hence, its covariance matrix estimates the forecast covariance matrix **Σ*** _{τ}*. We estimate the forecast error covariance without removing the mean from the differences of forecast and verification. We estimate the climatological covariance matrix

**Σ**

_{∞}using the sample covariance matrix of the assessment sample. The resulting forecast and climatological covariance matrices are then substituted in (3) to evaluate APT. The APT in the assessment sample for an individual component is computed by projecting the component onto the estimated forecast and climatology covariances and then evaluating (4).

Since the data have been divided into training and assessment samples, there are at least two sample estimates of the climatological covariance **Σ**_{∞} that could be used to define the Mahalanobis signal. We choose the assessment samples for estimating **Σ**_{∞} for the following reason. It is well known that the constant that minimizes the mean square difference with respect to a given sample is the sample mean. This fact implies that the constant forecast with the smallest error variance is the sample mean. Consequently, choosing the assessment sample for estimating **Σ**_{∞} ensures that the sample Mahalanobis signal is never positive for a constant forecast. A positive Mahalanobis signal can occur only if the forecast is nonconstant and consistently “closer” to the verification than the assessment sample mean. Since this measure gives no “reward” for predicting the climatological mean of the assessment sample, it is a conservative estimate of predictability.

### c. Statistical significance

*S*= 0) is not very interesting. However, in other contexts in which the predictability is in doubt, such as perhaps decadal predictability of annual averages, this test may be appropriate.

To test the significance of the predictability of a single component, we first derive the regression model and predictable components with one dataset and assess predictability with another dataset. This separation of training and assessment samples eliminates the artificial skill expected when the same data are used to both train and validate the model. The fundamental question is whether the forecast error is systematically larger than the error of a prediction based on the climatological mean. As we shall see, the required significance test for the Mahalanobis signal does not appear to be standard. Therefore, we develop a new significance test appropriate to this question.

*υ*be the verification of the

_{i}*i*th forecast, and assume that it is independent and normally distributed with mean

*μ*and variance

*σ*

^{2}, a relation we denote asLet

*υ*in the assessment sample, assumed to be of size

_{i}*N*. The sample mean under these assumptions is distributed asLet

*υ̂*be the prediction of

_{i}*υ*. An appropriate null hypothesis is that the forecast

_{i}*υ̂*is drawn independently from the same distribution as

_{i}*χ*

_{ν}^{2}denotes a chi-squared distribution with

*ν*degrees of freedom. However, the variance ratio appearing in (30) does not have an

*F*-type distribution because the SSE and SSA are not independent, as required for

*F*-type distributions. On the other hand, the distribution of the ratio SSE/SSA is independent of the mean and variance of the normal distribution and hence depends only on the sample size

*N*. Therefore, the sampling distribution of

*S*′ can be estimated by Monte Carlo methods as follows: First,

*N*independent random numbers are drawn from the distribution (27), representing verifications, and an additional

*N*independent random numbers are drawn from the distribution (28), representing forecasts. The sum square difference between these two sets of random variables defines SSE, while the sum square difference between the first sample and its sample mean defines SSA. This procedure is repeated for 10 000 trials, after which the resulting variance ratios are sorted in ascending order and the 100th element is selected, corresponding to the 1% significance level.

*N*used in the significance test procedure is not simply the number of time steps in the assessment sample (because the time series are correlated). We address this issue by using the

*effective*number of independent samples for

*N*. To estimate the effective number of independent samples, we invoke the fact that the variance of the sample autocorrelation of moving average model of any order iswhere

*ρ̂*is the sample autocorrelation at lag

_{s}*s*,

*ρ*is the autocorrelation function of the moving average model, and

_{τ}*N*is the number of samples used to estimate the sample autocorrelation (Brockwell and Davis 1991, p. 223). Note the similarity between (33) and APT (20). This similarity suggests that a reasonable value for the effective number of independent samples is

*N*/

*T*

_{2}. However,

*T*

_{2}is not readily available from the results of the analysis, whereas the APT value

*S*is. Therefore, for convenience, we estimate the effective number of degrees of freedom asInequality (21) implies that the effective number of degrees of freedom defined in (34) is always smaller than would be estimated from

*N*/

*T*. It follows that this choice leads to a more conservative significance test, in the sense that it tends to accept the null hypothesis of no predictability when it should reject it.

_{2}## 6. Example: 1000-hPa zonal velocity

A natural observational dataset for diagnosing predictability is the National Centers for Environmental Prediction National Center for Atmospheric Research (NCEP–NCAR) reanalysis set (Kalnay et al. 1996), which provides more than 50 yr of continuous, global data at 6-hourly time intervals. Because APT is invariant to linear transformations, variables with different units or natural variability can be included in the state vector without normalization. Nevertheless, we found 1000-hPa zonal velocity alone to be sufficient for our purpose because this variable contains signals from ENSO, the Madden–Julian oscillation (MJO), and weather. We refer to this variable as U1000.

Before computing principal components, we subtract the time mean and the first two harmonics of the annual and diurnal periods from each grid point. In addition, the time series at each grid point was standardized by dividing by its standard deviation. This standardization is necessary because U1000 variance in the tropics is small relative to midlatitudes, so truncating principal components with weak variances effectively removes tropical variability. Also, each grid point was multiplied by the square root of the cosine of latitude to conform to an area-weighted measure.

The U1000 field was sampled every 6 h during the 50-yr period 1 January 1956 to 31 December 2005, which corresponds to 73 052 time steps. Furthermore, the U1000 field is represented on a 2.5° × 2.5° grid, corresponding to a 10 368-dimensional state vector. These dimensions are too large to permit numerical solution of principal components. Accordingly, the principal components were computed only for data sampled every 3 days and 6 h, corresponding to 5619 time steps; this sampling interval was chosen to produce a subset as large as numerically feasible while also avoiding sampling the same time of day. After computing principal components, the spatial part of the component—the empirical orthogonal function—was projected onto the original data to compute all 73 052 time steps for each component. We verified that the principal component values of the full set and subsampled set coincided at the appropriate times. Furthermore, the amount of variance computed from the full data was approximately the same as the amount of variance computed from the subset data, indicating that the subsampling method introduced little bias.

As discussed in the previous section, we use the first 50 PCs to represent the state space. Predictable components and regression models were derived using PCs from the earlier 25-yr period 1956–80; the PCs in the later 25-yr period 1981–2005 were reserved for assessment. In computing these quantities, time-lagged covariances were derived for each lead time starting at 6 h and increasing in increments of 6 h until 180 days was reached (results are not sensitive to the choice of upper limit as long as the limit exceeds 15 days).

The total predictability of the regression models, as measured by the Mahalanobis signal (1), is shown in Fig. 1. The Mahalanobis signal also measures forecast skill because it measures the ratio of prediction error to climatological variance. We emphasize that these are legitimate forecasts: the regression models were derived from the training period 1956–80 and then used without modification to generate forecasts in the independent period 1981–2005. As expected, the total predictability decreases monotonically with lead time. The traditional interpretation of the top panel in Fig. 1 is that predictability essentially disappears after about 2 weeks, consistent with the time scale often invoked for weather predictability. If this interpretation were true, then the predictability of individual components also should disappear after 2 weeks. The actual predictability of individual components is shown in the bottom panel of Fig. 1. In contrast to the top panel, several components have significant predictability well after 2 weeks. In plotting the values for individual components, we mask out results that are not significant at the 1% level (according to the procedure discussed in the previous section); thus, some curves stop abruptly near the bottom of the panel.

Recall that the multiple correlation coefficient gives the fraction of variance explained by the regression model for the component in question. Thus, Fig. 1 shows that at 2 weeks about 1% of the total variance can be explained by the model; however, about 30% of the variance of one component can be explained by the model.

If predictability and the regression models were computed from the same sample, then the average of the bottom curves should equal the top curve in Fig. 1. In the present case, however, this correspondence is not ensured because these items were computed from different samples. Nevertheless, the difference between the total predictability and the average of individual components is always less than 0.01, indicating acceptable consistency of results derived from separate samples.

The training sample APTs of the first 30 components are shown in Fig. 2. We see that the first two or three components are separated from the others. The leading component, however, fails to be sustained in the assessment sample, as will be discussed in more detail in section 6b. Only the first five components have APTs exceeding 2 weeks. The fraction of variance explained by the individual components is shown in the bottom panel of Fig. 2. We see that predictable components generally explain about 1% of the variance.

### a. Explained variance

The fact that regression models explain about 1% of the total variance at 2 weeks and that individual components themselves explain about 1% of the variance might lead one to conclude that the predictability beyond 2 weeks is too small to be physically significant. However, one should recognize that the predictable components are computed for 6-hourly data and represented on a global domain. Any predictability that exists after 2 weeks, such as that due to climate change, ENSO, or intraseasonal oscillations, is expected to explain relatively little variance on 6-h time scales. Thus, it is presumptuous to dismiss the predictable components derived beyond 2 weeks simply because they explain relatively little variance on 6-hourly time scales. The question is whether the predictability identified here after 2 weeks relates to the predictability believed to occur on long time scales. The usual method for identifying predictability on longer time scales is to filter out short time-scale weather fluctuations by time averaging and then attempt to identify predictability in the time average states. We will quantify this predictability as described next.

**v**be the time series of a predictable component obtained by the projectionwhere

**q**is the projection vector for the predictable component and 𝗫 is a data matrix organized such that the

*n*th column vector specifies the state at the

*n*th time step. The time mean of 𝗫 has been removed and

**q**is normalized such that the time series has unit variance (i.e., such that

**v**

^{T}

**v**= 1). To compute the variance explained by time series

**v**, we consider a least squares problem in which we seek a vector

**p**that comes as close to 𝗫 as possible, in the sense that it minimizesThe above distance measure is called the Frobenius norm. The vector

**p**that minimizes (36) iswhich can be interpreted as the projection of time series

**v**onto 𝗫. The variance explained by time series

**v**is defined asA similar reasoning shows that the variance explained at point

*i*is the

*i*th diagonal element of 𝗫

**vv**

^{T}𝗫

^{T}/

*N*. To compute relative variances, these two variances should be normalized by different factors. Specifically, for given time series

**v**, the fraction of global variance is

**v**

^{T}𝗫

^{T}𝗫

**v**/tr[𝗫𝗫

^{T}], while the fraction of variance explained at point

*i*is (𝗫

**vv**

^{T}𝗫

^{T})

*/(𝗫𝗫*

_{ii}^{T})

*. Potentially, a time series may explain very little global variance but a large fraction of local variance.*

_{ii}*w*. Let

**v**

*be a time series in which each consecutive interval of length*

_{w}*w*is replaced by the mean of

**v**in that interval. This time series can be written asand where

**1**is a column vector of ones of dimension

*w*. Similarly, let 𝗫

*be the data matrix in which each consecutive set of*

_{w}*w*columns are replaced by the mean of the column vectors in that section. This time-averaged data matrix can be written asIt follows that the variance explained by

**v**

*is*

_{w}**v**

_{w}^{T}𝗫

_{w}^{T}𝗫

_{w}**v**

*/*

_{w}*N*. The latter expression can be interpreted as simply projecting the time-filtered time series

**v**

*onto the time-filtered data matrix 𝗫*

_{w}*. Note, however, that 𝗙*

_{w}*𝗙*

_{w}*= 𝗙*

_{w}*; that is, 𝗙*

_{w}*is idempotent. It follows thatwhich implies that only the filtered time series*

_{w}**v**

*need be computed. This fact allows us to compute the explained variance without computing 𝗫*

_{w}*, which results in significant computational savings. To compute the fraction of variance explained by*

_{w}**v**

*, the diagonal elements of 𝗫*

_{w}*𝗫*

_{w}

_{w}^{T}are computed once for each

*w*and archived.

The fraction of global variance explained by individual components as a function of averaging window *w* is shown in Fig. 3. We see that although the leading predictable components explain only about 1% of the variance on 1-day time scales, they explain several times more variance at 90-day time scales. Note also that components 4–9 explain more variance at 2-week time scales than the leading components. The spatial distribution of the fraction of variance explained by the sum of variances due to components 2–7 is shown in Fig. 4. Summing these variances is appropriate because the components are quasi-independent—the absolute correlation between all pairwise 90-day mean time series for these components is at most 0.26, with the vast majority being less than 0.1. We see that the leading predictable components explain as much as 70% of the variance in the central equatorial Pacific, and 50% of the southern midlatitude variances in the Indian and Pacific Oceans. These results lead us to conclude that the leading predictable components explain a significant amount of variance on seasonal time scales and in certain geographic locations. These results also reveal the error of dismissing components simply because they explain small amounts of variance on short time scales or on global space scales. To be clear, we emphasize that time averaging was not used in any way in our technique; it was used only to compare the results of the analysis to observations.

The following sections examine individual predictable components in closer detail.

### b. Climate predictability

The leading predictable component is shown in Fig. 5. Although this component has large APT in the training period 1956–80, it has relatively little APT in the assessment period 1981–2005. A plausible explanation for this discrepancy is that the time series has significant low-frequency oscillations and trends in the first 20 years but not in the second half of the data. Low-frequency oscillations often produce significant sample autocorrelations at long lags, which in turn produce a large APT. This tendency is consistent with the goal of identifying predictable components because trends and other low-frequency fluctuations are highly predictable. The fact that the secular trend is significant prior to 1980 and much smaller after 1980 explains why predictability identified in the training period fails to verify in the assessment period. Indeed, because the trend is substantially reduced in the assessment period, the Mahalanobis signal evaluated in the assessment period is negative. The negative value arises because the linear regression model for this component attempts to extrapolate the trend on long time scales and hence overpredicts the amplitude of the component in the second half of the period when the trend relaxes. Therefore, the forecast error variance tends to exceed the climatological variance, leading to negative Mahalanobis signals. Because the predictability of this component is not consistent between the training and assessment samples, we omit this component in the computation of the APT shown in the top panel of Fig. 1. Whether this inconsistency arises from sampling errors or a physical difference is difficult to determine because the variability in question was temporary.

Because trends are commonly associated with climate predictability, we claim that the first component pertains to climate predictability. A remarkable fact about this conclusion is that the amplitude of the pattern is given every 6 h. In other words, climate predictability has been identified without time averaging.

### c. Seasonal predictability

The second leading component is shown in Fig. 6. For comparison purposes, we have plotted the value of the Niño-3 index as the thick red line in the bottom panel. We see that the two time series are strongly correlated with each other. The strong similarity between the two time series leads us to conclude that the second predictable component captures predictability due to ENSO. Consistent with this conclusion, the amplitude of the pattern is concentrated in the Pacific Ocean. It is noteworthy that the component tracks the ENSO signal after 1981 even though the component was computed using data only from the period prior to 1981.

### d. Mixed time-scale predictability

The third leading component is shown in Fig. 7. The time series shows variability primarily on decadal and shorter time scales in the training sample. In addition, a long-term trend is apparent in the assessment sample. The component has pronounced zonal structure in the Southern Hemisphere (SH). Annually averaged SH surface westerlies have a maximum extending from about 35° to 60°S, where U1000 reaches 10 m s^{−1}. Therefore, the structure of the third component in this region and the negative sign of the trend are consistent with a strengthening and poleward shift of the surface westerlies. Such trends in SH circulation have been interpreted as trends in the southern annular mode or Antarctic Oscillation (AAO) (Thompson and Solomon 2002); the structure of the third component in SH is consistent with the low phase of the AAO. The structure of the third component in the Northern Hemisphere (NH) is less zonally symmetric, with centers of action in the northern Atlantic and northern Pacific that are reminiscent of the Arctic Oscillation (AO). The sign of the pattern is consistent with the low phase of the AO, which is associated with a southern shift of westerlies in the north Atlantic.

To establish the connection between the annular modes and the third component, we compute monthly AAO and AO indices following the Climate Prediction Center (CPC) definition (http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/teleconnections.shtml) of the AAO (AO) index as the first area-weighted PC of the monthly 700- (1000) mb geopotential height anomalies over the region 20°–90°S (20°–90°N). The period 1956–2006 is used for both PCs although CPC uses the period from 1979 for the AAO because of the lack of observational data in the SH. The correlation between the AAO and AO PCs is 0.08, and adding them together gives a global “annular mode index.” This index, as well as the regression of this index with monthly 1000-mb zonal flow anomalies, is shown in Fig. 8. The regression pattern closely resembles the spatial pattern of the third component.

### e. Intraseasonal predictability

The fourth leading component is shown in Fig. 9. The variability on seasonal and decadal time scales does not stand out as strongly as in the previous components, consistent with the fact that its multiple correlation approaches zero at long lags. The structure of this component is concentrated in the Pacific Ocean. The localization in the Pacific and the statistically significant autocorrelations up to 15 days suggests that this component is associated with intraseasonal predictability. Perhaps the most well-known phenomenon on this time scale is the Madden–Julian oscillation. It is of interest to determine whether the next few components capture this variability. To test this possibility, we first examined the cross correlations between the first 10 components and found that modes 4–7 had statistically significant correlations at 10-day lags, suggesting that these components act in quadrature, as would be expected for a propagating disturbance such as the MJO. Therefore, we reconstructed the U1000 field using the leading seven predictable components. The average of this field over 15°S and 15°N over the June 1996–June 1997 period is plotted in Fig. 10. For comparison purposes, we also have plotted the 200-hPa zonal velocity as reconstructed using the real-time multivariate MJO (RMM) technique of Wheeler and Hendon (2004). In the RMM method, the MJO is described by the first two multivariate EOFs of daily near-equatorially averaged 850-hPa zonal wind, 200-hPa zonal wind, and satellite-observed outgoing longwave radiation (OLR) data with the annual cycle (three harmonics) and ENSO variability removed; unlike Wheeler and Hendon (2004), we do not remove the mean of the previous 120 days. We see that the structure of the fields compare fairly well, although the amplitude of the predictable reconstruction is smaller and reversed in sign. These differences are plausible consequences of using U1000: lower-level winds are weaker than upper-level winds, and the MJO is a baroclinic phenomenon in which low-level variability tends to be negatively correlated with upper-level variability (Madden and Julian 1994).

## 7. Summary and discussion

This paper proposes a new method for diagnosing predictability on multiple time scales. The method is to find projection vectors that maximize the average predictability time (APT), as measured by the integral of predictability. If predictability is measured by the Mahalanobis signal, then this optimization problem can be reduced to a standard eigenvalue problem. If the prediction model is based on linear regression, then the general decomposition is closely related to canonical correlation analysis, except that instead of maximizing the correlation at a single time lag, the method maximizes the multiple correlation over many time lags. The solution to the optimization problem leads to a complete set of components that are uncorrelated in time and can be ordered by their contribution to the APT, analogous to the way in which principal component analysis decomposes variance. Specifically, the first component maximizes APT, the second component maximizes APT time subject to being uncorrelated with the first, and so on. The state at any time can be represented as a linear combination of predictable components. Furthermore, the variance at any point can be expressed in terms of the variance of the individual components because the corresponding time series are orthogonal. Importantly, the results are invariant to nonsingular linear transformations and hence allow variables with different units and natural variances to be mixed in the same state vector.

In practice, the decomposition method requires the time dimension to exceed the state dimension. This requirement can be satisfied by projecting the data onto the leading principal components. A simple truncated sum of predictability values leads to an inconsistent measure of predictability time, just as a Fourier transform of a covariance function leads to an inconsistent estimate of the power spectrum. Drawing on spectral estimation theory, we suggest that lag windows used in spectral estimation theory can be applied to the estimation of APT. A critical question is the statistical significance of the results. Unfortunately, conventional cross validation techniques are computationally prohibitive if the number of lags is large. In this study, we took advantage of the long time series by computing the components using the first half of the data and assessing them in the second half. The significance test for the resulting APT is not standard but can be straightforwardly computed by Monte Carlo methods. This test accounts for the fact that the effective number of degrees of freedom depends on the APT of the component.

We applied the decomposition method to 1000-hPa zonal velocity (U1000). The data were sampled every 6 h for a total of 50 yr, giving 73 052 time steps, and projected onto the leading 50 principal components. The forecast model for this study was a linear regression model, which was estimated for each lag from 6 h to 180 days using only the first 25 yr of data. The resulting models then were used to make forecasts in the second 25-yr dataset. The predictability of this model, as measured by the Mahalanobis signal, decreased monotonically with lead time, reaching about 1% after 2 weeks. However, at least four components, as derived from the first half of the data, still were predictable in the second half after 2 weeks in a statistically significant sense. On the other hand, none of the components explained more than 1.5% of the variance of the data. Although this level of explained variance may seem small, this fact is misleading because the variance is compared to the variability on 6-hourly time scales and on global space scales. For instance, the leading components were shown to explain a large fraction of variance (e.g., > 70%) on 90-day time scales in certain geographic locations. Thus, the leading components can be identified with seasonally predictable structures. The first and third predictable components have trends and thus were identified with structures associated with climate predictability. Other leading predictable components were predictable on 2-week time scales and identified with structures associated with intraseasonal predictability. Indeed, a reconstruction of equatorial U1000 revealed eastward-propagating structures reminiscent of the MJO.

Importantly, the predictability on climate, seasonal, intraseasonal, and weather time scales found here was detected in 6-hourly data without time averaging. Furthermore, this predictability was detected in a single framework that allowed the separate structures to be isolated.

## Acknowledgments

We thank two anonymous reviewers for comments that lead to an improved manuscript. We thank Edmund Chang for useful comments on annular modes. This research was supported by the National Science Foundation (ATM0332910), the National Aeronautics and Space Administration (NNG04GG46G), and the National Oceanic and Atmospheric Administration (NA04OAR4310034 and NA05OAR4311004). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

## REFERENCES

Barnston, A. G., , and T. M. Smith, 1996: Specification and prediction of global surface temperature and precipitation from global SST using CCA.

,*J. Climate***9****,**2660–2697.Brockwell, P. J., , and R. A. Davis, 1991:

*Time Series: Theory and Methods*. 2nd ed. Springer Verlag, 577 pp.DelSole, T., 2001: Optimally persistent patterns in time-varying fields.

,*J. Atmos. Sci.***58****,**1341–1356.DelSole, T., , and M. K. Tippett, 2007: Predictability: Recent insights from information theory.

, RG4002. doi:10.1029/2006RG000202.*Rev. Geophys.*DelSole, T., , and M. K. Tippett, 2009: Average predictability time. Part I: Theory.

,*J. Atmos. Sci.***66****,**1172–1187.Hegerl, G. C., and Coauthors, 2007: Understanding and attributing climate change.

*Climate Change 2007: The Physical Science Basis,*S. Solomon et al., Eds., Cambridge University Press, 663–745.Jenkins, G. M., , and D. G. Watts, 1968:

*Spectral Analysis and Its Applications*. Holden-Day, 525 pp.Johnson, R. A., , and D. W. Wichern, 1982:

*Applied Multivariate Statistical Analysis*. Prentice-Hall, 594 pp.Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.***77****,**437–471.Lütkepohl, H., 2005:

*New Introduction to Multiple Time Series Analysis*. Springer, 764 pp.Madden, R. A., , and P. R. Julian, 1994: Observations of the 40–50-day tropical oscillation—A review.

,*Mon. Wea. Rev.***122****,**814–837.Noble, B., , and J. W. Daniel, 1988:

*Applied Linear Algebra*. 3rd ed. Prentice-Hall, 521 pp.Schneider, T., , and S. Griffies, 1999: A conceptual framework for predictability studies.

,*J. Climate***12****,**3133–3155.Shukla, J., 1998: Predictability in the midst of chaos: A scientific basis for climate forecasting.

,*Science***282****,**728–731.Shukla, J., , and J. L. Kinter III, 2006: Predictability of seasonal climate variations: A pedagogical view.

*Predictability of Weather and Climate,*T. N. Palmer and R. Hagedorn, Eds., Cambridge University Press, 306–341.Simmons, A. J., , and A. Hollingsworth, 2002: Some aspects of the improvement in skill of numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***128****,**647–677.Thompson, D. W. J., , and S. Solomon, 2002: Interpretation of recent Southern Hemisphere climate change.

,*Science***296****,**895–899.Wheeler, M. C., , and H. H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction.

,*Mon. Wea. Rev.***132****,**1917–1932.