## 1. Introduction

This paper proposes a new measure of the overall predictability of a system, independent of lead time. Such a measure allows one system to be characterized as “more predictable” than another. One motivation for this new measure is that traditional measures of the overall predictability of a system often have no obvious generalization to multivariate systems. For instance, the limit of predictability, as defined by Lorenz (1969), is the time beyond which the mean square error exceeds a predefined threshold. However, mean square error depends on the coordinate system used to represent the state and is problematic if different variables with different units and natural variances are mixed. Alternatively, an integral time scale, defined as the integral of some moment of the autocorrelation function with respect to lag, is used often in turbulence studies. However, the autocorrelation function is even more restrictive than mean square error in that it is defined only for a single time series. The predictability time scale of a process is sometimes suggested to be related to the peak of its power spectrum. As an example, ENSO has a spectral peak around 4 yr (Julian and Chervin 1978), which under this assumption implies predictability for about 4 yr. However, real-time forecasts of ENSO demonstrate little skill beyond 1 yr, suggesting an inconsistency. Moreover, even if a reasonable relation between predictability and power spectra could be ascertained for the univariate case, the generalization to multivariate systems would not necessarily be straightforward.

Our proposal for overcoming these limitations has two key elements. The first is adoption of a measure of predictability that is applicable to multivariate systems and invariant to linear transformation of the variables—measures satisfying these properties have been proposed by Leung and North (1990), Schneider and Griffies (1999), and Kleeman (2002) (see DelSole and Tippett 2007 for a review)—and the second is the integration of this measure of predictability with respect to all lead times. The resulting measure has several attractive properties. First, the measure is consistent, in the sense that applying it to the same system represented using a different basis set gives the same result. Second, if the predictability decay follows the same form in different systems, then systems that are predictable at longer lead times will have a larger measure. We restrict our attention to predictability averaged over all initial conditions and call the resulting integral the average predictability time (APT). The APT may not be interpretable as a time scale if the decay of predictability is pathological, but nonetheless the integral still gives useful information about overall predictability.

When only predictability averaged over initial conditions is considered, two natural measures satisfying the above properties are mutual information and the Mahalanobis metric. Unfortunately, mutual information depends on the complete forecast and climatological distributions and is therefore difficult to estimate from finite time series. Even in the case of normally distributed variables, mutual information is difficult to integrate with respect to lead time. In contrast, the Mahalanobis metric depends only on second-order moments and is easier to integrate. Furthermore, it turns out that choosing the Mahalanobis metric as a basis for APT has several other attractive properties. Specifically, the resulting APT can be expressed in terms of power spectra, thus clarifying the connection between predictability and power spectra. This point is demonstrated in the present paper. In addition, the resulting APT can be decomposed into independent components that optimize it, analogous to the way that principal component analysis decomposes variance. This decomposition clarifies the usefulness of APT even if the system is characterized by a wide range of time scales—the decomposition separates different components according to their APT, allowing the full spectrum of APTs to be diagnosed. This decomposition will be derived and illustrated with meteorological data in Delsole and Tippett (2009, hereafter Part II).

The present paper may be summarized briefly as follows: In section 2, we review the measure on which the APT will be based, namely the Mahalanobis metric. We then show in section 3 that the resulting APT gives sensible results for one-dimensional stochastic models; that is, the APT is inversely proportional to the damping rate and comparable to traditional integral time scales in appropriate cases. On the other hand, we show that the APT gives clearly superior results compared to a widely used integral time scale. Next we discuss the relation between predictability and power spectra. Specifically, we show that the APT equals the integral of the squared modulus of the power spectrum and explain how APT depends on the shape of the power spectrum. In section 6 we evaluate the APT for autonomous, linear stochastic models excited by Gaussian white noise. In addition, we show how the multivariate expressions parallel the univariate results if the variables are transformed appropriately. In section 7, we show that the APT for a set of independent, uncoupled stochastic models equals the average APT of the individual systems. In section 8 we derive the relation between APT and multivariate power spectra and show that the minimum APT occurs when the system is a set of independent, uncoupled white noise processes, consistent with intuition. Bounds on the APT of linear stochastic models are derived in section 9 and a surprising interpretation of APT is discussed in section 10. The bounds are illustrated in section 11 with a simple stochastic model. We conclude with a summary and discussion of results.

## 2. Definition of average predictability time

Consider a system whose state vector is the *K*-dimensional vector **x**. Let the forecast distribution at a fixed lead time *τ* have covariance matrix **Σ*** _{τ}*. This covariance matrix quantifies the uncertainty in the forecast as a function of lead time. If the system is stationary and the forecast is independent of the initial condition at asymptotically long lead times, then the climatological distribution can be identified with the forecast distribution in the limit of large lead time (DelSole and Tippett 2007), namely

**Σ**

_{∞}.

**Σ**

*=*

_{τ}**Σ**

_{∞}; that is, it vanishes for a forecast that is no better than a randomly drawn state of the system. The Mahalanobis signal is appropriate only if forecast uncertainty is well characterized by second-moment statistics. As we shall see later, the factor 1/

*K*is used to ensure that a collection of independent, identical systems has the same APT as any subsystem. Schneider and Griffies (1999) note that this factor also allows the predictability of random processes with different dimensions to be compared. Given the above metric, we define APT by

*e*-folding time if the autocorrelation is exponential (see section 3). For discrete time, we define APT as

*τ*= 1.

## 3. One-dimensional case

*a*is a negative number and

*w*is a Gaussian white noise process with zero mean and time-lagged covariance given by

*δ*denotes the Dirac delta function. The solution to (4) at time

*τ*for a particular realization of the forcing can be solved by elementary methods as

*μ*

_{0}gives

*a*. The total variance of the ensemble is

*σ*

_{0}

^{2}= 0. In this case, the only source of uncertainty is the stochastic forcing, and the variance of the forecast ensemble given by (8) is

*σ*

_{∞}

^{2}= −

*σ*

_{w}^{2}/(2

*a*) from (8). If the initial condition is not perfect (i.e.,

*σ*

_{0}≠ 0), then the predictability is reduced relative to the perfect initial condition case.

*τ*. The APT is found by integrating this expression with respect to lead time and multiplying by 2. The result is

## 4. Connection between APT and autocorrelation

*c*is defined as

_{τ}*μ*is the stationary mean. The forecast error variance of this model is thus

*ρ*=

_{τ}*c*/

_{τ}*c*

_{0}and

*c*

_{0}=

*σ*

_{∞}

^{2}, in which case the autocorrelation can be written in terms of the forecast covariance as

*square*of the autocorrelation function. This function was proposed by DelSole (2001) as a measure of the time scale of a process; it emerges here as a measure of overall predictability. As a check, note that the time-lagged covariance for the stochastic model (4) is

*ρ*= exp(

_{τ}*aτ*). Substituting this into (17) gives

*S*= −1/

*a*, consistent with (10).

*ω*

_{0}→ 0 recovers the one-dimensional case

*S*= −1/

*a*. However, the limit

*ω*

_{0}→ ∞ gives

*S*= −1/2

*a*, or half the APT of the case with no oscillations. It is instructive to compare the APT with the familiar integral time scale

*ω*

_{0}= 0, then the integral time scale

*T*

_{1}coincides with the APT given in (11). This shows that the APT is consistent with the integral time scale

*T*

_{1}for random processes with nonoscillating autocorrelation functions. However, if

*ω*

_{0}≠ 0, then the integral time scale is less than the APT, and in fact

*T*

_{1}= 0 in the limit

*ω*

_{0}→ ∞ while

*S*= −1/2

*a*. Thus, the APT and integral time scale

*T*

_{1}have dramatically different dependence on the oscillation frequency. To illustrate this fact, consider the autocorrelations for select values of

*a*shown on the right-hand side of Fig. 1. As

*ω*

_{0}increases, the frequency of oscillation increases, but the bounding envelope remains the same. In the case

*a*= −1 and

*ω*

_{0}= 4, the integral time scale

*T*

_{1}evaluated from (22) is 1/17, but strong correlations persist well beyond this value. It is evident, then, that the integral time scale defined in (21) is not appropriate for oscillatory correlation functions because the oscillations lead to cancellations in the integral. In contrast, the APT gives a more appropriate estimate of the time scale of a process.

## 5. Connection between APT and power spectrum

We now derive the relation between APT and power spectra. Certain basic points about the relation between predictability and the power spectrum can be inferred by considering two extreme cases. First, white noise can be considered to be the least predictable process because its value at one time is completely independent of its value at any other time. The power spectrum of white noise is constant, or “flat.” Conversely, consider a perfect sine wave. A sine wave is perfectly predictable because the value at a finite number of times can be used to specify the value at all other times. The power spectrum of a sine wave is a delta function. These considerations suggest that strongly peaked power spectra correspond to highly predictable time series, whereas flat or near-flat power spectra correspond to weakly predictable time series.

*p*of a stationary process equals the Fourier transform of the time-lagged covariance

_{ω}*c*; specifically,

_{τ}*p*/

_{w}*σ*

_{∞}

^{2}the whitened power spectrum. By definition, the whitened power spectrum has a unit integral. The above result shows that the APT for a linear regression model is proportional to the integral of the squared modulus of the power spectrum. To see how this relation relates to the shape of the power spectrum, consider the identity

*p*=

*p*

It is perhaps worth mentioning that the above definition of peakiness is not the only one possible. For instance, the Burg entropy is defined as the integral of the log of the power spectrum, and it is known (Priestley 1981, p. 604) that its minimum value, for all power spectra with the same total power, is achieved when the spectrum is a constant (i.e., flat). We propose (24) as a measure of flatness because it has certain practical and theoretical advantages compared to other definitions; that is, it can be estimated from only second-order moments and can be decomposed into components ordered by their APT (the latter point will demonstrated in Part II).

*c*=

_{−τ}*c*). The power spectrum and autocorrelation function for various choices of

_{τ}*a*and

*ω*

_{0}are illustrated in Fig. 1. As expected, the autocorrelation function generally increases as

*a*approaches zero (for fixed

*τ*) and the peak of the power spectrum becomes sharper (i.e., less flat) as

*a*approaches zero. Results for different values of

*ω*

_{0}are shown in the right panel, but inferences of APT given the power spectrum are not as straightforward (e.g., they require visually integrating the square of the power).

These considerations clarify the error in relating predictability to the *location* of spectral peaks. Returning to the ENSO example mentioned in the introduction, we note that the Niño-3 index is tolerably fit by the autocorrelation function (19) with *a*^{−1} = 16 months and *ω*_{0}^{−1} = 48/(2*π*) months, as shown in Fig. 2 (We do not recommend this fitting procedure in general; it is used here primarily for illustration purposes.). These parameter values give *S* = 9.5 months, which differs considerably from the period of the spectral peak, which is 2*π*/*ω*_{0} = 48 months. In the limit of large *ω*_{0}, there is only a modest effect on *S*, *S* → −1/(2*a*) = 8. Thus, the location of the peak is almost irrelevant to the predictability time scale. The reason is that predictability depends on the peakiness of the power spectrum, which is more strongly controlled by the damping time −1/*a*, rather than on the location of the peak itself (*ω*_{0}).

## 6. Multivariate linear stochastic models

*K*×

*K*matrix, called the dynamical operator, and

**w**is a Gaussian white noise process with zero mean and covariance matrix 𝗤. The dynamical operator is assumed to be independent of time and stable—that is, it possesses

*K*distinct eigenvalues with negative real parts. Let

*λ*(𝗔) denote the

_{k}*k*th eigenvalue of 𝗔. Also, let the eigenvector decomposition of 𝗔 be

**Λ**is a diagonal matrix whose diagonal elements are the eigenvalues of 𝗔. As in the one-dimensional case, assume that the initial ensemble has zero variance. Tippett and Chang (2003) show that the ensemble of solutions to the stochastic model (28) has covariance matrix

*H*denotes the conjugate transpose, −

*H*denotes the inverse of the conjugate transpose,

*= 𝗖*

_{ij}*𝗗*

_{ij}*). [This result also is derived in DelSole and Tippett (2007); the stationary form of this solution can be found in Horn and Johnson (1999, p. 301)]. The solution (30) is the multivariate generalization of (8). A further standard result is that the forecast covariance matrix (30) also satisfies*

_{ij}**Σ**

_{∞}. Substituting (32) into the predictability measure (1) gives

*τ*. Adopting the convention that repeated indices are summed, we have

**Σ**

_{∞}into (34) gives

**Σ**

_{∞}

^{1/2}of the climatological covariance, which satisfies

*H*/2 denotes the Hermitian of the square root matrix. A square root matrix always exists for positive definite

**Σ**

_{∞}, although it is unique only up to a unitary matrix multiplied on the right-hand side. Substituting (36) into the predictability measure (33) and invoking standard properties of the trace, determinant, and exponential operators yields

## 7. Normal case

*λ*(𝗔) and diagonal noise covariance matrix 𝗤 having diagonal elements

_{k}*Q*. This case corresponds to a set of uncoupled, damped oscillators that are excited independently. In this case, the eigenvector matrix 𝗭 is the identity matrix, and the forecast covariance matrix computed from (30) is diagonal, with the

_{kk}*k*th diagonal element being

*Q*are strictly positive so that all the eigenmodes are excited and that the climatological is invertible. Substituting this expression into the predictability measure (1) gives

_{kk}*is a real, positive definite, diagonal matrix). Substituting this expression into the definition of the whitened dynamical operator (38) gives*

_{τ}^{−1}𝗔 𝗭 =

**Λ**and the fact that diagonal matrices commute. This result shows that if the noise covariance matrix 𝗤 is of the form (43), then

## 8. Relation between the power spectrum and predictability

*as*

_{ω}*can be related to the forecast covariance matrix*

_{τ}**Σ**

*only after a forecast model is invoked explicitly. Here we consider linear forecast models of the form*

_{τ}**x̂**is a prediction of

**x**and 𝗟

*is a linear operator. Determination of the linear operator that minimizes the sum of squared forecast errors is a standard regression problem with solution*

_{τ}_{τ}= exp(𝗔

*τ*), reflecting the close relation between linear stochastic models and linear regression models.

*τ*. However, for any stationary process, 𝗖

_{−τ}= 𝗖

*. Substituting this identity into the above equation implies*

_{τ}^{H}*S*

_{−τ}=

*S*; that is, our expression for

_{τ}*S*is an even function of lag. Therefore, we may sum this expression over positive and negative lags. Substituting the Wiener–Khinchin relation for the power spectrum, while noting that

_{τ}*S*

_{0}= 1, gives

_{τ}and

_{ω}are the whitened time lag covariance and power spectrum matrices, respectively, defined as

## 9. Bounds on APT

*n*= 1, 2, …,

*K*. As is well known, any increasing, convex function preserves majorization (Horn and Johnson 1999, p. 173). Because exp(2

*x*) is an increasing, convex function, it follows that an upper bound on Mahalanobis signal is

_{s}= (

^{H})/2 is the symmetric part of

*appearing in the upper bound (66) arises in a variety of contexts related to the instantaneous rate of change of variables. For instance, the eigenvectors of 𝗔*

_{s}*are called instantaneous optimals and define an orthogonal set of initial states that optimize the instantaneous rate of change of*

_{s}**x̃**

^{H}

**x̃**(see DelSole 2004 for review). In the present situation, however, the initial condition plays no role in the predictability because all unpredictability arises from stochastic forcing (i.e., the initial condition is assumed to be known perfectly). The relevance of the operator

*d*

**Σ̃**

_{τ}/

*dτ*=

*τ*= 0. Furthermore, the Lyapunov equation 𝗔

**Σ**

_{∞}+

**Σ**

_{∞}𝗔

*+ 𝗤 = 0 implies*

^{H}^{H}), where we have used the fact that

**Σ̃**

_{∞}= 𝗜, by definition. Combining these results implies that

*is the rate of change of the whitened forecast covariance matrix at*

_{s}*τ*= 0. Interestingly, we would obtain the same rate of change if the original dynamical operator had been 𝗔

*. It follows that the upper bound (66) can be derived by replacing the original dynamical operator by the normal dynamical operator 𝗔*

_{s}_{s}, which gives precisely the same rate of predictability loss at the initial time as the original system, then integrating the predictability of the system over all times.

*λ*(𝗔𝗕) =

_{k}*λ*(𝗕𝗔) for any two matrices 𝗔 and 𝗕. Using this identity, the upper bound (66) becomes

_{k}*K*gives the conjectural upper bound

## 10. APT as a solution to an alternative stochastic model

A useful concept in the application of information theory predictability measures to normally distributed variables is that techniques for analysis of predictability are often equivalent to techniques for analysis of variance to applied whitened variables (Schneider and Griffies 1999; DelSole and Tippett 2007). Here we show how APT fits in this framework. Generalized stability analysis relates the variance of linear dynamics forced by homogeneous isotropic stochastic forcing to the stability properties of the linear dynamics (Farrell and Ioannou 1996; Tippett and Marchesin 1999). Here we show that the variance of *whitened* linear dynamics excited by homogeneous isotropic stochastic forcing is related to the predictability of the system as measured by APT.

*K*. This equivalence implies that the APT can be interpreted as the “total variance” of a stochastic model with dynamical operator

**Σ**

*is the stationary covariance matrix of*

_{y}**y**obtained from the stochastic model

**v**is a Gaussian white noise process with zero mean and covariance matrix 𝗜. The fact that APT equals the total variance of a stochastic model implies that the intuition developed about total variance produced by stochastic models can be applied directly to APT. As an example, a system with a nonnormal dynamical operator generates more variance than a system with a normal dynamical operator with the same eigenvalues (Ioannou 1995). Applying this theorem to the stochastic model (75) immediately implies that nonnormality of

**Σ**

*derived from the stochastic model (75) satisfies the Lyapunov equation*

_{y}## 11. Example

*c*is a parameter measuring the coupling between the two components. Because 𝗔 is upper triangular, its eigenvalues are simply the diagonal elements −1/3 and −1. The negative eigenvalues imply that the dynamical operator is stable and that the stochastic model gives statistically stationary solutions. The predictability of the system is minimized and achieves the lower bound (63) when there is no coupling between the two components of the system (i.e., when

*c*= 0). The time-dependent predictability for the case

*c*= 1 and

*c*= 4 are shown in the top panels of Fig. 3. Also shown are the upper and lower bounds. As expected, as the coupling parameter increases, so too does the predictability, at all lead times. For

*c*= 1 the APT is

*S*= 2.54; for

*c*= 4 the APT is

*S*= 3.35. For comparison, the conjectured upper bound (72) is 3.5, whereas the lower bound is 2. In general, predictability is an increasing function of coupling parameter

*c*. The two upper bounds provide tight constraints in different extremes—the bound based on instantaneous optimals performs well at small lead times, whereas the conjectured bound performs well at long lead times. The lower panels show the average predictability time as a function of coupling parameter

*c*. The upper bound based on instantaneous optimals greatly overestimates the APT at large values of the coupling parameter. A good upper bound is the minimum of the two upper bounds.

## 12. Summary and discussion

This paper introduced average predictability time (APT) for characterizing the overall predictability of a system. The APT is defined as the integral of the Mahalanobis signal with respect to lead time. As such, APT measures an inherent property of a system that is independent of the basis set used to represent the system. The appropriateness of APT was illustrated with a one-dimensional stochastic model in which the APT depends inversely on the damping rate, consistent with the intuition that systems with stronger damping have less memory and hence are less predictable. Furthermore, the APT is comparable to integral time scales in appropriate cases. However, if the autocorrelation is a damped oscillation, then the integral time scale defined in (21) becomes arbitrarily small for large frequencies, which is unrealistic, whereas APT is always within 50% of the *e*-folding time. Thus, the APT provides a more suitable measure of time scale than the integral time scale (21). For nonlinear or non-Gaussian processes, the Mahalanobis signal may decay very differently from that of linear stochastic systems (e.g., it may exhibit long tails). Nevertheless, just as the correlation coefficient has meaning even if the variables are not linearly related, the APT still has meaning even if it cannot be interpreted directly as a time scale.

The APT also clarifies the connection between predictability and power spectra. Specifically, if a process is stationary, then the APT of a linear regression forecast is proportional to the integral of the square of the normalized power spectrum. The appropriateness of this relation can be seen from the fact that the latter quantity can be decreased simply by replacing the power in any spectral band by its mean value within the band; that is, the quantity decreases as the spectrum becomes flatter. If the time series is band-limited, then the minimum value occurs when the system comprises a set of independent and identically distributed white noise processes, which is obviously the least predictable system. In essence, predictability is related to the width of spectral peaks, with strong narrow peaks associated with high predictability and nearly flat spectra associated with low predictability. As extreme examples, white noise has a constant power spectrum and is minimally predictable, whereas a sine wave has a delta-function power spectrum and is perfectly predictable. Expressing the APT in terms of the power spectra rigorously quantifies this intuitive relation.

Closed form expressions for the APT of linear, multivariate stochastic models were derived. If the dynamical operator and noise covariance matrix are both diagonal, then—remarkably—the APT depends only on the damping rates and equals the average APT of the individual eigenmodes. Because the APT is invariant to linear transformation, this result is true for any system for which there exists a basis set in which both the dynamical operator and noise covariance matrix are diagonal. We further show that this particular system minimizes the APT compared with all linear stochastic models with the same dynamical operator. Thus, the least predictable system can be transformed into a set of uncoupled, independent stochastic models. It follows that systems that are irreducibly coupled have more predictability than those that are fundamentally uncoupled. Simply put, coupling enhances predictability. APT rigorously justifies this intuitive notion.

As reviewed in DelSole and Tippett (2007), applying the whitening transformation to a system allows familiar concepts from analysis of variance and generalized stability analysis to be applied directly to predictability analysis. We show that this equivalence carries over to APT because, surprisingly, APT itself can be interpreted as the total variance of an alternative stochastic model. The alternative stochastic model is driven by homogeneous white noise and has a dynamical operator equal to the whitened dynamical operator of the original stochastic model. This connection allows one to anticipate that the lower bound for APT occurs when the whitened dynamical operator is normal, which in turn is equivalent to the condition that there exists a basis set in which both the dynamical operator and noise covariance matrix are diagonal—consistent with the lower bound of APT discussed above.

The remarkable equivalence noted above is worth further reflection. Loosely speaking, one way to understand the stability of a dynamical system is to examine its response to white noise forcing. The more variance produced, the less stable the system. This approach is the basis of generalized stability theory, which in turn utilizes concepts from dynamical stability theory. In defining APT, we proposed something apparently different: we proposed that the overall predictability of a system can be quantified by the integral of predictability with respect to lead time, reasoning that more predictable systems have larger integrals. Surprisingly, the integral of predictability is itself the total variance of the system consisting of the whitened dynamical operator forced by white noise. APT measures the stability of the whitened dynamics. Thus, the two methods for understanding predictability turn out to be fundamentally equivalent. An important benefit in this equivalence, however, is that the whitened dynamical operator is unique up to a variance preserving unitary transformation. Thus, the framework proposed in this paper imposes a particular stochastic model for studying predictability, which is essential for applying generalized stability theory. In contrast, the latter theory does not impose the norm or coordinate system; rather, it provides a framework for understanding variance after the norm and coordinate system have been chosen.

An upper bound on the APT of linear stochastic models also was derived. The upper bound is proportional to the sum of inverse eigenvalues of the symmetric part of the whitened dynamical operator. The latter operator arises frequently in the investigation of instantaneous growth rates of variance. It is interesting that the upper bound on APT (or the predictability at any lead time) is related to the predictability at very short time scales in linear stochastic models. The upper bounds imply that as the short time predictability decreases, so too does the maximum possible APT. This intuitive notion is used widely in predictability studies as a justification for drawing conclusions about overall predictability from the characteristics of error growth at short times. Two key points about this relation are worth emphasizing. First, although intuitive, we are aware of no rigorous demonstration of a connection between predictability at short times and overall predictability. Thus, our result provides such a demonstration, at least for linear stochastic models. Second, the connection between short time error growth and APT is most direct when the error growth is measured in whitened space. In other words, our result clarifies the existence of a preferred norm for relating predictability at short and long lead times. Attempts to draw conclusions about long-term predictability from short-term error growth in other norms may actually be misleading (see DelSole and Tippett 2008).

A conjecture for an upper bound for APT follows from the conjecture of Tippett and Chang (2003), which states that out of all linear stochastic models with the same dynamical eigenvalues, the model with rank-1 noise covariance matrix (with nonzero diagonal elements) has maximum predictability. This conjecture can be motivated by the fact that if minimum predictability occurs when all eigenmodes are forced independently, then perhaps maximum predictability occurs when all eigenmodes are forced in perfect correlation. The resulting upper bound depends only on the eigenmode damping rates; in particular, the upper bound is independent of the detailed structure of the forcing, provided all modes are excited.

It should be recognized that APT can be applied to more general forecast models than linear stochastic models. For instance, APT can be evaluated for regression models with physically different variables for predictors and predictands. Interestingly, if only a single variable is predicted, but more than one predictor is used, then the APT relation (17) still is applicable, but the parameter *ρ _{τ}* denotes the

*multiple correlation*. This result is discussed further in Part II of this paper.

We note that other measures of predictability can be used to define APT, and these alternative measures may be attractive in some cases. Mutual information in particular is appropriate for non-Gaussian or nonlinear systems. Furthermore, mutual information is integrable for linear stochastic systems, even though it is unbounded as lead time approaches zero for perfect initial conditions. For linear stochastic models with Gaussian white noise, the APT derived from mutual information turns out to be proportional to the APT (42) for normal whitened dynamical operators. However, the relation between this alternative form of APT and power spectra is obscure, and the decomposition of this form of APT is not straightforward.

In climate systems, the APT of different components can vary greatly owing to the widely different time scales associated with land, ice, and ocean processes. Thus, characterizing the predictability of the full climate system by a single number might seem to be a gross simplification. However, the APT can be decomposed into uncorrelated components that can be ordered by their fractional contribution to APT, such that the first component maximizes APT, the second maximizes APT subject to being uncorrelated with the first, and so on. This decomposition therefore allows the full spectrum of predictability times to be diagnosed. This decomposition can be used to study predictability on different time scales without time filtering, provided the predictabilities on different time scales are characterized by different spatial structures. This decomposition and its practical implementation are discussed in Part II.

## Acknowledgments

We thank three anonymous reviewers for detailed comments that lead to an improved manuscript. This research was supported by the National Science Foundation (ATM0332910), the National Aeronautics and Space Administration (NNG04GG46G), and the National Oceanic and Atmospheric Administration (NA04OAR4310034). MKT is supported by a grant/cooperative agreement from the National Oceanic and Atmospheric Administration (NA05OAR4311004). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

## REFERENCES

Cohen, J. E., 1988: Spectral inequalities for matrix exponentials.

,*Linear Algebra Appl.***111****,**25–28.DelSole, T., 2001: Optimally persistent patterns in time-varying fields.

,*J. Atmos. Sci.***58****,**1341–1356.DelSole, T., 2004: The necessity of instantaneous optimals in stationary turbulence.

,*J. Atmos. Sci.***61****,**1086–1091.DelSole, T., and M. K. Tippett, 2007: Predictability: Recent insights from information theory.

,*Rev. Geophys.***45****,**RG4002. doi:10.1029/2006RG000202.DelSole, T., and M. K. Tippett, 2008: Predictable components and singular vectors.

,*J. Atmos. Sci.***65****,**1666–1678.DelSole, T., and M. K. Tippett, 2009: Average predictability time. Part II: Seamless diagnoses of predictability on multiple time scales.

,*J. Atmos. Sci.***66****,**1188–1204.Farrell, B. F., and P. J. Ioannou, 1996: Generalized stability theory. Part I: Autonomous operators.

,*J. Atmos. Sci.***53****,**2025–2040.Fukunaga, K., 1990:

*Introduction to Statistical Pattern Recognition*. 2nd ed. Academic Press, 591 pp.Gelb, A., Ed.,. 1974:

*Applied Optimal Estimation*. MIT Press, 382 pp.Horn, R. A., and C. R. Johnson, 1999:

*Topics in Matrix Analysis*. Cambridge University Press, 607 pp.Ioannou, P. J., 1995: Nonnormality increases variance.

,*J. Atmos. Sci.***52****,**1155–1158.Julian, P. R., and R. M. Chervin, 1978: A study of the Southern Oscillation and Walker Circulation phenomenon.

,*Mon. Wea. Rev.***106****,**1433–1451.Kleeman, R., 2002: Measuring dynamical prediction utility using relative entropy.

,*J. Atmos. Sci.***59****,**2057–2072.Kwon, W. H., Y. S. Moon, and S. C. Ahn, 1996: Bounds in algebraic Riccati and Lyapunov equations: A survey and some new results.

,*Int. J. Control***64****,**377–389.Lancaster, P., and M. Tismenetsky, 1985:

*The Theory of Matrices: With Applications*. Academic Press, 570 pp.Leung, L-Y., and G. R. North, 1990: Information theory and climate prediction.

,*J. Climate***3****,**5–14.Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion.

,*Tellus***21****,**289–307.Majda, A., R. Kleeman, and D. Cai, 2002: A mathematical framework for quantifying predictability through relative entropy.

,*Methods Appl. Anal.***9****,**425–444.Priestley, M. B., 1981:

*Spectral Analysis and Time Series*. Academic Press, 890 pp.Schneider, T., and S. Griffies, 1999: A conceptual framework for predictability studies.

,*J. Climate***12****,**3133–3155.Tippett, M. K., and D. Marchesin, 1999: Upper bounds for the solution of the discrete algebraic Lyapunov equation.

,*Automatica***35****,**1485–1489.Tippett, M. K., and P. Chang, 2003: Some theoretical considerations on predictability of linear stochastic dynamics.

,*Tellus***55****,**148–157.Weiss, J. B., 2003: Coordinate invariance in stochastic dynamical systems.

,*Tellus***55A****,**208–218.

## APPENDIX

### A Cauchy–Schwartz Inequality

**x**and

**y**

**x**=

*α*

**y**for some scalar

*α*(we use the notation ||

**x**||

^{2}=

**x**

^{H}**x**). The Cauchy–Schwartz inequality can be written equivalently as

*X*with the whitened power spectrum matrix

*ω*index represent equally spaced frequencies between −

*π*and

*π*, where the number of frequencies increases indefinitely. To the extent that the resulting sum can be interpreted as a Riemann integral, the Cauchy–Schwartz inequality becomes

*P̃*is proportional to (A3), which corresponds to (61).

_{ijω}