1. Introduction
This paper proposes a new measure of the overall predictability of a system, independent of lead time. Such a measure allows one system to be characterized as “more predictable” than another. One motivation for this new measure is that traditional measures of the overall predictability of a system often have no obvious generalization to multivariate systems. For instance, the limit of predictability, as defined by Lorenz (1969), is the time beyond which the mean square error exceeds a predefined threshold. However, mean square error depends on the coordinate system used to represent the state and is problematic if different variables with different units and natural variances are mixed. Alternatively, an integral time scale, defined as the integral of some moment of the autocorrelation function with respect to lag, is used often in turbulence studies. However, the autocorrelation function is even more restrictive than mean square error in that it is defined only for a single time series. The predictability time scale of a process is sometimes suggested to be related to the peak of its power spectrum. As an example, ENSO has a spectral peak around 4 yr (Julian and Chervin 1978), which under this assumption implies predictability for about 4 yr. However, real-time forecasts of ENSO demonstrate little skill beyond 1 yr, suggesting an inconsistency. Moreover, even if a reasonable relation between predictability and power spectra could be ascertained for the univariate case, the generalization to multivariate systems would not necessarily be straightforward.
Our proposal for overcoming these limitations has two key elements. The first is adoption of a measure of predictability that is applicable to multivariate systems and invariant to linear transformation of the variables—measures satisfying these properties have been proposed by Leung and North (1990), Schneider and Griffies (1999), and Kleeman (2002) (see DelSole and Tippett 2007 for a review)—and the second is the integration of this measure of predictability with respect to all lead times. The resulting measure has several attractive properties. First, the measure is consistent, in the sense that applying it to the same system represented using a different basis set gives the same result. Second, if the predictability decay follows the same form in different systems, then systems that are predictable at longer lead times will have a larger measure. We restrict our attention to predictability averaged over all initial conditions and call the resulting integral the average predictability time (APT). The APT may not be interpretable as a time scale if the decay of predictability is pathological, but nonetheless the integral still gives useful information about overall predictability.
When only predictability averaged over initial conditions is considered, two natural measures satisfying the above properties are mutual information and the Mahalanobis metric. Unfortunately, mutual information depends on the complete forecast and climatological distributions and is therefore difficult to estimate from finite time series. Even in the case of normally distributed variables, mutual information is difficult to integrate with respect to lead time. In contrast, the Mahalanobis metric depends only on second-order moments and is easier to integrate. Furthermore, it turns out that choosing the Mahalanobis metric as a basis for APT has several other attractive properties. Specifically, the resulting APT can be expressed in terms of power spectra, thus clarifying the connection between predictability and power spectra. This point is demonstrated in the present paper. In addition, the resulting APT can be decomposed into independent components that optimize it, analogous to the way that principal component analysis decomposes variance. This decomposition clarifies the usefulness of APT even if the system is characterized by a wide range of time scales—the decomposition separates different components according to their APT, allowing the full spectrum of APTs to be diagnosed. This decomposition will be derived and illustrated with meteorological data in Delsole and Tippett (2009, hereafter Part II).
The present paper may be summarized briefly as follows: In section 2, we review the measure on which the APT will be based, namely the Mahalanobis metric. We then show in section 3 that the resulting APT gives sensible results for one-dimensional stochastic models; that is, the APT is inversely proportional to the damping rate and comparable to traditional integral time scales in appropriate cases. On the other hand, we show that the APT gives clearly superior results compared to a widely used integral time scale. Next we discuss the relation between predictability and power spectra. Specifically, we show that the APT equals the integral of the squared modulus of the power spectrum and explain how APT depends on the shape of the power spectrum. In section 6 we evaluate the APT for autonomous, linear stochastic models excited by Gaussian white noise. In addition, we show how the multivariate expressions parallel the univariate results if the variables are transformed appropriately. In section 7, we show that the APT for a set of independent, uncoupled stochastic models equals the average APT of the individual systems. In section 8 we derive the relation between APT and multivariate power spectra and show that the minimum APT occurs when the system is a set of independent, uncoupled white noise processes, consistent with intuition. Bounds on the APT of linear stochastic models are derived in section 9 and a surprising interpretation of APT is discussed in section 10. The bounds are illustrated in section 11 with a simple stochastic model. We conclude with a summary and discussion of results.
2. Definition of average predictability time
Consider a system whose state vector is the K-dimensional vector x. Let the forecast distribution at a fixed lead time τ have covariance matrix Στ. This covariance matrix quantifies the uncertainty in the forecast as a function of lead time. If the system is stationary and the forecast is independent of the initial condition at asymptotically long lead times, then the climatological distribution can be identified with the forecast distribution in the limit of large lead time (DelSole and Tippett 2007), namely Σ∞.
3. One-dimensional case
4. Connection between APT and autocorrelation
5. Connection between APT and power spectrum
We now derive the relation between APT and power spectra. Certain basic points about the relation between predictability and the power spectrum can be inferred by considering two extreme cases. First, white noise can be considered to be the least predictable process because its value at one time is completely independent of its value at any other time. The power spectrum of white noise is constant, or “flat.” Conversely, consider a perfect sine wave. A sine wave is perfectly predictable because the value at a finite number of times can be used to specify the value at all other times. The power spectrum of a sine wave is a delta function. These considerations suggest that strongly peaked power spectra correspond to highly predictable time series, whereas flat or near-flat power spectra correspond to weakly predictable time series.
It is perhaps worth mentioning that the above definition of peakiness is not the only one possible. For instance, the Burg entropy is defined as the integral of the log of the power spectrum, and it is known (Priestley 1981, p. 604) that its minimum value, for all power spectra with the same total power, is achieved when the spectrum is a constant (i.e., flat). We propose (24) as a measure of flatness because it has certain practical and theoretical advantages compared to other definitions; that is, it can be estimated from only second-order moments and can be decomposed into components ordered by their APT (the latter point will demonstrated in Part II).
These considerations clarify the error in relating predictability to the location of spectral peaks. Returning to the ENSO example mentioned in the introduction, we note that the Niño-3 index is tolerably fit by the autocorrelation function (19) with a−1 = 16 months and ω0−1 = 48/(2π) months, as shown in Fig. 2 (We do not recommend this fitting procedure in general; it is used here primarily for illustration purposes.). These parameter values give S = 9.5 months, which differs considerably from the period of the spectral peak, which is 2π/ω0 = 48 months. In the limit of large ω0, there is only a modest effect on S, S → −1/(2a) = 8. Thus, the location of the peak is almost irrelevant to the predictability time scale. The reason is that predictability depends on the peakiness of the power spectrum, which is more strongly controlled by the damping time −1/a, rather than on the location of the peak itself (ω0).
6. Multivariate linear stochastic models

7. Normal case



8. Relation between the power spectrum and predictability


9. Bounds on APT









10. APT as a solution to an alternative stochastic model
A useful concept in the application of information theory predictability measures to normally distributed variables is that techniques for analysis of predictability are often equivalent to techniques for analysis of variance to applied whitened variables (Schneider and Griffies 1999; DelSole and Tippett 2007). Here we show how APT fits in this framework. Generalized stability analysis relates the variance of linear dynamics forced by homogeneous isotropic stochastic forcing to the stability properties of the linear dynamics (Farrell and Ioannou 1996; Tippett and Marchesin 1999). Here we show that the variance of whitened linear dynamics excited by homogeneous isotropic stochastic forcing is related to the predictability of the system as measured by APT.



11. Example
12. Summary and discussion
This paper introduced average predictability time (APT) for characterizing the overall predictability of a system. The APT is defined as the integral of the Mahalanobis signal with respect to lead time. As such, APT measures an inherent property of a system that is independent of the basis set used to represent the system. The appropriateness of APT was illustrated with a one-dimensional stochastic model in which the APT depends inversely on the damping rate, consistent with the intuition that systems with stronger damping have less memory and hence are less predictable. Furthermore, the APT is comparable to integral time scales in appropriate cases. However, if the autocorrelation is a damped oscillation, then the integral time scale defined in (21) becomes arbitrarily small for large frequencies, which is unrealistic, whereas APT is always within 50% of the e-folding time. Thus, the APT provides a more suitable measure of time scale than the integral time scale (21). For nonlinear or non-Gaussian processes, the Mahalanobis signal may decay very differently from that of linear stochastic systems (e.g., it may exhibit long tails). Nevertheless, just as the correlation coefficient has meaning even if the variables are not linearly related, the APT still has meaning even if it cannot be interpreted directly as a time scale.
The APT also clarifies the connection between predictability and power spectra. Specifically, if a process is stationary, then the APT of a linear regression forecast is proportional to the integral of the square of the normalized power spectrum. The appropriateness of this relation can be seen from the fact that the latter quantity can be decreased simply by replacing the power in any spectral band by its mean value within the band; that is, the quantity decreases as the spectrum becomes flatter. If the time series is band-limited, then the minimum value occurs when the system comprises a set of independent and identically distributed white noise processes, which is obviously the least predictable system. In essence, predictability is related to the width of spectral peaks, with strong narrow peaks associated with high predictability and nearly flat spectra associated with low predictability. As extreme examples, white noise has a constant power spectrum and is minimally predictable, whereas a sine wave has a delta-function power spectrum and is perfectly predictable. Expressing the APT in terms of the power spectra rigorously quantifies this intuitive relation.
Closed form expressions for the APT of linear, multivariate stochastic models were derived. If the dynamical operator and noise covariance matrix are both diagonal, then—remarkably—the APT depends only on the damping rates and equals the average APT of the individual eigenmodes. Because the APT is invariant to linear transformation, this result is true for any system for which there exists a basis set in which both the dynamical operator and noise covariance matrix are diagonal. We further show that this particular system minimizes the APT compared with all linear stochastic models with the same dynamical operator. Thus, the least predictable system can be transformed into a set of uncoupled, independent stochastic models. It follows that systems that are irreducibly coupled have more predictability than those that are fundamentally uncoupled. Simply put, coupling enhances predictability. APT rigorously justifies this intuitive notion.
As reviewed in DelSole and Tippett (2007), applying the whitening transformation to a system allows familiar concepts from analysis of variance and generalized stability analysis to be applied directly to predictability analysis. We show that this equivalence carries over to APT because, surprisingly, APT itself can be interpreted as the total variance of an alternative stochastic model. The alternative stochastic model is driven by homogeneous white noise and has a dynamical operator equal to the whitened dynamical operator of the original stochastic model. This connection allows one to anticipate that the lower bound for APT occurs when the whitened dynamical operator is normal, which in turn is equivalent to the condition that there exists a basis set in which both the dynamical operator and noise covariance matrix are diagonal—consistent with the lower bound of APT discussed above.
The remarkable equivalence noted above is worth further reflection. Loosely speaking, one way to understand the stability of a dynamical system is to examine its response to white noise forcing. The more variance produced, the less stable the system. This approach is the basis of generalized stability theory, which in turn utilizes concepts from dynamical stability theory. In defining APT, we proposed something apparently different: we proposed that the overall predictability of a system can be quantified by the integral of predictability with respect to lead time, reasoning that more predictable systems have larger integrals. Surprisingly, the integral of predictability is itself the total variance of the system consisting of the whitened dynamical operator forced by white noise. APT measures the stability of the whitened dynamics. Thus, the two methods for understanding predictability turn out to be fundamentally equivalent. An important benefit in this equivalence, however, is that the whitened dynamical operator is unique up to a variance preserving unitary transformation. Thus, the framework proposed in this paper imposes a particular stochastic model for studying predictability, which is essential for applying generalized stability theory. In contrast, the latter theory does not impose the norm or coordinate system; rather, it provides a framework for understanding variance after the norm and coordinate system have been chosen.
An upper bound on the APT of linear stochastic models also was derived. The upper bound is proportional to the sum of inverse eigenvalues of the symmetric part of the whitened dynamical operator. The latter operator arises frequently in the investigation of instantaneous growth rates of variance. It is interesting that the upper bound on APT (or the predictability at any lead time) is related to the predictability at very short time scales in linear stochastic models. The upper bounds imply that as the short time predictability decreases, so too does the maximum possible APT. This intuitive notion is used widely in predictability studies as a justification for drawing conclusions about overall predictability from the characteristics of error growth at short times. Two key points about this relation are worth emphasizing. First, although intuitive, we are aware of no rigorous demonstration of a connection between predictability at short times and overall predictability. Thus, our result provides such a demonstration, at least for linear stochastic models. Second, the connection between short time error growth and APT is most direct when the error growth is measured in whitened space. In other words, our result clarifies the existence of a preferred norm for relating predictability at short and long lead times. Attempts to draw conclusions about long-term predictability from short-term error growth in other norms may actually be misleading (see DelSole and Tippett 2008).
A conjecture for an upper bound for APT follows from the conjecture of Tippett and Chang (2003), which states that out of all linear stochastic models with the same dynamical eigenvalues, the model with rank-1 noise covariance matrix (with nonzero diagonal elements) has maximum predictability. This conjecture can be motivated by the fact that if minimum predictability occurs when all eigenmodes are forced independently, then perhaps maximum predictability occurs when all eigenmodes are forced in perfect correlation. The resulting upper bound depends only on the eigenmode damping rates; in particular, the upper bound is independent of the detailed structure of the forcing, provided all modes are excited.
It should be recognized that APT can be applied to more general forecast models than linear stochastic models. For instance, APT can be evaluated for regression models with physically different variables for predictors and predictands. Interestingly, if only a single variable is predicted, but more than one predictor is used, then the APT relation (17) still is applicable, but the parameter ρτ denotes the multiple correlation. This result is discussed further in Part II of this paper.
We note that other measures of predictability can be used to define APT, and these alternative measures may be attractive in some cases. Mutual information in particular is appropriate for non-Gaussian or nonlinear systems. Furthermore, mutual information is integrable for linear stochastic systems, even though it is unbounded as lead time approaches zero for perfect initial conditions. For linear stochastic models with Gaussian white noise, the APT derived from mutual information turns out to be proportional to the APT (42) for normal whitened dynamical operators. However, the relation between this alternative form of APT and power spectra is obscure, and the decomposition of this form of APT is not straightforward.
In climate systems, the APT of different components can vary greatly owing to the widely different time scales associated with land, ice, and ocean processes. Thus, characterizing the predictability of the full climate system by a single number might seem to be a gross simplification. However, the APT can be decomposed into uncorrelated components that can be ordered by their fractional contribution to APT, such that the first component maximizes APT, the second maximizes APT subject to being uncorrelated with the first, and so on. This decomposition therefore allows the full spectrum of predictability times to be diagnosed. This decomposition can be used to study predictability on different time scales without time filtering, provided the predictabilities on different time scales are characterized by different spatial structures. This decomposition and its practical implementation are discussed in Part II.
Acknowledgments
We thank three anonymous reviewers for detailed comments that lead to an improved manuscript. This research was supported by the National Science Foundation (ATM0332910), the National Aeronautics and Space Administration (NNG04GG46G), and the National Oceanic and Atmospheric Administration (NA04OAR4310034). MKT is supported by a grant/cooperative agreement from the National Oceanic and Atmospheric Administration (NA05OAR4311004). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.
REFERENCES
Cohen, J. E., 1988: Spectral inequalities for matrix exponentials. Linear Algebra Appl., 111 , 25–28.
DelSole, T., 2001: Optimally persistent patterns in time-varying fields. J. Atmos. Sci., 58 , 1341–1356.
DelSole, T., 2004: The necessity of instantaneous optimals in stationary turbulence. J. Atmos. Sci., 61 , 1086–1091.
DelSole, T., and M. K. Tippett, 2007: Predictability: Recent insights from information theory. Rev. Geophys., 45 , RG4002. doi:10.1029/2006RG000202.
DelSole, T., and M. K. Tippett, 2008: Predictable components and singular vectors. J. Atmos. Sci., 65 , 1666–1678.
DelSole, T., and M. K. Tippett, 2009: Average predictability time. Part II: Seamless diagnoses of predictability on multiple time scales. J. Atmos. Sci., 66 , 1188–1204.
Farrell, B. F., and P. J. Ioannou, 1996: Generalized stability theory. Part I: Autonomous operators. J. Atmos. Sci., 53 , 2025–2040.
Fukunaga, K., 1990: Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press, 591 pp.
Gelb, A., Ed.,. 1974: Applied Optimal Estimation. MIT Press, 382 pp.
Horn, R. A., and C. R. Johnson, 1999: Topics in Matrix Analysis. Cambridge University Press, 607 pp.
Ioannou, P. J., 1995: Nonnormality increases variance. J. Atmos. Sci., 52 , 1155–1158.
Julian, P. R., and R. M. Chervin, 1978: A study of the Southern Oscillation and Walker Circulation phenomenon. Mon. Wea. Rev., 106 , 1433–1451.
Kleeman, R., 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci., 59 , 2057–2072.
Kwon, W. H., Y. S. Moon, and S. C. Ahn, 1996: Bounds in algebraic Riccati and Lyapunov equations: A survey and some new results. Int. J. Control, 64 , 377–389.
Lancaster, P., and M. Tismenetsky, 1985: The Theory of Matrices: With Applications. Academic Press, 570 pp.
Leung, L-Y., and G. R. North, 1990: Information theory and climate prediction. J. Climate, 3 , 5–14.
Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 , 289–307.
Majda, A., R. Kleeman, and D. Cai, 2002: A mathematical framework for quantifying predictability through relative entropy. Methods Appl. Anal., 9 , 425–444.
Priestley, M. B., 1981: Spectral Analysis and Time Series. Academic Press, 890 pp.
Schneider, T., and S. Griffies, 1999: A conceptual framework for predictability studies. J. Climate, 12 , 3133–3155.
Tippett, M. K., and D. Marchesin, 1999: Upper bounds for the solution of the discrete algebraic Lyapunov equation. Automatica, 35 , 1485–1489.
Tippett, M. K., and P. Chang, 2003: Some theoretical considerations on predictability of linear stochastic dynamics. Tellus, 55 , 148–157.
Weiss, J. B., 2003: Coordinate invariance in stochastic dynamical systems. Tellus, 55A , 208–218.
APPENDIX
A Cauchy–Schwartz Inequality
(top) Autocorrelation function and (bottom) corresponding power spectrum of the damped oscillation function (19), for the values of the parameters indicated in the figures.
Citation: Journal of the Atmospheric Sciences 66, 5; 10.1175/2008JAS2868.1
Autocorrelation function of the Niño-3 index (downloaded from http://www.cpc.noaa.gov/data/indices/) during the period 1950–2007 (histogram), and a fit of the autocorrelation to (19) (solid line). The horizontal dashes indicate the 5% significance thresholds of the correlation coefficient.
Citation: Journal of the Atmospheric Sciences 66, 5; 10.1175/2008JAS2868.1
(a) Mahalanobis signal (solid) and upper bound (66) (dashed) as a function of lead time τ for c = 1 (dark) and c = 4 (light) in the 2 × 2 example. The lower bound (63) (lower filled circles) and upper bound conjecture (72) (upper filled circles) are independent of c. (b) The integrated Mahalanobis signal (solid), lower bound (62) (lower filled circles), upper bound conjecture (72) (upper filled circles) and upper bound (65) (dashed) as a function of the coupling parameter c.
Citation: Journal of the Atmospheric Sciences 66, 5; 10.1175/2008JAS2868.1