This paper reexamines a procedure proposed by Shukla and Gutzler for estimating potential seasonal predictability. Certain subtle and unverified assumptions required for the method to work are clarified, and Monte Carlo experiments are used to demonstrate that these assumptions are adequate even for autocorrelated processes in typical applications, provided the effective time scale T0 of the stochastic process is known. This paper highlights the fact that the time scale T0 is difficult to estimate reliably (as noted in other papers) and can be biased by an order of magnitude. This bias can seriously compromise the reliability of the Shukla–Gutzler method.
Variability arising from the chaotic, turbulent nature of the atmospheric system irrespective of changes in radiative forcing or boundary conditions is often called weather noise. It is generally believed that weather noise is predictable on time scales less than a month (Lorenz 1963; Shukla 1981). Beyond a month, slowly varying components of the climate system, such as sea surface temperature, soil moisture, or sea ice thickness, are predictable owing to their longer memory. If these slower components influence the atmosphere, then they can give rise to atmospheric predictability beyond a month. Such predictability arising from “boundary forcings” is called potential predictability. The degree to which variability of seasonal means exceeds variability due to weather noise determines the degree of potential predictability.
There are two fundamentally different ways of estimating potential predictability. The first is to perform idealized experiments with a general circulation model, as pioneered by Chervin (1986) and continuing to the present day (Zwiers 1987; Kumar and Hoerling 1995; Phelps et al. 2004, to name a few). Of course, this approach is limited by the ability of the dynamical model to capture all dynamical processes that are relevant to the potential predictability. The second is to estimate potential predictability directly from observational data, as pioneered by Madden (1976) and continuing to the present day (Zheng et al. 2000; Feng et al. 2011, 2012). Of course, the empirical approach is limited by the statistical assumptions invoked to separate weather noise from boundary forced variability. A comparison between various empirical estimates of potential predictability has been conducted recently by Feng et al. (2012, manuscript submitted to J. Geophys. Res.). In the course of this comparison, an error in the method proposed by Shukla and Gutzler (1983) was discovered. The purpose of this paper is to correct this error, and to clarify certain aspects of the method proposed by Shukla and Gutzler (1983).
In this section we summarize the mathematical basis for potential predictability and discuss a significance test for potential predictability that serves as the foundation of the Shukla–Gutzler method. Let X be a stationary stochastic process with mean μX and variance . We call this process “weather noise.” The observed variability is assumed to contain an additional component of variability that is independent of, and varies on much longer time scales than, the stochastic process X. This additional component of variability is associated with boundary forcing. Let different boundary forcings be identified with the indices b = 1, 2, … , B. In the context of seasonal predictability, boundary forcing is often identified with the state of the sea surface temperature, in which case the index b is effectively the “year.” The above assumptions imply that the observed variable O on the dth day can be modeled by
where Xd,b denotes the stochastic process X, but with an extra index b to indicate an independent realization of X for each value of b. More precisely, we assume that Xd,b is independent of when b ≠ b′. We further assume that Xd,b is independent of μb, and has a mean, variance, and autocorrelation that are independent of b. The parameter μb is the change in population mean due to boundary forcing and is assumed to be uncorrelated in b. Because the two terms on the right-hand side of (1) are independent, it follows that the variance of daily observations is
In other words, the variance of the observed daily variable can be decomposed into two terms: a boundary forced term var(μb) and a weather noise term .
Model (1) serves as the foundation in many studies for estimating potential predictability based on observations (Madden 1976; Zwiers 1987). Also, the use of additive stochastic forcing to understand and model dynamical systems can be formally justified in certain systems for which there exists a strong separation in time scales (Majda et al. 2001). In the context of seasonal predictability, the time scale of weather variability (e.g., 5 days) is well separated from ENSO variability (e.g., 6 months), suggesting that the model is plausible. Nevertheless, the assumption that weather noise is independent of boundary forcing is not likely to be strictly true in realistic systems; for instance, the variability of weather may vary with sea surface temperature. Studies based on large ensemble sizes find that the probability of extremes can depend on sea surface temperature (Sardeshmukh et al. 2000; Peng and Kumar 2005). Recently, Feng et al. (2012) tested the independence assumption within an analysis of covariance framework and found that the assumption holds well for seasonal predictability. For the purpose of this paper, we proceed under the assumption that (1) is an adequate model of observations and consider a particular estimate of potential predictability when the model is true.
In the context of (1), the observed variable Od,b is said to have no potential predictability when var(μb) = 0. It follows that the hypothesis of no potential predictability is equivalent to the following hypothesis:
The standard procedure for testing hypothesis (3) is analysis of variance (ANOVA), provided that Xd,b in model (1) is uncorrelated in time (i.e., is a white noise process). It is well known that temperature is autocorrelated on daily time scales and consequently does not satisfy the assumptions needed to apply ANOVA. Nevertheless, ANOVA provides the basis for much of potential predictability analysis and, hence, is worth reviewing briefly.
An estimate of the boundary forced term μb is the time mean of Od,b. Given daily values O1,b, O2,b, … , OD,b, where D is the number of days, the time mean is defined as
Assuming Od,b and are independent for b ≠ b′, an unbiased estimate of the variance of time means is
where the “grand mean” is defined as
To test the null hypothesis in (3), we compare the estimated variance of time means in (5) to the variance expected under the assumption that the null hypothesis is true. This alternative estimate is based on the intraseasonal variance defined by
where dofI is the “degrees of freedom” chosen to yield an unbiased estimate of intraseasonal variance. In ANOVA, which assumes Xd,b is white noise, the correct degrees of freedom is
If the null hypothesis in (3) is true and Xd,b is Gaussian white noise, then the statistic
3. Accounting for autocorrelations
To account for autocorrelations in Xd,b, we start with some well-established facts. The time mean of the weather noise is defined as
Let the autocorrelation function of Xd,b be ρτ, where τ is time lag between days. It is well known (Brockwell and Davis 1991, chapter 7) that the time mean is an unbiased estimate of the mean μX with variance:
The T0 often is called the effective time scale of the stochastic process. No b index appears on the right-hand side of (11) because the statistics of X are independent of b. The governing equation for the time mean can be derived from (1) as
Since the two terms on the right-hand side of (13) are independent, it follows that
Thus, given the above assumptions, the variance of time means can be decomposed into two terms: a boundary forced term var(μb) and a weather noise term .
If we were to consider a single realization of the weather noise process, denoted Xd, it is well known (see, e.g., Trenberth 1984) that an unbiased estimate of is
where we use the caret symbol ˆ to denote sample quantities. For the more general case of Xd,b, in which different realizations for different b are independent, we have the estimate
In general, we do not observe Xd,b directly. However, it is readily verified from (1) that
Substituting this expression into (16) gives
Comparison with (7) implies that the degrees of freedom required to account for autocorrelations in weather noise is
To derive a test statistic, we compute the ratio of two independent, chi-squared distributed quantities. If the null hypothesis in (3) is true, then according to (13), O and X differ by a constant, in which case . Moreover, different realizations of can be considered to be independent and identically distributed. It follows that the quantity
has a chi-squared distribution with B − 1 degrees of freedom. Unfortunately, the quantity
does not have an exact chi-squared distribution because Xd,b is not white noise (i.e., not independently and identically distributed), nor is it independent of , when Xd,b is autocorrelated (Thiébaux and Zwiers 1984, see their appendix). Nevertheless, for large sample sizes, we assume that this quantity has an approximate chi-squared distribution (this will be checked shortly). A rational approach for choosing the degrees of freedom is to ensure that the mean of θI is consistent with the mean of the chi-square distribution. Recall that the mean of a chi-squared distribution equals its degrees of freedom. Since defined in (18) is unbiased, it follows that θI has a mean of B(D − T0). We therefore assume that θI has an approximate chi-squared distribution with B(D − T0) degrees of freedom. Assuming that θM and θI are approximately independent, and assuming that the null hypothesis (3) is true, then the statistic
has an approximate F distribution with B − 1 and B(D − T0) degrees of freedom.
4. The Shukla–Gutzler method
This choice differs from the proper dofI defined in (19) by a factor of T0, implying that this choice does not produce an unbiased estimate. Shukla and Gutzler (1983) also propose estimating intraseasonal variance using standard formulas [e.g., see their equation for σ2 after their (3)], but the above considerations imply that this estimate will also be biased.
It turns out that the F statistic in (22) corresponds exactly to Eq. (1) of Shukla and Gutzler (1983). Therefore, the error in the degrees of freedom does not affect the value of the F statistic proposed by Shukla and Gutzler (1983), but it does affect the significance level with which the F value is compared. Moreover, since , the incorrect degrees of freedom leads to conservative significance levels. Hence, after correcting the degrees of freedom, we expect to identify more potential predictability.
To assess the various approximations, we perform the following Monte Carlo experiments. First, we generate time series from an autoregressive (AR) model:
where wd,b is Gaussian white noise with zero mean and unit variance. As is well known (Box et al. 2008), the resulting random variable has autocorrelation function ρτ = φτ and variance:
In practice, the initial condition X0,b was chosen to be 0 and the first 100 time steps were discarded to avoid the influence of the initial condition. We choose parameters consistent with Shukla and Gutzler (1983), namely, D = 31 days and B = 15 yr.
Perhaps the most questionable assumption in accounting for autocorrelations discussed in section 3 is that the estimated intraseasonal variability, as measured by θI in (21), has a chi-squared distribution. To assess this assumption, we generated a synthetic time series from the autoregressive model in (24) and used this time series to evaluate θI. This procedure was repeated 1000 times for three separate values of the autoregressive parameter φ. The results of these experiments are shown as histograms in Fig. 1. Superimposed on the histograms are two chi-squared distributions: one with degrees of freedom given by (19), which is dofI = 15 × (31 − T0) (solid curve); and another with degrees of freedom given by (23), which is (dotted curve). In each case, the population value of T0, derived from the population autocorrelation function ρτ = φτ, is used. The figure clearly shows that the degrees of freedom dofI gives a much better match to θI than . There is some evidence that the chi-squared assumption breaks down for large φ, but nevertheless the chi-squared distribution with corrected degrees of freedom gives a better description of the Monte Carlo results than with the chi-squared distribution with Shukla–Gutzler degrees of freedom.
Although the difference between the degrees of freedom in (19) and (23) is large, this fact may not lead to substantially different tests of potential predictability. To understand this, let us take a specific case of φ = 0.7, which corresponds to a time scale of approximately 5 days and degrees of freedom for dofI and of 390 and 78, respectively. Although these values differ by a factor of 5, the F distribution is relatively insensitive to this change in the degree of freedom, because both numbers are relatively large (e.g., above 50). This point is illustrated in Fig. 2, which shows the histogram of the F ratio in (22) superposed with two analytic F distributions, where the first degree of freedom is B − 1 for both, but the second degree of freedom is either dofI (solid curve) or (dotted curve). The figure clearly shows that the F distribution is nearly the same even though the degrees of freedom are very different. Moreover, the standard F distribution matches the Monte Carlo results even for large φ. Thus, despite the large change in degrees of freedom, there is only a slight change in the significance threshold of the F distribution, because of the relatively large sample size. However, the impact of the change of degrees of freedom will be more significant for smaller sample sizes. As a possible example, the error may be important for estimating potential predictability of a short run from a very high-resolution atmospheric general circulation model.
5. Practical issues
The comparisons illustrated in Figs. 1 and 2 show that the theoretical distributions derived in section 3 provide a reasonable description of the variability of a first-order autoregressive process, provided the population parameter T0 is known. Unfortunately, this parameter is unknown in practical applications. One natural idea is to substitute an estimate of the time scale T0 derived from the sample autocorrelation function. When this substitution is done, the histogram of F values derived from the Monte Carlo experiments no longer match the predicted distribution (not shown). This result implies that estimation of the time scale T0 is a critical step in the Shukla–Gutzler method. Thiébaux and Zwiers (1984) examine the problem of estimating T0 and conclude that it “is quite difficult to estimate reliably.” Trenberth (1984) has also shown that, in the absence of potential predictability, improved estimates of the autocorrelation function can be obtained by subtracting the mean of the entire dataset, rather than subtracting a different mean for each subsample (i.e., season). Unfortunately, this approach is problematic if potential predictability exists, for then the potentially predictable signal contaminates the autocorrelation function and leads to overestimates of T0.
We have investigated a variety of approaches to estimating T0, but none of them proved satisfactory. Nevertheless, some robust results were obtained that are worth reporting. These results also can be found in other papers (Thiébaux and Zwiers 1984; Trenberth 1984), but do not appear to be widely appreciated. The above considerations suggest at least four methods for estimating T0:
- ACF-SEAS: Estimate T0 by substituting in (12) the sample autocorrelation function , where the covariances are computed from residuals from the seasonal mean:
- ACF-GRAN: Estimate T0 by substituting in (12) the sample autocorrelation function , where covariances are computed from residuals relative to the grand mean:
The resulting estimates computed from 1000 realizations of the autoregressive process for three different values of the AR parameter are shown in Fig. 3. The figure reveals that estimating the autocorrelation function (ACF) from residuals with respect to the season mean systematically underestimates the true time scale, often by a factor of 3 or more. This bias can be reduced (but not eliminated) by fitting seasonal anomalies to a first-order autoregressive (AR1) model and then determining the time scale of the resulting AR1 model. Similar procedures applied to residuals with respect to the grand mean give even more accurate results, consistent with Trenberth (1984).
To explore the case of potential predictability, we add to Xd,b a signal term of the following form:
where μb is a random variable drawn independently for each b from a normal distribution with zero mean and variance SNR/(1 − φ2), where SNR is a specified signal-to-noise ratio. The corresponding estimated values of T0 are shown in Fig. 4. As anticipated, estimations based on residuals with respect to the grand mean consistently overestimate the true time scale, by as much as a factor of 2–10, depending on how the autocorrelation is estimated. In addition, estimates based on residuals with respect to the seasonal mean systematically underestimate the true time scale, as found in the case with no potential predictability.
The above biases in estimating T0 have important implications on the Shukla–Gutzler method. Specifically, when T0 is underestimated, then the F statistic in (22) is inflated, and the degrees of freedom are overestimated, both of which cause the Shukla–Gutzler method to overstate the degree of predictability. However, overestimation of the degrees of freedom has marginal impact on the significance test, as shown in the previous section. Much more important is the fact that the F statistic is (22) is inversely proportional to T0, so a factor of 2–10 error in T0 leads to a similar error in F, which could easily compromise the significance test. For a perspective on the seriousness of this bias in practical applications, see X. Feng et al. (2011, unpublished manuscript).
6. A spectral interpretation of the Shukla–Gutzler method
To understand why the time scale T0 tends to be underestimated, it is useful to consider a spectral interpretation. Furthermore, it is useful to consider the more general estimate
where M < D. Shukla and Gutzler (1983) proposed this estimate and suggested using D = 30 days and an upper limit of summation between M = 15 and M = 30 days. Given a single time series of length N, the sample autocorrelation function is related to the normalized periodogram at frequency ω by a discrete Fourier transform pair (Cryer and Chan 2010):
where λM(ω) is a set of spectral weights, called the spectral window:
The result in (34) shows that time scale T0 estimated from (31) can be interpreted as a smoothed periodogram estimate of the power spectrum. The spectral window for selected values of M is shown in Fig. 5. We see that while the window applies the largest weights to the periodogram at low frequencies, it also applies negative weights at moderately low frequencies. Negative weights are problematic because they lead to underestimates of the true time scale T0, and in some cases to negative values of T0. The negative weights are an artifact of the sharp cutoff in the summation in (31) and can be avoided by applying weights that gradually decrease to zero (Priestley 1981, p. 434). This approach, in turn, can be interpreted as modifying the spectral window λM(ω), which in turn can be viewed as a generalization of the method of Madden (1976), which puts all weight on the lowest possible resolved frequency.
7. Summary and conclusions
This paper reexamined a procedure proposed by Shukla and Gutzler (1983) for testing potential predictability of seasonal means using techniques based on analysis of variance. Certain subtle and unverified assumptions that are required for the Shukla–Gutzler method to work, such as the fact that certain quantities follow the chi-square distribution and that the time mean is independent of its residuals, even for autocorrelated data, were clarified. Moreover, Monte Carlo experiments demonstrate that these assumptions are adequate even for autocorrelated data, provided the effective time scale T0 is known. We derive the same F statistic in (22) for testing potential predictability as Shukla and Gutzler (1983), but find that the degrees of freedom differ from that proposed by Shukla and Gutzler (1983). On the other hand, in many practical applications, the degrees of freedom are already so large that the correction has relatively little effect on the overall significance test. Unfortunately, the effective time scale is generally not known and hence must be estimated from data. This estimation problem has received considerable attention in previous papers with no clear solution. Our Monte Carlo experiments highlight the fact that estimates of T0 can be biased by a factor of 3–10, depending on the method. The question of how the potential predictability estimated from the corrected Shukla–Gutzler method compares with other estimates will be considered in Feng et al. (2012, manuscript submitted to J. Geophys. Res.).
Support is gratefully acknowledged from grants from the NSF (Grant 0830068), the National Oceanic and Atmospheric Administration (Grant NA09OAR4310058), and the National Aeronautics and Space Administration (Grant NNX09AN50G).