## Abstract

This paper proposes a new method for assessing potential predictability of seasonal means using a single realization of daily time series. Potential predictability is defined as variability in seasonal means that exceeds the variability due to weather stochastic processes. The proposed method is based on analysis of covariance and accounts for autocorrelation in daily time series and uncertainties in statistical parameters. The method is applied to reanalyzed daily surface air temperature and detects significant potential predictability over the oceans and equatorial land areas. Potential predictability is weaker and varies significantly with season over extratropical land areas, with the fraction of potentially predictable variance rarely exceeding 60%. The proposed method also produces an estimate of the potentially predictable component of seasonal means, which can be used to investigate the relation between potential predictability and possible boundary forcings. The results are generally consistent with previous studies, although a more detailed study will be made in a future paper.

## 1. Introduction

It is generally believed that the chaotic nature of the atmosphere limits useful weather predictions to less than a month (Lorenz 1963; Shukla 1981). At time scales beyond a month, slowly varying components of the climate system, called boundary forcings, have greater impact on atmospheric variations than initial condition errors. Consequently, the atmospheric response to boundary forcings provides the basis of potential predictability on seasonal time scales, where the term “potential” is used to highlight the possibility that the boundary forcings themselves may not be predictable (Madden 1976; Madden et al. 1999; Chervin 1986; Rowell et al. 1995; Phelps et al. 2004). Potential predictability of seasonal means has been assessed by idealizing interannual variability of seasonal means as a linear combination of weather noise variability that is unpredictable beyond a month, plus potentially predictable variability arising from boundary forcing (Madden and Shea 1978; Shukla and Gutzler 1983; Trenberth 1985; Zwiers 1987; Palmer and Anderson 1994; Kumar and Hoerling 1995; Zwiers 1996; Quan et al. 2004). The extent to which interannual variability exceeds weather noise variability determines the potential predictability.

A key quantity in potential predictability is the magnitude of weather noise variability, defined as the variance of seasonal means due to atmospheric processes that occur independently of boundary forcing. Although this quantity cannot be measured directly from observations, it can be estimated in general circulation models (GCMs). For instance, weather noise variability can be estimated by comparing GCM runs with climatological sea surface temperature (SST) to runs with interannually varying SST (Lau 1985; Chervin 1986). Alternatively, a GCM can be run with an ensemble of atmospheric states coupled to the same SST (Dix and Hunt 1995; Kumar and Hoerling 1995; Stern and Miyakoda 1995; Zwiers 1996; Rowell 1998; Sardeshmukh et al. 2000; Shukla et al. 2000; Phelps et al. 2004; Quan et al. 2004). These strategies have been used to demonstrate that potential seasonal predictability can arise from variability in sea surface temperature (SST; Rowell 1998; Phelps et al. 2004; Wu and Kirtman 2006; von Storch 2008), soil moisture (Koster et al. 2000; Yang et al. 2004; Dirmeyer 2005), sea ice (Honda et al. 1999), and snow cover (Gutzler and Preston 1997).

A fundamental limitation of estimates of weather variability from GCMs is that models are imperfect and hence give inaccurate, incomplete, or model-dependent estimates of weather noise variability. For instance, it is well established that soil moisture anomalies can persist for months and influence weather through their impact on evaporation and other surface energy fluxes, but the degree to which soil moisture affects precipitation varies considerably with model (Koster et al. 2004). Snow anomalies also can persist for months and impact surface air temperatures by modifying the reflective and absorptive properties of the land surface, but different models represent snow cover in different ways that can affect the predictability of surface air temperatures (Shongwe et al. 2007). Similarly, Arctic sea ice extent can persist for several months at certain times of the year and influence the general circulation by altering the surface temperature gradient, which in turn influences the strength of the westerlies through thermal wind balance (Alexander et al. 2004; Budikova 2009; Kumar et al. 2010; Blanchard-Wrigglesworth et al. 2011). However, predictions of interannual and decadal variations of Arctic sea ice extent are notoriously difficult with dynamical models and have been attempted only recently with statistical models (Lindsay et al. 2008). Even in the most extensively studied case of potential predictability from tropical Pacific SSTs, models still have difficulty reproducing observed El Niño–Southern Oscillation (ENSO) teleconnection patterns and can give widely differing results, especially in monsoonal regions (Yang and DelSole 2012). Finally, it seems unlikely that every important mechanism for potential predictability has been identified in the literature and is captured correctly in current models.

In light of the above limitations of estimating potential predictability with GCMs, it is desirable to develop alternative methods for estimating potential predictability that do not require accurate physical models of all relevant components of the climate system. Madden (1976) proposed estimating weather noise variability of seasonal means by extrapolating the power spectra derived from observed seasonal (e.g., 96 day) time series. Shukla (1983) raised the question of whether such an estimate overestimated the magnitude of weather noise variability because of the fact that boundary forcing may influence seasonal spectra. To ameliorate this issue, Shukla and Gutzler (1983) proposed instead using monthly (e.g., 30-day) time series to estimate weather noise variability. Shukla and Gutzler also proposed an analysis of variance technique that accounted for autocorrelation in the time series, but the available methods for estimating the autocorrelation function were problematic (Trenberth 1984a,b). Additionally, previous studies either adopt ad hoc formulations of the degrees of freedom for weather noise variability (Madden and Shea 1978; Shukla and Gutzler 1983) or neglect the uncertainty of parameter estimates in the statistical significance test, which may lead to bias in potential predictability. Despite these shortcomings, such methods have been used to investigate the potential predictability of the atmospheric circulation (Madden 1976; Shukla and Gutzler 1983; Trenberth 1985), temperature (Madden and Shea 1978; Nicholls 1981; Madden and Kidson 1997; Zheng et al. 2000), and rainfall (Singh and Kripalani 1986; Madden et al. 1999).

In view of the aforementioned shortcomings of statistical methods, this study aims to develop an improved statistical test of whether the interannual variance exceeds the variance due to weather variability. Such a test would provide an independent estimate of the regions and seasons for which boundary forcings have the potential to provide useful predictive skill on seasonal time scales and can be compared with the results of GCMs to validate them. Such comparison will not necessarily be straightforward, as differences between GCM and observational estimates may arise for a variety of reasons. For instance, observational estimates may be fundamentally limited by sample size or univariate assumptions, or GCM estimates may be fundamentally limited by deficient physics. Nevertheless, development of accurate estimates of potential predictability from observations would be the first step in such a comparison. The proposed test is effectively analysis of covariance (ANOCOVA), which can be interpreted as a combination of an autoregressive (AR) model and a model for performing analysis of variance (ANOVA). The proposed test has the advantage of taking time dependence into account like an AR model, while also accounting for uncertainty in parameter estimates, which previous studies often neglect. Besides estimating potential predictability, ANOCOVA produces an estimate of the potentially predictable signal that can be used to investigate relations between the signal and boundary forcings such as SST and soil moisture.

A description of the ANOCOVA model, including parameter estimation, order selection, test of potential predictability, variance estimates of signal and noise, partition of potential predictable variance, and model assumptions, is given in section 2. The reanalysis surface air temperature data are briefly described in section 3. The discussion of model results and relations between signal and SST and soil moisture are shown in section 4. Finally, the conclusions are given in section 5.

## 2. Analysis of covariance

### a. The model

A variable *T* is defined to be unpredictable if its distribution given a set of conditions *θ* is identical to its distribution irrespective of these conditions (Lorenz 1973):

This definition encompasses various forms of predictability. For instance, if the conditioning variable *θ* denotes observations of the climate system at some time prior to that of *T*, then this definition recovers predictability of the first kind (Lorenz 1973). If the conditioning variable denotes a hypothetical change in the time history of a greenhouse gas, then this definition recovers climate predictability. If the conditioning variable denotes the simultaneous state of an isolated component of the climate system, for example, sea surface temperature, soil moisture, or sea ice thickness, then this definition recovers potential predictability. The term “potential” is used because, while *T* may be predictable given *θ*, the condition *θ* itself might not be predictable given antecedent observations. It should be recognized that the distinction between these different forms of predictability may be ambiguous. For instance, if the change in greenhouse gas is included as part of the “earth system,” then predictability due to changes in greenhouse gas becomes potential predictability, rather than climate predictability.

The above definition implies that a necessary condition for a variable to be predictable is for the conditional and unconditional distributions to differ. Although two distributions could differ in a variety of ways, differences in means are of great interest because of their predictive utility. Therefore, we begin by investigating the hypothesis

where *E*(*T*|*θ _{i}*) denotes the conditional expectation of

*T*given

*θ*. The standard procedure for testing equality of means is analysis of variance. To apply ANOVA, we suppose we have

_{i}*D*independent realizations of the random variable

*T*under each condition

*θ*. Let

_{b}*T*

_{d}_{,b}denote the sample for the

*d*th realization under condition

*θ*. Then, standard (one-way) ANOVA assumes that the samples are drawn from the model

_{b}where *μ _{b}* is a population parameter that depends on the

*b*th conditional

*θ*, and

_{b}*ε*

_{db}are independently, identically, and normally distributed random variables with zero mean and variance . Under these assumption,

*E*(

*T*

_{d}_{,b}|

*θ*) =

_{b}*μ*, and the condition for

_{b}*T*to be unpredictable, namely (2), becomes

There are at least two difficulties with applying ANOVA to data sampled at daily intervals. First, *ε*_{db} in model (3) is not likely to be independent of *θ _{b}*; for instance, the variability of weather may vary with the sea surface temperature. Despite some evidence that the noise variance depends on boundary condition (Renwick and Wallace 1996; Compo and Sardeshmukh 2004; Seager et al. 2010), other studies suggest this dependence is detectable only with relatively large ensemble sizes (Sardeshmukh et al. 2000), and Levene’s test for equality of variance implies that the residuals in our study have no significant dependence on boundary condition (see section 4a). More questionable is the assumption that

*ε*

_{db}in model (3) is independent of

*d*, which is tantamount to assuming

*T*

_{d}_{,b}is white noise for fixed

*θ*. It is well known that many weather variables tend to be correlated from day to day. The purpose of this paper is to develop a procedure for testing the hypothesis (4) while taking into account autocorrelation in the time series.

_{b}A standard approach to modeling autocorrelated time series is to suppose the random variable is governed by an autoregressive model of order *p*:

where *φ*_{1}, *φ*_{2}, … , *φ _{p}* are autoregression coefficients. In many applications, the parameter

*μ*in (5) is ignored because the time series is centered, but it is included here for reasons that will become apparent shortly. It is readily verified that if the parameters

*φ*

_{1},

*φ*

_{2}, … ,

*φ*,

_{p}*μ*were known, then one can define a new variable

which satisfies (5) but with *μ* = 0. The model (5) is not adequate because the solutions are independent of *θ _{b}*. To construct a model such that the mean of

*T*

_{d}_{,b}depends on

*θ*, we allow the

_{b}*μ*to depend on

*θ*, giving the following model:

_{b}This equation reduces to the ANOVA model if *φ*_{1} = *φ*_{2} = ⋯ = *φ _{B}* = 0, and reduces to a standard autoregressive model if

*μ*is constant, implying that the equation represents a combination of the autoregressive model (5) and the ANOVA model (3). A model of the form (7) is often the basis of a technique called ANOCOVA. For fixed

_{b}*μ*, solutions from (7) become asymptotically stationary for sufficiently large

_{b}*d*. In this limit, the conditional expectation of

*T*

_{d}_{,b}becomes independent of

*d*:

Taking the conditional expectation of both sides of (7) and using (8) and *E*(*ε*_{d,b} | *θ _{b}*) = 0 gives

This equation shows directly that the conditional mean of *T _{d}*

_{,b}is proportional to

*μ*.

_{b}Variations in *μ _{b}* are associated with potential predictability. The term

*μ*is assumed to be constant over a season, but it varies from year to year because of changes in, say, sea surface temperature and soil moisture. This assumption is common in most other estimates of potential predictability from observations. It should be recognized that changes due to atmospheric composition, such as greenhouse gases, will also lead to variations in

_{b}*μ*. Hence, our definition of “potential predictability” includes predictability of the second kind. In section 2g we will separate potential predictability into forced and unforced components.

_{b}### b. Estimates of model parameters

The regression parameters *μ*_{1}, *μ*_{2}, … , *μ _{B}*,

*φ*

_{1},

*φ*

_{2}, … ,

*φ*in model (7) can be estimated from a sample drawn from the model. Suppose a time series

_{p}*T*

_{d}_{,b}from model (7) is available, where

*d*specifies the “day” at steps

*d*= 1, 2, … ,

*D*+

*p*, and

*b*specifies the “boundary forcing” corresponding to cases

*b*= 1, 2, … ,

*B*. In practice, the boundary forcing is characteristic of the year, so

*b*often corresponds to the “year” of the time series. The climatological annual cycle of

*T*

_{d}_{,b}should be removed to avoid including the annual cycle in the autoregressive model. For this dataset, the model (7) can be written in matrix form as

where the matrices , , and *β*_{F} are defined as

where is a *D*-dimensional vector consisting of lagged-*p* values of *T _{d}*

_{,y}for the

*b*th condition, and

**j**and

**0**are

*D*-dimensional vectors consisting of all ones and all zeros, respectively. This implies that is a

*DB*-dimensional vector, is a

*DB*× (

*B*+

*p*) matrix, and

*β*_{F}is a

*B*+

*p*dimensional vector. The least squares estimate of

*β*_{F}is

where the caret denotes a sample quantity, superscript T denotes the transpose operation, and subscript *F* indicates results for the “full” model (7). The full model is distinguished from the “constrained” model (5) by the fact that the latter has no variation in the parameter *μ _{b}*, and hence has no potential predictability due to boundary forcing.

The parameters in the constrained model (5) also can be estimated by the method of least squares. As with the full model, the constrained model can be written in matrix form as

where the matrices , and *β*_{H} are defined as

This implies that is a *DB* × (1 + *p*) matrix, and *β*_{H} is a 1 + *p* dimensional vector. The least squares estimate of *β*_{H} is

### c. Order determination

One procedure for selecting the appropriate order *p* of the autoregressive process is the well known Akaike Information Criterion (AIC; Akaike 1974). This criterion is to select the value of *p* that minimizes

where *N* (=*DB*) is sample size, SSR is the residual sum of squares, and *K* is the number of parameters. The first term on the right-hand side of (16) is the log likelihood, which measures goodness of fit, while the second term is a penalty that increases with the number of fitted parameters. In general, models with small values of AIC are preferred. Hurvich and Tsai (1989) argued that AIC can be quite biased especially for small samples or when the number of fitted parameters is moderate to large fraction of the sample size. They added a correction for sample size based on AIC (AICC) with a greater penalty for extra parameters and showed AICC converges to AIC for sufficiently large sample sizes. It turns out that for our study, AIC and AICC are nearly the same because of the large sample size.

Another widely used criterion is the Bayesian Information Criterion (BIC; Schwarz 1978), which is derived from Bayesian arguments. This criterion is defined as

where *N*, SSR/*N*, and *K* are as in (16). As with AIC, we select the value of *p* that minimizes BIC. The BIC criterion differs with AIC only in the form of penalty term. However, BIC is more parsimonious than AIC because of the larger penalty term, generally selecting lower orders of the autoregressive process (Katz 1981; Lütkepohl 1985). Furthermore, use of BIC may be preferable for sufficiently long time series, which may range from 100 to over 1000 (Wilks 2006).

### d. Test for potential predictability

The hypothesis of no potential predictability is equivalent to the null hypothesis

in model (7). Rejection of the hypothesis *H*_{0} will be interpreted as evidence for potential predictability. A procedure for testing this hypothesis can be derived from the likelihood ratio test, as described in Graybill (1961), Mardia et al. (1979), and Seber and Lee (2003). This procedure leads to the statistic

where

and . If the null hypothesis (18) is true, then *F _{a}* has an

*F*distribution with

*B*− 1 and

*DB*−

*B*−

*p*degrees of freedom. If the estimated value of

*F*is less than the 5% significance threshold based on the

_{a}*F*distribution, then

*H*

_{0}cannot be rejected, and we conclude that whatever potential predictability exists, if it exists at all, is so weak that it cannot be distinguished from an autoregressive process with constant population mean.

The statistic (19) has an intuitive interpretation. The quantities SSR_{F} and SSR_{H} measure the degree to which the respective models fit the data. If the hypothesis *H*_{0} is true, then we do not expect the full model (7) to fit the data significantly better than the constrained model (5), suggesting that SSR_{H} will be “close” to SSR_{F}, or equivalently that SSR_{H} − SSR_{F} will be “small.” Conversely, if the hypothesis is *H*_{0} is false, then we expect the full model to fit the data much better than the constrained model, suggesting that SSR_{F} will be much “smaller” than SSR_{H}, or equivalently, SSR_{H} − SSR_{F} will be “large.” Thus, the difference SSR_{H} − SSR_{F} measures how much better the autoregressive model with changing population mean fits the data compared to the autoregressive model with constant population mean. The numerator SSR_{F} sets the scale for measuring “small” and “large,” and the remaining terms give the degrees of freedom for normalizing the measures of fit.

Note *F _{a}* depends only on sample quantities and does not require specifying unknown population quantities, such as autocorrelation time scale. Moreover, the distribution under the null hypothesis is independent of the autoregressive parameters. Therefore, the significance test for potential predictability takes into account the uncertainty of the estimated parameters.

### e. Signal and noise variances

If *H*_{0} is rejected, then it is of interest to quantify the potentially predictable variance. The conditional variance of *T _{d}*

_{,b}can be shown to be (Box et al. 2008)

where *ρ _{τ}* is the autocorrelation function of

*T*

_{d}_{,b}derived from (7) for fixed

*μ*. The population autocorrelation function corresponding to model (5) can be derived from

_{b}*φ*

_{1},

*φ*

_{2}, … ,

*φ*by standard methods [see Box et al. (2008); closed-form expressions are complicated beyond the first order, so they are not given here]. Note that var(

_{p}*T*

_{d}_{,b}|

*θ*) is independent of

_{b}*θ*. The conditional mean of

_{b}*T*

_{d}_{,b}from model (7) is given by (9). It follows by the law of total variance that

where *E*(·) and var(·) are the (unconditional) expectation and variance operator, respectively, defined as the average or variance over all *d* and *b*. The first term on the right of (22) can be identified with unpredictable variance because of weather variability, while the second term can be identified with potentially predictable variance.

It is customary to quantify potential predictability in terms of the amount of variance of seasonal means that is predictable. We define the seasonal mean to be

It is well known (Brockwell and Davis 1961, chapter 7) that for any stationary process the variance of the time mean of a process is related to the variance of the process as

where

Moreover, since *E*(*T _{d}*

_{,b}|

*θ*) defined in (9) is independent of

_{b}*d*, it follows that

and hence

It follows from these expressions and the law of total variance that the variance of seasonal means for processes governed by model (7) is

### f. Estimates of signal and noise variances

To derive estimates of the predictable variance, we substitute least squares estimates for *φ*_{1},*φ*_{2}, … , *φ _{p}* into (28). This substitutional approach assumes that the uncertainty in these parameters is sufficiently small as to be negligible, which is a reasonable assumption for large

*DB*. We emphasize that this approximation is invoked only for estimating variances, not for performing the significance test. An important step is estimating the term var(

*μ*). Unfortunately, one cannot estimate this term simply by substituting sample estimates of

_{b}*μ*because these estimates contain independent errors that will inflate the overall estimate of the variance. Thus, we seek an unbiased estimate of var(

_{b}*μ*) in which the variance of estimation errors is subtracted out. To do this, it is convenient to define the matrix

_{b}where **j**_{B} is a *B*-dimensional vector of ones, is a *B*-dimensional identity matrix, and is a *B* × *p* matrix of zeros. This matrix allows us to write

where is the average of *μ _{b}*. Equation (31) is an unbiased estimate of the variance of

*μ*. In general, we do not know the population value of

_{b}

*β*_{F}, we only have estimates of it. Substituting the least squares estimate of

*β*_{F}into (31) and taking expectations gives

where we have used the standard regression theory result

and the fact that if **w** is a random vector with mean ** μ** and covariance matrix

**Σ**, then

for any matrix . Finally, it is a standard fact that an unbiased estimate of is

It follows from these results that an unbiased estimate of the variance of *μ _{b}* is

An approximately unbiased estimate of the signal variance (29) is therefore

To estimate the noise variance, we need estimates of the autocorrelation of *T _{d}*

_{,b}. We derive these estimates by substituting the least squares parameters into the autoregressive model (5), or equivalently into the autoregressive model (7) but with fixed

*μ*. This leads to the approximately unbiased estimate of the noise given by

_{b}It is often useful to express the predictable variance as a fraction of the total variance. The total variance can be estimated in one of two ways. First, one could estimate total variance using the unbiased estimate

where is the grand mean of *T _{d}*

_{,b}. Second, one could estimate total variance by adding the signal and noise variance estimates (37) and (38), respectively. We find that the two estimates of total variance are within 10% of each other, except over regions dominated by ice or snow where the AR model is not expected to be a good fit of temperature variability. To ensure the fraction of predictable variance (FPV) is between 0 and 1, we calculate FPV with

### g. Partition of potentially predictable variance

As mentioned in the introduction, our definition of potential predictability includes changes in the mean that are due to changes in greenhouse gas concentrations. We would like to decompose the potentially predictable variance into a part because of changes in greenhouse gas concentrations, and the remainder, presumably because of boundary forcing. Our approach assumes that the response of *T _{d}*

_{,b}to greenhouse gas concentrations associated with the conditions

*θ*

_{1},

*θ*

_{2}, … ,

*θ*is known to within a multiplicative factor. Indeed, over the time period under investigation (1979–2008), the response to anthropogenic and natural forcing is dominated by a trend (Ting et al. 2009; DelSole et al. 2011). Furthermore, the response to multiple radiative forcings is approximately the linear sum of the responses to the individual forcings (Knutson et al. 2006; Meehl et al. 2006). Therefore, we identify the forced component in

_{B}*μ*using the model

_{b}where *t _{b}* is the “time” of condition

*θ*, and

_{b}*a*and

*c*are parameters chosen to best fit . This regression allows us to decompose the

*μ*into two uncorrelated parts:

_{b}where the first term in parentheses on the right-hand side is the trend component, presumably because of changing greenhouse gas concentration, and the second term in parentheses is the remainder. The decomposition (42) can be written in matrix terms (aside from a factor that cancels out) as

where

and **t** = (*t*_{1 }*t*_{2} … *t _{B}*)

^{T}. Substituting (43) into (32) and following steps similar to those used to derive (36) leads to the following unbiased estimate of the greenhouse gas–forced component of

*μ*:

_{b}The nontrend component, which we identify as “boundary forced,” is the residual:

### h. Validity of model assumptions

The ANOCOVA model (7) makes several assumptions that should be recognized. First, the model assumes that the residual error *ε*_{db} is an independently and normally distributed random variable with a mean zero and identical variance. Fortunately, ANOVA is robust to departures from normality particularly when groups are of equal sample size and for large sample size (as is this case here), even under considerable heterogeneity of variances (Scheffe 1959; Rice 1995). Whether *ε*_{db} is independent or not can be assessed using the Ljung and Box (1978) statistic, which tests if a group of autocorrelations differ from zero. The statistic is defined as

where *L* is the number of lags *τ*. The null hypothesis that the correlations are all zero is rejected if , where indicates the 1 − *α* quantile of the chi-squared distribution with *L* degrees of freedom. The model implicitly presumes that the signal *μ _{b}* operates on such a long time scale that it can be considered effectively constant within a season. This assumption is not likely to be true in general. It is possible that the boundary forcing within seasons still contain a varying component even after the annual cycle has been removed. Such variations will affect the statistical test for potential predictability as pointed out by Trenberth (1984b). Finally, the model assumes there is no interaction between signal and noise. Unfortunately, the changing boundary conditions can alter the fast fluctuations of daily weather, for example, some SST changes may alter the mean air temperature and the character of daily weather, including shifts in the path of storms. However, previous studies suggest that atmospheric internal variability on seasonal time scale due to SST is relatively small (Kumar et al. 2000; Peng and Kumar 2005). In any case, this assumption can be checked a posteriori (see section 4a).

## 3. Data

The surface air temperature dataset used in this study is the reanalysis dataset provided by National Centers for Environmental Prediction (NCEP)–National Center for Atmospheric Research (NCAR) (Kalnay et al. 1996). This reanalysis uses a “frozen” state-of-the-art data assimilation system to eliminate climate jumps due to the use of different data assimilation systems. We use daily averages of 6-hourly temperature on a 192 × 94 Gaussian grid from January 1979 to December 2008. The reason for choosing this time period is to avoid the significant change in data density that occurs in the late 1970s as a result of satellite data becoming available, and to avoid the dramatic climate “shift” that occurred in 1976, which would be detected as potential predictability despite its being unpredictable. Reanalysis estimates are a combination of an imperfect forecast model and heterogeneous observations, and therefore have uncertainty (Simmons et al. 2004). Moreover, monthly and weekly SSTs were used in specifying surface temperature over oceans, which filters out high frequency information and reduces daily variability.

The annual cycle of time series can vary substantially over a season, which could contaminate the daily autocorrelations. It is necessary to subtract the annual cycle prior to analyzing potential predictability. It would be difficult to estimate this cycle simultaneously with the model parameters because the model will be estimated using data only within a season, whereas the annual cycle requires data over all seasons. On the other hand, the estimated annual cycle has negligible uncertainty because of the large sample size since all seasons are used in the estimation and there are so few harmonic coefficients that are estimated. Therefore we remove the first three harmonics of the annual cycle from the surface air temperature time series following Narapusetty et al. (2009). The fraction of total variance explained by the first three harmonics is shown in Fig. 1. The explained variance is relatively small over equatorial and Southern midlatitude regions, implying less contribution of total variation from the annual cycle. In contrast, for most land areas, especially in mid- and high-latitude regions, the annual variation dominates more than 70% of the total temperature variance.

The daily anomaly with the annual cycle removed is divided into conventional seasonal series. We mainly focus on two seasonal series, December–Februrary (DJF) and June–August (JJA). Each season contains 90 days, but our results are not sensitive to the length of the season. For these choices, the parameter *D* defined in section 2 is 90, and the parameter *B* is 29 and 30 for DJF and JJA, respectively, implying a sample size of 2670 and 2700, respectively. The number of parameters *K* in (16) and (17) are 29 + *p* + 1 and 30 + *p* + 1 for DJF and JJA, respectively, where *p* is the AR order of the model.

The data used in the correlation analysis are from two sources: version 3 of the extended reconstructed sea surface temperature (ERSST.v3; Smith et al. 2008) and the global soil moisture dataset (Fan and Van den Dool 2004). We use monthly SST anomalies distributed over a 2° × 2° grid and monthly soil moisture with a spatial resolution of 0.5° × 0.5° starting from 1979 to 2008.

## 4. Results

### a. Model details

We determine the order *p* based on both AIC and BIC to evaluate the extent to which they differ. The maximum order examined is set to 5. Daily temperature anomalies during these two seasons are fitted to the ANOCOVA model at various orders. The order selected on the basis of the two criteria AIC and BIC for each season are shown in Fig. 2. As anticipated, the AIC tends to select higher order models than BIC, with order 5 prominent over the Southern Ocean in DJF and North Pacific in JJA. In contrast, the BIC criterion suggests that a third-order model provides a good fit for most locations, while fourth- or higher-order dependence is spotty over the equatorial and northern subtropical oceans.

Since a first-order AR model often is assumed to provide an adequate model of weather variability, it is of interest to compare the first-order model with the model selected by BIC. First, we fit daily time series of temperature to the ANOCOVA model and perform the Ljung and Box test to determine whether the residual error as described in section 2h is white noise. The test results for both models are shown in Fig. 3. Gray shading indicates areas where the hypothesis of white noise for the residuals cannot be rejected. The figure shows that the first-order ANOCOVA model is not adequate in the subtropics. However, the residuals are found to be white for most regions with BIC-order model, indicating that the higher-order model is necessary to ensure that the residuals are white. There are a few areas of the globe where the residuals differ significantly from white noise, this area does not exceed the 5% area expected to occur by chance.

Our results show that the first-order ANOCOVA model is incapable of satisfying the underlying assumption that the residual errors are independent over the subtropical region. Therefore, we only show results from the model selected by BIC hereafter. First, we calculate weather noise on the basis of (38) in DJF and JJA (Fig. 4) with the derived autoregressive parameters from (12). There is a substantial disparity in the weather noise variability for both seasons because of the land sea contrast, larger over land than in the ocean. Meanwhile, the magnitude of noise over the northern extratropical land areas in JJA is considerably lower than the corresponding value in DJF.

Moreover, we performed the Levene test for equality of the variance of residuals and found no statistically significant change in variance at any grid point in any season at the 5% level, suggesting that weather noise does not depend on boundary signal. Our results are consistent with the findings by Kumar et al. (2000) and Peng and Kumar (2005) that interannual variations of SST have little impact on the noise variability. However, other papers claim that weather noise does vary with SST (Renwick and Wallace 1996; Compo and Sardeshmukh 2004; Seager et al. 2010), and it is not entirely clear how our findings can be reconciled with these papers. Our results lead us conclude that the assumption of independence of signal and noise is valid, that the known departures from independence appear to be weak, at least for seasonal mean quantities, and seem unlikely to cause significant errors in the observational estimates. We note that this conclusion is only valid based on the surface temperature of NCEP–NCAR reanalysis used in the present study.

### b. Fraction of potential predictable variance

The FPV ratio derived from the estimated weather noise and interannual variance for both models over the globe in DJF and JJA is shown in Fig. 5. Note that FPV is displayed only over the regions where the null hypothesis of identical seasonal mean is rejected at the 5% level. As expected, potential predictability tends to be highest over the tropics, where interannual variability is closely connected with ENSO. For most tropical regions, FPV is significantly greater than 0.7, indicating that 70% of the interannual variability of seasonal means are potentially predictable. Note there are several grid cells over the eastern Pacific ironically showing insignificant FPV in winter, which is seemingly contrary to the well-known predictability over this region. The explanation for this result is that eastern Pacific temperatures are highly predictable, for instance, the typical decorrelation time scale is 15 days (Feng et al. 2011), and ANOCOVA is simply stating that there is no detectable predictability beyond the predictability due to autoregressive processes. Additional large potential predictability is also observed over mid- and high-latitude oceans, arising from the impact of SST that may or may not be tropical related. There is evidence of modest potential predictability over extratropical land areas, notably northern Europe and southern land areas in DJF, as well as East Asia, North America and South Africa in JJA. The values of FPV are less than 0.6 over these regions with only a few regions of FPV in excess of 0.75 in JJA.

### c. Forced versus unforced potential predictability

We have defined potential predictability as a significant change in seasonal means. This definition encompasses potential predictability because of anthropogenic and natural radiative forcing, as well as changes because of unforced variations in sea surface temperature, soil moisture, and other slow components of the climate system. Over the time period under investigation (1979–2008), the response to anthropogenic and natural forcing is dominated by a trend (Ting et al. 2009; DelSole et al. 2011). Consequently, we partition the potentially predictable signal into trend and nontrend components and computed unbiased estimates for the respective variances, as described in section 2g. The resulting predictability estimates, shown in Fig. 6, reveal that the unforced component of potential predictability dominates over most regions. This is not surprising as the unforced component is most likely dominated by ENSO variations, which is the largest source of interannual variations in the climate system. It should be noted that the relatively small area characterized by significant trend, as compared to previous estimates (such as Fig. 3.10 of Trenberth et al. (2007), is due primarily to the different choice of dataset (NCEP–NCAR reanalysis versus Smith and Reynolds 2004). Specifically, we have repeated our analysis on the alternative dataset and reproduced previous trend estimates, and derived maps of FPV similar to those shown in Fig. 6.

### d. Relations between the predictable signal and boundary conditions

In this section, we investigate the relation between the potential predictability identified in the previous sections and SST and soil moisture. We focus on northern South America (15°S–10°N, 35°–80°W) and East Asia (40°–60°N, 110°–135°E), which have high and moderate degrees of predictability, respectively. The potentially predictable signal derived from ANOCOVA is spatially averaged over each selected region and season. We find that the resulting time series is highly correlated with the observed seasonal means, so that the noise filtering by ANOCOVA does not lead to significantly different signals than would be obtained without ANOCOVA filtering. Correlation maps between the regionally averaged predictable component and the coinciding seasonal mean SST and soil moisture are shown in Figs. 7 and 8. For northern South America, we find significant positive correlations over equatorial central-eastern Pacific and tropical Indian and Atlantic, with negative correlations in the northern and southern subtropical Pacific in DJF and March–May (MAM), suggestive of an influence of ENSO. This ENSO–temperature relationship has been demonstrated by a number of previous studies (Halpert and Ropelewski 1992; Trenberth et al. 2002). In addition, there are significant negative correlations with soil moisture of South America, reminiscent of soil moisture feedback on temperature (De Haan and Kanamitsu 2008; Barreiro and Díaz 2011). For East Asia, the impact of SST is marginal: weak positive correlations are observed only over the northwestern and southwestern Pacific, and mid- and high-latitude Atlantic Oceans, especially during JJA and September–November (SON) (Figs. 7e–h). Such SST correlations are in line with the earlier studies that link the interannual variability of East Asia temperature with ENSO (Wu et al. 2010; Yang and DelSole 2012) and North Atlantic SST anomalies (Czaja and Frankignoul 2002; Wu et al. 2011). Significant negative correlations between soil moisture and temperature are found only over a small portion of East Asia, primarily in MAM and JJA (Figs. 8e–h), which is in agreement with the previous work that documents the positive impact of soil moisture on regional monsoon climate (Kim and Hong 2007; Zhang and Zuo 2011). On the basis of the correlation patterns shown above, there is evidence to suggest that both remote SST forcing and local soil moisture determine the seasonal temperature predictability over northern South America and would provide useful skill for seasonal forecast. The seasonal temperature over East Asia displays placid relationships with SST and soil moisture, consistent with the modest predictability detected in this area.

## 5. Conclusions

This paper proposed a new method for testing potential predictability of seasonal means. Potential predictability is defined as variability in seasonal means that exceeds the variability due to weather stochastic processes. The proposed method is based on analysis of covariance (ANOCOVA) and generalizes traditional analysis of variance to account for autocorrelated time series and uncertainties in the autoregressive parameters. The method also produces a time series for the potentially predictable signal that can be used to investigate its relation with other variables, such as boundary condition indices. An unbiased estimate of the potentially predictable variance, as well as its decomposition into forced and unforced components, also is derived.

We assessed the potential predictability of seasonal means by applying ANOCOVA to surface air temperature from the NCEP–NCAR reanalysis for the period 1979–2008. The order of the autoregressive part of the model is determined from the BIC criterion. It is shown that residuals from the resulting ANOCOVA model are consistent with the white noise assumption, whereas the residuals from a first-order model differ significantly from white noise. In addition, first-order ANOCOVA models tend to produce larger weather noise variability than BIC-selected models (not shown), leading to lower predictability estimates, especially over land areas.

Potential predictability of seasonal mean temperature is found to be significant over most ocean locations, with maximum potential predictability centered over the equator accounting for over 90% of the total variability in seasonal means. Over land, potential predictability of temperature is consistently weaker than over ocean. The strongest land potential predictability occurs in equatorial areas. Other notable potential predictable land regions include northern Europe, East Asia, and North America, consistent with previous studies (Barnston and Smith 1996; Zheng et al. 2000; Palmer et al. 2004; Shongwe et al. 2007; Wang et al. 2009; Barnston et al. 2010; Yang and DelSole 2012).

The difference between predictability and potential predictability must be borne in mind when interpreting our results. In particular, the lack of potential predictability does not imply lack of predictability. For instance, ANOCOVA indicates that some regions in the eastern Pacific have no significant potential predictability, despite the fact that this region is highly predictable. This result merely implies that there is no detectable predictability beyond the predictability due to autoregressive processes.

The potentially predictable signal derived from ANOCOVA was used to calculate correlation maps with sea surface temperature and soil moisture. These maps suggest that both ocean and land conditions are responsible for the significant potential predictability. Over East Asia, seasonal temperature shows only a weak relation with SST and soil moisture, consistent with the modest predictability detected in this region. It should be recognized that in these cases the potentially predictable component derived from ANOCOVA is strongly correlated with actual seasonal means, hence the correlation maps derived from ANOCOVA-based time series are nearly the same as those derived from seasonal means.

This study develops a rigorous framework for estimating potential predictability that overcomes limitations of many previous methods. The estimated predictability from ANOCOVA can be used to evaluate the performance of the dynamical model estimates of potential predictability, which may be inaccurate, incomplete, or model dependent. On the other hand, the ANOCOVA-derived predictability can be compared with estimates from the previous statistical methods, such as those by Madden (1976) and Shukla and Gutzler (1983) mentioned in the introduction. Such comparison can be instructive for evaluating the sensitivity of potential predictability estimates arising from different model formulations. In addition, the newly developed ANOCOVA model can also be applied to other climate variables, such as geopotential height and wind. These analyses will be presented in forthcoming papers.

## Acknowledgments

This work was sponsored by NASA’s Energy and Water Cycle Study (NEWS) program (Grant NNX11AE32G). Dr. DelSole was supported from grants from the NSF (0830068), the National Oceanic and Atmospheric Administration (NA09OAR4310058), and the National Aeronautics and Space Administration (NNX09AN50G). The authors thank Dr. Gilbert Compo acting as reviewer and two anonymous reviewers for their useful suggestions and constructive comments, which were very helpful in improving the manuscript.

## REFERENCES

_{2}concentration as diagnosed from an ensemble of AO GCM integrations