## 1. Introduction

Forecasting the weather one season ahead has recently become a popular—and economically necessary—exercise, commonly tackled either by experimenting with sophisticated general circulation models (GCMs) (see Palmer and Anderson 1994 for a review) or statistical models based on correlations between predictands (the weather variables to predict) and predictors (the weather variables used to produce the prediction). Such forecasts are necessarily of poor quality since individual events, such as storms, are known to be unpredictable beyond a couple of weeks at best. The challenge of seasonal forecasting consists of predicting, one or several seasons ahead, the seasonal means of weather variables.

Due to the climate drift and other problems intrinsic to GCMs (Tracton et al. 1989; Brankovic et al. 1990; Déqué 1991) in predicting long-term weather behavior it is not surprising that the seasonal skill of GCMs may not necessarily be the best (Van den Dool 1994). One of the main difficulties is actually to validate GCM forecasts since a large number of independent prediction cases is required to fully assess their skill. Statistical models do not suffer from this problem, on the one hand, but on the other hand contain poor physical diagnostic information. A clear awareness of the origin of the long-term predictive skill began with the seminal works of Walker and Bliss (1932), Bjerknes (1969), and Wallace and Gutzler (1981) with the discovery of global teleconnection patterns, in particular, ENSO and Pacific–North American (PNA). Nowadays, the use of systematic procedures, such as principal component analysis (PCA) or a rotated version of it, to extract coherent signals enabled the compression of climatic variables into a few “guess patterns.” These leading patterns are of great interest for *parametric* statistical modeling, where they can be used at both predictor and predictand stages, since they help in reducing the number of model coefficients to be fitted. However, leading EOFs usually tend to have the largest spatial scales, whereas fairly small anomalies may be predictable, for example, if ENSO is in a warm or a cold phase (Montroy 1997). *Nonparametric* statistical models may forecast such anomalies as well, provided they use some downscaling procedure (such as processing of a best analogs ensemble) at the predictand stage.

The recent renewal of interest for statistical modeling is precisely motivated by the progress made by physical models. They also still have a strong appeal both for theoretical investigation and for practical forecasting. The studies carried out by Bergen and Harnack (1982), Barnett and Preisendorfer (1987), Barnston (1994), and Vautard et al. (1996), to mention but a few, show encouragingly that a carefully designed empirical model can possess significant skill on a monthly to seasonal scale and often produce better predictions on such timescales than existing GCMs (Van den Dool 1994; Sarda et al. 1996). The present paper is a further attempt along this line.

North American surface air temperatures (SATs) and precipitations have often been used as a benchmark for testing statistical forecasting methodologies. This can be attributed to the early development of homogeneous station databases containing long records of these meteorological variables in the United States. A second reason is that North America is an ideal geographical location for seasonal prediction; many observational studies as well as GCM experiments (Barnett and Preisendorfer 1987; Palmer 1988; Peng et al. 1995; Renwick and Wallace 1996; Kumar et al. 1996; Shabbar and Khandekar 1996; Hoerling et al. 1997) indeed indicate that North American climate variability on monthly to seasonal timescales appears to be partly related to interannual variability of PNA-like structures which, in winter at least, are significantly related to changes in SST in the tropical Pacific: North America lies essentially underneath an atmosphere responding to low-frequency phenomena such as El Niño.

The present article presents an application of the prediction technique developed by Vautard et al. (1996) to the North American SATs. Comparison is made with other existing techniques. The purpose of this comparison is twofold: First, comparison of geographical and seasonal skill distributions can help in developing a scheme based on an optimal combination of different models (Sarda et al. 1996). Second, it can help in explaining some advantages of the model presented here. Among the most commonly used methods, including the classical approaches based on stochastic models (Box and Jenkins 1976), the following can be found:

*Simple regressive models*(Barnett 1981; Bergen and Harnack 1982). These models rely on the direct linear or nonlinear relation (with time lags) between predictor and predictand fields. The set of predictors can include multiple fields (Barnett and Preisendorfer 1978; Livezey and Barnston 1988).*Canonical correlation analysis*(CCA) (Barnett and Preisendorfer 1987; Barnston 1994; Shabbar and Barnston 1996; He and Barnston 1996) is now widely applied. The attractive character of CCA is its design of optimal rotation of both predictors and predictands, which helps keep relevant predictive information in a compact form while unpredictive noise is filtered. Despite these technical advantages, it is often difficult to physically interpret the outcoming CCA patterns since they are scalar-product dependent.*Optimal climate normal*(OCN) approach (Huang et al. 1996; Wilks 1996) utilizes the persistence of anomalies of the climate regimes to predict the future of the predictand by finding an optimal span of time in the nearest history defined as the OCN.*Empirical normal mode*(ENM; Brunet 1994; Brunet and Vautard 1996) approach uses physically meaningful empirical orthogonal functions (EOFs) as a filter of predictor variables. The goal is to retain the principal quasi-monochromatic predictor modes of the general circulation of the atmosphere.

Here we apply a slightly different methodology proposed by Vautard et al. (1996). The principle is to identify predictable components under the form of slowly varying, spectrally narrowbanded, time series obtained from predictor fields. These are called the space–time principal components (ST PCs). These time series are obtained by diagonalizing the lag covariance matrix of the predictors, using a procedure called multichannel singular spectrum analysis (MSSA: Plaut and Vautard 1994). It is mathematically equivalent to extended EOF analysis (EEOF), the only difference being in the use of many lags. To be more specific, the EEOF applications usually consist in finding the eigenelements of a state vector covariance matrix, the state vector consisting in the succession of a few lags of the same field. The MSSA applications consider, by contrast, a large number of lags of the same field, allowing an increased time and spectral resolution, but the same diagonalization procedure is used. Plaut and Vautard (1994) derived properties linking the power spectrum of the original field to that of the ST PCs—the time series of the projection of the fields onto the eigenvectors of the lag-covariance matrix. Another consequence of the use of many lags is the appearance of pairs of degenerate eigenelements corresponding to intermittently periodic oscillations in the analyzed field behavior. Even in the absence of such oscillations, ST PCs are generally spectrally narrow-banded. These ST PC properties call for prediction applications.

The particularity of the methodology of Vautard et al. (1996) is the use of a two-step procedure. First, the ST PCs are extrapolated in time up to the target prediction lead time. Their extrapolation is generally skillful precisely because of their narrowbanded character. Second, a synchronous downscaling procedure, from the extrapolated ST PCs is applied to predict the predictand field. This method does not benefit, in principle, from the statistically optimal linkages between predictors and predictands sought by CCA. Nevertheless, as we shall demonstrate, it turns out to be more skillful for the present application. We present tentative arguments for interpreting this result. Finally, another advantage of the MSSA approach is that it can easily be extended to a probabilistic prediction framework. These probabilistic forecasts of seasonal weather variables are also tested here, but the emphasis is on the comparison between CCA and our model. For technical details about the methodology, the reader is referred to Vautard et al. (1996, sections 2 and 3 and the schematic diagram in Fig. 5).

Section 2 is devoted to a presentation of data and methodologies. In section 3, the skills of the proposed models are presented and compared to the skill of the CCA model. In section 4 we discuss a particular case, the El Niño winter 1982/83. Section 5 contains a summary and a brief discussion.

## 2. Data and methodologies

### a. Predictands

Along this article, our objective is to predict the seasonal averages of SATs over North America. In order to build statistical models and to validate predictions, we use station data provided to us by the National Centers for Environmental Prediction. These are daily averages of the 2-m temperatures taken at 164 stations over North America, including 105 U.S. stations (Alaska is excluded) and 59 Canadian stations, reasonably equally spaced. Daily SATs are then averaged over periods of 3 months at each station. No data prior to January 1952 and posterior to December 1993 were used, leaving us with a 40-yr validation period for our forecasts (two years are taken at the beginning of the dataset but not used for verification). Stations for which more than 10% of data are missing are removed from the data basis beforehand.

### b. Predictors

According to the results of Barnston (1994) and Shabbar and Barnston (1996), the SST field is the major predictor of North American SATs. Since our goal here is mostly methodological, we focus on this predictor only. The SST monthly mean fields we use are derived from the Comprehensive Ocean–Atmosphere Data Set (COADS) archive, by averaging the available observations in 10° × 10° boxes as was done in Barnston (1994). A quality control test leads us to retaining 282 cells over the World Ocean. Most of the SST cells beyond 40°S are missing, while the Northern Hemisphere has much more available information. In the early 1950s, many equatorial Pacific cells were missing, but data from other areas are present. Here we simply omit missing values, which is not a major problem since SST fields are transformed into principal components (PCs) before making any prediction. The projection onto PCs when grid cells are missing is done simply by calculating the dot product of existing cells with EOFs, and dividing by the sum of squares of the EOF’s corresponding cells.

### c. Cross validation

The validation of empirical models is carried out for the period 1954–1993, that is, 40 full years. In order to remove spurious skill inflation by the use of data simultaneously for model training and verification, we use a cross-validation procedure that consists of removing successive parts of the dataset for model building while holding the removed data for verification. The dataset is split into eight 5-yr-long verification periods. The first verification period is 1954–58 and the last is 1989–93. No data processing is performed using verification data. A slightly different cross-validation procedure is used for validating the CCA model (see below).

### d. The CCA model

The canonical correlation analysis method used in the present paper is a reproduction of the work of Barnston (1994), to which the reader is referred for more details. The same data preprocessing, for both predictors and predictands, is performed, including

detrending—a linear, season-dependent trend is removed,

standardization—the detrended anomalies are divided by their standard deviation, and

orthogonalization—a PCA is performed.

The same validation procedure is used: by removing successively a given season from the training data, the season withheld serving only for verification purposes. The scores are correlation coefficients between predicted detrended anomalies and verified detrended anomalies at each data station. Whenever verification data is missing (which occurs about 5% of the time), the data is simply omitted in the calculation of the correlation coefficient. Global skill is estimated simply by averaging spatially individual correlations. Barnston (1994) tried various sets and combinations of sets of predictors. Here, we use only SSTs.

In order to ensure that the algorithm used here is the same as the one used by Barnston (1994), we apply it to the prediction of the seasonal means SATs over the United States using only SST as predictor with a lead time *τ* = 1 month, lead time being defined as in Barnston (1994), that is, the time interval between the latest known data and the first day of the season to be forecast. Instead of using 59 stations as in Barnston, we use the available 105 stations from our dataset. The validation period (35 yr from 1956 to 1991) is the same. The result of this experiment is shown in Fig. 1, which also includes the result reproduced from the equivalent experiment of Barnston (1994, his Fig. 5a). Note that in Barnston (1994), stations with negative score were weighted by the ratio of the amplitude of the predictions to that of the observations in order to compensate a systematic negative bias of cross-validated samples (Barnston and Van den Dool 1993). We show in Fig. 1 the results obtained with and without applying this weighting procedure. We reproduce Barnston’s results with essentially positive scores when weighted, while substantial negative correlations are found when unweighted (see the skill score for December in particular). In the rest of this article, the weighting procedure will not be performed since we do not know a priori whether negative skill is due to biased estimates or simply to nonsignificant values.

### e. One-step models

Here we intend to compare several empirical models derived from Vautard et al. (1996) and the CCA model. The first class of models used is based on a one-step procedure, which, like CCA, takes filtered and compressed information from the predictor fields and directly predicts the seasonally averaged SATs on the basis of relationship between predictors and predictands. These models are called the *one-step* models. We examine here three different models. In each case, the prefiltering of predictors is the same. The models only differ by the formulation of predictor–predictand relationship.

#### 1) Data manipulation and filtering

*S*(

*m, n, x*) the SST monthly mean value at gridpoint

*x,*year

*n,*and calendar month

*m.*For each

*m*and

*x,*we fit a linear time trend,

*S*′ is the trend residual. Further processing is performed with this residual. Note that, using this procedure, both trend and seasonal cycle removals are done at the same time. This process is included in the cross-validation design, that is, the verification years are withheld in the calculation of trends and climatology. The same procedure is applied to predictand fields.

Next, in order to compress information, detrended anomalies of monthly SST are submitted to a principal component analysis (PCA), and the first 10 principal components are retained. Unlike in Barnston (1994), PCA is applied to all SST anomalies put together. Thus, there is only one set of principal components and associated EOFs for all seasons. Barnston (1994) used instead a seasonally stratified extended EOF analysis of SSTs. The advantage of our procedure is the larger number of data used in the covariance matrix calculation, resulting in more statistically significant EOFs. The price to pay is that we do not distinguish the SST anomaly patterns that are specific to a given season.

The final predictor filtering stage consists in applying multichannel singular spectrum analysis (Plaut and Vautard 1994) to the principal components of monthly mean values in order to obtain the space–time principal components; a lag-covariance matrix is calculated from the first 10 principal components (describing about 50% of the total SST variance) obtained from PCA and is diagonalized. The MSSA window taken here is *W* = 12 months, for comparison with the extended-EOF analysis used in the CCA. We checked, however, that results weakly depend on the window length. The eigenvectors of the lag-covariance matrix have both temporal and geographical dimensions, hence their name *space–time EOFs.* Their associated time series are called the *space–time principal components.* Finally, the first five ST-PCs are used for prediction. For technical details and properties of MSSA, the reader is referred to Plaut and Vautard (1994, section 2d), and for further details about its application to long-range prediction to Vautard et al. (1996).

The last two stages (PCA + MSSA) are somewhat equivalent to the extended-EOF analysis of Barnston (1994). The window lengths are the same, as well as the final number of predictors. The main differences are that (i) in MSSA covariance and lag-covariance matrices are calculated in a nonseasonal way; (ii) we use SST monthly means instead of seasonal means in the first place and (iii) the filtering is performed in two stages instead of one. All these operations are performed in cross-validation mode; that is, we obtain as many filters as the number of 5-yr verification periods (eight).

As in Barnston (1994), the leading ST PCs bear structures resembling the oceanic ENSO signal, which is known to be determinant for the prediction of the North American climate. More precisely, the leading three ST PCs capture this signal, but it is mixed with decadal to interdecadal variability.

#### 2) The regression model (1STEP REG)

This model is the simplest one: A linear regression between the detrended predictand anomaly field and filtered predictors (the leading five ST PCs) is calculated. One obtains therefore a set of five regression coefficients for each station. This model is the closest to the CCA model. Let us denote by **X**(*m, n*) the vector containing as elements the five leading ST PCs at the time when the prediction is carried out; here *m* denotes month and *n* denotes year. Due to the MSSA filter, **X**(*m, n*) actually contains information from all months of the previous year as for extended EOF analysis. Let *m*′ and *n*′ (*n*′ ≥ *n*) denote the calendar month and year of the *end* of the season to be predicted. In order to train the regression coefficients, we use only predictands from particular seasons. Ideally, one should retain only training data in the same season as the season to be predicted, that is, using only months *m* (for predictors) and months *m*′ (for predictands) of training years. That is what is done in the CCA application, and this leads to less than 40 training data points per season to predict.

An opposite choice is to train the regression coefficients using all seasons together, which filters out completely the seasonal character of the coefficients. The best choice lies between these two possibilities. We found by trial and error that the summer season requires specific summer-trained regression coefficients, while other seasons are rather insensitive to seasonality in the training data. For the prediction of seasons ending in July, August, or September (AMJ, JJA, and JAS, or equivalently *m*′ = 7, 8, 9), regression coefficients are trained using predictand seasons *m*′ = 7, 8, and 9. The number of training data is therefore three times larger than the number of training data used for CCA. For the other seasons, (ending in October–June), training predictand seasons are taken as *m*′ = 10, 11, 12, 1, · · · , 6, providing nine times more training data than for CCA. A discussion about this choice can be found in section 3a.

#### 3) The composite model (1STEP COM)

**X**(

*m, n*), we select its set of nearest analogs

**X**"(

*m*",

*n*") within the training set and the prediction consists in averaging the corresponding training predictands, hence the name “composite model.” The analogs are chosen in a seasonal manner similar to the regression model. Such analog techniques have been used for instance by Bergen and Harnack (1982), Livezey and Barnston (1988), and Vautard et al. (1996). As noticed by the latter authors, optimal prediction skill is obtained when a sufficient number of analogs is taken, but the amplitude of the forecast is largely reduced. Here we follow their strategy by selecting half of the training predictors as “best predictors” and make a composite with the corresponding predictands. The similarity measure used to select best analogs

**X**

**X**

#### 4) The categorical model (1STEP CAT)

This model is the probabilistic version of the previous one. In order to produce probabilistic forecasts, the detrended SAT anomalies are classified into three equally probable categories, called the *terciles.* The separators of these categories are determined from training data only. The three categories are denoted *B* (below normal), *N* (near normal), and *A*(above normal). Tercile separators depend on the station and on the season. Figure 2 shows the geographical distribution of the tercile separator values, for the JFM winter season. The *N*/*A* separator map is positive everywhere but undergoes large amplitude fluctuations, indicating important spatial dependence in the variability of SATs. For instance, large variability of SATs is observed over central-western Canada, downstream of the Rocky Mountains. This variability is many times that observed over California. The *B*/*N* separator map is roughly symmetrical to the *N*/*A* map and similar conclusions hold.

The goal of probabilistic prediction is to estimate the probability that the seasonal mean falls within a given tercile. The same subset of “best analogs” as for the composite model is selected, but instead of averaging the corresponding training predictands we count, for each station, the frequency of training predictands falling into each category. These frequencies are the probabilistic forecasts. In order to validate the probabilistic forecasts, we make a deterministic choice by selecting the category that has the highest probability to occur and compare it to the observed category (see section 2g below for the definition of the skill measure).

### f. Two-step models

These types of models have been shown by Vautard et al. (1996) to perform better than the one-step approaches at long lead times for the prediction of monthly geopotential heights over the North Atlantic. They all rely on a prior prediction of the predictor itself, followed by a “specification” stage, where the predictand is finally forecast. The choice of a predictable predictor is therefore crucial. The leading components of SST anomalies, mostly dealing with the ENSO phenomenon (see also Jiang et al. 1995; Moron et al. 1998), bear such a predictable character.

The first step of all three models presented below is identical. The ST PCs are extrapolated by a linear autoregression. Let us denote by *t* the time when forecast is carried out. The leading five ST PCs at time *t* + *τ* are extrapolated using as regressors the leading 100 ST PCs at time *t.* Here *t* + *τ* denotes the time of the last month of the target season, that is, the lead time plus 3 months. For instance, if the JFM season is to be forecast on 31 December of the previous year, the extrapolation time of the ST PCs is *τ* = 3 months (0-lead time forecast). As explained by Vautard et al. (1996), the forecasts are fairly insensitive to the number of regressors in this extrapolation stage. We checked that using 50 or 20 regressors instead does not affect the final skill.

Using these extrapolated ST PCs, three models are designed to complete the predictand forecast, through the *specification* stage, in a manner similar to the one-step models.

#### 1) Regression model (2STEP REG)

As for the 1STEP REG model, regression coefficients are calculated, taking as regressors the leading five ST PCs simultaneous to the last month of the season to be predicted. Once these coefficients are calculated, the regression is fed by the *extrapolated* leading five ST PCs instead of observed ST PCs as in the one-step models. We use the same rules of seasonal selection of training data as for 1STEP REG, in order to calculate the regression coefficients.

Since both the extrapolation of the ST PCs and the specification are linear operations, one expects that the skill of the 1STEP REG and 2STEP REG do not differ greatly. This will be verified in the next section.

#### 2) Composite model (2STEP COM)

The regression used in the 2STEP-REG model is simply replaced by a composite average over predictands corresponding to best training analog predictors of the extrapolated ST PCs. The same rules for seasonality and analog selection as in the 1STEP-COM model are used. The only difference between this model and 1STEP COM is that the analogs of extrapolated ST PCs are taken instead of the analogs of the observed ST PCs.

#### 3) Categorical model (2STEP CAT)

This is the probabilistic and categorical version of the 2STEP-COM model. Instead of averaging the “best analog” predictands, we count for each station the frequency of occurrence of SATs within each tercile, providing a probabilistic forecast of the occurrence of the seasonal mean SAT. Then, the most probable tercile is taken as the predicted tercile. The same parameters as for the 1STEP-CAT model are chosen.

### g. Measures of skill and significance

Here we use two measures of skill. The first one is the classical correlation coefficient: It is used when the prediction to be carried out deals with continuous quantities, that is, for validating the 1STEP-REG, 1STEP-COM, 2STEP-REG, and 2STEP-COM models. At each station, the correlation between predicted SATs and verified SATs is calculated. In order to obtain a global skill score, the average over the 164 stations is calculated.

The significance of these correlations can be roughly estimated by calculating the 95% confidence interval of the correlation between two random uncorrelated Gaussian processes with the same number of data. For individual stations there are, for a given season, 40 independent forecasts. By randomly generating 1000 realizations of two random uncorrelated processes with 40 cases in each realization, calculating the correlation for each realization, and sorting these 1000 values of the correlation, one finds that a correlation is significantly different from zero, at the 95% level, when it is larger than about 0.26. The 95% significance level of global skill scores can be obtained in a similar manner, but one has to make an assumption about the number of “independent stations.” A conservative—five principal components of SATs describe about 75% of the variance—estimate of this number is 5. In this case, global skill scores are significant when they exceed about 0.11.

*A, B,*or

*N*) is compared with the observed one, and a contingency table

*T*

_{ij}is set up (

*i, j*= 1, · · · , 3), containing the number of occurrences of predicted tercile

*i*when tercile

*j*was observed. The contingency table is multiplied by a weighting matrix a

_{ij}(see, for instance, the coefficients in Vautard et al. 1996), which has several mathematical properties. The LEPS skill score is

*N*

_{i}denotes the number of observed occurrences of category

*i,*and quantifies the

*value*of the forecasts in terms of information theory (Ward and Folland 1991). LEPS scores lie between −1 and +1, +1 being obtained for perfect forecasts, and 0 occurring for random or constant forecasts, which makes the scoring system

*equitable*(Gandin and Murphy 1992).

## 3. Prediction of seasonal mean surface air temperatures

### a. Global skill scores at zero lead time

In this section, we focus on the prediction, one season ahead, of seasonal mean SATs. This is the “zero lead time” situation. For instance, the JFM season is forecast on the 31 December of the previous year. For the one-step models, the predictors are the leading ST PCs calculated from monthly SST values distributed all over the previous year. For the two-step models, the ST PCs are first extrapolated to the target month, and the specification stage consists in predicting the JFM SATs using the extrapolated ST PCs.

Figure 3a shows the global skill scores of the continuous forecast models, that is, 1STEP-REG, 1STEP-COM, 2-STEP-REG, and 2STEP-COM models. The skill score of the CCA model is also shown. Significant, albeit low, global skill is found for the prediction of the winter and spring seasons for all models, while almost no skill is found in the prediction of the fall season. The skill of the summer prediction depends on the model used. CCA skill is significant only for late winter and summer seasons, in accordance with the results of Barnston (1994) and Shabbar and Barnston (1996). All models have about the same skill for the JFM season. However, spring and early winter appear much better predicted by the one-step and two-step models than by the CCA model. Summer seasons (MJJ, JJA, and JAS) are better predicted by the linear regression models (one-step and two-step).

The first important point suggested by Fig. 3a is that, *for this lead time,* one-step models and two-step models exhibit comparable skills. This was also noticed by Vautard et al. (1996), who showed that the difference was only pronounced at longer lead times. The second important point lies in the skill difference between composite and regression models in summer, indicating that the number of analogs chosen for summer prediction is too small (summer prediction is made of summer training analogs only, see section 2e). Figure 3b shows the skill of the 1STEP-REG model if regression coefficients are trained using data all year long. In this case, fall, winter, and spring skill are close to the skill of the same model but using only fall–winter–spring data for training, but the skill difference is large in summer: If winter data are included to train summer prediction, skill decreases. The 1STEP-COM model does not improve either when including winter data for analog selection. Therefore, prediction of summer SATs requires specific seasonal constrains, indicating that summer climate dynamics is much different from climate dynamics during other seasons. This result motivated our choice of seasonal selection of training data.

The last important point is that the CCA model, which is linear in essence, is almost systematically beaten by the 1STEP linear regression model. CCA exhibits even strong negative scores (the December score is −0.20), which, according to Barnston and Van den Dool (1993), is due to a systematic negative bias of linear models scores when estimated in a cross-validation framework. If such were the case, the same conclusions would hold for our linear regression model, which shows positive scores all year long. We argue here that this difference is essentially due to the too small number of training data used in the seasonally stratified version of CCA used here. In our 1STEP-REG model, we use nine months together for training fall–winter–spring prediction, and three seasons for training summer prediction, while CCA uses here only one season (39 data points for each verification year) for training. Figure 3b also shows the skill obtained by the 1STEP-REG model, at zero lead time, using only training data of the same season, for instance, using only December ST PCs for training the regression coefficients of a zero lead time prediction of the JFM SATs. Except for the late fall season, the seasonal march of skill behaves almost exactly as for the CCA model. Negative skill is also found in late fall and early winter. Therefore, a simple improvement of the CCA method as used by Barnston (1994) could be to include, at least, adjacent seasons for training the prediction of a given season.

Finally, Fig. 3c shows the LEPS skill scores of the 1STEP-CAT and 2STEP-CAT models. These scores closely follow the seasonal variations of the correlation scores of the continuous composite models, with slightly lower values. The difference between LEPS and correlations is in agreement with the discussion of Potts et al. (1996), but does not mean that continuous forecasts are *better* than categorical forecasts. Like for the continuous models, there is no significant skill difference between the one-step and the two-step approaches at zero lead time. The 95% significance level of global LEPS scores is about 0.10. Only winter and spring seasons exhibit significant skill. LEPS scores of the categorical forecasts will not be further displayed; in fact, like in Fig. 3c, they always behave in a way quite similar to correlation scores.

### b. Global skill scores at longer lead times

We now examine the dependence of skill on lead time. As explained in section 3a, the 1STEP-COM and 2STEP-COM models suffer from a too low number of analogs for the prediction of the summer season. Therefore, we consider, from now on, only three continuous models: The CCA model, a 1STEP model consisting of a 1STEP-REG prediction for summer (the MJJ, JJA, and JAS seasons) and a 1STEP-COM prediction for the other seasons, and a 2STEP model, consisting of a 2STEP-REG prediction for summer and a 2STEP-COM prediction for the other seasons. By contrast, categorical models cannot be replaced by a linear regression for summer prediction.

Figures 4a–c show the skill scores of the CCA, 1STEP, and 2STEP models at lead times of 3, 6, and 9 months. First, it has to be noticed that most models still exhibit significant skill at long lead times for winter, spring, and late summer prediction. This weak dependence on lead time was already observed by Barnston (1994) and was attributed to the persistence of long-lived SST anomalies, such as El Niño. Another explanation comes from the fact that the leading ST PCs not only gather phenomena acting on the interannual time scale (like ENSO) but also decadal to interdecadal variability, hence the SST ST PCs used as predictors are very persistent.

The second conclusion that can be drawn from these figures is that CCA is still beaten almost systematically by the 2STEP model, marginally in some seasons and significantly in other seasons (as argued in section 2g, a difference can be considered to be significant if it is larger than about 0.11). The skill difference between the 2STEP and the 1STEP models becomes larger with larger lead times, which confirms the result of Vautard et al. (1996).

Finally, the LEPS scores (not shown) of the categorical models have a behavior similar to that of the correlation scores of the continuous models, which strengthens our conclusions, since they do not rely on a single measure of skill. Again, the 2STEP-CAT model beats systematically the 1STEP-CAT model. Table 1 recapitulates the model skill and their order as a function of season and lead times.

### c. Geographical distribution of skill

Figure 5 shows the geographical distribution of the correlation skill at lead time 0 of the CCA model for four seasons distributed like in Barnston (1994) in order to make comparison easier. As already noted by Barnston (1994), winter prediction (JFM) has a fairly significant skill, which is mostly spread over southeastern, Texas, the Mexican border, and the northern states of the United States. Skill over Canada is significant almost everywhere, with peak values near the Great Lakes, in accordance with the results of Shabbar and Barnston (1996). A large area with near-zero or negative skill sits over the U.S. Rocky Mountains, where winter SATs seem unrelated with SSTs. The percentage of stations where skill is statistically significant (above 0.26) is 56%.

Spring and fall predictions undergo large areas of negative scores, probably due to the combination of low skill with cross validation. Only a few scattered areas display significant skill values. Significant-skill station numbers are 15% and 5%, respectively, for these seasons. Summer prediction has significant skill over 31% of the stations, over the U.S. eastern coast and near the Mexican border. Only a few Canadian stations show significant skill, along the Pacific coast.

Figure 6 shows the geographical distribution of skill of the 2STEP-COM model (replaced by the 2STEP-REG model in summer). Winter prediction is as good as using the CCA model, 48% stations displaying significant skill. Significant-skill areas are about the same as for the CCA model, indicating that the two models essentially provide similar prediction for this season. Note however that the 2STEP model also exhibits a band of near-zero skill across United States, but with no large negative values as the CCA model. The same is true for other seasons: the 2STEP model skill never drops below values of about −0.2. Its skill is significant over 29% of the stations in spring, 38% in summer, and 23% in fall. For spring and fall seasons, the areas where skill is significant are British Columbia, the plains west of the Great Lakes and, for spring, near the Mexican border: Prediction of the summer season is significant over eastern Canada and the eastern United States, southern California, and again near the Mexican border.

Figure 7 shows the variation of skill distribution with lead time of the 2STEP-COM model for the JFM prediction. As lead time increases, the relatively high skill values found over Canada still remain, whereas skill over the southeastern United States vanishes. That skill locally, and slightly, increases with lead time is probably a statistical feature. Finally, LEPS score distributions of the STEP-CAT models (not shown) behave in a similar way as correlation score patterns.

## 4. A prediction case: Winter 1982/83

In order to illustrate the prediction models proposed, we now turn to an example, the forecast of the 1982/83 winter. Assume present time is 31 December 1982 and that we want to predict the JFM 1983 season. Winter 1982/83 is a particular winter since it coincides with the major 1982–83 El Niño event; however, it can be verified (not shown) that similar conclusions could be drawn from most other warm events consideration. According to most of the CCA diagnostics of Barnston (1994), prediction of this particular winter should be of good quality.

Using the cross-validation method described in section 2, the models are trained using data throughout the two periods: 1952 to 1977 and 1984 to 1993. No data outside these time intervals are used to train the model. We are given the predictor SST field up to the end of 1982. This SST field is submitted to trend removal, and the leading five ST PCs are calculated using the 1982 year of SST field and the ST EOFs calculated from training data. Then the ST PCs are extrapolated 3 months ahead using the linear autoregression coefficients calculated also from training data. Finally, analogs of the extrapolated ST PCs are sought and the best 50% of them are selected. For the 2STEP-COM model, their simultaneous predictands are simply averaged and the resulting composite prediction is calculated at each station.

Figure 8a shows the detrended SAT anomaly field of this particular winter. This field exhibits a large positive anomaly over the northern United States and southern Canada, while the southeastern United States and northern Canada witness a cold anomaly. This pattern is quite typical of El Niño winters (Hoerling et al. 1997). The 2STEP-COM forecast (Fig. 8b) predicts warming of Canada and the northern United States and cooling of the southeastern U.S. Note however the very small amplitude of the forecast due to the relatively large span of analogs taken for the forecast (about 150). Nevertheless, except over northern areas, the anomaly pattern is fairly well reproduced. The CCA prediction (Fig. 8c) shares most features of the 2STEP-COM forecast but with much higher amplitudes. The larger amplitudes of the CCA forecast in this particular case are probably due to the linear dependence of CCA forecasts over quite large SST anomalies, whereas the large number of best analogs for the ST PC model makes the present version of this model more reliable for pattern than for amplitude forecasting. With both CCA and the ST PC model, the most striking discrepancy occurs in northwestern Canada, north of 60°N, whereas elsewhere, predicted patterns are quite in agreement with observations.

The best analog predictands are also distributed into terciles, and by counting tercile frequencies, the 2STEP-CAT model provides tercile probabilities at each station. The probability field of falling into tercile *A* (above) is displayed in Fig. 9a. The peak probabilities occur within British Columbia with values near 0.5, while the lowest probabilities are found near the Gulf of Mexico. Symmetrically, Fig. 9b shows the distribution of probabilities of tercile *B* (below). Note that the probabilities are not far from their average values, (0.33), meaning that in probability space also, there is a bias toward low-amplitude forecasts. These biases can be corrected anyway a posteriori by artificially inflating the amplitude of the forecasts. Vautard et al. (1996) showed that, in fact, the amplitude of the probability forecast using two-step models is less biased than that of one-step models. Finally, Fig. 9c shows the geographical distribution of the observed terciles, which closely follows the anomaly contours (compare with Fig. 8a). Figure 9 shows that the probabilistic version of our two-step model performs reasonably well in forecasting this typical El Niño winter.

## 5. Summary and conclusions

The overall aim of this article was to propose several statistical models, based on space–time principal components (ST PCs), for the seasonal prediction of surface air temperatures (SATs) over North America. The models are validated using a cross-validation technique and are compared to the CCA prediction technique, one of the most used existing schemes. Apparently, the ST PC–based models have higher skill than the CCA model, as it has been used by several authors (Barnett and Preisendorfer 1987; Barnston 1994; Shabbar and Barnston 1996; He and Barnston 1996). Nonetheless, the information content in the predictor filtering is almost the same in all models, meaning that skill discrepancies are mostly methodological.

For the prediction of some seasons, the CCA model produces spurious negative correlations over large areas, while a simple linear regression model does not have this deficiency. The main problem with the CCA approach is the well-known overfitting problem (Kendall 1975; Cherry 1996). Pairs of most correlated predictor/predictand modes are sought for over a training period and are used for prediction. When the training sample is not long enough, the associated correlations between these pairs of modes is largely overestimated (Cherry 1996). Since this correlation is precisely the linear prediction coefficient linking predictand modes to their associated predictor mode, the variance of the forecast is overestimated, specifically along irrelevant directions. When this overfitting problem is combined to cross validation, this could lead to large spurious negative correlations.

One way to cure this deficiency could be to deflate the training CCA correlations. This deflation would be more and more important as the order of the mode is increased. In that way, the forecast variance would be smaller and truly correlated modes would be emphasized. We are not aware of any objective method to solve this problem. In the present CCA application, similar to that of Barnston (1994), only a few training data points are used since all filters depend on the season. Only 40 yr are available, leaving thus 39 training data points (one is witheld for verification). We believe that CCA skill could be improved simply by adding to training data adjacent seasons. We presented in section 3 some evidence of this fact.

Another difficulty with CCA is its lack of probabilistic formulation. The model proposed here does include a probabilistic formulation. By contrast, CCA comes with potentially interesting diagnostics such as the “loading patterns,” telling which structures are responsible for skill. Nevertheless, the orthogonality constraints piloting the definition of these patterns, caution has to be taken in drawing any physical conclusions from their geographical distribution, except perhaps for the leading one.

The model we proposed is based on a two-step approach. First, predictable components of the predictor field are extracted, the space–time principal components. The only fundamental difference between ST PCs and the extended-EOF coefficients of Barnston (1994) lies in the nonseasonal character of the ST PCs. These latter are calculated using multichannel singular spectrum analysis (MSSA), using all-year-long data to calculate the lag-covariance matrix. This has the advantage of increasing the number of robust components but it removes specific characters of a given season’s SST variability. We found here that for prediction of North American SATs the former aspects prevail. The ST PCs are then extrapolated to the target season to be predicted. A downscaling procedure is then applied to the extrapolated ST PCs in order to forecast SATs. The downscaling procedure can take the form of continuous forecasts or categorical forecasts. The best continuous forecast skill is obtained by a downscaling procedure based on a regression for summer and based on analogs for winter. The simplest way to produce probabilistic/categorical forecasts is to use analogs at downscaling stage.

Skill has been checked with both correlation scores (for continuous forecasts) and LEPS skill scores (for categorical forecasts); the conclusions were very similar, which makes them quite reliable. Global skill is found significant for winter, spring, and summer predictions while it is close to zero for fall predictions. Winter predictions have useful skill over the whole of Canada, the northern border of the United States, and skills reaching 0.5–0.6 are found over the southeastern parts of the United States specifically along the Gulf of Mexico. Spring and late fall (OND) predictions have significant skill along a band extending from British Columbia to the plains west of the Great Lakes, and, for spring, near the Mexican border. Summer predictions have significant skill over Ontario and Quebec, Canada; California near the Mexican border; and over the southeastern United States. Finally we examined the dependence of skill on the lead time of the forecast. It is interesting to notice that both global skill and skill patterns do not vary much, for a given target season, when the lead time extends up to 9 months. Canada appears to be a very favored country for winter prediction, skill remaining almost as significant for long lead times as for short ones.

Another advantage of the two-step method proposed here is that the ST PC extrapolation stage, carried out here using a simple linear regression, can easily be replaced by a GCM integration. This is the simplest form of a hybrid statistical–dynamical model, recently proposed by Sarda et al. (1996).

## Acknowledgments

We are thankful to H. Van den Dool, A. Barnston, and J. Hoopingarner (CPC, NCEP) for having provided us with the North America surface air temperature data. This work was initiated in 1994 during the visit at LMD of M. Jianwen Liu, who is now at the Beijing Institute of Applied Meteorology. The preparation of this work was performed on daily temperatures and runoff over the United States, a dataset which has been kindly provided to us by D. Lettenmaier.

## REFERENCES

Barnett, T. P., 1981: Statistical prediction of North American air temperatures from Pacific predictors.

*Mon. Wea. Rev.,***109,**1021–1041.——, and R. W. Preisendorfer, 1978: Multifield analog prediction of short-term climate fluctuations using a climate state vector.

*J. Atmos. Sci.,***35,**1771–1787.——, and ——, 1987: Origins and levels of monthly and seasonal forecast skill for United States surface air temperatures determined by canonical correlation analysis.

*Mon. Wea. Rev.,***115,**1825–1850.Barnston, A. G., 1992: Correspondence among the correlation, RMSE, and Heidke forecast verification measures; refinement of the Heidke score.

*Wea. Forecasting,***7,**699–709.——, 1994: Linear statistical short-term climate predictive skill in the Northern Hemisphere,

*J. Climate,***7,**1513–1564.——, and H. M. Van den Dool, 1993: A degeneracy in estimated skill in forecasts using regression-based cross-validation designs.

*J. Climate,***6,**963–977.Bergen, R. E., and R. P. Harnack, 1982: Long-range temperature prediction using a simple analog approach.

*Mon. Wea. Rev.,***110,**1083–1099.Bjerknes, J., 1969: Atmospheric teleconnections from the equatorial Pacific.

*Mon. Wea. Rev.,***97,**163–172.Box, G. E. P., and G. M. Jenkins, 1976: Time series analysis.

*Forecasting and Control,*Holden-Day.Brankovic, C., T. N. Palmer, F. Molteni, S. Tibaldi, and U. Cubasch, 1990: Extended-range predictions with ECMWF models: Time lagged ensemble forecasting.

*Quart. J. Roy. Meteor. Soc.,***116,**867–912.Brunet, G., 1994: Empirical normal mode analysis of atmospheric data.

*J. Atmos. Sci.,***51,**932–952.——, and R. Vautard, 1996: Empirical normal modes versus empirical orthogonal functions for statistical prediction.

*J. Atmos. Sci.,***53,**3468–3489.Cherry, S., 1996: Singular value decomposition analysis and canonical correlation analysis.

*J. Climate,***9,**2003–2009.Déqué, M., 1991: Removing the model systematic error in extended range forecasting.

*Ann. Geophys.,***6,**217–224.Gandin, L. S., and A. H. Murphy, 1992: Equitable skill scores for categorical forecasts.

*Mon. Wea. Rev.,***120,**361–370.He, Y., and A. G. Barnston, 1996: Long-lead forecasts of seasonal precipitation in the tropical Pacific islands using CCA.

*J. Climate,***9,**2020–2035.Hoerling, M. P., A. Kumar, and N. M. Zhong, 1997: El Niño, La Niña, and the nonlinearity of their teleconnections.

*J. Climate,***10,**1769–1786.Huang, J., H. M. Van den Dool, and A. G. Barnston, 1996: Long-lead seasonal temperature prediction using optimal climate normals.

*J. Climate,***9,**809–817.Jiang, N., J. D. Neelin, and M. Ghil, 1995: Quasi-quadrennial and quasi-biennial variability in the equatorial Pacific.

*Climate Dyn.,***12,**101–112.Kendall, M. G., 1975:

*Multivariate Analysis.*Griffen, 210 pp.Kumar, A., M. Hoerling, M. Ji, A. Leetma, and P. Sardeshmuk, 1996:Assessing a GCM’s suitability for making seasonal predictions.

*J. Climate,***9,**115–129.Livezey, R. E., and A. G. Barnston, 1988: An operational multifield analog/antianalog prediction system for United States seasonal temperatures. 1. System design and winter experiments.

*J. Geophys. Res.,***93**(D9), 10 953–10 974.Montroy, D. L., 1997: Linear relation of central and eastern North American precipitation to tropical sea surface temperature anomalies.

*J. Climate,***10,**541–558.Moron, V., R. Vautard, and M. Ghil, 1998: Trends, interdecadal and interannual oscillations in global sea-surface temperatures.

*Climate Dyn.,***14,**545–569.Palmer, T. N., 1988: Medium and extended range predictability and stability of the Pacific/North American mode.

*Quart. J. Roy. Meteor. Soc.,***114,**691–713.——, and D. L. Anderson, 1994: The prospects for seasonal forecasting—A review paper.

*Quart. J. Roy. Meteor. Soc.,***120,**755–793.Peng, S., L. A. Mysak, H. Ritchie, J. Derome, and B. Dugas, 1995:The differences between early and midwinter atmospheric responses to sea surface temperature anomalies in the northwest Atlantic.

*J. Climate,***8,**137–157.Plaut, G., and R. Vautard, 1994: Spells of low-frequency oscillations and weather regimes in the Northern Hemisphere.

*J. Atmos. Sci.,***51,**210–236.Potts, J. M., C. K. Folland, I. T. Jolliffe, and D. Sexton, 1996: Revised“LEPS” scores for assessing climate model simulations and long-range forecasts.

*J. Climate,***9,**34–53.Renwick, J. A., and J. M. Wallace, 1996: Relationships between North Pacific wintertime blocking, El Niño, and the PNA pattern.

*Mon. Wea. Rev.,***124,**2071–2076.Sarda, J., G. Plaut, C. Pires, and R. Vautard, 1996: Statistical and dynamical long-range atmospheric forecasts: Experimental comparison and hybridization.

*Tellus,***48A,**518–537.Shabbar, A., and A. G. Barnston, 1996: Skill of seasonal climate forecast in Canada using canonical correlation analysis.

*Mon. Wea. Rev.,***124,**2370–2385.——, and M. Khandekar, 1996: The impact of El Niño–Southern Oscillation on the temperature field over Canada.

*Atmos.–Ocean,***34,**401–416.Tracton, M. S., K. Mo, W. Chen, E. Kalnay, R. Kistler, and G. White, 1989: Dynamical Extended Range Forecasts (DERF) at the National Meteorological Center.

*Mon. Wea. Rev.,***117,**1604–1635.Van den Dool, H. M., 1994: Long-range weather forecasts through numerical and empirical methods.

*Dyn. Atmos. Oceans,***20,**247–270.Vautard, R., P. Yiou, and M. Ghil, 1992: Singular spectrum analysis:A toolkit for short, noisy chaotic signals.

*Physica D,***35,**395–424.——, C. Pires, and G. Plaut, 1996: Long-range atmospheric predictability using space–time principal components.

*Mon. Wea. Rev.,***124,**288–307.Walker, G. T., and W. Bliss, 1932: World weather V.

*Mem. Roy. Meteor. Soc.,***4,**53–84.Wallace, J. M., and D. S. Gutzler, 1981: Teleconnection in the geopotential height field during the Northern Hemisphere winter.

*Mon. Wea. Rev.,***109,**784–812.Ward, M. N., and C. K. Folland, 1991: Prediction of seasonal rainfall in the North Nordeste of Brazil using eigenvectors of sea-surface temperature.

*Int. J. Climatol.,***11,**711–743.Wilks, D. S., 1996: Statistical significance of long-range “optimal climate normal” temperature and precipitation forecasts.

*J. Climate,***9,**827–839.

Geographical distribution of SAT tercile separators for the JFM season, once the seasonal linear trend is removed. Values are in degrees Celsius. The stations are indicated by dots on the map (a) B/N separators, (b) N/A separators.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of SAT tercile separators for the JFM season, once the seasonal linear trend is removed. Values are in degrees Celsius. The stations are indicated by dots on the map (a) B/N separators, (b) N/A separators.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of SAT tercile separators for the JFM season, once the seasonal linear trend is removed. Values are in degrees Celsius. The stations are indicated by dots on the map (a) B/N separators, (b) N/A separators.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Seasonal march of the skill of the zero lead time prediction of SATs over North America, obtained from various models. (a) Heavy solid: 2STEP COM; heavy dashed: 1STEP COM; light solid:2STEP REG; light dashed: 1STEP REG; dotted: CCA. (b) Heavy solid: 1STEP REG using all seasons together to train the regression coefficients; light dashed: 1STEP REG using the same seasonal rules for training data as in (a); light solid: 1STEP REG using only one season for training regression coefficients, the same season as the predictand season; dotted: CCA. (c) Global LEPS scores of 1STEP CAT (light solid) and 2STEP CAT (heavy solid).

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Seasonal march of the skill of the zero lead time prediction of SATs over North America, obtained from various models. (a) Heavy solid: 2STEP COM; heavy dashed: 1STEP COM; light solid:2STEP REG; light dashed: 1STEP REG; dotted: CCA. (b) Heavy solid: 1STEP REG using all seasons together to train the regression coefficients; light dashed: 1STEP REG using the same seasonal rules for training data as in (a); light solid: 1STEP REG using only one season for training regression coefficients, the same season as the predictand season; dotted: CCA. (c) Global LEPS scores of 1STEP CAT (light solid) and 2STEP CAT (heavy solid).

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Seasonal march of the skill of the zero lead time prediction of SATs over North America, obtained from various models. (a) Heavy solid: 2STEP COM; heavy dashed: 1STEP COM; light solid:2STEP REG; light dashed: 1STEP REG; dotted: CCA. (b) Heavy solid: 1STEP REG using all seasons together to train the regression coefficients; light dashed: 1STEP REG using the same seasonal rules for training data as in (a); light solid: 1STEP REG using only one season for training regression coefficients, the same season as the predictand season; dotted: CCA. (c) Global LEPS scores of 1STEP CAT (light solid) and 2STEP CAT (heavy solid).

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Seasonal march of the 1STEP (light solid curves), the 2STEP (heavy solid curves), and the CCA model (dotted lines) skills with lead times of (a) 3 months, (b) 6 months, and (c) 9 months.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Seasonal march of the 1STEP (light solid curves), the 2STEP (heavy solid curves), and the CCA model (dotted lines) skills with lead times of (a) 3 months, (b) 6 months, and (c) 9 months.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Seasonal march of the 1STEP (light solid curves), the 2STEP (heavy solid curves), and the CCA model (dotted lines) skills with lead times of (a) 3 months, (b) 6 months, and (c) 9 months.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of correlation skill of the CCA model, for a zero lead time and for various predictand seasons. (a) Jan–Mar; (b): Apr–Jun; (c) Jul–Sep; and (d) Oct–Dec. Contour interval is 0.2. Areas where skill is significant at the 95% level (≥0.26) are shaded.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of correlation skill of the CCA model, for a zero lead time and for various predictand seasons. (a) Jan–Mar; (b): Apr–Jun; (c) Jul–Sep; and (d) Oct–Dec. Contour interval is 0.2. Areas where skill is significant at the 95% level (≥0.26) are shaded.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of correlation skill of the CCA model, for a zero lead time and for various predictand seasons. (a) Jan–Mar; (b): Apr–Jun; (c) Jul–Sep; and (d) Oct–Dec. Contour interval is 0.2. Areas where skill is significant at the 95% level (≥0.26) are shaded.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Same as Fig. 5, but for the skill of the 2STEP-COM model, replaced by the 2STEP-REG model for the summer season.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Same as Fig. 5, but for the skill of the 2STEP-COM model, replaced by the 2STEP-REG model for the summer season.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Same as Fig. 5, but for the skill of the 2STEP-COM model, replaced by the 2STEP-REG model for the summer season.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of correlation skill of the 2STEP-COM model for the prediction of the JFM season with lead times of 3 months (a), 6 months, (b) and 9 months (c).

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of correlation skill of the 2STEP-COM model for the prediction of the JFM season with lead times of 3 months (a), 6 months, (b) and 9 months (c).

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Geographical distribution of correlation skill of the 2STEP-COM model for the prediction of the JFM season with lead times of 3 months (a), 6 months, (b) and 9 months (c).

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

(a) Observed detrended anomaly field of the SAT during the El Niño winter DJF, 1983. Contour interval is 1°C. (b) 2STEP-COM prediction of the detrended anomaly SAT field, with lead time 0, for DJF 1983. Contour interval is 0.2°C. (c) CCA prediction of the same winter season. Contour interval is 0.2°C.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

(a) Observed detrended anomaly field of the SAT during the El Niño winter DJF, 1983. Contour interval is 1°C. (b) 2STEP-COM prediction of the detrended anomaly SAT field, with lead time 0, for DJF 1983. Contour interval is 0.2°C. (c) CCA prediction of the same winter season. Contour interval is 0.2°C.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

(a) Observed detrended anomaly field of the SAT during the El Niño winter DJF, 1983. Contour interval is 1°C. (b) 2STEP-COM prediction of the detrended anomaly SAT field, with lead time 0, for DJF 1983. Contour interval is 0.2°C. (c) CCA prediction of the same winter season. Contour interval is 0.2°C.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

(a) Geographical distribution of the probabilistic prediction, at lead time 0, of the A tercile of SATs for DJF 1983, using the 2STEP-CAT model. Contour interval is 0.05. (b) As in (a), but for the probabilistic prediction of the B tercile. (c) Observed terciles of the SAT seasonal mean for DJF 1983.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

(a) Geographical distribution of the probabilistic prediction, at lead time 0, of the A tercile of SATs for DJF 1983, using the 2STEP-CAT model. Contour interval is 0.05. (b) As in (a), but for the probabilistic prediction of the B tercile. (c) Observed terciles of the SAT seasonal mean for DJF 1983.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

(a) Geographical distribution of the probabilistic prediction, at lead time 0, of the A tercile of SATs for DJF 1983, using the 2STEP-CAT model. Contour interval is 0.05. (b) As in (a), but for the probabilistic prediction of the B tercile. (c) Observed terciles of the SAT seasonal mean for DJF 1983.

Citation: Journal of Climate 12, 2; 10.1175/1520-0442(1999)012<0380:SPONAS>2.0.CO;2

Classification of the performance of the different statistical models as a function of season and lead times. Short lead times mean zero lead time, that is, the forecast for the forthcoming season, and long lead times take as reference the forecast with a 6-month lead time. Summer means JJA, fall: SON, winter: DJF and spring: MAM. The left column indicates the most skillful model, and the performance decreases with column number. Boldface acronyms mean that the skill is significantly different from 0. The first four rows are for short lead times (SH) and the last four are for long lead times (LO). It should be noted that most of the differences between significant skills are not significant themselves.