## 1. Introduction

The predictability of El Niño–Southern Oscillation (ENSO) sea surface temperatures (SSTs) has received considerable research over the last two decades. During the 1997–98 strong El Niño and subsequent 1998 moderate La Niña, 15 dynamical and statistical ENSO seasonal forecast models were in real-time operation [see Barnston et al. (1999) and Landsea and Knaff (2000) for details and intercomparisons of model performance]. Most ENSO prediction models produce useful forecasts (i.e., a correlation skill of at least 0.5) at leads out to 6 months when skill is assessed over all seasons (Kirtman et al. 2002). However, the predictability of ENSO has a strong seasonal cycle: it is relatively easy to predict boreal winter and spring ENSO conditions from boreal summer but it is difficult to predict boreal summer ENSO conditions from boreal winter and spring. The decrease in forecast skill through the months of March– May is known as the spring predictability barrier. This phenomenon was reported first by Walker and Bliss (1932) who observed that the Southern Oscillation had least persistence across the March–May season. Subsequent studies have documented the ENSO spring predictability barrier in detail (see Torrence and Webster 1998 for a recent review).

Improved seasonal predictions of boreal summer ENSO conditions would bring sound socioeconomic benefits. August–September ENSO has a strong influence on Atlantic, U.S., and Caribbean hurricane activity (e.g., Gray 1984; Bove et al. 1998; Saunders et al. 2000), which peaks between August and October; Northwest Pacific typhoon activity (Chan 1985; Saunders et al. 2000), which peaks between July and October; and patterns of boreal summer tropical precipitation (e.g., Ropelewski and Halpert 1987; Dai and Wigley 2000). The ability to skillfully predict seasonal hurricane/typhoon activity and seasonal rainfall at longer range would benefit society, business, and government by reducing the risk and uncertainty associated with the year-to-year variability in the incidence of such climatic events and conditions.

The statistical ENSO–climatology and persistence (CLIPER) prediction model is arguably one of the more successful ENSO seasonal forecast models to date (Kerr 2000). ENSO–CLIPER was developed by Knaff and Landsea (1997) as a “no skill” control forecast for comparison with more sophisticated dynamical ENSO prediction models. It is a statistical model based entirely on the linear optimal combination of persistence, month-to-month trend of initial conditions, and climatology. The formulation of the ENSO–CLIPER model provides scope for modifying its structure. The sensitivity of this model's summer ENSO skill to changes in the model specification is assessed. The study then examines whether the skill of the standard ENSO–CLIPER model may be improved by combining—or “consolidating”— hindcasts made with different structural CLIPER variants. This procedure—called ensemble or consensus forecasting—has long been used in numerical weather prediction to improve forecast skill (Thompson 1977) but has been applied only recently to seasonal climate forecasting (Goddard et al. 2001). For a consensus forecast to achieve skill that is measurably higher than from its individual ensemble members, these members need to show statistical independence (e.g., Goerss 2000). Reports indicate that ENSO predictive skill may be improved by combining forecasts made with different predictive models (Unger et al. 1996; Kirtman et al. 2002; Mason and Mimmack 2002).

The paper is structured as follows. Section 2 reviews briefly the standard ENSO–CLIPER model, describes how prediction skill and uncertainty are calculated, and details the datasets employed. The results section (3) displays the summer ENSO prediction skill from the standard ENSO–CLIPER model and its temporal stability, and shows the sensitivity of this skill to three factors used in the model's formulation. The factors examined are the predictor significance level test, the teleconnected predictor averaging period, and the variance factor used during the optimal combination of predictors. A consolidated CLIPER model is presented comprising the ensemble of 18 model variants of the standard CLIPER model. The August–September ENSO skill of the consolidated model is presented for each ENSO index region (3.4, 3, 4, and 1+2) using deterministic and probabilistic skill measures applied to cross-validated hindcasts from 1952 to 2002 and to replicated real-time forecasts from 1900 to 1950 (deterministic skill only). Section 4 discusses these results and conclusions are drawn in section 5.

## 2. Methodology

### a. Standard ENSO–CLIPER model

A detailed description of the standard ENSO–CLIPER model methodology is provided by Knaff and Landsea (1997) and need not be repeated here. In summary, there are 14 potential predictors available to the model. These predictors are listed by ENSO index region and number in Table 1 and may be categorized as follows:

persistence of predictand SST anomaly (1-, 3-, and 5-month means); predictor numbers 1–3;

trend of predictand SST anomaly (1-, 3-, and 5-month means); predictor numbers 4–6;

initial condition of teleconnected predictors (3-month mean); predictor numbers 7, 9, 11, and 13; and

trend of teleconnected predictors (3-month mean); predictor numbers 8, 10, 12, and 14.

Each predictor that correlates with the predictand to the 5% significance level after correction for serial autocorrelation enters a predictor pool from which a leaps-and-bounds (L&B) algorithm (Furnival and Wilson 1974) estimates the optimal combination of *N* = 1, 2, … , 14 predictors. The L&B selection routine works by stepping forward using every possible combination of the predictors until the best multiple regression equations having 1, 2, … , 14 predictors are found. The final selected model is the one with the largest *N* that explains at least 2.5% more variance than the *N* − 1 predictor model. This is subject to the caveat that only one of the 1-, 3-, and 5-month mean predictors in each of the categories a and b may be selected. If a satisfactory predictor model can be found, multivariate linear regression is applied to produce the forecast; otherwise, a zero anomaly (i.e., climatology) forecast is recorded.

### b. Cross-validated hindcasts 1950–2002

We assess seasonal predictability in two ways: from cross-validated hindcasts for the period 1952–2002 and from replicated real-time forecasts for the independent prior period 1900–50 (section 2c). The standard ENSO– CLIPER model was derived using a fixed 43-yr training period, 1952–94. Numerical simulations (B. Lloyd-Hughes 2003, unpublished manuscript) indicate that at least 50 forecast–observation pairs are required for a realistic skill estimate. Previous studies of ENSO predictability (e.g., Mason and Mimmack 2002; Kirtman et al. 2002; Latif et al. 1998) have sought to ameliorate this problem by pooling predictions of different seasons at a given lead. However, this is always at the expense of statistical independence. A cross-validated approach (Wilks 1995) is adopted here to extend the validation period to 51 yr (1952–2002). At each step a new model is formulated trained on all data excluding a 5-yr block centered on the year of interest (i.e., year blocks of 1952–56, 1953–57, … , 1998–2002 are used). This block is tapered at the time series ends. Block elimination is employed to minimize potential skill inflation that might arise from the multiannual persistence of ENSO conditions. The choice of 5 yr follows from the frequency spectrum of the ENSO signal, which shows a dominant peak in periodicity at about 4 yr (Rasmusson and Carpenter 1982; Trenberth 1997).

Forecast lead time is defined according the convection of the World Meteorological Organization (WMO 2002) where a zero lead forecast is one that employs data up to the end of the month immediately prior to the forecast period starting; that is, predictions issued at the end of July for conditions in August–September are said to be issued at zero lead.

### c. Independent (replicated real time) forecasts 1900–50

Our replicated real-time forecast scheme uses an initial model training period from 1870 to 1899. The training period increases 1 yr at a time as each forecast is made. For example, the August–September ENSO forecast for 1901 is made by training the prediction model on data between 1870 and 1900, and so on. This updating exactly replicates the operation of a real-time forecast. Independent ENSO forecasts for each year between 1900 and 1950 are obtained. The year 1950 marks the limit of independent forecasts since data after this were used in the structural development of the original CLIPER model.

### d. Deterministic skill and uncertainty

August–September ENSO skill is assessed for the cross-validated hindcasts for 1952–2002 using deterministic and probabilistic skill measures, and for the replicated real-time forecasts from 1900 to 1950 using deterministic skill measures.

*x̂*

_{i}and

*x*

_{i}are, respectively, the hindcast and observed anomaly values for each of the

*n*= 51 yr. The climatologies used here are the 51-yr (1952–2002) average for the cross-validation period, and the 51-yr (1900–50) average for the replicated real-time forecast period.

Model skill is compared against ordinary persistence skill for the standard ENSO–CLIPER model and its temporal stability, and for the ENSO–CLIPER model formulated using different values of three sensitivity factors. Persistence is calculated over the same length interval as the predictand period (WMO 2002). For example, the ordinary persistence at a lead of 1 month for the August–September target predictand is calculated as the mean anomaly over the prior 2- month period May– June.

Confidence intervals are computed around the MSSS skill values using the bootstrap method (Efron and Gong 1983). This involves randomly selecting with replacement 51 yr (in this case) of actual data together with the associated predicted and climatological hindcasts. Upon calculating the MSSS skills and repeating many times, a distribution of skill values is obtained from which a 95% two-tailed confidence interval can be readily obtained. This confidence interval means there is a 95% probability that the skill computed over the 51-yr period will lie within this uncertainty window. The root-mean-square skill score (RMSSS) is also considered and is calculated in a way identical to Eq. (1) but with the insertion of the root-mean-square error in place of the MSE. RMSSS places less weight on the correct prediction of extremes and so provides a useful comparison to the MSSS.

Here, *s*_{x̂} and *s*_{x} are, respectively, the sample standard deviations of the hindcast and observed values; *r*_{x̂,x} is the product moment correlation of the hindcasts and observations; and *E*〈⋯〉 represents the expectation or mean value. Although Eq. (2) is not exact when block elimination is employed, the basic decomposition result will hold. The first three terms in the expansion relate to phase errors (through the correlation), amplitude errors (through the ratio of the hindcast to the observed variances), and the overall bias error. The contribution from each of these terms to the skill improvement afforded by the consolidated ENSO–CLIPER model is considered in section 4.

### e. Probabilistic skill

_{Fm}represents the cumulative probability of the forecast up to bin

*m,*CP

_{Om}is the cumulative “observed” probability up to bin

*m,*and there are 15 equisized bins of ENSO sea surface temperature anomaly. In addition, CP

_{Om}is a step function from zero to one at the bin in which the actual observed value falls. For a perfect forecast, RPS = 0. The RPS is referenced to climatology to give the RPSS, which, for

*n*forecasts, is defined as

_{f}is the RPS of the forecast and RPS

_{cl}is the RPS of the climatology (i.e., zero anomaly) forecast. The maximum RPSS is 1; a negative RPSS indicates skill worse than climatology. RPSS is regarded as a “harsh” seasonal forecast skill measure with values of 0.10 being considered respectable and values of 0.20 as very good (A. G. Barnston 2003, personal communication).

The RPSS is computed and compared for three different hindcast formulations for 1952–2002, two of which are probabilistic. The first probabilistic hindcast formulation (termed normal) fits a normal distribution to the cross-validated hindcast errors for 1952–2002. This normal distribution gives a prediction interval around the deterministic hindcast value thereby providing the cumulative probability distribution. The second probabilistic formulation (termed ensemble) bins individual ensemble members according to size to obtain directly the cumulative probability distribution. The third formulation (termed deterministic) employs the deterministic hindcast values and is included for reference. The climatology cumulative probability distribution is obtained in each case by populating bins with the observed values for 1952–2002.

### f. Data

The monthly ENSO indices and Southern Oscillation index (SOI) data employed in the cross-validated hindcasts for 1952–2002 are supplied by the U. S. Climate Prediction Center (CPC). The ENSO indices are obtained from a weekly 1° spatial resolution optimum interpolation SST analysis (Reynolds et al. 2002). Although the CPC data begin in 1950, our first cross-validated hindcast is for August–September 1952. The data in 1950 and 1951 are reserved to compute the 5-month trends in predictor categories a and b at the longest leads. The independent (replicated real time) ENSO forecasts for 1900–50 employ the Kaplan et al. (1998) reconstructed sea surface temperatures from 1870 to 1950 and historical SOI values from the Climatic Research Unit at the University of East Anglia for 1870– 1950 [compiled using the method given in Ropelewski and Jones (1987)].

## 3. Results

### a. Standard ENSO–CLIPER cross-validated hindcasts

The standard ENSO–CLIPER model cross-validated hindcast skills for predicting the August–September (henceforth AS) Niño-1+2, -3, -3.4, and -4 indices for 1952–2002 are shown in Fig. 1. These areas are: Niño-1+2 (0°–10°S, 80°–90°W), Niño-3 (5°N–5°S, 90°– 150°W), Niño-3.4 (5°N–5°S, 120°–170°W), and Niño-4 (5°N–5°S, 150°W–160°E). Skills are shown as a function of monthly lead out to 10 months (prior October). MSSS decays gradually for all indices from ∼90% at zero lead to ∼20% at a lead of 4 months. Skill attributable to persistence, while initially similar to that of the standard ENSO–CLIPER model, decays more rapidly and (with the exception of Niño-1+2) is always negative at 4 months lead. The standard CLIPER model provides the largest (∼20%) absolute improvement in MSSS over persistence at leads of 3 and 4 months. At leads of 5 months and greater the standard ENSO–CLIPER model skill is zero. This is a direct consequence of the model formulation since when no predictors are found (as tends to be the case at the longer leads) no hindcast is made, resulting in a zero MSSS. The same is not true for persistence, which is free to yield wildly inaccurate hindcasts. The slight improvement in persistence skill at the longest leads is noteworthy. This is an artifact of the MSSS decomposition, which as shown in Eq. (2), contains a term penalizing bias. Hindcast bias will be coupled to the annual cycle and is expected to be minimized at 12 months lead.

Confidence in the skill estimates for the standard ENSO–CLIPER model varies with lead. The 95% confidence interval grows from ∼10% absolute width at zero lead to 30%–60% width at leads of 3–6 months before settling back to ∼20% width at longer leads. Thus there is confidence of high skill at short lead and of no skill at long lead. Overall, AS Niño-4 is the best-predicted index with model hindcast MSSS skill positive to 97.5% confidence at leads out to 4 months or early April and better than persistence at all leads. These findings concur with Barnston and Ropelewski (1992) who reported an increase in ENSO forecast skill from east to west across the Pacific Ocean.

### b. Temporal stability

Analyses were performed on the subperiods 1952–75 and 1976–2002 to assess the temporal stability of the standard ENSO–CLIPER model AS hindcast skill. These results are displayed by ENSO region in Fig. 2 with the early period in the left-hand column and the later period on the right. The results for the AS Niño-3.4, -3, and -4 indices appear stable for both CLIPER and persistence. The variation of skill with lead is similar for both time periods and the skill traces for each period generally fit within the other period's 95% confidence intervals. That said, the hindcast skill for the AS Niño-3 index is higher in the first (1952–75) split while the hindcast skill for the AS Niño-4 index is higher in the second (1976–2002) split. This shift toward higher (lower) AS ENSO skill in the west (east) in recent times is reflected most by the Niño-1+2 index. The latter shows a 60% reduction in absolute skill and a 40% reduction in persistence at leads of 3–5 months between 1952–75 and 1976–2002.

Kirtman and Schopf (1998) found ENSO skill to be higher in periods where the predictand variance is greatest. Standard deviations of the AS Niño-1+2 index for the first and second splits are 1.0° and 1.2°C, respectively. Thus, a change in variance cannot explain the change in skill. Examination of the hindcast time series (not shown) reveals that the reduction in the Niño-1+2 skill may arise from the poor prediction of the 1997 El Niño event and from an errant prediction of positive conditions for the summer of 1992 when in reality neutral conditions prevailed. With these years eliminated, the skills in the second split show a closer resemblance to those in the first. A further plausible explanation for the drop in Niño-1+2 skill during the period 1976–2002 relative to 1952–75 is that in the earlier period El Niño tended to start from Peruvian waters and spread westward. In the more recent period it has tended to start from the central equatorial Pacific and spread eastward. This delay in reaching the South American coast could mean that the Niño-1+2 SST anomalies were less well developed in August–September in the more recent period and thus harder to predict.

The temporal splits in Fig. 2 show that the 95% skill confidence intervals for the Niño-3 and Niño-1+2 indices are far wider in the second split than the first. Wang et al. (2002) found greater sensitivity in skill for splits of Niño-3 than Niño-4. This was attributed to the increase in SST variance as the equatorial Pacific is traversed from west to east. A similar explanation combined with the poor prediction of the 1997 El Niño may account for the wider confidence intervals in the later split. However, caution must be applied in interpreting skill estimates based on a sample of just 25 yr.

### c. Sensitivity to significance level

The sensitivity of the standard CLIPER model to the 5% significance level used to screen potential predictors was assessed in terms of MSSS. Comparisons were made between models screened at significance levels of 1%, 5%, and 10% (all other restrictions being left unchanged). Results for each ENSO region are shown in Fig. 3. For completeness each panel includes the standard persistence skill from Fig. 1 and the MSSS from a “consensus” model defined as the skill from the average of the hindcasts made with the three individual significance levels. It is clear that the predictor screening significance level has little effect upon the 1951–2002 model performance, changing it at best by ∼10%. This result might be expected since poor predictors will be rejected at the subsequent leaps-and-bounds (L&B) predictor optimization stage. The main advantage of predictor screening is to increase computation efficiency. Each reduction in the number of potential predictors passed to the L&B algorithm yields a saving of at least six floating point operations (Furnival and Wilson 1974). Figure 3 also shows that, in general, the consensus model outperforms the individual significance level models.

### d. Sensitivity to percentage of variance explained improvement factor

Changes in the MSSS for 1952–2002 resulting from variation of the percentage of variance explained (PVE) improvement factor passed to the L&B algorithm in the standard CLIPER model were investigated for PVE factors of 1%, 2.5%, and 5%. These are shown in Fig. 4. Once again, the remaining restrictions were left unchanged. With the exception of the Niño-3.4 index at leads of 2–4 months where MSSS differences of 20% are seen, the model skill is found to be insensitive to the PVE improvement factor. Higher values of the improvement factor were also investigated. In general these resulted in a single predictor model since a further predictor could not be found to provide the required leap in PVE.

### e. Sensitivity to averaging period

The final CLIPER sensitivity restriction investigated was the averaging period for the teleconnected ENSO initial condition and trend predictors (predictor categories c and d in section 2a). Figure 5 shows skill plots for each region constructed using models built separately using 1-, 3-, and 6-month averages of the teleconnected predictors. Again other sensitivity factors were left unchanged. The results display a similar pattern to Fig. 4 with sensitivity limited to Niño-3.4 at leads of 2–3 months where MSSS differences approaching 30% are seen. As with Fig. 3, the consensus model generally outperforms the models built with an individual averaging period.

### f. A consolidated model

In the absence of any clear physical justification for the level of predictor screening, L&B improvement factor, or teleconnected predictor averaging period, it seems reasonable to consolidate the hindcasts from each model into a single aggregate hindcast. A “consolidated” ENSO–CLIPER model is defined as the mean of 18 ensemble model hindcasts formulated with PVE improvement factors of 1%, 2.5%, and 5% and averaging periods of 1–6 months and no predictor screening.

The consolidated CLIPER model 51-yr cross-validated skill for the prediction of AS ENSO for all ENSO regions is displayed in Fig. 6. Skills from the standard ENSO–CLIPER model are included for comparison (filled circles). For all regions and at all leads it is clear that the consolidated model outperforms (or at worst matches) the MSSS skill of the standard CLIPER model. The skill difference between the two models is quantified in Table 2 and discussed below. Confidence intervals for the estimation of MSSS are similar overall for both models but narrower for the consolidated model at leads of 0–4 months. The consolidated model MSSS skill is positive to 97.5% confidence at leads out to 4 months or early April for all ENSO indices (for Niño-4 and Niño-1+2 it is to leads of 5 months or early March); in comparison, the standard CLIPER MSSS skill is positive to 97.5% confidence at leads out to only 1 month for Niño-3.4 and 2 months for Niño-1+2. The consolidated model shows similar temporal stability (not shown) to that seen for the standard CLIPER model but with correspondingly higher skills.

Absolute percentage improvements in MSSS and RMSSS of the consolidated model over persistence and of the consolidated model over the standard model are presented in Table 2. The consolidated model outperforms persistence at all leads. Hindcasts from the consolidated and standard models are nearly identical at 0- and 1-month leads since all formulations tend to favor simple persistence of the predictand. Similarly, at very long leads when predictors become scarce, all formulations tend to a zero hindcast. It is at leads from 2 to 6 months where the consolidated CLIPER model offers the greatest improvement over the standard CLIPER model for predicting August–September ENSO. Assessed over the 51-yr period 1952–2002 the consolidated model provides a 10%–20% absolute improvement in MSSS at all leads from 2 to 6 months for all the main ENSO index regions: 3.4, 3, and 4; for the 1+2 index region the improvement is ∼5%. The largest 51-yr improvement in MSSS is 31% for the AS Niño-3.4 region at 2 months lead. Table 2 also shows that the skill values for improvements in root-mean-square error are smaller than for MSSS. This indicates that a proportion of the consolidated model skill comes from the successful prediction of ENSO extremes.

To aid the further comparison of the consolidated and standard CLIPER cross-validated model skill we include (Table 3) the reduction in root-mean-square error (rmse) and mean absolute error (MAE) afforded by the consolidated model over the standard model for each ENSO index and lead. Values for the standard deviation (std dev), rmse_{cl}, and MAE_{cl} of each August–September ENSO index are also included in Table 3 to help in the evaluation of these data. Table 3 shows for the Niño-3.4, -3, and -4 index regions at leads between 2 and 6 months that the consolidated model gives a mean improvement of 0.06°–0.08°C in rmse and of 0.05°–0.06°C in MAE over the standard model. These improvements may be slightly less than the natural uncertainty associated with the measurement of AS SST in the ENSO regions but the consistency in sign in Table 3 (which would not be expected if the improvements were due to chance) shows that the consolidated model provides a real benefit.

### g. Probabilistic skill 1952–2002

The consolidated and standard CLIPER models are compared in terms of their rank probability skill in Table 4. The table shows the RPSS for 1952–2002 from the three different hindcast formulations termed deterministic, normal, and ensemble, as described in section 2e. The deterministic formulation is applied to the two CLIPER models and to persistence, the normal probabilistic formulation is applied to the two CLIPER models, and the ensemble probabilistic formulation is applied perforce only to the consolidated CLIPER model. Table 3 shows, as expected, that the consolidated model RPSSs are generally higher than those from the standard model. The consolidated CLIPER normal model outperforms the consolidated ensemble model, which in turn outperforms the deterministic model. The consolidated normal scores model provides positive RPSS at all leads out to 6 months for all ENSO indices in agreement with the MSSS deterministic results, which also showed skill to 6 months lead. Taking RPSS values of 0.10 as being respectable (A. G. Barnston 2003, personal communication), the consolidated CLIPER model is seen to provide respectable probabilistic predictive skill for all AS ENSO indices at leads out to 4 or 5 months.

The improvement in RPSS of the consolidated model over the standard CLIPER follows directly from the former's better deterministic skill and narrower error distribution. It is interesting to note that the consolidated ensemble scores are higher than the deterministic ones. This implies that additional information may be contained within the ensemble hindcasts and that simply averaging these together may not yield the best hindcast.

### h. Independent forecasts for 1900–50

The replicated real-time forecast skill for 1900–50 of the consolidated ENSO–CLIPER model for predicting the August–September Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 6 months is compared against persistence in Fig. 7 and against the standard ENSO–CLIPER model in Fig. 8. The skill measure used is MSSS. For all regions and at all leads it is clear that the consolidated model outperforms persistence and outperforms (or at worst matches) the MSSS skill of the standard CLIPER model. These skill differences are quantified in Table 5 and discussed below. Confidence intervals for the estimation of MSSS are narrower for the consolidated model at all leads and for all ENSO indices; the only exception being Niño-1+2 for the comparison between the consolidated and standard CLIPER models. The consolidated model MSSS skill is positive to 97.5% confidence at leads out to 4 months or early April for AS Niño-4 and out to leads of 2 months or early June for the other AS ENSO indices.

Absolute percentage improvements in MSSS and RMSSS of the consolidated model over persistence and of the consolidated model over the standard model are presented in Table 5 for independent (replicated real time) forecasts for 1900–50. As with the cross-validated hindcasts for 1952–2002 (Table 2) the consolidated model outperforms persistence at all leads and outperforms the standard CLIPER model at leads from 1 to 6 months for the main ENSO index regions: 3.4, 3 and 4. Assessed over the 51-yr period 1900–50 the consolidated model provides a 5%–10% absolute improvement in MSSS at all leads from 1 to 5 months for the Niño-3.4, -3, and -4 regions; for the Niño-1+2 index region there is little improvement. The largest improvement in MSSS is 15% for the AS Niño-4 region at leads of 2 and 4 months. The skill values for improvements in RMSSS are smaller than for MSSS.

## 4. Discussion

Figures 4 and 5 show that the standard ENSO–CLIPER predictions of Niño-3.4 at leads of 2–3 months are sensitive to both the L&B improvement factor and to the intrinsic averaging procedure imposed upon predictor categories c and d. Figure 9 displays histograms of the number of times that each of the 14 predictors are used in predicting Niño-3.4 for 1952–2002 at a lead of 3 months for averaging periods of 1–6 months. There is considerable variation in the model formulation as the averaging period is changed. As the latter increases, there is a shift from models reliant upon predictors 6 and 7 to those using predictors 3, 4, and 5. Reference to Table 1 reveals that the dominant predictors under 1-month averaging are the 5-month trend in Niño-3.4 and the persisted 3-month value of Niño-1+2. When the averaging period of the teleconnected SSTs is extended to 6 months, these are rejected in favor of shorter-period trends and initial conditions of the predictand itself. It appears that teleconnected SSTs (predictors 7–14) only become useful when they are computed for a period similar to that of the predictand itself. It is notable that predictors 11–14 are never selected in any model formulation. This is a likely result of the intercorrelation between the predictors and the order in which they are presented to the L&B algorithm. In the situation where the predictor pool is intercorrelated the likelihood of each successive predictor explaining additional variance will decrease with each additional predictor.

The consolidated model is seen to outperform the standard ENSO–CLIPER model for all the indices studied. The greatest improvements are found at leads of 2– 6 months, which are precisely the leads at which model instability is identified. Averaging the separate models has the effect of reinforcing the consensus of the individual members. Thus, when the models are in agreement, a sharp hindcast is issued. Conversely, if there is no consensus, the individual predictions will tend to cancel each other out and the hindcast value will tend to zero.

Decomposition of the MSSS into temporal, amplitude, and bias errors allows an assessment of how each error term contributes to the skill improvement. Plots of correlation (not shown) follow the same pattern as was found for MSSS (see Fig. 6). The consolidated model yields higher and less volatile correlations with the largest improvements seen for Niño-4. The effect of consolidation on the amplitude ratio is neutral. The amplitude ratios for both models are always less than one; that is they underpredict the observed variance in SST. This is apparent particularly at long leads where the hindcasts tend to the climatological value. Bias errors are negligible for both models and are always less than 0.1°C. Thus the skill improvement afforded by the consolidated model must arise from a reduction in the temporal error, that is, through improved prediction of the timing of events.

*x̂*

*E*

*x*

*x̂*

*β*

_{0}

*β*

_{1}

*x̂,*

*β*

_{0}and

*β*

_{1}are, respectively, the bias in the mean and variance of the hindcasts. Following the cross-validation procedure, the consolidated hindcasts were recalibrated using parameters estimated from data excluding a 5-yr block about the target year. The revised MSSS values show little improvement over the raw hindcasts. Since the recalibration amounts to a linear transformation of the hindcast values, it cannot change the product moment correlation between the hindcast– observation pairs,

*r*

_{x̂,x}. Further as noted above, the hindcast bias is negligible. Thus, the only scope for improvement in MSSS arises from adjustment of the hindcast variance. Given the minimal improvement in MSSS post-recalibration, it is concluded that there is no significant bias in the consolidated hindcast variance, and thus the remaining unexplained variance must be attributable to factors outside of the model and/or to nonlinear interactions.

Neither the standard nor the consolidated ENSO– CLIPER model is found to be skillful prior to March (lead of 5 months), this corresponding to the onset of the “spring predictability barrier” (Torrence and Webster 1998). The likely failing of the models results from their heavy reliance (by design) on persistence, which often breaks down during this time of the year. The inclusion of long-term trends is insufficient to predict phase changes from winter into summer.

Optimization of the consolidated CLIPER model may lead to further skill improvements. The model presented here (defined as the mean of an ensemble of 18 models built using six teleconnected predictor averaging periods and three PVE improvement factors) was selected from the visual inspection of Figs. 3–5 and for computational expediency. Improved hindcast skill may be obtained from an optimized multiensemble consolidated ENSO– CLIPER model, which includes the capability to select ensemble models built 1) using predictors in categories a and b computed over non-1-, -3-, and -5-month means; 2) using different predictor significance level screening factors; and 3) using more than 18 ensembles. Additional skill may also be obtainable through the deployment of phase dependent models. Previous studies (e.g., Mason and Mimmack 2002) have found that ENSO is more predictable when in its positive phase.

## 5. Conclusions

A “consolidated” ENSO–CLIPER seasonal prediction model has been presented to address the issue of improving summer ENSO predictive skill due to the spring predictability barrier between March and May. Consolidated CLIPER comprises the ensemble of 18 model variants of the statistical ENSO–CLIPER prediction model. Assessing August–September ENSO skill using deterministic and probabilistic skill measures applied to cross-validated hindcasts for 1952–2002 and deterministic skill measures applied to replicated real-time forecasts for 1900–1950 shows that the consolidated CLIPER model consistently outperforms the standard CLIPER model at all leads from 2 to 6 months for all the main ENSO indices (3, 3.4, and 4). The new model provides up to a 30% (15%) reduction in mean-square error for 1952–2002 (1900–50). However, it must be noted that the formulation of the consolidation remains arbitrary, representing a small subset of all the possible CLIPER formulations and thus may be far from optimal. Decomposition of the MSSS into correlation, variance ratio, and bias shows that the consolidated model also provides superior predictions of the timing and amplitude of ENSO events compared to the standard CLIPER model.

This investigation has focused on the predictability of summer ENSO conditions. Ongoing research will extend the consolidated ENSO–CLIPER results to other seasons and will compare hindcast skill performance and model versatility (i.e., range of predictand periods, range of forecast lead times, and speed of forecast– hindcast computation) to that achieved by leading dynamical ENSO prediction models.

## Acknowledgments

Benjamin Lloyd-Hughes is and Paul Rockett was supported by the Tropical Storm Risk (TSR) venture. We thank three referees for helpful comments. We kindly acknowledge the U.S. Climate Prediction Center for providing data.

## REFERENCES

Barnston, A. G., and Ropelewski C. F. , 1992: Prediction of ENSO episodes using canonical correlation analysis.

,*J. Climate***5****,**1316–1345.Barnston, A. G., Glantz M. H. , and He Y. , 1999: Predictive skill of statistical and dynamic climate models in forecasts of SST during the 1997–98 El Niño episode and the 1998 La Niña onset.

,*Bull. Amer. Meteor. Soc***80****,**217–243.Bove, M. C., Elsner J. B. , Landsea C. W. , Niu X. , and O'Brien J. J. , 1998: Effect of El Niño on U.S. landfalling hurricanes revisited.

,*Bull. Amer. Meteor. Soc***79****,**2477–2482.Chan, J. C. L., 1985: Tropical cyclone activity in the northwest Pacific in relation to the El Niño/Southern Oscillation phenomenon.

,*Mon. Wea. Rev***113****,**599–606.Dai, A., and Wigley T. M. L. , 2000: Global patterns of ENSO-induced precipitation.

,*Geophys. Res. Lett***27****,**1283–1286.Déqué, M., 2003: Continuous variables.

*Forecast Verification—A Practitioner's Guide in Atmospheric Science,*I. T. Jolliffe and D. B. Stephenson, Eds., John Wiley, 97–119.Efron, B., and Gong G. , 1983: A leisurely look at the bootstrap, the jackknife and cross validation.

,*Amer. Stat***37****,**36–48.Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories.

,*J. Appl. Meteor***8****,**985–987.Furnival, G. M., and Wilson R. W. , 1974: Regression by leaps and bounds.

,*Technometrics***16****,**499–511.Goddard, L., Mason S. J. , Zebiak S. E. , Ropelewski C. F. , Basher R. , and Cane M. A. , 2001: Current approaches to seasonal-to-interannual climate predictions.

,*Int. J. Climatol***21****,**1111–1152.Goddard, L., Barnston A. G. , and Mason S. J. , 2003: Evaluation of the IRI's “net assessment” seasonal climate forecasts.

,*Bull. Amer. Meteor. Soc***84****,**1761–1781.Goerss, J. S., 2000: Tropical cyclone track forecasts using an ensemble of dynamical models.

,*Mon. Wea. Rev***128****,**1187–1193.Gray, W. M., 1984: Atlantic seasonal hurricane frequency. Part I: El Niño and 30 mb quasi-biennial oscillation influences.

,*Mon. Wea. Rev***112****,**1649–1668.Kaplan, A., Cane M. A. , Kushnir Y. , Clement A. , Blumenthal M. , and Rajagopalan B. , 1998: Analyses of global sea surface temperature 1856–1991.

,*J. Geophys. Res***103****,**18567–18589.Kerr, R. A., 2000: Second thoughts on skill of El Niño predictions.

,*Science***290****,**257–258.Kirtman, B. P., and Schopf P. S. , 1998: Decadal variability in ENSO predictability and prediction.

,*J. Climate***11****,**2804–2822.Kirtman, B. P., Shukla J. , Balmaseda M. , Graham N. , Penland C. , Xue Y. , and Zebiak S. , cited 2002: Current status of ENSO forecast skill. Report to the Climate Variability and Predictability (CLIVAR) Numerical Experimentation Group (NEG), CLIVAR Working Group on Seasonal to Interannual Prediction. [Available online at http://www.clivar.org/publications/wg_reports/wgsip/nino3/report.htm.].

Knaff, J. A., and Landsea C. W. , 1997: An El Niño–Southern Oscillation climatology and persistence (CLIPER) forecasting scheme.

,*Wea. Forecasting***12****,**633–652.Landsea, C. W., and Knaff J. A. , 2000: How much skill was there in forecasting the very strong 1997–98 El Niño?

,*Bull. Amer. Meteor. Soc***81****,**2107–2119.Latif, D., Anderson D. , Barnett T. , Cane M. , Kleeman R. , Leetmaa A. , O'Brien J. , Rosati A. , and Schneider E. , 1998: A review of the predictability and prediction of ENSO.

,*J. Geophys. Res***103****,**14375–14393.Mason, S. J., and Mimmack G. M. , 2002: Comparison of some statistical methods of probabilistic forecasting of ENSO.

,*J. Climate***15****,**8–29.Murphy, A. H., 1988: Skill scores based on the mean square error and their relationships to a correlation coefficient.

,*Mon. Wea. Rev***116****,**2417–2424.Rasmusson, E. M., and Carpenter T. H. , 1982: Variations in tropical sea surface temperature and surface wind fields associated with the Southern Oscillation/El Niño.

,*Mon. Wea. Rev***110****,**354–384.Reynolds, R. W., Rayner N. A. , Smith T. H. , Stokes D. C. , and Wang W. , 2002: An improved in situ and satellite SST analysis for climate.

,*J. Climate***15****,**1609–1625.Ropelewski, C. F., and Halpert M. S. , 1987: Global and regional scale precipitation patterns associated with the El Niño/Southern Oscillation.

,*Mon. Wea. Rev***115****,**1606–1626.Ropelewski, C. F., and Jones P. D. , 1987: An extension of the Tahiti–Darwin Southern Oscillation Index.

,*Mon. Wea. Rev***115****,**2161–2165.Saunders, M. A., Chandler R. E. , Merchant C. J. , and Roberts F. P. , 2000: Atlantic hurricanes and NW Pacific typhoons: ENSO spatial impacts on occurrence and landfall.

,*Geophys. Res. Lett***27****,**1147–1150.Thompson, P. D., 1977: How to improve accuracy by combining independent forecasts.

,*Mon. Wea. Rev***105****,**228–229.Torrence, C., and Webster P. J. , 1998: The annual cycle of persistence in the El Niño–Southern Oscillation.

,*Quart. J. Roy. Meteor. Soc***124****,**1985–2004.Trenberth, K. E., 1997: The definition of El Niño.

,*Bull. Amer. Meteor. Soc***78****,**2771–2778.Unger, D., Barnston A. , Van den Dool H. , and Kousky V. , 1996: Consolidated forecasts of tropical Pacific SST in Niño 3.4 using two dynamical models and two statistical models.

*Experimental Long-Lead Forecast Bulletin,*Vol. 5, No. 1, NOAA/NWS/Climate Prediction Center, 50–52.Walker, G. T., and Bliss E. W. , 1932: World weather V.

,*Mem. Roy. Meteor. Soc***4**(36) 53–84.Wang, G., Kleeman R. , Smith N. , and Tseitkin F. , 2002: The BMRC coupled general circulation model ENSO forecast system.

,*Mon. Wea. Rev***130****,**975–991.Wilks, D., 1995:

*Statistical Methods in the Atmospheric Sciences*. Academic Press, 467 pp.WMO, 2002: Standardised Verification System (SVS) for long-range forecasts (LRF). Attachment II-9 to the Manual on the GDPS (WMO 485), Vol. I, WMO, 21 pp.

As in Fig. 1 but for the subperiods (left column) 1952–75 and (right column) 1976–2002

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

As in Fig. 1 but for the subperiods (left column) 1952–75 and (right column) 1976–2002

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

As in Fig. 1 but for the subperiods (left column) 1952–75 and (right column) 1976–2002

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the significance level imposed during predictor screening for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The “consensus” skill refers to the average of the three hindcasts obtained using significance levels of 1%, 5%, and 10%. The standard persistence skill from Fig. 1 is included for reference

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the significance level imposed during predictor screening for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The “consensus” skill refers to the average of the three hindcasts obtained using significance levels of 1%, 5%, and 10%. The standard persistence skill from Fig. 1 is included for reference

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the significance level imposed during predictor screening for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The “consensus” skill refers to the average of the three hindcasts obtained using significance levels of 1%, 5%, and 10%. The standard persistence skill from Fig. 1 is included for reference

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the PVE improvement factor passed to the L&B algorithm for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The consensus skill refers to the average of the three hindcasts obtained using L&B improvement factors of 1%, 2.5%, and 5%

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the PVE improvement factor passed to the L&B algorithm for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The consensus skill refers to the average of the three hindcasts obtained using L&B improvement factors of 1%, 2.5%, and 5%

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the PVE improvement factor passed to the L&B algorithm for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The consensus skill refers to the average of the three hindcasts obtained using L&B improvement factors of 1%, 2.5%, and 5%

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the teleconnected predictor averaging period used in the model formulation for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The consensus skill refers to the average of the three hindcasts obtained using averaging periods of 1, 3, and 6 months

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the teleconnected predictor averaging period used in the model formulation for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The consensus skill refers to the average of the three hindcasts obtained using averaging periods of 1, 3, and 6 months

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

The sensitivity of the standard ENSO–CLIPER model cross-validated hindcast skill to the teleconnected predictor averaging period used in the model formulation for the prediction of AS Niño-3.4, -3, -4, and -1+2 indices for 1952–2002 at monthly leads to 9 months. The consensus skill refers to the average of the three hindcasts obtained using averaging periods of 1, 3, and 6 months

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Cross-validated hindcast skill for 1952–2002 of the consolidated ENSO–CLIPER model compared against the standard ENSO– CLIPER model for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 9 months. The skill measure used is MSSS defined as the percentage improvement in mean-square error over a hindcast of zero anomaly, the climatology being 1952–2002. The gray band is a bootstrapped estimate of the 95% confidence interval for the skill measure. The skill and uncertainty from the standard ENSO–CLIPER model are shown by the filled circles and error bars

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Cross-validated hindcast skill for 1952–2002 of the consolidated ENSO–CLIPER model compared against the standard ENSO– CLIPER model for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 9 months. The skill measure used is MSSS defined as the percentage improvement in mean-square error over a hindcast of zero anomaly, the climatology being 1952–2002. The gray band is a bootstrapped estimate of the 95% confidence interval for the skill measure. The skill and uncertainty from the standard ENSO–CLIPER model are shown by the filled circles and error bars

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Cross-validated hindcast skill for 1952–2002 of the consolidated ENSO–CLIPER model compared against the standard ENSO– CLIPER model for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 9 months. The skill measure used is MSSS defined as the percentage improvement in mean-square error over a hindcast of zero anomaly, the climatology being 1952–2002. The gray band is a bootstrapped estimate of the 95% confidence interval for the skill measure. The skill and uncertainty from the standard ENSO–CLIPER model are shown by the filled circles and error bars

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Replicated real-time forecast skill for 1900–50 of the consolidated ENSO–CLIPER model compared against persistence for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 6 months. The skill measure used is MSSS defined as the percentage improvement in mean-square error over a forecast of zero anomaly, the climatology being 1900–50. The gray band is a bootstrapped estimate of the 95% confidence interval for the skill measure. The skill and uncertainty from persistence are shown, respectively, by the filled circles and error bars

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Replicated real-time forecast skill for 1900–50 of the consolidated ENSO–CLIPER model compared against persistence for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 6 months. The skill measure used is MSSS defined as the percentage improvement in mean-square error over a forecast of zero anomaly, the climatology being 1900–50. The gray band is a bootstrapped estimate of the 95% confidence interval for the skill measure. The skill and uncertainty from persistence are shown, respectively, by the filled circles and error bars

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Replicated real-time forecast skill for 1900–50 of the consolidated ENSO–CLIPER model compared against persistence for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 6 months. The skill measure used is MSSS defined as the percentage improvement in mean-square error over a forecast of zero anomaly, the climatology being 1900–50. The gray band is a bootstrapped estimate of the 95% confidence interval for the skill measure. The skill and uncertainty from persistence are shown, respectively, by the filled circles and error bars

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Replicated real-time forecast skill for 1900–50 of the consolidated ENSO–CLIPER model compared against the standard ENSO–CLIPER model for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 6 months. The skill measure and presentation format are the same as in Fig. 7

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Replicated real-time forecast skill for 1900–50 of the consolidated ENSO–CLIPER model compared against the standard ENSO–CLIPER model for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 6 months. The skill measure and presentation format are the same as in Fig. 7

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Replicated real-time forecast skill for 1900–50 of the consolidated ENSO–CLIPER model compared against the standard ENSO–CLIPER model for predicting the AS Niño-3.4, -3, -4, and -1+2 indices at monthly leads out to 6 months. The skill measure and presentation format are the same as in Fig. 7

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Histograms of the standard ENSO–CLIPER predictors selected for making hindcasts of the AS Niño-3.4 index for 1952–2002 at a lead of 3 months (early May) for models built with teleconnected predictor averaging periods from 1 to 6 months. The predictor numbers (1–14) correspond to the classification in Table 1

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Histograms of the standard ENSO–CLIPER predictors selected for making hindcasts of the AS Niño-3.4 index for 1952–2002 at a lead of 3 months (early May) for models built with teleconnected predictor averaging periods from 1 to 6 months. The predictor numbers (1–14) correspond to the classification in Table 1

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Histograms of the standard ENSO–CLIPER predictors selected for making hindcasts of the AS Niño-3.4 index for 1952–2002 at a lead of 3 months (early May) for models built with teleconnected predictor averaging periods from 1 to 6 months. The predictor numbers (1–14) correspond to the classification in Table 1

Citation: Weather and Forecasting 19, 6; 10.1175/813.1

Predictor pools in the standard ENSO–CLIPER model for predicting the Niño-3.4, -3, -4, and -1 + 2 indices. Here, IC and TR represent, respectively, initial condition and trend predictors with the numeral designating whether these are 1-, 3-, or 5-month means as defined by Knaff and Landsea (1997)

Improvement afforded by the consolidated ENSO–CLIPER model over (a) persistence and (b) the standard ENSO–CLIPER model for predicting AS Niño-3.4, -3, -4, and -1+2 as a function of monthly lead from cross-validated hindcasts for 1952–2002. Values are given as the absolute difference in MSSS and RMSSS (in parentheses)

Comparison of the consolidated ENSO–CLIPER and standard ENSO–CLIPER cross-validated hindcast skill for 1952–2002 for predicting AS Niño-3.4, -3, -4, and -1+2 for 1952–2002 as a function of monthly lead in terms of each model's rmse, MAE, and the improvement offered by the consolidated model over the standard model for each measure

RPSS for 1952–2002 of the consolidated ENSO–CLIPER model compared against the standard ENSO–CLIPER model and persistence for predicting AS Niño-3.4, -3, -4, and -1+2 as a function of monthly lead out to 6 months. The RPSS is compared for different CLIPER probabilistic hindcasts as described in the text

Improvement afforded by the consolidated ENSO–CLIPER model over (a) persistence and (b) the standard ENSO–CLIPER model for predicting AS Niño-3.4, -3, -4, and -1+2 as a function of monthly lead from replicated real-time forecasts for 1900–50. Values are given as the absolute difference in MSSS and RMSSS (in parentheses)