Seasonal forecasts made by coupled atmosphere–ocean general circulation models (CGCMs) undergo strong climate drift and initialization shock, driving the model state away from its long-term attractor. Here we explore initializing directly on a model’s own attractor, using an analog approach in which model states close to the observed initial state are drawn from a “library” obtained from prior uninitialized CGCM simulations. The subsequent evolution of those “model-analogs” yields a forecast ensemble, without additional model integration. This technique is applied to four of the eight CGCMs comprising the North American Multimodel Ensemble (NMME) by selecting from prior long control runs those model states whose monthly tropical Indo-Pacific SST and SSH anomalies best resemble the observations at initialization time. Hindcasts are then made for leads of 1–12 months during 1982–2015. Deterministic and probabilistic skill measures of these model-analog hindcast ensembles are comparable to those of the initialized NMME hindcast ensembles, for both the individual models and the multimodel ensemble. In the eastern equatorial Pacific, model-analog hindcast skill exceeds that of the NMME. Despite initializing with a relatively large ensemble spread, model-analogs also reproduce each CGCM’s perfect-model skill, consistent with a coarse-grained view of tropical Indo-Pacific predictability. This study suggests that with little additional effort, sufficiently realistic and long CGCM simulations provide the basis for skillful seasonal forecasts of tropical Indo-Pacific SST anomalies, even without sophisticated data assimilation or additional ensemble forecast integrations. The model-analog method could provide a baseline for forecast skill when developing future models and forecast systems.
Seasonal forecast skill has significantly improved over the past three decades (Barnston et al. 2012; Barnston and Tippett 2017). Operational prediction centers worldwide use coupled atmosphere–ocean general circulation models (CGCMs) to conduct routine initialized seasonal forecasts with actionable skill of 6–12 months in advance (Doblas-Reyes et al. 2013), much of which is related to predictions of tropical sea surface temperature (SST) and especially of El Niño–Southern Oscillation (ENSO) (Jin et al. 2008; Barnston et al. 2012; Kirtman et al. 2014). Both model and initialization improvements have driven advances in forecast skill, with considerable effort devoted to developing data assimilation schemes (Ji and Leetmaa 1997; Stockdale et al. 1998, 2011; Keenlyside et al. 2005; Saha et al. 2014).
While coupled models have improved, they remain imperfect. Models continue to suffer from both unconditional biases in their mean states, such as the double ITCZ and cold tongue bias (e.g., Li and Xie 2014), and conditional biases in their variability, such as a westward shift of ENSO (e.g., Joseph and Nigam 2006). CGCM seasonal forecast skill is significantly impacted by these model errors (e.g., Barnston et al. 2015; Newman and Sardeshmukh 2017). Initialization shock, which can arise when forecast error rapidly develops from an initial imbalance between the analyzed initial state and all possible model states, degrades forecast skill in the presence of nonlinear air–sea interactions (Mulholland et al. 2015, 2016). At longer forecast leads, this problem may be worsened by the difficulty of spinning up the deeper oceanic component of the initialization.
When model-state space trajectories do not coincide with trajectories in nature, we could try to find and initialize those model trajectories that are closest (in some sense) to nature. This “shadowing” approach has been applied to relatively simple models (e.g., Judd et al. 2004), although the real-world problem may be considerably more complex (Smith 2001). Anomaly initialization (e.g., Hazeleger et al. 2013; Volpi et al. 2017) and nudging (e.g., DelSole et al. 2008; Meehl et al. 2014; Carrassi et al. 2016) both aim to remove effects of the initial model drift, but are only partly successful and do not directly confront conditional model error. Alternatively, model errors may be addressed through postprocessing hindcast ensembles, although their length and size are limited by available computational resources.
Here, we take the view that for forecast lead times where the model has drifted toward its own climatology and forecast states follow trajectories that lie within the model attractor, model initialization should be directly in the model phase space and not in the phase space of nature. To this end, we investigate initializing climate forecasts with states taken directly from long control simulations of operational CGCMs. These states are chosen as analogs to the observed initial state; then, their evolution within the control simulation immediately provides the forecast ensemble, and no additional model integration is necessary. In this paper we further simplify this approach by choosing these “model-analogs” solely from the current position of the state in phase space, without considering the phase-space trajectory, and determining them not from the entire climate state, but rather from monthly anomalies of SST and sea surface height (SSH). Nevertheless, applying this technique to tropical Indo-Pacific Ocean forecasts yields skill, such as shown in Fig. 1, that is not only surprisingly competitive with the traditional approach of executing an initialized forecast ensemble from the corresponding CGCMs, but actually appears to exceed it in the key ENSO region within the equatorial eastern Pacific.
The use of analogs in weather and climate forecasting goes back many decades (Lorenz 1969; Barnett and Preisendorfer 1978; Gutzler and Shukla 1984; Livezey and Barnston 1988; Van den Dool 1989). The underlying assumption is that a pair of states that are initially sufficiently similar to each other will follow similar trajectories for some subsequent finite period. Lorenz (1969) first noted that the increasing difference between two such analog trajectories represents error growth and therefore a measure of potential predictability. He suggested this difference might be empirically determined, but found that for daily weather, no pair of analogs taken from the observational record was close enough to be useful. In fact, Van den Dool (1994) suggested that global daily weather anomalies may have too many degrees of freedom for useful analogs to be found.
However, analogs may be more useful for problems with fewer effective degrees of freedom (Van den Dool 1994). Observational analogs have been applied with some success to short-term climate forecasting (Barnett and Preisendorfer 1978; Nicholls 1980) including U.S. surface temperature seasonal anomalies (Livezey and Barnston 1988; Barnston and Livezey 1989), climate downscaling (Kruizinga and Murphy 1983; Zorita and von Storch 1999; Wetterhall et al. 2005), and calibration of ensemble weather forecasts (Van den Dool et al. 2003; Fraedrich et al. 2003). However, these applications are still limited by the relatively short observational record. The recent easy availability of long climate model runs, providing access to larger data libraries than the observational record, has made analog techniques particularly attractive for applications such as paleoclimate reconstruction (Overpeck et al. 1992; Graham et al. 2007; Goodwin et al. 2014), estimating perfect-model climate predictability (Branstator et al. 2012), and separation of internal and forced trend components within climate runs (Deser et al. 2016; Lehner et al. 2017).
This study will investigate the use of a model-based analog technique for seasonal forecasts of tropical Indo-Pacific SST variability, which many studies have suggested can be represented by relatively few degrees of freedom. For example, most central and eastern Pacific ENSO events can be described through a linear combination of two dominant spatial patterns that together explain more than half of the variance in the tropical Indo-Pacific region (e.g., Takahashi et al. 2011). Also, the evolution of ENSO is well approximated by a low-dimensional, Markovian dynamical system with relatively few spatial degrees of freedom (Penland and Sardeshmukh 1995). Additionally, Wittenberg (2009) found that for the GFDL CM2.1 control simulation, 500 years of data was sufficient to sample the range of ENSO events. We might thus expect that our model-analog approach to ENSO could be more tractable than other problems in terms of the amount of data needed.
The paper is organized as follows. Section 2 describes the details of the model-analog technique employed in this study, along with the CGCM simulations used. In section 3, model-analogs are first applied to forecast each model itself. The sensitivity of the technique to the size of historical data as well as ensemble size is explored, and it is shown that, in agreement with Lorenz (1969), model-analogs can be used to estimate a CGCM’s perfect model seasonal tropical predictability. In section 4, the model-analog technique is used to make hindcasts for the 1982–2009 period, their skill is shown comparable to hindcast skill from the operational CGCM ensemble, and their sources of skill are evaluated. A summary and discussion of some implications of the model-analog technique are presented in section 5.
2. Models and method
a. Model-analog technique
We choose analogs at each time t by minimizing a metric of the distance between the target state x(t) and each library state y(t′). Here, the target state is defined as the state at the initialization time, and the library consists of all preexisting states obtained from a CGCM control simulation. In this study, we define a simple distance metric d(t, t′) from the root-mean-square (RMS) difference between a subset of variables within the full state vectors, or
where i represents a variable, with I as the total variables, and j represents a spatial degree of freedom (e.g., gridpoint index or principal component) within the training region, with J as the total spatial degrees of freedom. In (1), each variable is normalized by its respective domain-averaged standard deviation, and , to have equal weight within this metric. Monthly mean data are used in (1), and to include seasonality, the library states are restricted to have the same calendar month as the target state. Consequently, only one analog is available from each year of the model simulation, so that the library length N is equivalent to the number of years in the training period.
Distances are ranked in ascending order, and the K states closest to the target state are chosen as the model-analog ensemble members, indicated by the set , with k as the analog index and as the time of this analog in the library. The subsequent model evolution of this ensemble is the model-analog forecast ensemble for x(t + τ) at lead time τ months. The ensemble mean represents the initial model-analog ensemble-mean reconstruction of the target x(t), and is the model-analog ensemble-mean forecast at lead time τ months. In this paper, we do not weight the model-analog ensemble members. Note that a linear weighting such as used in the constructed analog method (Van den Dool 1994) can be shown to reduce to multivariate linear regression (Tippett and DelSole 2013). We also tested a quadratic weighting [i.e., inversely weighting each member by d2(t, t′)] but found no notable improvement.
To define analogs, we set I = 2 and constructed target and library states from SSH anomalies (SSHAs; i = 1) and SST anomalies (SSTAs; i = 2). SST is one of the most important quantities for seasonal forecasts (e.g., Latif et al. 1998; Barnston et al. 2012), and fluctuations in central and eastern tropical Pacific SST can exert a significant impact on atmospheric circulation and remote ocean regions via the atmospheric bridge (Alexander et al. 2002). Ocean heat content variations are also important for ENSO dynamics (Zebiak and Cane 1987; Neelin et al. 1998) and have been shown to be important for ENSO prediction (Kleeman et al. 1995; Ji and Leetmaa 1997; Rosati et al. 1997; Newman et al. 2011). Here, SSH variations are used to represent ocean heat content variations, since in the tropical oceans, SSH is intimately linked to the depth of the thermocline (Cane 1984). Furthermore, SSH is also an indicator of equatorial wave dynamics in the oceans (Gill and Clarke 1974), which affects the development and termination of ENSO events (Suarez and Schopf 1988). Using SSTAs alone substantially degraded the forecast skill of the model-analogs (not shown).
b. Model and observational datasets
The training (library) dataset consists of monthly mean data taken from CGCM control simulations of varying lengths, conducted using four climate models: CM2.1, CM2.5 FLOR, CCSM4, and CESM1, listed in Table 1. In each control simulation, only the portion of integration near equilibrium (i.e., spunup) is retained. Atmospheric greenhouse gas concentrations are fixed throughout each simulation, with the specified year also indicated in Table 1. The longest run, the 4000-yr preindustrial simulation of the GFDL CM2.1, has previously been analyzed to investigate record length needed to adequately sample ENSO events (Wittenberg 2009; Atwood et al. 2017), the predictability of decadal modulations of ENSO events (Karamperidou et al. 2014; Wittenberg et al. 2014) and their impact on decadal variability (Ogata et al. 2013), and ENSO diversity (Kug et al. 2010; Capotondi et al. 2015; Chen et al. 2017).
Versions of all four models are also currently used by NCEP to make operational seasonal forecasts as part of the second phase of the North American Multimodel Ensemble (NMME) project, and were also used to generate ensembles of retrospective forecasts (“hindcasts”) initialized from the beginning of each month from 1982 to 2009 and integrated forward for 12 months (Kirtman et al. 2014; see Table 1). To calculate anomalies, all these hindcasts were bias corrected: the mean hindcast drift as a function of lead and calendar month was removed separately for each model, as is common practice with CGCM seasonal forecasts (Stockdale 1997; Saha et al. 2006; Kirtman and Min 2009). For convenience, these hindcasts are denoted as NMME hindcasts, which we will compare with our model-analog hindcasts.
To determine initial observed states, we used SSTs from the monthly mean NOAA Optimum Interpolation SST, version 2, dataset (OISST; Reynolds et al. 2002) and SSHs from the ECMWF Ocean Reanalysis System 4 dataset (Balmaseda et al. 2013), for the years 1982–2015. SSTA and SSHA were determined by removing the monthly mean 1982–2009 climatology.
All model and observed data were interpolated onto a common 2° longitude by 2° latitude grid prior to our analysis. Analogs were determined within the domain bounded by 30°S–30°N, 30°E–80°W, covering the entire tropical Indo-Pacific Ocean.
c. Skill metrics
To measure how well the initial analog ensemble mean reproduces the target state, we use dK(t), the distance between the target state x(t) and the model-analog ensemble mean z(t), which from (1) is
where indicates the ith component of z(t) at the jth grid point. Note that dK(t) will generally be different than the mean of the distances of the analog members themselves.
We use two deterministic skill measures: anomaly correlation (AC) and RMS error skill score (RMSSS; Barnston et al. 2015) defined as 1 − ε, where ε = ϵ/σ is the standardized RMS error, ϵ is the RMS error of the ensemble-mean forecast, and σ is the observed climatological standard deviation. The overall skill within the tropical Indo-Pacific domain was assessed using the standardized RMS error and the pattern correlation between the forecast and verification (OISST or perfect model) SSTA fields in the region 30°S–30°N, 30°E–80°W. Probabilistic skill is assessed using ranked probability skill score (RPSS; Weigel et al. 2007), reliability, and sharpness (Murphy 1973), applied to the observed and forecast tercile distributions.
3. Assessing perfect-model predictability with model-analogs
In this section, we evaluate the model-analog technique in a perfect model context. That is, for each CGCM, we construct model-analogs from a portion of its control run (“data library”) and use them to make forecasts within the withheld (“verification”) dataset. For each control simulation, data from the first 200 years are used for verification, with the remaining data used as the data library. Thus, the numbers of years N in the training periods for the CM2.1, CM2.5 FLOR, CCSM4, and CESM1 control simulations are 3800, 500, 900, and 500, respectively. For all four simulations, we set the analog ensemble size at K = 10. Parametric sensitivity is discussed in section 3b for choices of N and K, and in section 3c for the length of the verification period.
a. Evaluation of model-analogs from four control simulations
Figure 2 shows how well the model-analog ensemble means represent the target anomalies for each control simulation. In general, the AC between the target-state and the model-analog ensemble mean (shading) is above 0.7 throughout the training region (red box in Fig. 2a). For each simulation, the model-analog ensemble mean matches the target anomaly best along or near the equator, both for SSTA and for SSHA (not shown; values are generally similar with some differences in details). However, even in the eastern equatorial Pacific, the RMSSS (shown by white contours) does not exceed about 0.75; that is, on average there is at best about an initial ±0.25σ difference between the model-analog ensemble mean and the target anomalies.
The model-analogs better capture larger-scale, more slowly decorrelating variations in the target anomalies than smaller-scale, more rapidly decorrelating variations. This is demonstrated in Fig. 3, which shows the monthly initial model-analog reconstruction error projected on the first 40 SSTA EOFs determined from the verification datasets. The model-analog ensemble means have a better representation of the target SSTA within the space of the leading principal components (PCs), which correspond to larger spatial scales and have the longest decorrelation time scales (not shown). Most of the initial RMS error outside of the equatorial Pacific in Fig. 2 is due to the relatively larger error in the higher-order PCs, which dominate the variability there.
Given the nontrivial error in the analog representation of the target anomalies, it is notable that the 6-month-lead model-analog ensemble-mean forecasts have considerable skill over large parts of the tropical Indo-Pacific Ocean (Fig. 4); for some of the models, much of this skill is retained up to leads of 24 months (Fig. 5; see also Fig. S1 in the supplemental material). For all four models, AC skill at 6-month lead exceeds 0.8 in the equatorial Pacific and 0.7 in large parts of the tropical Indian Ocean, and in the CESM1 even reaches 0.8 in portions of the Indian Ocean. Interannual variability in the tropical Indian Ocean is strongly related to ENSO (e.g., Nicholson 1997; Klein et al. 1999; Venzke et al. 2000; Alexander et al. 2002), although other sources exist (e.g., Saji et al. 1999; Yamagata et al. 2004). The model-analog technique also captures some notable skill differences between the four models. For example, in CM2.1 (Fig. 4a), skill is greatest in the western–central equatorial Pacific, while in CESM1 it peaks in the central to eastern equatorial Pacific (Fig. 4d), consistent with the greater westward displacement of ENSO variability in the CM2.1 compared to the CESM1 (not shown).
Note that, for all four models, the patterns of RMSSS (contours) and AC (shading) are very similar. This does not appear to be coincidental. In fact, it can be shown that for the mean of an infinite ensemble of any perfect model (i.e., a model whose forecasts have no conditional or unconditional bias), AC skill ρ and standardized RMS forecast error ε, standardized by the standard deviation of the verification data, are directly related by
(von Storch and Zwiers 2002; Newman and Sardeshmukh 2017). This relationship is largely satisfied for the model-analog skill shown in Fig. 4, apart from some small (~1%–3%) differences in some regions (Fig. S2 in the supplemental material). Additionally, ρ2 + ε2 computed from standardized RMS error and pattern correlation values determined over the entire tropical Indo-Pacific training domain largely satisfies (3) for all forecast leads (Fig. S3). That is, despite the relatively large spread of its initial ensemble, the model-analog technique applied in a perfect model sense provides an excellent estimate of the potential predictability of each model within the tropical Indo-Pacific region, consistent with Lorenz (1969).
Potential predictability of equatorial SSTA AC skill (Fig. 5) is greater than persistence (not shown) for all forecast lead times ranging from 0 to 24 months. Following previous studies (e.g., Collins et al. 2002), we defined a maximum predictable lead time (MPLT) as the forecast lead time at which the AC reaches 0.6. Figure 5 shows that MPLTs for equatorial SSTAs have a strong dependence upon both model and longitude. For instance, in CM2.1, MPLT ranges from 20 months in the western equatorial Pacific to only about 8 months in the eastern part of the basin (Fig. 5a). The other three models have MPLT in the central equatorial Pacific varying between 16, 12, and 24 months for CM2.5 FLOR, CCSM4, and CESM1, respectively.
Past studies of potential (i.e., perfect model) ENSO predictability (e.g., Boer 2000; Collins et al. 2002; Power et al. 2006; Wittenberg et al. 2014) employed the more commonly used approach, known as reforecast experiments, in which a target trajectory in a model is forecasted by an ensemble of perturbed runs. The model-analog technique appears to yield similar results at less computational expense. For example, we obtain an MPLT of about 20 months in CM2.1 through the model-analog method, which agrees with the estimate based on CM2.1 reforecasts in Wittenberg et al. (2014).
b. Sensitivity to library length and ensemble size
One potential area of parametric dependence for analog forecasts is the library length necessary to achieve useful skill. For our model-analogs, this question has a practical aspect, since it might give guidance on how long to run the control simulation. The library needs to sample the attractor for the phenomenon of interest (e.g., ENSO) well enough to provide sufficient analogs near a given target state. The required library length will thus depend on the required number of analogs (i.e., the required resolution of the forecast PDF, and the required robustness of the forecast ensemble mean), the required precision of the analogs (i.e., the acceptable neighborhood width in phase space), and the attractor’s recurrence time for the target state(s) of interest (which is related to the attractor’s dimension and the ergodic time scale of the phenomenon). The library can be expanded by either lengthening trajectories (e.g., running a longer control simulation) or diversifying trajectories (e.g., increasing the number of independent ensemble members). Van den Dool (1994) estimates that the required library length depends on the number of spatial degrees of freedom. Here, we consider this question using the GFDL CM2.1 control simulation, the longest simulation available in this study, although results are qualitatively similar for all four models (not shown).
Using the first 200 years of the GFDL CM2.1 simulation as the verification period, we tested varying N from 50 to 3800 years. Extending the training period in this way corresponds to extending the control simulation. Also, since more analogs are potentially available with increasing library length, we tested ensemble size by letting K range from 1 to 70. Setting K = 1 corresponds to using the nearest analog with distance d1.
The impacts of library length and ensemble size are first evaluated in terms of the representation of the initial target states, using the distance dK between the target states x(t) and corresponding ensemble-mean model-analog state vectors z(t) (section 2). Figure 6a shows the time mean of dK(t) determined over the verification period 〈dK〉 scaled by the climatological root-mean-square distance of states from the verification period, the square of which is given by , where T = 12M, and M is the number of verification years. In all cases, the initial ensemble-mean distance is minimized by using an analog ensemble mean rather than the single best analog (Fig. 6a). Most of this improvement comes from simply increasing K from 1 to 5, and in fact results are optimal for K > 10–20 (with a weak dependence upon N), becoming worse for further increase of ensemble size. Clearly, the gain in additional ensemble members can be offset by adding members that are increasingly far from the target state. On the other hand, the minimum value of 〈dK〉 decreases monotonically as N is increased, consistent with the availability of more library states, increasing the chance of better matches to the target state. However, much of the improvement is realized by N ~ 500 with relatively less improvement thereafter, even as the length of the training period is further extended.
Potential predictability estimates similarly depend upon N and K. This is shown in Fig. 6 for Niño-3.4 SSTA forecast skill at a lead time of 6 months, for both AC (Fig. 6b) and RMSSS (Fig. 6c). Forecast skill again maximizes for K ~ 10–20, with most of the improvement in forecast skill apparent for K ~ 5, and again most of the benefits of library size are reached for N ~ 500, with more modest improvement thereafter. Note that forecast skill is less sensitive to the ensemble size K as the library size increases, especially for N 1000.
c. Sensitivity to verification period
A historical record of 200 years may not be long enough to completely sample ENSO events (Wittenberg 2009), so we also tested sensitivity to the choice of the verification period. First, we repeated the analog calculation using the GFDL CM2.1, but used the first 500 years instead of the first 200 years of the simulation as the verification period, with the remainder used for the library as before. However, we found no appreciable change (not shown); for example, Niño-3.4 SSTA 6-month-lead forecast skill increased from 0.81 to 0.83. Next, the CESM1 simulation was divided into two nonoverlapping portions of 350 years each, with each half of the simulation using the verification period with the other half serving as the library. The skill was the same in both cases and very close to the CESM1 results shown above. Our sensitivity experiments all suggest that potentially useful model-analog ENSO forecast skill can be obtained with surprisingly small ensemble size, library length, and verification periods.
4. Retrospective forecasts using model-analogs
In this section we use the model-analogs to make retrospective forecasts (hindcasts) of observed SSTAs in the tropical Indo-Pacific region, and compare their skill to initialized NMME seasonal forecasts.
a. Description of model-analog forecast technique
To make real-world forecasts, we find model-analogs to target state vectors constructed from observed anomalies. Note that now we use the entire dataset from each model simulation as its data library, in contrast to the analysis in section 3. We constructed model-analog hindcasts for forecast leads of 0–12 months, where month 0 is the initial (target) state and the ensuing forecast leads are chosen to match the forecast range of the NMME hindcasts. (Clearly, the model-analog forecasts could be extended beyond 12 months.) The NMME forecasts were initialized on or near (for staggered starts) the first day of each month. The month-0.5 forecast was then the mean of the first month of the forecast run, that is, centered in the middle of the calendar month. The equivalent model-analog forecast was initialized with the monthly mean observations centered on the previous month (i.e., month 0) so that the 1-month-lead model-analog forecast and the NMME month-0.5 forecast verified at the same time. We renamed both these forecasts the month-1 forecast, and so on for increasing forecast lead times. Thus, to match NMME hindcasts initialized between January 1982 and December 2009, the model-analog hindcasts should be initialized from December 1981 to November 2009. Figure 7 shows a schematic of how each forecast system is initialized. In the following, both model-analog and NMME hindcasts are verified against OISST data.
As in section 3b, we investigated the sensitivity of the model-analog reconstruction of observations to ensemble size K and library length N. Figure 8 shows 〈dK〉 for each of the four models. In general, the effects of library length and ensemble sizes resemble those in Fig. 6a, with more improvement from using even a small ensemble rather than from increasing library length. Note that for CM2.1, 〈dK〉 is much greater for observations than for the perfect model (cf. Figs. 6a and 8a), which is not surprising given that the models all suffer from biases in simulating ENSO variability (Delworth et al. 2006; Wittenberg et al. 2006). It also appears that 〈dK〉 has a similar dependence on N for all four models. Consistent with our findings in section 3b, it appears to be sufficient to construct model-analogs using a historic record of approximately 500 years with K ~ 10–15. In the following, we use K = 15 for each model case.
b. Comparing model-analog and NMME hindcast skill
How well each set of model-analog ensemble means represent observed target anomalies is assessed using AC and RMSSS (Fig. 9). Some observed states, notably the mature stages of the 1982/83 and 1997/98 El Niño events, are not as well reconstructed in model space as others (not shown, but see dK(t) in Fig. S4 of the supplemental material, which is essentially a measure of how well each model reproduces observed anomalies). As in the perfect-model analysis (Fig. 2), the model-analogs best reproduce SSTA variability in the equatorial Pacific, where the AC and RMSSS are as high as 0.9 and 0.6, respectively. Model-analogs also reproduce some observed SSTA variability in the subtropical Pacific, particularly as seen in relatively high AC (0.6) and RMSSS (0.3) values in the western part of the basin. In the northern subtropical Pacific, the model-analogs all represent aspects of the North Pacific meridional mode (Chiang and Vimont 2004), which has been linked to ENSO variability (Chang et al. 2007) via the seasonal footprinting mechanism (Vimont et al. 2003; Alexander et al. 2010).
However, outside of these regions the model-analogs provide a poorer representation of observed target anomalies than they did for the model states (cf. Figs. 9 and 2). This is most notable in the equatorial Indian Ocean, where the perfect-model-analogs mostly have initial correlation values that were nearly as high as in the Pacific, while the model-analogs of observations do not. So, compared to the perfect model, variability in the Indian Ocean plays a much smaller role in defining the model-analogs. This is also the case in the western equatorial Pacific, most dramatically for the CM2.1, where the model space for the GFDL CM2.1 control run with preindustrial forcings is apparently so different from nature in the late twentieth century that it is not possible for any model-analog to simultaneously match observed SSTAs in both the eastern and western parts of the basin. The bias in the model degrades analog forecast skill as well as corresponding NMME forecast skill in the western equatorial Pacific (see below).
Month-6 model-analog and NMME hindcast skill are shown in Figs. 10a,c,e,g,i and 10b,d,f,h,j, respectively. The bottom row shows the four-model grand ensemble mean of the model-analog and NMME hindcasts. The model-analogs clearly reproduce the essential details of skill from the NMME hindcasts. In particular, both sets of hindcasts are skillful in the central and eastern equatorial Pacific, where correlation is as high as 0.7. The model-analogs have higher skill along the equator and South American coast than the NMME hindcasts. Month-6 model-analog hindcasts also display skill in the northwestern tropical Pacific and in the South Pacific near 30°N, 150°W, with a correlation of 0.5. In these regions, SSTA variability is largely driven by ENSO via the atmospheric bridge (Alexander et al. 2002). In general, NMME hindcasts outperform model-analog hindcasts in the northwestern tropical Pacific except for CESM1. In the South Pacific, skill is comparable. However, the model-analogs appear to have poorer skill than the corresponding NMME hindcasts in much of the western tropical Pacific and Indian Ocean, except for CM2.1.
The evolution of equatorial skill as a function of forecast lead time is shown in Fig. 11 for the model-analogs and their corresponding hindcasts, with the bottom row again showing the grand ensemble means of each. Model-analog skill is comparable to the NMME hindcast skill throughout the Pacific for all forecast leads. The model-analogs are more skillful than the NMME hindcasts near the date line and along the South American coast. The poorer Indian Ocean skill for the model-analogs remains evident.
Figure 12 shows how well the model-analogs capture individual events by comparing the month-6 model-analog hindcasts of the Niño-3.4 time series to observations and the month-6 NMME hindcasts. The model-analog hindcasts reproduce most of the strong ENSO events (blue lines in Fig. 12), particularly the 1982/83 and 1997/98 El Niño events and the 1988/89 and 1998/99 La Niña events, and again appear to have skill comparable to that of the NMME. We carried out a more detailed skill comparison by applying the random walk test of DelSole and Tippett (2016) to RMSSS values of month-6 Niño-3.4 hindcasts (Fig. S5 in the supplemental material). We found that the model-analogs for the CM2.1 and CCSM4 were both significantly more skillful (at the 95% significance level) than the corresponding NMME hindcasts. The model-analogs for the other two models, as well as the multimodel ensembles, were equally as skillful as the corresponding NMME hindcasts.
The pattern correlation within the Indo-Pacific training region of observed and month-6 hindcast SSTA is shown for both the model-analog and NMME in Fig. 13. Both sets of hindcasts show similar variations of skill, although there is a tendency for the model-analogs to perform better in ENSO events than in neutral conditions. Again, the model-analog hindcasts perform as well as the NMME hindcasts although there are discrepancies between the two forecast methods. The time-mean values of the pattern correlations are 0.34 (0.29, 0.32, and 0.34) and 0.34 (0.39, 0.37, and 0.37) for the model-analog and NMME hindcasts using CM2.1 (CM2.5 FLOR, CCSM4, and CESM1, respectively).
Finally, Fig. 14 shows probabilistic skill based on the upper (El Niño), middle (ENSO neutral), and lower (La Niña) terciles of the SSTA distributions. RPSS calculated from the multimodel ensembles of model-analog and NMME month-6 hindcasts are shown in Figs. 14a and 14b, respectively. Again, the model-analog technique clearly reproduces the essential details of skill from the NMME hindcasts in the tropical Indo-Pacific region (see also Tippett et al. 2018). In particular, both sets of hindcasts are more skillful than climatology in the central and eastern equatorial Pacific, with RPSS above 0.3. Plots of the corresponding reliability (Figs. 14c–e, top panels) and frequency of occurrence (Figs. 14c–e, bottom panels) for the same three categories, averaged over the central and eastern equatorial Pacific (6°S–6°N, 160°E–80°W), likewise show that the model-analog and NMME hindcast ensemble distributions are comparable within the ENSO region. For the upper and lower terciles, the model-analogs appear somewhat less sharp but more reliable than the NMME hindcasts, which corresponds to the slightly higher overall values of RPSS; both hindcast ensembles are overconfident when forecasting neutral conditions.
c. Evaluating sources of model-analog skill
Recently, Newman and Sardeshmukh (2017) showed that a low-dimensional empirical dynamical model of monthly SSTA and SSHA variability, a linear inverse model (LIM) with approximately 30 degrees of freedom, had similar tropical Indo-Pacific SSTA skill to the grand ensemble mean of all eight NMME operational models, and that both LIM and CGCM spatial and temporal skill variations were largely predicted by the LIM itself. Therefore, they suggested, the seasonal predictability of tropical SSTAs is effectively linear, and both forecast systems may already be near the intrinsic limit of tropical Indo-Pacific SSTA predictability, with two exceptions: potential nonlinear predictability in the far eastern tropical Pacific, where the NMME-mean hindcast skill was greater than the LIM, and significant NMME forecast error in the western tropical Pacific, where the LIM alone had positive skill.
We might then ask whether the skill of the model-analogs is also a consequence of low-dimensional linear predictability. Certainly, the model-analogs appear effectively low-dimensional. While the operational models have detailed full-field initializations, the model-analogs are initialized only with tropical Indo-Pacific SSTA and SSHA, and have relatively large initial errors apart from the leading PCs (Fig. 3). In fact, training the model-analogs using only the leading 10–20 combined SSTA–SSHA PCs (i.e., with 100% initial error in all higher-order PCs) led to initial error and forecast skill nearly identical to that presented in this paper, a result that is consistent with previous reduced-space empirical dynamical models of ENSO (e.g., Penland and Sardeshmukh 1995; Berliner et al. 2000; Chen et al. 2017). Capturing both perfect-model and actual NMME hindcast skill with analogs determined from such relatively short data libraries also suggests that the effective number of degrees of freedom must be O(10) as opposed to O(100) or higher (Van den Dool 1994). Likewise, the ensemble members of the model-analogs are close enough [in the Lorenz (1969) sense] that their evolution can be used to estimate predictability (i.e., trajectory divergence) within the CGCMs.
To test whether the model-analog skill is effectively linear, the analysis of the previous section was repeated using “antianalogs” from the control simulations, determined by 1) reversing the target anomaly sign; 2) constructing an analog ensemble, its subsequent evolution, and ensemble mean as before; and 3) reversing the sign again to yield the forecast. Skill of these antianalogs for month 6, shown in Fig. 15, can then be compared to the corresponding model-analog skill (Figs. 10a,c,e,g). To first approximation, the patterns and values are largely similar, consistent with effectively linear skill, particularly in the central equatorial Pacific, northwestern tropical Pacific, and Indian Ocean. However, all four models show substantial differences between analog and antianalog skill in the far eastern Pacific, notably the Niño-1.2 region, supporting the importance of predictable nonlinearity there that is captured by the model-analogs. Other areas where the skill of the analogs exceeded the antianalogs are generally regions of low skill and where these differences may also not be significant.
The LIM is a Markovian model; that is, its forecasts depend only on the current state, not on any past states. This is a justification for the approach applied to determine analogs in this paper, but of course it is not a necessary component of the model-analog technique. For example, we can incorporate information about trajectory velocity by defining analogs using a different metric than in (1), such as
where and . However, using this metric to select model-analogs led to no forecast skill improvement (Fig. S6 in the supplemental material).
In the key ENSO regions, the model-analogs have higher skill than their NMME hindcast counterparts, and perfect-model skill is also comparable to actual skill. Elsewhere, model-analog skill is lower than NMME hindcast skill and much lower than perfect-model skill. The reasons for this remain to be determined. One possibility is suggested by the notably poorer representation of observed initial conditions provided by the model-analogs outside the equatorial Pacific compared to the perfect-model analysis (cf. Figs. 9 and 2). During the hindcast period, the tropical SST warming trend was also much larger outside of the equatorial eastern Pacific, especially in the Indian Ocean and west Pacific warm pool (e.g., Solomon and Newman 2012). However, all the control simulations used for the model-analogs were run under fixed radiative forcing conditions. Interestingly, we found that the CM2.5 model-analogs had higher Indian Ocean hindcast skill when trained from the entire CM2.5 library, which included a (spurious) trend resulting from a spinup period of several hundred years rather than from the equilibrated last 700 years that we ultimately employed here (not shown). Moreover, the LIM hindcasts, which used a fixed dynamical operator but whose initial conditions included the trend, had Indian Ocean skill that was similar to the NMME hindcasts. This suggests that our model-analog skill was negatively impacted by external radiative forcing. If so, training model-analogs with large ensembles of externally forced model simulations [such as in Kay et al. (2015)] could improve skill.
5. Summary and discussion
We have investigated the use of analogs based on long climate simulations, rather than observational datasets, to make monthly forecasts of observed tropical Indo-Pacific Ocean SST and SSH anomalies. Analogs are chosen from model states that are most similar to the observed target state in a given calendar month, where both are defined only from tropical Indo-Pacific Ocean SSTAs and SSHAs. An analog ensemble is defined from those model states that minimize the distance to the target state. This model-analog ensemble and its subsequent time evolution then yields the time-evolving forecast ensemble from the target state.
For ensemble sizes of about 10–15 members, 500 years of model output appears sufficient to generate estimates of ensemble-mean perfect-model skill, and to make hindcasts of the 1982–2009 period that have generally similar deterministic and probabilistic skill as the initialized hindcast ensembles from four climate models that are also used to make operational seasonal forecasts as part of the NMME project. This appears consistent with Wittenberg (2009), who showed that a record of 500 years is required in order to obtain a complete sampling of ENSO events in the CM2.1. Extending the model simulation record to 4000 years yielded only slight skill improvement. More important was the initial increase of ensemble members from 1 to 10–15, and in fact, skill was degraded for ensemble sizes greater than about 20–30. Our results thus raise the possibility that the tropical Indo-Pacific forecast skill of any CGCM, both in a historical and a perfect model sense, could be assessed using the model-analog technique as soon as a sufficiently long control run is produced, an assessment that could even be made part of the model development cycle itself.
The skill of the model-analog hindcasts is all the more striking given the relatively small amount of data that “initializes” them; we used only monthly anomalies of tropical Indo-Pacific SST and SSH, neither field drawn directly from a data assimilation system employing the CGCM in question. The relatively large initial ensemble spread also raises the possibility that for the tropical Indo-Pacific region, seasonal forecasts may only require accurate initialization within a low-dimensional subspace involving relatively large spatial scales. Moreover, this spread may be sufficient to estimate model potential predictability based on the divergence of an ensemble of analog trajectories (e.g., Lorenz 1969), consistent with earlier suggestions that on monthly time scales, the tropical Indo-Pacific region could be considered a coarse-grained, Markovian dynamical system (e.g., Newman and Sardeshmukh 2017).
Our model-analog technique is, in essence, an attempt to reconstruct the phase space of a CGCM from a library of its states. The theoretical basis for phase-space reconstruction is that the underlying attractor may be reconstructed using a succession of states, represented within so-called embedding vectors, to reproduce past trajectories (Takens 1981; Sugihara and May 1990; see also Ye et al. 2015). In practice, this means that analogs determined using multiple time lags could capture information about the phase-space trajectory near the target state (e.g., McDermott and Wikle 2016; Lguensat et al. 2017; McDermott et al. 2018). Weighting analog ensemble members by their distance from the target state, such as with kernel-weighting strategies (e.g., Zhao and Giannakis 2016), may also enhance this approach. Note, however, that for Markovian dynamics, previous states provide no predictive information, which might justify the simpler method of our study. Moreover, the larger data libraries available from climate model simulations, compared to the more limited observational datasets used in previous analog studies, likely aided our analog determination. Still, these issues will need to be reevaluated as the model-analog technique is applied to other forecast problems.
The model-analog method potentially has some notable advantages compared with the conventional seasonal forecasting approach in which an ensemble of initialized climate model forecasts is run (e.g., Latif et al. 1998; Keenlyside et al. 2005; Kirtman et al. 2014). First, initialization shock may be avoided in the model-analog forecasts, since both initial conditions and their subsequent trajectories are taken from a free-running model in which each component of the model is well balanced and the consistency of model physics is retained. Recall that model-analog skill is higher than the corresponding model hindcast skill throughout the ENSO region. The improvement is greatest in the Niño-1.2 region, with differences large enough to be statistically significant for each CGCM’s model-analog and for their ensemble mean, suggesting that the NMME hindcast error in this region is at least partly due to initialization shock. Moreover, since the model-analog anomalies are always in the model space, the mean bias in their forecasts is removed by construction. This may also improve model-analog skill relative to the NMME hindcasts, whose bias correction is determined retrospectively from a relatively short 28-yr training period.
Second, the use of the model-analog technique could mean the reduction of technological barriers for new models to participate in seasonal forecasting, which would further enhance the diversity of multimodel ensembles and potentially allow for improved ensemble forecast calibration. Certainly, it is encouraging that a skillful ensemble of model-analogs can be constructed from a control simulation of just a few hundred years in length, since such runs are often routinely executed, such as for the IPCC assessments (Meehl et al. 2007; Taylor et al. 2012). The model-analog technique could therefore allow for the construction of forecast ensembles based on a much larger number of CGCMs than are currently employed operationally, since it is far less computationally expensive than running a model in both initialization and forecast modes and does not require additional data assimilation or new suites of retrospective ensemble forecast integrations. For example, a 15-member NMME hindcast ensemble covering 1982–2009, with monthly initializations and forecast leads of up to one year, would be computationally equivalent to a control simulation of over 5000 years, or about 10 times what the model-analogs would require. Also, since only SSTAs and SSHAs are needed to construct the model-analogs, and monthly data are available since 1958, hindcast skill could be evaluated over a much longer period than 28 years at no additional computational cost (although the quality of SSH prior to the satellite era may be a concern).
In this paper, we have only displayed SSTA skill, although we also found similar results for SSHA. Of course, all other quantities associated with a target SSTA–SSHA state (e.g., rainfall and air temperature) can be forecasted as well, using corresponding quantities associated with the preidentified analogs. We could also test the inclusion of such variables in the analog selection metric itself. Given the results shown in this paper, further development of the model-analog technique along the lines outlined above seems promising.
We are grateful to Dr. XiaoWei Quan for providing us with the CCSM4 and CESM1 long control runs. Discussions with Alicia Karspeck and Prashant Sardeshmukh aided the development of this work. Comments from Ben Kirtman and two anonymous reviewers greatly improved the manuscript. This work has been funded by NOAA/CPO MAPP, NOAA/CPO Earth System Modeling, and NOAA/CPO Climate Variability and Predictability (CVP) No. GC14-250a programs. We acknowledge the agencies that support the NMME Phase II system, and we thank the participating climate modeling groups (Environment Canada, NASA, NCAR, NOAA/GFDL, NOAA/NCEP, and University of Miami) for producing and making their model output available. NOAA/NCEP, NOAA/CTB, and NOAA/CPO jointly provided coordinating support and led development of the NMME Phase II system. Data shown in this paper are available by email request from the corresponding author.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-17-0661.s1.