1. Introduction
The decadal prediction problem has been in an embryonic stage for decades. To progress, we could simply apply the climate community's long experience in understanding seasonal-to-interannual variability and improving its prediction to decadal time scales. For example, a wide array of both physical and empirical methods has been used to make ENSO forecasts (e.g., review by Latif et al. 1998). Statistical forecasts are relatively easy and economical to perform, and their skill can be sufficiently high that they are useful both in their own right and as benchmarks for more complex numerical models (e.g., Livezey 1999; van Oldenborgh et al. 2005; Laepple et al. 2008; Krueger and von Storch 2011).
It seems reasonable then that a similar two-pronged approach of physical and empirical methods could advance decadal prediction. This is not to say that improvement will or can occur as readily as for seasonal forecasting. One concern is that, while ENSO provides a well-defined interannual phenomenon understandable as the result of a defined mechanism (e.g., delayed oscillator theory, recharge–discharge mechanism), there do not yet appear to be so clearly defined decadal phenomena, at least in the Pacific. Large-scale patterns such as the Pacific decadal oscillation (PDO; Mantua et al. 1997) do not dominate decadal variability to the same degree as ENSO dominates interannual variability and moreover may represent the superposition and/or convolution of a few mechanisms (e.g., Schneider and Cornuelle 2005; Newman 2007, hereafter N07), including the low-frequency or reddened tail of interannual phenomena (e.g., Newman et al. 2003a; Vimont 2005), rather than the result of one identifiable physical process. The effects of anthropogenic climate change complicate comparison between models and observations, and how to distinguish natural from anthropogenically forced decadal variability is a fundamental problem (Solomon et al. 2011).
Currently, a number of modeling centers have carried out decadal “hindcasts” as part of phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012). It is an important long-range goal of climate diagnosis to provide insights that will help improve decadal forecasts from these coupled GCMs (CGCMs). Here, we diagnose annual-to-decadal variability and predictability, both unforced and forced, with an empirically determined linear model of the observed system.
2. Multivariate red noise
Climate variability is often characterized by a notable separation between the dominant time scales of interacting processes. For example, compared to much longer ocean time scales, weather varies so rapidly that it has almost no memory. Weather forcing of the ocean can then be approximated as white noise forcing of a damped integrator or univariate red noise for an anomaly scalar time series, the simplest null hypothesis of climate (Hasselmann 1976). When extended to the more general case of anomalies representing many evolving regional patterns of climate variables, this approximation based on time-scale separation becomes “multivariate red noise” (more generally known as a multivariate Ornstein–Uhlenbeck process, or a continuous version of a multivariate AR1 process). As opposed to its univariate counterpart, multivariate red noise contains both stationary and propagating anomaly patterns (so that scalar indices derived from it can have spectral peaks) and allows for nonsymmetric dynamical relationships (so that, despite the lack of exponential modal instability, some anomalies experience significant but transient growth and evolution over finite time intervals). A nonlinear system usefully approximated by multivariate red noise can be said to be “predictably” linear.
Linear inverse models (LIMs; Penland and Sardeshmukh 1995), which empirically determine multivariate red noise from observations, provide excellent approximations of observed Pacific SST anomaly evolution on time scales ranging from weeks to years. N07 found that a LIM constructed from annually averaged tropical and North Pacific SSTs reproduced observed tropical–North Pacific relationships on decadal time scales better than most CMIP3 coupled GCMs. Subsequent studies have had similar success in the Atlantic (Hawkins and Sutton 2009; Zanna 2012) and both ocean basins (Vimont 2012).
In this paper, the N07 analysis is extended to a state vector constructed from global SSTs and surface land temperatures. The resulting LIM is shown to have skill comparable to three CMIP5 decadal hindcast models that used yearly start dates for the period 1960–2000. The sources of this skill are diagnosed and evaluated in the context of simpler climate indices.
3. Data and model details
Datasets used in this study were SSTs from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST; Rayner et al. 2003) and surface (2 m) land temperatures from the University of East Anglia Climatic Research Unit (CRU) TS 3.1 dataset (Mitchell and Jones 2005) during the period 1901–2009. Monthly data were interpolated onto 2° latitude × 5° longitude grid boxes. Anomalies were determined by removing the climatological monthly mean from data that were temporally smoothed with a 12-month running mean. This aids our analysis, which does not consider seasonality. However, seasonality is likely still relevant to decadal variability (e.g., Newman et al. 2003a; Vimont 2005). Data were prefiltered in an EOF space retaining about 81% of the SST and 62% of the surface land temperature variances. SST EOFs were determined within ice-free regions only; the remaining ocean regions were regressed onto the SST principal components (PCs) to provide complete spatial coverage.








The leading 11 (6) EOFs of anomalous SSTs (surface land temperatures) were retained for the model, with the corresponding PCs defining a 17-component state vector x(t).
Finally, the LIM must be tested on independent data, so estimates of
Hindcasts from the LIM are compared to hindcasts from three CMIP5 CGCMs: third climate configuration of the Met Office Unified Model [HadCM3 (DePreSys)]; Max Planck Institute Earth System Model, low resolution (MPI-ESM-LR); and Geophysical Fluid Dynamics Laboratory Climate Model, version 2.1 (GFDL CM2.1). These models were chosen since they were the only available models whose hindcasts were initialized yearly rather than every 5 yr. Skill was determined from the bias-corrected ensemble mean for each hindcast initialization.
4. Results
a. Testing the empirical model
We first test the assumption of linear dynamics underlying the LIM. Figure 1 shows the observed lag-autocovariance for n = 2, 4, 6, and 8 yr compared to that predicted by (4). Generally, the match is quite good and confirms that the LIM reproduces the statistics of evolving surface temperature anomalies over the twentieth century, with deficiencies over Northern Europe and some parts of North America. Also, the LIM captures the 2-yr lag anticorrelation in the tropical eastern Pacific but underestimates its amplitude, likely because some subsurface ocean anomalies are needed in addition to SST to completely represent ocean state evolution within the LIM on time scales greater than 1 yr (Newman et al. 2011). Still, the LIM does implicitly include those subsurface effects that are linearly related to SST (Penland and Sardeshmukh 1995), in contrast to a physical dynamical model in which the evolution of the state vector is governed only by explicitly represented interactions among its components.
(left) Observed and (right) difference between observed and LIM predicted surface temperature lag covariance for lags of (top) 2, (middle) 4, and (bottom) 6 yr. Contour interval is 0.05 K2; zero contour is removed for clarity and negative values have dashed contours.
Citation: Journal of Climate 26, 14; 10.1175/JCLI-D-12-00590.1
b. Skill of LIM and CMIP5 decadal hindcasts
Figure 2 shows skill measured by local anomaly correlation for hindcasts averaged over leads of 2–5 yr (left panels) and 6–9 yr (right panels). The top two rows show that the LIM sets a much higher benchmark for skill than does damped persistence (i.e., a grid-space univariate AR1 model). Additionally, for yearly start dates from 1960 to 2000, skill of LIM decadal hindcasts is comparable to and sometimes better than skill from CMIP5 decadal hindcasts. Maxima (e.g., tropical Indian and Atlantic Oceans, central North Atlantic, southwestern United States, and east central Asia) and minima (e.g., eastern equatorial and northeast Pacific, western South America, and off U.S. Atlantic coast) of skill often coincide between the LIM and CGCM hindcasts, both those shown here and those documented in other studies (e.g., van Oldenborgh et al. 2012; Kim et al. 2012). LIM can thus serve as a benchmark for decadal forecast skill.
Local anomaly correlation of hindcasts averaged over leads of (left) 2–5 yr and (right) 6–9 yr, for damped persistence, the LIM, and the CMIP5 models, for hindcasts initialized yearly from 1960 to 2000. Contour interval is 0.1 with negative values indicated by blue shading. Shading of positive values starts at 0.1; warmer shading denotes larger values of correlation.
Citation: Journal of Climate 26, 14; 10.1175/JCLI-D-12-00590.1
Hindcast skill for the Atlantic multidecadal oscillation (AMO) and PDO indices is shown in Fig. 3; for comparison, root-mean-square error (rmse) is also displayed. Here, the AMO index is the area-weighted North Atlantic mean SST, between 0° and 60°N, minus the global mean SST (Trenberth and Shea 2006; van Oldenborgh et al. 2012), and the PDO index is the projection of SST on the leading EOF of monthly detrended North Pacific SST anomalies between 20° and 60°N (Mantua et al. 1997). For the LIM, as previously found for CMIP5 hindcasts (Guemas et al. 2012; Kim et al. 2012; van Oldenborgh et al. 2012), AMO skill is generally higher than PDO skill, which drops off very rapidly for leads greater than 1 yr. The LIM again clearly provides a more stringent decadal forecast test than persistence (here determined from the indices time series and not the gridded values), except for short lead PDO forecasts, with skill that is generally higher (although not significantly so) than the CGCMs. Even global mean temperature hindcast skill is generally comparable.
(left) Anomaly correlation skill comparison for (a) PDO and (b) AMO indices, and (c) global mean temperature, from hindcasts initialized in the years 1960–2000, calculated as described in the text. Using the lag-1 autocorrelation to roughly estimate degrees of freedom, a value of r ~ (0.4, 0.55, and 0.55) is 95% significant (different from 0) for (a)–(c). (right) As in (left), but using rmse as the skill measure for (d) PDO, (e) AMO, and (f) global mean temperature. Damped persistence is determined from the lag-1 autocorrelation of the observed index time series, not from local gridded values (as in Fig. 2), which would yield lower index skill.
Citation: Journal of Climate 26, 14; 10.1175/JCLI-D-12-00590.1
c. Diagnosing forecast skill


The three leading eigenmodes of
(a)–(f) Leading empirical eigenmodes, with their associated projection coefficient time series (determined by the projection of the data on the corresponding adjoint). (g),(h) Most energetic phase of eigenmode pair 9/10, which corresponds to interannual ENSO and its time series (least energetic phase not shown). The contour interval is the same in (a),(c),(e),(g). The sign is arbitrary but consistent with coefficient time series. Red (blue) shading represents positive (negative) values. Also shown are the EFTs for the eigenmodes and period T for the propagating eigenmodes. Note that in general these eigenmodes do not correspond to the EOFs, which are constrained to be orthogonal.
Citation: Journal of Climate 26, 14; 10.1175/JCLI-D-12-00590.1
The impact of these eigenmodes upon hindcast skill over the entire 1901–2009 record was explored by creating several different sets of hindcast initializations, each with nonzero projections on different subsets of the eigenmodes. Figure 5 shows that almost all of the total hindcast skill (top panels of Fig. 5) can be recovered for hindcasts initialized with data projected on the leading three eigenmodes alone (middle panels of Fig. 5). Figure 6 shows the impact of the least damped eigenmodes on climate index skill. As in Fig. 5, almost all skill is retained for hindcasts initialized with the leading three eigenmodes only, except where ENSO eigenmodes also impact PDO skill (Fig. 6a) primarily at shorter leads. Longer-range PDO skill is related to the least energetic phase, while AMO skill (Fig. 6b) is related to the most energetic phase. Global mean temperature skill (Fig. 6c) is mostly due to the leading eigenmode but also has a contribution from the most energetic phase since it contributes to both the AMO and global mean. Note also that hindcast skill of the detrended LIM and full LIM is almost identical for both the AMO and PDO indices; in fact, hindcast skill of both indices is about the same whether or not the leading eigenmode is included in initializations of the full LIM. Apart from these indices, however, much of the LIM skill appears due to the trend (cf. top and bottom panels of Fig. 5) or alternatively the leading eigenmode, as has also been suggested for the CGCMs (van Oldenborgh et al. 2012).
(top) LIM hindcast skill for the 1901–2009 period for hindcasts averaged over leads of (left) 2–5 yr and (right) 6–9 yr. (middle) Difference between the skill of the LIM hindcasts for the 1901–2009 period [i.e., in (top)] and a second set of LIM hindcasts where the projection of the initial conditions on only the three leading eigenmodes (Figs. 4a–f) is retained. (bottom) Detrended LIM hindcast skill for the 1901–2009 period for hindcasts averaged over leads of (left) 2–5 yr and (right) 6–9 yr (verified against detrended data).
Citation: Journal of Climate 26, 14; 10.1175/JCLI-D-12-00590.1
LIM skill of (a) PDO and (b) AMO indices, and (c) global mean temperature, for hindcasts where different initial conditions are used, for the 1901–2009 period. Also shown in all three panels is the detrended LIM skill. See text for description.
Citation: Journal of Climate 26, 14; 10.1175/JCLI-D-12-00590.1
5. Concluding remarks
A LIM, empirically constructed from annually averaged surface temperature observations using a 1-yr lag, has been shown to be a more suitable benchmark for decadal forecasts than damped persistence. In fact, it appears that CMIP5 CGCM decadal hindcast skill does not notably exceed that expected from a predictably linear dynamical system. To the extent that LIM and CGCM skill are comparable, both in amplitude and in geographical variation, the much simpler LIM can also be used to diagnose sources of forecast skill for both forecast systems.
Estimates of decadal predictability from even 100 yr of data are necessarily limited (e.g., Wunsch 2013), so it is important to view the results of this paper with some caution. In the LIM we obtained, virtually all long-range skill comes from the leading three eigenmodes with longest e-folding times. The leading eigenmode represents the global secular trend pattern while the next eigenmode pair represents decadal variability. Note that this eigenmode pair does not propagate with a multidecadal period but instead has a sufficiently long e-folding time that it varies on a multidecadal time scale. The most notable deficiency in CGCM hindcast skill compared to the LIM appears to be related to this eigenmode over the Pacific. It is interesting that the similar eigenmode found in the Pacific-only LIM was poorly simulated in all the CMIP3 preindustrial control and historical model simulations (N07; Solomon et al. 2011). Whether the global version of this eigenmode continues to be poorly represented by the CMIP5 models and, if so, why are subjects for further investigation.
Acknowledgments
The author thanks Mike Alexander, Prashant Sardeshmukh, Amy Solomon, and three anonymous reviewers for helpful comments. This work was supported by grants from NOAA CVP and NSF 1035423.
REFERENCES
DelSole, T., M. K. Tippett, and J. Shukla, 2011: A significant component of unforced multidecadal variability in the recent acceleration of global warming. J. Climate, 24, 909–926.
Guemas, V., F. J. Doblas-Reyes, F. Lienert, Y. Soufflet, and H. Du, 2012: Identifying the causes of the poor decadal climate prediction skill over the North Pacific. J. Geophys. Res., 117, D20111, doi:10.1029/2012JD018004.
Hasselmann, K., 1976: Stochastic climate models. Part I. Theory. Tellus, 28, 474–485.
Hawkins, E., and R. Sutton, 2009: Decadal predictability of the Atlantic Ocean in a coupled GCM: Forecast skill and optimal perturbations using linear inverse modeling. J. Climate, 22, 3960–3978.
Kim, H.-M., P. J. Webster, and J. A. Curry, 2012: Evaluation of short-term climate change prediction in multi-model CMIP5 decadal hindcasts. Geophys. Res. Lett., 39, L10701, doi:10.1029/2012GL051644.
Krueger, O., and J.-S. von Storch, 2011: A simple empirical model for decadal climate prediction. J. Climate,24, 1276–1283.
Kwon, Y.-O., M. A. Alexander, N. A. Bond, C. Frankignoul, H. Nakamura, B. Qiu, and L. Thompson, 2010: Role of Gulf Stream, Kuroshio-Oyashio, and their extensions in large-scale atmosphere–ocean interaction: A review. J. Climate,23, 3249–3281.
Laepple, T., S. Jewson, and K. Coughlin, 2008: Interannual temperature predictions using the CMIP3 multi-model ensemble mean. Geophys. Res. Lett., 35, L10701, doi:10.1029/2008GL033576.
Latif, M., and Coauthors, 1998: A review of the predictability and prediction of ENSO. J. Geophys. Res., 103 (C7 14 375–14 393.
Livezey, R., 1999: The evaluation of forecasts. Analysis of Climate Variability: Applications of Statistical Techniques, H. von Storch and A. Navarra, Eds., Springer Verlag, 177–196.
Mantua, N. J., S. R. Hare, Y. Zhang, J. M. Wallace, and R. Francis, 1997: A Pacific interdecadal climate oscillation with impacts on salmon production. Bull. Amer. Meteor. Soc., 78, 1069–1079.
Mitchell, T. D., and P. D. Jones, 2005: An improved method of constructing a database of monthly climate observations and associated high-resolution grids. Int. J. Climatol., 25, 2005.
Newman, M., 2007: Interannual to decadal predictability of tropical and North Pacific sea surface temperatures. J. Climate, 20, 2333–2356.
Newman, M., G. P. Compo, and M. A. Alexander, 2003a: ENSO-forced variability of the Pacific decadal oscillation. J. Climate, 16, 3853–3857.
Newman, M., P. D. Sardeshmukh, C. R. Winkler, and J. S. Whitaker, 2003b: A study of subseasonal predictability. Mon. Wea. Rev., 131, 1715–1732.
Newman, M., M. A. Alexander, and J. D. Scott, 2011: An empirical model of tropical ocean dynamics. Climate Dyn., 37, 1823–1841.
Penland, C., and P. D. Sardeshmukh, 1995: The optimal growth of tropical sea surface temperature anomalies. J. Climate, 8, 1999–2024.
Rayner, N. A., D. E. Parker, E. B. Horton, C. K. Folland, L. V. Alexander, D. P. Rowell, E. C. Kent, and A. Kaplan, 2003: Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. J. Geophys. Res., 108, 4407, doi:10.1029/2002JD002670.
Sardeshmukh, P. D., G. P. Compo, and C. Penland, 2000: Changes in probability associated with El Niño. J. Climate, 13, 4268–4286.
Schneider, N., and B. D. Cornuelle, 2005: The forcing of the Pacific decadal oscillation. J. Climate, 18, 4355–4373.
Solomon, A., and M. Newman, 2011: Decadal predictability of tropical Indo-Pacific Ocean temperature trends due to anthropogenic forcing in a coupled climate model. Geophys. Res. Lett., 38, L02703, doi:10.1029/2010GL045978.
Solomon, A., and Coauthors, 2011: Distinguishing the roles of natural and anthropogenically forced decadal climate variability: Implications for prediction. Bull. Amer. Meteor. Soc., 92, 141–156.
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc.,92, 485–498.
Ting, M., Y. Kushnir, R. Seager, and C. Li, 2009: Forced and internal twentieth-century SST trends in the North Atlantic. J. Climate, 22, 1469–1481.
Trenberth, K. E., and D. J. Shea, 2006: Atlantic hurricanes and natural variability in 2005. Geophys. Res. Lett., 33, L12704, doi:10.1029/2006GL026894.
van Oldenborgh, G. J., M. Balmaseda, L. Ferranti, T. Stockdale, and D. Anderson, 2005: Evaluation of atmospheric fields from the ECMWF seasonal forecasts over a 15-yr period. J. Climate, 18, 3250–3269.
van Oldenborgh, G. J., F. J. Doblas-Reyes, B. Wouters, and W. Hazeleger, 2012: Skill in the trend and internal variability in a multi-model ensemble. Climate Dyn., 38, 1263–1280, doi:10.1007/s00382-012-1313-4.
Vimont, D. J., 2005: The contribution of the interannual ENSO cycle to the spatial pattern of decadal ENSO-like variability. J. Climate, 18, 2080–2092.
Vimont, D. J., 2012: Analysis of the Atlantic meridional mode using linear inverse modeling: Seasonality and regional influences. J. Climate, 25, 1194–1212.
Wu, Z., N. E. Huang, J. M. Wallace, B. V. Smoliak, and X. Chen, 2011: On the time-varying trend in global-mean surface temperature. Climate Dyn., 37, 759–773, doi:10.1007/s00382-011-1128-8.
Wunsch, C., 2013: Covariances and linear predictability of the Atlantic Ocean. Deep-Sea Res. II,85, 228–243, doi:10.1016/j.dsr2.2012.07.015.
Zanna, L., 2012: Forecast skill and predictability of observed Atlantic sea surface temperatures. J. Climate, 25, 5047–5056.