## Abstract

Predictability properties of the Atlantic meridional overturning circulation (AMOC) are measured and compared to those of the upper-500-m heat content in the North Atlantic based on control simulations from nine comprehensive coupled climate models. By estimating the rate at which perfect predictions from initially similar states diverge, the authors find the prediction range at which initialization loses its potential to have a positive impact on predictions. For the annual-mean AMOC, this range varies substantially from one model to another, but on average, it is about a decade. For eight of the models, this range is less than the corresponding range for heat content. For 5- and 10-yr averages, predictability is substantially greater than for annual means for both fields, but the enhancement is more for AMOC; indeed, for the averaged fields, AMOC is more predictable than heat content. Also, there are spatial patterns of AMOC that have especially high predictability. For the most predictable of these patterns, AMOC retains predictability for more than two decades in a typical model. These patterns are associated with heat content fluctuations that also have above-average predictability, which suggests that AMOC may have a positive influence on the predictability of heat content for these special structures.

## 1. Introduction

The Atlantic meridional overturning circulation (AMOC) would seem to be a component of variability that decadal prediction efforts (Meehl et al. 2014; Yeager et al. 2012; Robson et al. 2012; Swingedouw et al. 2013; Hazeleger et al. 2013) should focus on. After all, it influences meridional heat transport in the North Atlantic (Ganachaud and Wunsch 2003; Latif et al. 2006; Srokosz et al. 2012; Msadek et al. 2013), thereby inducing basinwide sea surface temperature anomalies in some climate models. These features are reminiscent of the Atlantic multidecadal variability (AMV; Knight et al. 2006; Delworth and Mann 2000), which has been related to various climate phenomena in North America and Europe (Sutton and Hodson 2005; Zhang and Delworth 2006). Another reason that AMOC would appear to be an attractive feature for decadal prediction is that a number of studies have found that it is predictable for a decade or even longer (Griffies and Bryan 1997; Pohlmann et al. 2004; Collins et al. 2006; Msadek et al. 2010; Teng et al. 2011; Persechino et al. 2013). Here we are using “predictable” in the sense of a system’s sensitivity to uncertainties in initial conditions as one would measure for a model using a “perfect model” assumption. Sometimes this is referred to as “initial-value predictability.”

On the other hand, what matters for climate prediction is not whether AMOC is predictable by some absolute measure but rather whether it is more or less predictable than North Atlantic heat content, for it is heat content and its sea surface temperature (SST) manifestation that have direct societal relevance through their influences on surface climate. If AMOC is less predictable than heat content, then improvements in its prediction will not be effective in improving predictions of the heat content fluctuations that it drives. Given its huge mass and inertia, it is not surprising that earlier studies have found AMOC to be a highly predictable component of the ocean state, but very few perfect model predictability investigations have made a quantitative comparison of its predictability to that of heat content. Teng et al. (2011) did make such a comparison using two perturbation ensemble experiments and found that AMOC was actually less predictable than North Atlantic heat content. Another indication that AMOC may be less predictable than heat content is seen if we consider power spectra of intrinsic AMOC and North Atlantic heat content fluctuations in various climate models. As described in more detail in section 2, if one plots a nine-model average of such spectra (Fig. 1, top), it is heat content that is much redder. Results from simple stochastic models suggest this indicates AMOC’s predictability is likely to be lower than that of the heat content (Griffies and Bryan 1997; Boer 2000).

Given the possibility that AMOC may not be as predictable as heat content, and considering the implications that such a possibility may have for climate prediction model development, assimilation system design, and observing system strategies, we have carried out a comparison of the predictability of these two fields in a more systematic fashion than has been done to date. Ideally, we would investigate their predictability in nature, as has been done for sea surface temperature (Newman 2007; Zanna et al. 2012; Wunsch 2013). But there is some question as to whether the climate record is long enough for this purpose (Wunsch and Heimbach 2013), and observations of AMOC have only just begun (Cunningham et al. 2007; Johns et al. 2011).

Instead we have done our comparison for climate models. Predictability properties are known to vary substantially from one climate model to another (Collins et al. 2006; Branstator et al. 2012; Branstator and Teng 2012). This is not surprising given the large model-to-model differences in mean-state and variability characteristics that various investigations have documented and that are manifested in the large biases that many models have with respect to nature (e.g., Large and Danabasoglu 2006; Griffies et al. 2011; Danabasoglu 2008; and the references therein). So there is large uncertainty regarding to what extent model predictability would match that of the nature. For this reason, we have measured the predictability properties of many models as a means of assessing the range of estimates that currently can be made of the predictability of nature. Furthermore, the predictability of each model represents an inherent limit on the accuracy of its predictions and should be taken into account when evaluating and applying those predictions. It is also known that predictability can depend strongly on the start date (Collins et al. 2006; Pohlmann et al. 2004; Hermanson and Sutton 2010; Msadek et al. 2010), so unlike most previous studies, here we have considered hundreds of start dates. And because predictability can greatly depend on the circulation and temperature structures involved and the temporal scales being considered, we have investigated how the relative predictability of AMOC and heat content is affected by these considerations.

As described in the remainder of this paper, overall we find that for the current generation of climate models, AMOC is more predictable than North Atlantic heat content only for predictions of multiyear averages or for specific AMOC structures that turn out to have unusually high predictability.

## 2. Models, measures, and methods

### a. Models

We have based our study of predictability on preindustrial control runs from nine models. The models (and model acronym expansions), together with the length of the control runs, are given in Table 1. All of these models except CCSM3 participated in phase 5 of the Coupled Model Intercomparison Project (CMIP5) (Taylor et al. 2012).

In our study, AMOC corresponds to the zonal-mean overturning streamfunction within 30°S–75°N, while heat content, which is referred to as T0–500 in figures, consists of the upper-500-m-averaged ocean temperature within 20°–75°N, 60°W–0°. As explained in the appendix, our results are insensitive to the domain used for heat content, provided the domain includes both the subpolar and subtropical gyres. Additionally, we have used the upper-500-m-averaged salinity in the same North Atlantic domain. All fields are represented by annual means, and their variability is represented by the leading empirical orthogonal functions (EOFs) and corresponding principal components (PCs) of the detrended fields, as calculated for each model individually from its control run.

There is a rich diversity of variability for these fields among the models. This can be seen in the bottom panels of Fig. 1, which show the power spectra of AMOC and heat content, respectively, for each individual model. For a given model, each spectrum has been derived for each of the leading 10 PCs, and then we averaged the 10 spectra with variance weighting. For AMOC, these PCs represent at least 86% of the variance in each model, while for heat content, they represent at least 74%—except for MPI-ESM-LR, where they explain only 54%. The wide variation in placement and amplitude of individual spectral peaks, as well as differences in the degree of redness, is consistent with our expectation that the predictability properties of the various models may be very different from each other (DelSole and Tippett 2009).

### b. Measures

Quantifying the predictability of a model involves comparing an evolving distribution of initially very similar forecast states to the model’s climatological distribution of states. The more these two distributions are dissimilar, the more information is contained in the predicted distribution. Here we have used a mean-square error (MSE) measure that only involves variances to make this comparison. In particular, , where *d*_{i} is the ratio of the variance of the forecast distribution to the variance of the climatological distribution for the *i*th PC and *k* is the number of PCs used to measure predictability (*k =* 10 throughout our study). For clarity, we point out that when we use the word “error,” we are not referring to a comparison of a prediction with nature. Instead we are following common usage in predictability studies in which error refers to an estimate of the difference between the mean and individual realizations of perfect model ensemble integrations.^{1}

In most predictability studies, the forecast distribution pertains to an ensemble of perfect predictions initiated from perturbations to a particular start date, and MSE measures the spread about the ensemble mean. As in Branstator et al. (2012) and Branstator and Teng (2012), in the current study, we instead use an aggregate distribution that combines single predictions from hundreds of start dates, one from each year of a model’s control run. The prediction from each start date consists of the segment of the control integration that begins from that date and is considered to be a perfect prediction. We subtract from each of these predictions an estimate of the mean of a large ensemble of perfect predictions initialized by perturbations to the start date. Hence, MSE of the evolving aggregate distribution measures the average spread of ensemble predictions starting at each year in the model’s control trajectory, though it utilizes just one member of each ensemble. Relative entropy is an alternative measure of predictability that has desirable properties (Kleeman 2002; Branstator and Teng 2010). But for the aggregate distributions we employ, it is related to MSE in a straightforward way,^{2} so we use the conceptually simpler MSE rather than relative entropy.

To facilitate comparison of predictability among different models or variables, we have used two reference values. One is MSE = 0.90, which corresponds to a particular relative entropy value that Branstator et al. (2012) referred to as *R*^{nom}. It can be interpreted as the threshold of 0.90 statistical significance if one were estimating predictability using 18-member perturbation ensembles. We sometimes refer to it as the limit of predictability or the time of saturation. To facilitate comparing predictability at earlier stages of forecasts, we use an arbitrary MSE value of 0.60.

### c. Estimating predictability using control runs

Traditionally, predictability studies generate statistics concerning the rate at which similar states spread in phase space via explicit integrations of ensembles of realizations starting from similar states, but this is not practical for our study involving many models and many start dates. Instead we employ the method introduced by DelSole and Tippett (2009), as modified and applied by Branstator et al. (2012) and Branstator and Teng (2012), to quantify the spread of the aggregate distributions described above, one distribution for each model. As mentioned above, to produce these distributions, we must be able to estimate the mean prediction of an ensemble of perturbations about any given initial condition. As explained in detail in the above papers, we assume that such an ensemble mean at forecast range *τ* for an ensemble initiated by perturbations to the state **s** can be well approximated by *L*^{τ}**s**, where *L*^{τ} is the regression operator that relates states at time *t* to states at *t* + *τ* in the model’s control integration. We then estimate the members of the aggregate by evaluating **s**_{t+τ} − *L*^{τ}**s**_{t}, for each *t* where **s**_{t} is the model state at time *t* in the control, and then use them to calculate MSE. In practice, we only need to accurately estimate the ensemble-mean prediction of *k* PCs of AMOC or heat content. The appendix describes how we chose the variables for the state vector **s** that are best at doing this. They depend on the model and field (AMOC or heat content) whose predictability is investigated.

As with all regression methods, we must guard against the effects of overfitting relationships present in a finite sample. The approach that we have used to avoid these effects is explained in the appendix. Provided these safeguards have been implemented, strictly speaking, the technique gives a lower bound on predictability because approximating a model’s dynamics by linear *L*^{τ} should overestimate its errors. However, Branstator et al. (2012) found that this method gave good approximations to aggregate predictability when applied to heat content, and the appendix presents evidence it does the same for AMOC. Other studies have also estimated predictability in the ocean by using linear approximations (Newman 2007; Hawkins and Sutton 2009; Zanna et al. 2012), though usually they employ the additional assumption that predictions at any range can be made by repeated application of a single operator (Penland and Matrosova 1998).

### d. Most predictable pattern

A key part of our investigation involves patterns of state variables that optimize predictability. Of the various methods previously employed to find such patterns (e.g., Branstator et al. 1993; Renwick and Wallace 1995; Newman 2007; DelSole and Chang 2003), we have chosen to use canonical correlation analysis (CCA; von Storch and Zwiers 1999). Via CCA, for a given climate model and forecast range *τ*, we find the state vector and the vector of either AMOC or heat content PCs at lag *τ*, whose amplitudes are most highly correlated—hence the superscript 1. As explained by DelSole and Chang (2003), this is equivalent to finding the prediction pattern (and the initial state that produces it ) whose amplitude is optimally predicted in a least squares sense by *L*^{τ} from section 2c. In the context of our predictability study, where *L*^{τ} is assumed to perfectly estimate the evolution of model ensemble means, this corresponds to being the pattern with the highest predictability (i.e., lowest MSE), which we will refer to as the most predictable pattern (MPP). DelSole and Chang (2003) have pointed out that for our application, patterns found in this way are equivalent to those one would find from predictable component analysis (Renwick and Wallace 1995; Schneider and Griffies 1999).

Again, note that while is a vector consisting of the same model-dependent predictors used to construct *L*^{τ}, is a vector whose elements are *k* PCs of either AMOC or heat content, depending on whether we want to find highly predictable AMOC or heat content patterns. Also note that we calculate a different pair of patterns for each forecast range *τ* and each model. More generally, for each model and each *τ*, the method produces a collection of pattern pairs that have the property that the amplitudes of and and of and are uncorrelated for arbitrary *j* and *j*′ (such that *j ≠ j*′), and is more predictable than (DelSole and Chang 2003). The set of *k * span the *k* PCs, so predicted fields can be decomposed into a linear combination of . A desirable property of the resulting patterns is that, unlike in some techniques for finding patterns with high predictability, and can be different, and the evolution of state variables associated with these patterns can be found by successively multiplying by .

## 3. Quantification of predictability

### a. Generic variability

When we use the method described in section 2c to calculate the MSE of annual-mean AMOC for each of our study’s nine models (Fig. 2, top left), we find MSE increases rapidly with the prediction range in most models, indicating a fast decrease in the influence of initial states. Most models reach the 0.60 threshold in a year or two and the 0.90 limit after about a decade, consistent with studies of the prediction skill of AMOC with climate models (e.g., Pohlmann et al. 2009). Also shown in the figure are large model-to-model variations in predictability. For example, CCSM4 and MPI-ESM-LR reach the 0.90 limit after 4 or 5 yr, while it takes GFDL CM3 and NorEMS1-M approximately 2 decades to reach this value. These extremes in behavior are consistent with the spectra of Fig. 1 (bottom left) in which the first two of these models have much flatter AMOC spectra than the latter two models.

A plot of MSE for annual-mean heat content (Fig. 2, top right) indicates its predictability is distinctly greater than that of AMOC in the first decade. For many models, the 0.60 threshold is not reached until about year 3, and for eight of the nine models, MSE for heat content remains lower than for AMOC for at least 10 yr. The distinction between AMOC and heat content is clearly shown when we plot the model-average MSE in Fig. 2 (bottom). On the other hand, after 10 yr, as both variables approach the 0.90 limit, there is little difference between the two.

### b. Temporal averages

Inspection of the spectra in Fig. 1 suggests AMOC’s relatively low predictability for annual means may not carry over to predictions of multiyear time averages. The vertical dashed line in that figure separates variability with periods longer than 15 yr from higher-frequency variability. For these high frequencies, AMOC spectra tend to be much flatter than heat content spectra, while for periods longer than 15 yr, heat content spectra are, on average, flatter than AMOC spectra. In general, flatter spectra are associated with lower predictability, and time averaging will thus emphasize the frequencies for which AMOC is likely to be more predictable.

To quantify the effect of time averaging on our predictability comparison, we have repeated our calculations of MSE, but for predictions of 5-yr means of AMOC centered on *τ*. The results, plotted with a medium-thickness red line, as well as MSE for annual means (thin red line, repeated from Fig. 2, bottom), are shown in Fig. 3. The substantially higher predictability of 5-yr means in comparison to annual means is dramatic, in terms of both MSE values at all ranges and the range at which the 0.90 limit is reached. For these results, we have again used the method described in the appendix to calculate optimal predictors for 5-yr means, rather than using the predictors listed in Table 1. Had we used the same predictors employed for annual means, the results in Fig. 3 would not be substantially different.

Comparing MSE for annual means at range *τ* to 5-yr means centered on range *τ* is not necessarily a fair comparison because 5-yr means at range *τ* include contributions from ranges less than *τ*. A better comparison is between MSE for 5-yr means at range *τ* and the average of MSE for annual-mean predictions at ranges *τ* − 2, *τ* − 1*, τ, τ* + 1, and *τ* + 2. Curves for this latter quantity are plotted in Fig. 3 as dashed lines. Even when compared to these modified results for annual means, it is apparent that 5-yr averages of AMOC are much more predictable than are annual means.

We have repeated this analysis for heat content (blue lines in Fig. 3) and again find a systematic increase in predictability for 5-yr means. However, the effect is less dramatic in both the amount of reduction in MSE and the number of years by which the MSE saturation time is extended. The improvement is so much greater for AMOC that for 5-yr averages AMOC is more predictable than heat content at all ranges.

We have also considered 10-yr means of AMOC, again reoptimizing the predictors, and, as shown by the thick solid lines in Fig. 3, they are even more predictable than 5-yr means. Also, AMOC’s advantage over the heat content is even greater for 10-yr means than for 5-yr means.

### c. AMOC index and MPPs

To this point, we have considered whether AMOC has the potential to positively influence heat content predictability by determining whether its predictability is greater than heat content predictability for generic fluctuations. An alternative situation in which AMOC could also be a positive influence on heat content predictions is if some components of it were highly predictable; those components would have the potential to induce heat content fluctuations with high predictability. One component for which there is widespread interest is the aspect of AMOC variability associated with the AMOC index defined by the maximum zonal-mean overturning streamfunction between 0° and 60°N and between 500 and 5000 m. We have used the same regression operators for annual-mean AMOC that we used elsewhere to calculate MSE of the AMOC index (Fig. 2, bottom). Interestingly, its predictability is substantially greater than that of generic AMOC variability and is similar to that of generic heat content. Msadek et al. (2010) also found the AMOC index to have higher predictability than the predictability of general AMOC fluctuations in their study of a comprehensive climate model.

To pursue the search for highly predictable components of AMOC more systematically, we have employed the MPP methodology explained in section 2d to represent predictions at range *τ* as linear combinations of the patterns found by optimizing the predictability of AMOC. When MSE is calculated for predicted AMOC fields that have been filtered to retain only (which we call MPP1), we find MSE is substantially smaller than MSE for unfiltered AMOC (Fig. 4). Even for 20-yr predictions, it has only reached a value of 0.75. Also predictions that have been filtered to retain and (MPP1–2) or (MPP1–5) are much more predictable than unfiltered predictions (Fig. 4). For the purpose of our study, it is especially noteworthy that predictions of AMOC filtered in this way are more predictable than forecasts of generic heat content.

Also plotted in Fig. 4 is MSE of heat content filtered to retain only MPP1 of heat content for each forecast range. For the first 15 yr, these patterns are more predictable than MPP1 for AMOC, although the two curves start to merge thereafter. This suggests that there are factors other than the influence of AMOC that contribute to the very high predictability of heat content MPP1 at all but the longest ranges.

MPP analysis does not constrain the resulting patterns to have high amplitude. When we calculate the variance in the control integrations of projections onto AMOC MPP1 for each range, we find that, except for predictions longer than about 14 yr, in an average model, they explain more than 10% of the variance (Fig. 5). Note that since our system state space has 10 AMOC degrees of freedom, 10% is the variance that randomly generated patterns, on average, are expected to represent. Also, Fig. 5 indicates that if instead of retaining only AMOC MPP1, we retain the leading two or even five patterns, much more of the natural variability of AMOC is represented, while, as we saw, AMOC predictability is also high (Fig. 4).

## 4. Heat content associated with AMOC MPPs

Having found components of AMOC whose predictability is higher than that for generic heat content, we would like to determine whether their effect is to increase the predictability of heat content. A definitive determination would entail measuring the predictability of the heat content fluctuations caused by the leading AMOC MPPs. But because interactions between AMOC and heat content are likely to be two way (Buckley et al. 2012; Tulloch and Marshall 2012), this is difficult to do. Instead, we have examined the heat content variability that is associated with AMOC MPPs, reasoning that AMOC has the potential to be responsible for a portion of heat content variability and its predictability if the two covary.

The first indication we show that highly predictable AMOC components are in fact associated with highly predictable heat content comes from examining the structure of MPP1 for 10-yr predictions of AMOC and heat content. We present examples for two models, namely, CNRM-CM5 (Fig. 6) and GFDL CM3 (Fig. 7). For each example, the AMOC component of the initial condition (Figs. 6a, 7a) that produces the most predictable AMOC state 10 yr later (Figs. 6c, 7c) is shown. Using the same predictor variables (listed in Table 1), we have also constructed regression operators that estimate the concurrent and 10-yr-lagged heat content anomalies that accompany (Figs. 6b, 7b and Figs. 6d, 7d, respectively). For comparison to the AMOC MPP1 year 10 structures, Figs. 6f and 7f show of MPP1 for 10-yr heat content predictions and Figs. 6e and 7e display the accompanying AMOC pattern. The similarity between the middle and bottom rows in both Figs. 6 and 7 indicates that whether one is optimizing the predictability of AMOC or heat content, the same patterns are found. This similarity is also found for many other models and forecast ranges.

Though AMOC and heat content MPP1 are very similar to each other in the Figs. 6 and 7 examples, the patterns in one model are very different from those in the other model. From a temporal standpoint, they are also very different. MPP1 for CNRM-CM5 is almost steady, while if one follows the evolution of MPP1 for GFDL CM3 it turns out to have a period of about 15 yr. For some other models, features in MPP1 are even more distinctly propagating than for GFDL CM3. For example, the heat content pattern in both CCSM3’s AMOC and heat content MPP1 includes eastward-moving anomalies that are very similar to the predictable propagating pattern that Teng et al. (2011) discovered in ensemble perturbation experiments with that model. Examination of plots of the leading MPPs for other models also shows considerable model-to-model variability, but a more comprehensive analysis is required to determine whether there are any structural similarities among MPPs from different models that cannot be discerned from the simple inspection that we have carried out.

Returning to the similarity between highly predictable AMOC and heat content states suggested by Figs. 6 and 7, further analysis serves to quantify the connection between AMOC and heat content predictability. This analysis consists of producing for each model a regression operator that maps AMOC anomalies in its control integration to concurrent heat content anomalies. By applying these operators to predictions and verifications of AMOC that have been represented as linear combinations of , we find the predictability of those aspects of heat content variability that are associated with the leading AMOC MPPs. For example, the solid red and blue curves in Fig. 8 show MSE values for heat content associated with AMOC MPP1–2 and MPP1–5, respectively. These can be compared to MSE for the total heat content field, which is also plotted in Fig. 8. It is apparent that heat content fluctuations associated with the highly predictable components of AMOC variability are more predictable than generic heat content variations.^{3} Similarly, we have found the predictability of that part of the heat content that is not associated with MPP1–2 and MPP1–5 AMOC fluctuations. MSE values for these portions of heat content are also plotted in Fig. 8. For all forecast ranges, the components of heat content not associated with these AMOC MPPs are less predictable than generic fluctuations. Note that removing the heat content associated with leading AMOC MPPs does not reduce heat content predictability by a large amount, which is another indication that there must be components of heat content with high predictability that are not associated with AMOC.

Another attribute of MPPs serves to provide a linkage between their high predictability and the relatively high predictability of 5- and 10-yr averages. If we recalculate the model-average spectra of AMOC and the North Atlantic heat content but for data that have been filtered to retain only MPP1 for each model and each variable, then the spectra in Fig. 9 result. In comparison to the spectra for generic variability (repeated from Fig. 1, top, as dashed lines in Fig. 9), the highly predictable patterns for both variables are redder, so MPP analysis is also a low-pass filter. The spectrum for AMOC MPP1 is still not as red as the spectrum for heat content MPP1, consistent with AMOC MPP1 being less predictable than heat content MPP1 (Fig. 3). But this implicit temporal filtering is much stronger for AMOC than for heat content leading to spectra that are almost the same. When we have repeated this calculation but retain MPP1–3 or MPP1–5, we find that the resulting spectra are intermediate between the MPP1 and the unfiltered spectra of Fig. 9.

## 5. Summary and implications

The focus of our study has been to compare the initial-value predictability properties of AMOC to those of heat content in the upper 500 m of the North Atlantic. We have reasoned that if AMOC predictability is greater than heat content predictability, then there is the potential for improving the initial-value predictions of near-surface conditions by improving predictions of AMOC. We have made this comparison for nine climate models, eight of which participated in CMIP5. To estimate the predictability properties of these models, we used a technique that enabled us to evaluate predictability averaged over hundreds of start dates. This method did not require traditional perturbation ensemble experiments but instead used only model control integrations and serves to further demonstrate that control simulations are assets that the modeling community can use to study many aspects of predictability.

The results of our predictability comparison were mixed. On one hand, when considering generic fluctuations of annual means, we found that AMOC is less predictable than heat content; that is, it is more sensitive to initial-state uncertainty. This fact is likely to be associated with the substantial amount of high-frequency variability in AMOC fields, perhaps a result of AMOC being driven by convective events that can be strongly influenced by the chaotic atmosphere (e.g., Dong and Sutton 2002; Danabasoglu 2008).

On the other hand, we found that there are components of AMOC variability that are much more predictable than are its generic variations. These components tend to be more predictable than generic components of North Atlantic heat content. We were able to isolate these components using two methods. One method employed canonical correlation analysis to find patterns that have above-average predictability. The resulting highly predictable components of AMOC covary with heat content variations, and these covarying heat content fluctuations are of large-enough amplitude to increase the predictability of heat content compared to what it would be if they were absent. Unfortunately, the structure of the highly predictable patterns was not the same in all models, so understanding the processes that produce them is likely to require extensive analysis. We also found evidence of highly predictable components of heat content that are not associated with AMOC. The other method that helped to isolate unusually predictable components of AMOC was low-pass filtering. Simple 5- and 10-yr-average filters were sufficient to produce fields that were much more predictable than annual means as well as being more predictable than similarly filtered heat content. Note that investigations of AMOC prediction skill have tended to focus on such multiyear averages (e.g., Keenlyside et al. 2008) and have reported much greater skill for such averages than for annual means (e.g., Pohlmann et al. 2009).

Our results carry implications for decadal predictions and the design of observational and assimilation systems that support those predictions. Specifically, they indicate that improvements in predictions of AMOC have the potential to improve the skill and range of heat content and SST predictions, but this potential is much greater if it pertains to the most predictable structures or if multiyear averages are being predicted. Since our results also suggest that there are also highly predictable components of upper-layer heat content that are not associated with AMOC, improving AMOC predictions is not the only way to potentially improve decadal predictions of fields that directly impact near-surface climate.

Though recognition of the existence of highly predictable components may eventually help to improve predictions, their application to observational system design may not be effective as long as there is large uncertainty concerning how well model-simulated predictability properties match nature. And since we found AMOC predictability properties vary from model to model, they necessarily do not match nature in most (and possibly all) models. On the other hand, even if model predictability properties are different from nature’s, then knowledge of these properties can be employed to improve the utility of issued predictions, because they can be used to identify which components of prediction fields users should have low confidence in as a consequence of their sensitivity to uncertainties in the initial state and which components have the potential to be skillfully predicted.

In considering our results and conclusions, it is important to remember that many of our results pertained to averages of nine models. Though predictability properties varied substantially quantitatively from one model to another, the qualitative contrasts that we have found between AMOC and heat content predictability are valid for eight of the nine models that we examined, and the contrasts in predictability between the least and most predictable state components are valid for all of the models. Also it is important to remember that we only considered averages across many start dates; the predictability of specific start dates can be very different.

One limitation of our study is that we found indications that certain components of AMOC are associated with highly predictable aspects of heat content variability, but we did not attempt to show that they caused that variability. The association that we found is suggestive, but further investigation into the mechanisms involved is needed to firmly establish a causal relationship. Many studies have shown how AMOC fluctuations can be responsible for heat content fluctuations. Especially noteworthy for our investigation is Tiedje et al. (2012)’s examination of links between AMOC and meridional heat transport predictability.

One final point that we wish to make is that on decadal time scales, the initial state is not the only factor that can provide predictability. The effects of changes in system forcing can also increase the information in forecast distributions. Branstator and Teng (2012) compared predictability resulting from the initial state to that from representative concentration pathway 4.5 (RCP4.5) forcing in 13 CMIP5 models for the annual-mean North Atlantic heat content. We have repeated the same comparison for AMOC in the nine models of the current study and find on average AMOC initial-value predictability remains greater than forced predictability for the RCP4.5 scenario for about the first dozen years of predictions, and the forced response of AMOC provides less information than the forced response of North Atlantic heat content throughout all stages of predictions.

## Acknowledgments

We appreciate helpful discussions with G. Danabasoglu and comments from three reviewers. Portions of this study were supported by the Office of Science (BER), U.S. Department of Energy (DE-SC0005355 and Cooperative Agreement DE-FC02-97ER62402), and by the National Science Foundation (OCE-1243013). We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table 1 of this paper) for producing and making available their model output. For CMIP, the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.

### APPENDIX

#### Estimating MSE using Regression Operators

The key to producing accurate estimates of MSE using the method described in section 2c is to find, for a given model, a state vector that accurately estimates values of *k* PCs of field *x* (either AMOC or heat content) at a future time via regression. To determine such a state vector, we have used a cross-validation procedure to evaluate vectors consisting of *k*_{A} AMOC PCs, *k*_{H} heat content PCs, and *k*_{S} salinity PCs. We restrict *k*_{A}, *k*_{H}, and *k*_{S} to multiples of 5 for computational efficiency. The cross validation consists of constructing regression operators that relate the candidate state vector at time *t* to *x* at *t* + *τ* based on 80% of a model’s control run and then calculating MSE for *x* when this operator is applied to the withheld 20%. Repeating this procedure five times, each time holding out a different 20% segment, we find a combined MSE for the candidate state vector. Table 1 gives the values of *k*_{A}, *k*_{H}, and *k*_{S} that produce the smallest values of MSE for *τ* = 10 yr when considering annual means. Very similar values of *k*_{A}, *k*_{H}, and *k*_{S} are found for *τ* = 5 yr, so we have used these same predictors at all ranges. MSE turns out to be a rather slowly varying function of *k*_{A}, *k*_{H}, and *k*_{S} near its minimum, so considering only multiples of 5 has not affected our results materially.

We have found that setting *k*_{A} = 20, *k*_{H} = 10, and *k*_{S} = 0 for predictions of AMOC and *k*_{A} = 10, *k*_{H} = 20, and *k*_{S} = 0 for predictions of heat content in all models produces results very similar to those reported here, but we have used the predictors in Table 1 because they do give the best predictions and thus provide the best estimates of predictability. Note that just because salinity can be ignored as a predictor without substantially changing our predictability estimates does not necessarily mean that it does not impact the evolution of the system. Indeed, numerous studies have noted lead–lag relationships between salinity and AMOC and heat content and suggested these relationships can help explain decadal and longer modes. It may simply be that for simultaneous relationships, salinity is not independent enough of AMOC and heat content to improve the accuracy of predictions that only use AMOC and heat content as predictors.

We have repeated many of our calculations for a heat content and salinity domain within 0°–60°N, as this region is sometimes used for studies of North Atlantic heat content, and found essentially the same properties and patterns reported when our standard boundaries of 20°–75°N are used. It does matter, however, if we make the domain too small. For example, if we restrict the domain to 45°–70°N so that the domain is confined to the subpolar gyre—which is where other studies have found North Atlantic heat content predictability is largest—then the resulting regression operators are not nearly as accurate. We have also considered sensitivity to depth but find even if our heat content domain extends to 1000 m, the results are affected only marginally.

When reporting values of MSE in our paper, we have not used values from cross validation because they overestimate the true MSE value; that is, the value that we would find if we had control runs of infinite length. Instead, as described in Branstator et al. (2012), we follow Lorenz (1977)’s suggestion and combine an MSE found from cross validation with an MSE found from applying the regression operators to the same data used in their construction. This also applies to MSE values for MPPs. For example, the MSE values in Fig. 3 are arrived at by calculating MPPs from 80% of a control run and then combining MSE for forecasts of these patterns in the same 80% with MSE for forecasts of these patterns in the remaining 20%. Again this is done five times. (The plotted MPP patterns in Figs. 6 and 7 are those that result when full control run datasets are used.) The only MSE results that do not involve some form of cross validation are those concerning the predictability of heat content when AMOC influences are isolated or excluded (Fig. 8). To simplify the procedure for these calculations, the same data were used to construct regression operators and to derive their error characteristics.

Branstator et al. (2012) demonstrated that when MSE is calculated in the way suggested by Lorenz (1977), the regression method gives estimates of heat content MSE that are very similar to those derived from the traditional perturbation ensemble method. When we carried out a similar comparison for estimates of AMOC MSE, we also found that regression performs well. In particular, we compared MSE for CCSM4 in Fig. 2 with MSE from two perturbation ensemble experiments that also use CCSM4. Each ensemble consists of 25 members and the two ensembles begin from two substantially different AMOC states. MSE for these two ensembles is plotted as dashed lines in Fig. A1. Its increase with forecast range is clearly similar to that of MSE from the regression method, which is reproduced on that plot. The regression values are slightly larger for most ranges. We do not know if this is because only two start dates have been considered or because the regression method, with it linear assumption, can be an underestimate of predictability.

## REFERENCES

*Geophys. Res. Lett.,*

**39,**L12703,

**118,**1087–1098, doi:10.1002/jgrc.20117.

**26,**4335–4356, doi:10.1175/JCLI-D-12-00081.1.

**40,**2359–2380, doi:10.1007/s00382-012-1466-1.

**40,**2381–2399, doi:10.1007/s00382-012-1516-8.

*Statistical Analysis in Climate Research.*Cambridge University Press, 484 pp.

**85,**228–243, doi:10.1016/j.dsr2.2012.07.015.

**26,**7167–7186, doi:10.1175/JCLI-D-12-00478.1.

## Footnotes

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

^{1}

In Branstator et al. (2012), we called MSE mean-square difference (MSD).

^{2}

As described shortly, the errors that contribute to *d*_{i} are of the form (**s**_{t+τ} − *L*^{τ}**s**_{t}) for a linear operator *L*^{τ}, initial condition **s**_{t} valid at time *t,* and verification **s**_{t+τ} at range *τ*. Based on Eq. (2) in Branstator et al. (2012), when using an EOF basis and provided the *i*th element of (**s**_{t+τ} − *L*^{τ}**s**_{t}) is independent of the *j*th element, relative entropy equals −ln(*d*_{1}) − ln(*d*_{2}) − … − ln(*d*_{k}) for distributions that aggregate the behavior of predictions starting from each state **s**_{t} on a long trajectory that covers a system’s attractor.

^{3}

This result could have been anticipated from Fig. 3. Since MSE is invariant to a linear transformation of variables, MSE for heat content associated with AMOC via a linear regression operator is the same as MSE for AMOC. Actually, curves for MPP1–2 and MPP1–5 in Figs. 3 and 8 are not exact matches. This is because, as explained in the appendix, we combined errors from dependent and independent datasets to arrive at MSE for Fig. 3, while to simplify the analysis we did not do this for Fig. 8.