## Abstract

Understanding the extent to which Atlantic sea surface temperatures (SSTs) are predictable is important due to the strong climate impacts of Atlantic SST on Atlantic hurricanes and temperature and precipitation over adjacent landmasses. However, models differ substantially on the degree of predictability of Atlantic SST and upper-ocean heat content (UOHC). In this work, a lower bound on predictability time scales for SST and UOHC in the North Atlantic is estimated purely from gridded ocean observations using a measure of the decorrelation time scale based on the local autocorrelation. Decorrelation time scales for both wintertime SST and UOHC are longest in the subpolar gyre, with maximum time scales of about 4–6 years. Wintertime SST and UOHC generally have similar decorrelation time scales, except in regions with very deep mixed layers, such as the Labrador Sea, where time scales for UOHC are much larger. Spatial variations in the wintertime climatological mixed layer depth explain 51%–73% (range for three datasets analyzed) of the regional variations in decorrelation time scales for UOHC and 26%–40% (range for three datasets analyzed) of the regional variations in decorrelation time scales for wintertime SST in the extratropical North Atlantic. These results suggest that to leading order decorrelation time scales for UOHC are determined by the thermal memory of the ocean.

## 1. Introduction

The climate of the next few decades, particularly on regional scales, will depend on natural climate variations in addition to anthropogenic forcing (e.g., Hawkins and Sutton 2009; Corti et al. 2015; DelSole 2017). Therefore, unlike centennial climate predictions, in which the climate state is largely a response to anthropogenic forcing, decadal climate predictions require an accurate specification of the initial climate state and an adequate representation of natural variability in addition to knowledge of external forcings. Given the inherently chaotic nature of the atmosphere, prospects for decadal climate prediction lie in the interaction of the atmosphere with slower parts of the climate system: the ocean, the cryosphere, and the geosphere. The impact of ocean initialization on decadal climate predictions, termed “initial-value predictability,” has been intensely studied over recent years; in particular phases 5 and 6 of the Coupled Model Intercomparison Project (CMIP5 and CMIP6) include decadal prediction experiments, which were performed using numerous participating models (Doblas-Reyes et al. 2013; Meehl et al. 2014; Boer et al. 2016; Eyring et al. 2016).

The two main approaches for studying initial-value predictability are 1) initialized predictions and 2) statistical measures of predictability. In initialized predictions, such as the CMIP5/CMIP6 decadal prediction experiments, a coupled model initialized with ocean observations is used to produce an ensemble of forecasts. Improvements in predictability are inferred by comparing the skill of the ocean initialized forecasts with the skill from uninitialized forecasts. In the initialized forecasts, predictive skill can arise from both ocean initial conditions and external forcing. In the absence of an initialized ocean, the skill comes entirely from external forcing. Note that the skill of a forecast relative to some reference (e.g., persistence or uninitialized simulations) is a quantity evaluated at a specific lead time. The predictability time scale is a metric of the time period over which a forecast has skill.

Initialized predictions suggest that the Atlantic is a region of enhanced predictability for sea surface temperature (SST; Smith et al. 2007; Keenlyside et al. 2008; Pohlmann et al. 2009; van Oldenborgh et al. 2012; Karspeck et al. 2015; Årthun et al. 2017) and upper-ocean heat content (UOHC; Yeager et al. 2012; Hermanson et al. 2014). Using decadal prediction experiments with the Community Climate System Model version 4 (CCSM4), Karspeck et al. (2015) show that the North Atlantic is the only region where ocean initialization leads to a statistically significant increase in skill of SST forecasts (at lead times of both 2–5 and 6–9 years) compared to an uninitialized model; a similar result was found by Doblas-Reyes et al. (2013) using the multimodel mean of CMIP5 models. A number of studies have suggested that predictability of North Atlantic SSTs and UOHC is related to variations in the Atlantic meridional overturning circulation (AMOC; Keenlyside et al. 2008; Pohlmann et al. 2009; Matei et al. 2012; Yeager et al. 2012; Hermanson et al. 2014). However, as detailed in Karspeck et al. (2015), successful decadal prediction in the North Atlantic may not rely on prediction of the AMOC, but rather on adequate initialization of the temperature and salinity fields (and hence geostrophic currents, including the AMOC).

Initialized predictions have also been used to show that the recent changes in UOHC in the subpolar gyre are predictable (Yeager et al. 2012; Hermanson et al. 2014). Some studies have suggested that these decadal changes in subpolar UOHC are related to variations in the AMOC (Robson et al. 2016) while others have focused on the importance of changes in the gyre circulation related to the overlying winds (e.g., Bersch 2002; Häkkinen and Rhines 2004; Sarafanov et al. 2008; Häkkinen et al. 2011; Piecuch et al. 2017), including shifts in the Gulf Stream path (Nigam et al. 2018; Ruiz-Barradas et al. 2018).

Statistical measures of initial value predictability can be estimated from long control runs of climate models with time-invariant external conditions (Branstator et al. 2012; DelSole et al. 2013; DelSole 2017).^{1} Since these methods do not require model experiments, these techniques facilitate the comparison of predictability properties among multiple models. Branstator et al. (2012) estimate initial value predictability for UOHC (defined as average temperature over the upper 300 m) in six atmosphere–ocean general circulation models; they find that there are large regional variations in predictability time scales in the North Atlantic, and the locations of highest predictability are different for each model. Predictability time scales in the subpolar North Atlantic range from less than 3 years to more than 15 years, while time scales in the subtropical North Atlantic range from less than 2 years to 10 years.

DelSole et al. (2013) use a similar multivariate regression model trained on climate model simulations, but they apply the model to observations (with external forcing removed) as well. They find that the North Atlantic basin is the region where the regression model has the highest skill for predicting SST, with significant skill for up to 6 years for both observations and the ensemble mean of preindustrial and twentieth-century simulations. However, the skill varies considerably with ensemble member: one ensemble member has no skill at any lead time and another has skill for up to 10 years. The spread in skill is primarily due to model differences, consistent with the results of Branstator et al. (2012).

While most predictability studies are based on models, a few studies have made statistical predictions based purely on observations (Alexander et al. 2008; Zanna 2012; Ho et al. 2013; Newman 2013; Huddart et al. 2017). One method for estimating observed predictability time scales for SST is to use a linear inverse model (LIM) derived from SST observations. Predictability for decadal SST variations from LIMs is generally 3–4 years over most of the North Atlantic domain, with predictability generally being larger in subpolar regions (Zanna et al. 2012; Newman 2013; Huddart et al. 2017). Furthermore, the skill of LIMs is in many cases found to be comparable to that of initialized models, suggesting that predictions from a LIM are a useful benchmark for interannual to decadal SST predictions (Newman 2013; Huddart et al. 2017). Similar results regarding predictability in the Atlantic basin have been found using other statistical methods, such as autoregressive models and constructed analog models (Ho et al. 2013). Statistical models can also be used to help understand dynamics behind predictability. For example, statistical models demonstrate that in the Atlantic basin coupling between the tropics and extratropics plays only a small role in setting predictability time scales (Huddart et al. 2017), in contrast to the Pacific basin where coupling between the tropics and extratropics plays a much larger role (Newman 2007; Alexander et al. 2008; Newman et al. 2016; Dias et al. 2019).

In this paper, we estimate predictability time scales of Atlantic SST and UOHC purely from ocean observations. A central question in this paper is whether spatial variations in predictability time scales can be explained by spatial variations in the ocean mixed layer depth (MLD). According to the null hypothesis of SST variability, the ocean mixed layer integrates stochastic atmospheric forcing, resulting in a red SST spectrum with a dominant time scale directly proportional to the MLD (Frankignoul and Hasselmann 1977). In this framework, the fact that the North Atlantic has longer predictability time scales than other basins may simply be explained by the deep MLDs in the North Atlantic. We consider a linear dependence of predictability time scales with MLD to be a simple null hypothesis to explain spatial variations in predictability time scales; we will evaluate this null hypothesis in our study.

In section 2 we introduce the diagnostics and datasets that we will use for SST and UOHC and define a measure of the decorrelation time scale, which we use to quantify predictability of SST and UOHC. We compute decorrelation time scales for UOHC and wintertime SST in section 3 and compare them quantitatively in section 4. In section 5, we determine to what extent spatial variations in decorrelation time scales can be explained by spatial variations in the wintertime climatological MLD. In section 6 we determine to what extent decaying versus oscillatory variations play a role in setting decorrelation time scales for UOHC and wintertime SST. Finally, in section 7 we conclude and discuss how our results relate to prior studies.

## 2. Methods

In this section we describe the diagnostics that we will use for SST and UOHC. Then, we introduce a measure of the decorrelation time scale, which we use to quantify predictability of SST and UOHC.

### a. Diagnostics for SST and upper-ocean heat content

Although SST is the ocean variable that most directly interacts with the atmosphere, predictability time scales for SST may underestimate ocean memory. SST memory is lost in the summertime due to the formation of the seasonal thermocline. In winter when the mixed layers deepen, anomalies isolated beneath the seasonal thermocline can become entrained in the mixed layer. This process, often referred to as the “reemergence mechanism” (Alexander and Deser 1995), can lead to enhanced predictability for wintertime SSTs (Namias and Born 1970; de Coëtlogon and Frankignoul 2003; Deser et al. 2003).

As our goal is to estimate predictability of SST and UOHC on interannual time scales, we define measures of SST and UOHC that minimize the effects of seasonal variations, specifically the loss of ocean memory in summer. For SST, we focus on wintertime SST (), where wintertime is defined as the January–March average SST. We define our measure of UOHC as the average temperature in the layer between the surface and a depth *D*:

where *θ* is potential temperature. We define *D* to be the wintertime climatological MLD, which is calculated by forming a monthly climatology of MLD and taking the January–March average of this quantity at each spatial location. An alternative choice is the maximum climatological MLD; our results are unchanged for this definition of *D*. The layer from the ocean surface to −*D* reflects the portion of the ocean that comes in contact with the atmosphere seasonally, and *H* covaries with SST on interannual time scales (Buckley et al. 2014, 2015). Averaging over this layer implicitly accounts for reemergence because anomalies that are isolated below the seasonal thermocline in summer are included in *H* for all months. We express the ocean heat content in units of temperature rather than Joules for convenience, as this choice allows us to more easily compare time series of SST and *H*. However, our results do not depend on this choice because 1) *D* is constant in time and 2) our predictability measure is based on the autocorrelation function, which does not depend on the magnitude of the anomalies.

Our analysis of predictability of SST and *H* will utilize six different gridded observational products. As our analysis requires long time series for SST and *H*, and data products have substantial uncertainties, which are generally larger for earlier periods, we choose several data products to help understand the uncertainty. [See Kennedy (2014) for a review of uncertainty in SST data products and Boyer et al. (2016) for a review of uncertainty in ocean temperature products.] We use monthly SST from Extended Reconstructed SST version 5 (ERSST v5; Smith et al. 2008), Hadley Center SST (HadISST; Rayner et al. 2003), and COBE SST2 (Hirahara et al. 2014). We calculate *H* from monthly fields of gridded temperature from Ishii (Ishii et al. 2006), EN4.2.1 (Good et al. 2013), and the ocean temperature data product from Cheng (Cheng et al. 2017). All analyses will be restricted from 1945 to present, and the specific periods used for each product are listed in Tables 1 and 2. Our results are unchanged if we instead restrict all products to the period of 1945–2012, which is common to all data products.

The wintertime climatological MLD *D* is an important parameter in our study, but the MLD is a diagnostic that cannot directly be measured. We use three different estimates of *D*, two based on the gridded data products (Ishii and EN4^{2}) and one from the Argo MLD climatology of Holte et al. (2017), henceforth Argo MLD. In all cases the MLD is based on a threshold method in which the MLD is given by the depth at which the density exceeds a near-surface reference density by an amount . For Ishii and EN4 (Figs. 1a,b), we use a fixed density threshold of applied to the monthly gridded observations (see Fig. 1 caption for details). The advantage of calculating MLDs from the gridded observational products is that *D* is consistent with the temperature field of the gridded product. However, because the gridded products are spatially and temporally smoothed fields, we must apply a large density threshold (Toyoda et al. 2015). For the Argo MLD product (Fig. 1c) we use the MLD based on the variable density threshold of 0.2°C density equivalent applied to individual Argo profiles (see figure caption for details). The advantage of the Argo MLD product is that it is based directly on Argo profiles (Argo 2019) and can be considered a best estimate of the MLD based on data. The disadvantages are that it is restricted to the Argo period (from 2000 to the present) and the MLD field has missing points because Argo floats cannot travel in shallow regions.

In all cases *D* is very deep in the Labrador Sea, Irminger Sea, and Iceland basin; relatively deep south of the Gulf Stream; and quite shallow elsewhere (Fig. 1). In the Labrador and Irminger Seas, *D* is significantly larger for Ishii and EN4 (maximum about 1500 m; Figs. 1a,b) than for the Argo MLD product (maximum around 1000 m; Fig. 1c), which is consistent with the fact that a variable density threshold results in a smaller density criteria and therefore shallower MLDs in cold regions. The value of *D* in the Argo MLD product (Fig. 1c) is also more patchy, which is the result of calculating MLD directly from in situ profiles rather than from a highly smoothed monthly temperature field (as in Ishii and EN4).

Gridded temperature fields and *D* are used to calculate *H* according to Eq. (1). For Ishii and EN4, *D* is based on the MLD calculated from the respective gridded product (e.g., Fig. 1a for Ishii and Fig. 1b for EN4); for Cheng we use *D* based on the Argo MLD climatology (Fig. 1c). Both annual average values of *H*, henceforth , and wintertime average values of *H*, henceforth , are calculated from the monthly fields.

Previous work using an ocean state estimate demonstrates that SST and *H* are highly correlated on interannual time scales (Buckley et al. 2014). We now show that a close correspondence between wintertime SST and *H* is also seen in the Ishii gridded product. The Ishii product is chosen for this analysis since Ishii includes COBE SST2 observations, leading to a consistent product that can be used to compare SST and *H*. Figure 2 shows the squared correlation (at each grid point) between and . The squared correlation between and is generally quite high; the mean value of the squared correlation over the North Atlantic domain is 0.83. Correlations are lowest in the Labrador Sea and east of Iceland. Other regions of relatively low correlations include the eastern subpolar gyre and isolated regions in the North Atlantic and South Atlantic subtropical gyres (e.g., the region south of the Grand Banks). Correlations are very high in the tropics. Low correlations between and generally occur where *D* is deep.

### b. Measure of predictability time scales

To estimate the predictability time scales for wintertime SST and *H*, we use a measure of the decorrelation time scales, as defined by DelSole (2001):

where is the autocorrelation (for SST or *H*) at discrete lag *k*; when considering yearly/wintertime average data, as is done here, *k* is the number of years. For an exponentially decaying autocorrelation function, *T*_{2} is equal to the familiar *e*-folding time scale. The time scale *T*_{2} is chosen in contrast to other possible measures of the decorrelation time scale (e.g., Schubert et al. 1992; DelSole 2001) because it accurately estimates predictability in the presence of oscillatory variability. As shown in DelSole (2001), *T*_{2} measures fundamental aspects of time series and arises naturally in statistical sampling theory.

In this paper, *T*_{2} will be interpreted as a lower bound on the predictability time scale. This interpretation is motivated by the fact that autocorrelations out to lag *T*_{2} tend to be significant, and therefore a linear regression model can make skillful predictions at least as long as *T*_{2}. It is a lower bound because it is based purely on the local autocorrelation function. Including additional predictors, such as nonlocal SST or variations in the AMOC, may improve skill (and thus correspond to longer predictability time scales). For instance, a LIM that captures covariability between multiple spatial patterns may have more skill at individual points than a regression model fitted to data at each point. Thus, our diagnostic *T*_{2} is best understood as a lower bound on predictability, one which may be a useful benchmark against which to evaluate dynamical models or more sophisticated statistical measures of predictability [such a comparison is made in the framework of atmospheric predictability in Schubert et al. (1992)]. For these reasons, we henceforth refer to *T*_{2} as the decorrelation time scale.

If we know the exact value of our autocorrelation function, we can integrate over all lags to find *T*_{2}, but for a finite length time series we can only calculate the sample autocorrelation function. For a sufficiently long time series, we can instead integrate from lag to , as long as we can chose a value of that is much longer than the time scale that we are trying to estimate but much shorter than the length of the time series. For the observational records that we are using, which are on the order of 70 years, it is not possible to find such a value of . To resolve this dilemma, we fit an autoregressive (AR) model to the time series of , , and . The AR parameters are then used to calculate the theoretical autocorrelation function, at discrete lags *k*. Our estimate of *T*_{2} is found by replacing with in Eq. (2). We try AR orders between 1 and 3 and find little sensitivity to AR order, particularly for AR orders greater than 1, so we choose an AR order of 2. Prior to computing the AR fits we remove a quadratic function (in time) from the time series of SST and *H* at each spatial location. As a result of both the limited data period available and the removal of the longest time scales (due to removal of a quadratic), the resulting decorrelation time scales may underestimate predictability of SST and *H*.

## 3. Decorrelation time scales for upper-ocean heat content and SST

In this section we calculate decorrelation time scales for *H* and wintertime SST based on gridded observational products. Decorrelation time scales for based on Ishii, EN4, and Cheng are shown in Fig. 3 (showing the extratropical North Atlantic) and Fig. S1 (showing the entire Atlantic domain; see the online supplemental material). The largest decorrelation time scales, around 4–6 years, are seen in the subpolar gyre, in particular in the Labrador Sea, Irminger Sea, and Iceland basin. Decorrelation time scales are around two years south of the Gulf Stream path, but elsewhere in the subtropical gyres they are about a year. The black contours in Fig. 3 and in Fig. S1 show contours of *D*, and it is apparent that the largest decorrelation time scales are found where *D* is deep.

The black dots in Fig. 3 and Fig. S1 show grid points where the decorrelation time scale is not significantly different from white noise on interannual time scales (see the appendix for details). In these regions there is no interannual predictability, although there may be predictability on shorter time scales (which could be examined by looking at monthly *H* anomalies). One can see that the interannual predictability of *H* is mostly restricted to the extratropics. Therefore, for the rest of our analysis we will consider only the extratropical North Atlantic (ETNA; north of 20°N).

Results shown in Fig. 3 and Fig. S1 are from yearly averages for *H* () but results using wintertime averages for are quite similar (Table 1). Yearly average values of *H* include temperature anomalies in the summertime seasonal thermocline, which are likely unrelated to the deeper (and presumably more predictable) temperature anomalies formed during the winter. Therefore, consideration of only wintertime *H* anomalies could potentially lead to increased decorrelation time scales, but in practice decorrelation time scales based on yearly average *H* tend to be somewhat larger (see Table 1), simply as a result of suppressing noise by averaging over more months. The results from the three different data products are quite consistent. If we regrid all estimates of *T*_{2} onto a common 1° grid and compute linear fits for all points in the ETNA between the various data products, we find *R*^{2} values between 0.65 and 0.80 and slopes between 0.7 and 1.3 (see Table 1).

Now we present results showing the decorrelation time scales for based on ERSST, HadISST, and COBE SST2 (Fig. 4). The largest decorrelation time scales, around 3–5 years, are seen in the subpolar gyre. Decorrelation time scales are around two years south of the Gulf Stream path, but elsewhere in the subtropical gyre they are about a year. The spatial pattern of decorrelation time scales is generally similar for and , except for in the Labrador Sea where decorrelation time scales for are much shorter than those for . The black dots in Fig. 4 show grid points where the decorrelation time scale is not significantly different from white noise on interannual time scales (see the appendix for details). In these regions there is no interannual SST predictability, although there may be predictability on shorter time scales (which could be examined by looking at monthly SST anomalies). One can see that the interannual predictability of is mostly restricted to the subpolar gyre.

The results from the three different data products are quite consistent. If we regrid all estimates for *T*_{2} onto a common 1° grid and compute linear fits for all points in the ETNA between the various data products, we find *R*^{2} values between 0.60 and 0.79 and slopes between 0.5 and 2.0 (see Table 2). The deviation of the slopes from one is due to COBE SST2 having significantly larger decorrelation time scales than the other two products.

## 4. Comparing decorrelation time scales for upper-ocean heat content and SST

We now explore the relationship between decorrelation time scales for SST and *H* using the Ishii data product. To present results that are most comparable, wintertime values are used for both SST and *H*, but recall that decorrelation time scales for are very similar (but slightly smaller) than those for (Table 1). Figure 5a shows a scatterplot comparing *T*_{2} for to *T*_{2} for for all points in the ETNA. A linear fit explains 61% of the variance, and the slope of the line is one, indicating that *T*_{2} for is comparable to *T*_{2} for for most points in the ETNA.

Now we explore the points where the linear relationship between *T*_{2} for and does not hold. The triangles (squares) in Fig. 5a are points that are more than two standard deviations above (below) the best-fit line. There are significantly more triangles than squares, indicating that there are many more outliers where *T*_{2} is larger for than for . The points in Fig. 5a are shaded (grayscale) to reflect their value of *D*, and it is apparent that points where *T*_{2} for is significantly larger than *T*_{2} for are generally located where *D* is deep. This is what one would expect, as averaging from the surface to a depth *D* is a more effective low-pass filter when *D* is deep.

Figure 5b shows the spatial distribution of the outliers from the linear fit. All of the outliers are in the subpolar gyre, and the outliers where *T*_{2} is larger for than for are concentrated in the Labrador Sea and in the eastern subpolar gyre off the coast of the British Isles. These points, particularly those in the Labrador Sea, align with the regions where the squared correlation between and is low (see Fig. 2). To understand why *T*_{2} is so much larger for than for in these regions, we choose two characteristic points, one in the Labrador Sea and one in the eastern subpolar gyre; these points are labeled on both the scatterplot (Fig. 5a) and the spatial map (Fig. 5b). The time series of and at these points are shown in Figs. 5c and 5d. It is apparent that depth averaging is an effective low-pass filter, which leads to the enhanced predictability for compared to in these regions.

Although differences between decorrelation time scales for and are expected in regions of deep *D*, the Labrador Sea remains a unique region in that it has such low predictability time scales for , generally less than two years, despite very persistent anomalies in *H*. There are several factors that may explain this.

Large spatial gradients in MLDs: Regions of deep convection are characterized by weak vertical gradients and deep MLDs, but there are strong lateral gradients in properties. These lateral gradients are baroclinically unstable, leading to a rapid growth of baroclinic eddies that act to restratify the near-surface layer (Send and Marshall 1995; Jones and Marshall 1997; Marshall and Schott 1999; Frajka-Williams et al. 2014). The input of buoyant waters from boundary currents adjacent to regions of deep convection is also important in this restratification process (Straneo 2006; Schmidt and Send 2007). Time scales for restratification in the Labrador Sea are thought to be rapid (from weeks to months). While air–sea heat fluxes are quite large over the Labrador Sea and may play a role in restratification, near surface restratification occurs more rapidly than can be accounted for by air–sea heat fluxes alone, indicating the importance of lateral mixing (Send and Marshall 1995; Jones and Marshall 1997). Wintertime SST anomalies are rapidly destroyed by restratification, but since the restratification is restricted to the near-surface layers (top few hundred meters at most), anomalies in remain mostly intact. Therefore, restratification processes may explain the much longer decorrelation time scales of compared to that of .

Large temporal variability in MLD: Figure 2 shows that low correlations between and generally align with regions where MLDs vary strongly from one winter to the next. Temporal variability of wintertime MLDs also may lead to decorrelation time scales for that are much larger than for . If in a given winter the MLD is shallower than

*D*,*H*will include thermal anomalies from the previous winter, stored below this year’s wintertime mixed layer. This effect will result in decorrelation time scales for that overestimate the true upper ocean predictability (which would be seen if we allowed*D*to vary from one year to the next) and are significantly larger than decorrelation time scales for . We do not expect any comparable impact on decorrelation time scales for during years when the wintertime MLD is deeper than*D*, as in this case anomalies will still reflect thermal anomalies in the mixed layer.

In summary, generally decorrelation time scales for wintertime SST and *H* are similar, but in the Labrador Sea decorrelation time scales for SST are much shorter. This is likely due to large spatial and temporal variations in wintertime MLDs and modification of SST by rapid near-surface processes. However, the gridded observational datasets utilized here lack the temporal and spatial resolution to fully understand the dynamics operating in the Labrador Sea.

## 5. Can mixed layer depth variations explain spatial variations in decorrelation time scales?

In this section we explore our null hypothesis that spatial variations in decorrelation time scales are related to variations in the wintertime climatological MLD *D*. We first consider the linear relationship between the decorrelation time scale for *H* and *D* and then consider the relationship between the decorrelation time scale for wintertime SST and *D*. Then, we use the slopes of the linear relationships to estimate the damping parameter.

### a. Relationship between decorrelation time scale for H and D

Figures 6a–c show scatterplots between *D* and *T*_{2} for all points in the ETNA based on from the Ishii, EN4, and Cheng data products. The red lines show a linear fit between *T*_{2} and *D*; linear fits demonstrate that between 51% and 74% of the spatial variations in *T*_{2} can be explained by spatial variations of *D*. The strong linear relationship between *D* and *T*_{2} matches the expectation from stochastic climate models in which the ocean simply thermodynamically integrates stochastic atmospheric forcing (Frankignoul and Hasselmann 1977). The slopes of the lines indicate that an increase in *D* by 1 km results in an increase in the decorrelation time scale of 3.1–4.2 years. The precise values of *R*^{2} and the slope of the linear regression are somewhat sensitive to the MLD product. If the calculations for Ishii and EN4 are repeated using *D* from the Argo MLD climatology (Holte et al. 2017), *R*^{2} decreases (Figs. 7a,b), likely because the Argo MLD product is more patchy. Furthermore, since *D* from the Argo MLD product is shallower than *D* calculated from Ishii and EN4, the slope of the line increases (Figs. 7a,b) and becomes more similar to that from Cheng (which uses *D* from the Argo MLD climatology).

In calculating the linear fits, all points in the ETNA are included, regardless of whether or not the values of *T*_{2} can be distinguished from white noise on interannual time scales (dash-dotted gray lines in Figs. 6a–c, 7a, and 7b show values of *T*_{2} above which *T*_{2} can be distinguished from white noise on interannual time scales). All points are included since points where *D* is shallow and there is no interannual predictability are part of the observed relationship between *T*_{2} and *D*; in fact, if there were places with large *D* where there was no interannual predictability, these points would not match the expectation from the null hypothesis and would require explanation.

Now we explore the points that do not follow the scaling of *T*_{2} with *D*. The green (black) points in Figs. 6a–c, 7a, and 7b indicate outliers that are more than two standard deviations above (below) the best-fit line. There are significantly more outliers above the line than below the line, indicating that there are more points with higher than expected predictability than lower than expected predictability. Figures 6d–f show the spatial distribution of the outliers; the spatial distribution of the outliers is quite similar when the calculations for Ishii and EN4 are repeated using *D* from the Argo MLD climatology (Figs. 7c,d). The outliers are mostly confined to the subpolar gyre, except for a few points with higher than expected predictability in the slope waters west of the Gulf Stream. Points with higher than expected predictability are almost all confined to shallow bathymetry: the North American continental shelf, the Grand Banks, and the Reykjanes Ridge, a concurrence that needs further investigation, particularly in the context of previous work suggesting that interactions of ocean currents with bathymetry are an important process in generating upper ocean temperature variability (e.g., Nigam et al. 2018). There are very few points with lower than expected predictability, and their locations are not robust between the different data products, so it is somewhat difficult to determine their origin. There does seem to be some tendency for points with lower than expected predictability to occur in regions with large spatial and temporal gradients in *D*. This may be due to rapid restratification as a result of lateral mixing by eddies, which is expected to lead to decorrelation time scales shorter than those predicted by local values of *D* (Send and Marshall 1995; Jones and Marshall 1997; Marshall and Schott 1999; Frajka-Williams et al. 2014).

### b. Relationship between decorrelation time scale for wintertime SST and D

We now consider the relationship between variations in *D* and *T*_{2} for wintertime SST; in considering this relationship we use *D* based on the Argo MLD climatology for all three SST data products. Figures 8a–c show a scatterplot between *D* and *T*_{2} for for all points in the ETNA. (As before fits are for all points in the ETNA; gray dash-dotted lines show the value of *T*_{2} above which *T*_{2} can be distinguished from white noise on interannual time scales.) The linear fit between *D* and *T*_{2} indicates that an increase in *D* by 1 km results in an increase in the decorrelation time scale of 1.3–2.5 years. As expected, the slope of the line is significantly shallower than that for because predictability time scales for are generally less than those for . We find that 26%–40% of the spatial variations of *T*_{2} for can be explained by spatial variations in *D*. There are several reasons for the smaller percent variance of *T*_{2} explained by *D* for compared to :

Predictability time scales for

*H*are expected to be proportional to*D*according to stochastic climate models (e.g., Frankignoul and Hasselmann 1977; Deser et al. 2003; de Coëtlogon and Frankignoul 2003), but there is not the same expectation for SST, which is also influenced by surface processes, as discussed in section 4.SST is a noisier field than

*H*, as depth averaging acts as a low-pass filter (see section 4). Additionally, for*H*we are able to further suppress noise by averaging over a year rather than just averaging over the wintertime months.For the fit between

*T*_{2}and*D*for the SST analysis, we use*D*from the Argo MLD climatology, which is relatively patchy. Our analysis with Ishii and EN4 shows that the portion of the variance of*T*_{2}explained by*D*decreases somewhat when*D*from the Argo MLD product is used compared to the smoother values of*D*based on the gridded data products (cf. Figs. 6a,b and 7a,b).In the Labrador Sea, predictability for is quite small, despite very large values of

*D*; in contrast predictability for is quite large, as discussed in section 4. If the Labrador Sea is excluded from our fit between*T*_{2}and*D*, the percent variance explained increases by 6%–10% and the slopes increase by about 0.5 yr km^{−1}(due to the removal of low values of*T*_{2}with large values of*D*).

The points above suggest that decorrelation time scales for exhibit a weaker scaling with *D* than the time scales for *H* due to both complex surface dynamics (particularly in the Labrador Sea) and noise (in both the SST fields and the MLD field).

Now we explore the points that do not follow the scaling of decorrelation time scale with *D*. The green (black) points in Figs. 8a–c indicate outliers that are more than two standard deviations above (below) the best fit line. There are significantly more outliers above the line than below the line, indicating that there are more points with higher than expected predictability than lower than expected predictability. Figures 8d–f show the spatial distribution of the outliers. The outliers almost all have higher than expected predictability and are located in the region with large decorrelation time scales south of Greenland. There is a small region of points with lower than expected predictability in the Labrador Sea, consistent with the fact that while *D* is deep and decorrelation time scales for and are long, decorrelation time scales for are short in this region (see also Fig. 5).

### c. The damping parameter

The slopes of the fitted lines between *T*_{2} (for and ) and *D* can be written in the units of a damping parameter *α*. From Frankignoul et al. (1998), a characteristic time scale can be related to *α* as

where is a reference density and *C*_{p} is the heat capacity. Inverting this relationship and setting , where *m* is the slope of our fit between *T*_{2} and *D*, we can make a gross estimate of *α*. The values of *α* calculated by this method can be understood as the best-fit damping parameter, under the assumption that the damping parameter is constant over the entire domain. Using this relation results in values of *α* ranging from 33 to 44 W m^{−2} K^{−1} for (Figs. 6a–c). If the calculations for Ishii and EN4 are repeated using *D* from the Argo MLD climatology, *α* for Ishii and EN4 decreases to 36–37 W m^{−2} K^{−1} (Figs. 7a,b), values more similar to that of Cheng (*α* = 33 W m^{−2} K^{−1}), which uses *D* from the Argo MLD climatology. Values of *α* for are significantly larger, ranging from 54 to 109 W m^{−2} K^{−1} (Figs. 8a–c).

Expressing the slope in the units of *α* is not intended to suggest that calculating *α* based on the fits between *T*_{2} and *D* is an accurate method for calculating the damping parameter; instead the goal is merely to express *m* in units that are familiar to the reader. Namely, the canonical value of the damping by air–sea heat fluxes derived by Frankignoul et al. (1998) is *α*_{a} = 20 W m^{−2} K^{−1}, although values are somewhat higher in the subpolar gyre and vary seasonally (values are larger in the wintertime). Differences between our calculations and that of Frankignoul et al. (1998) include:

Frankignoul et al. (1998) estimate only atmospheric damping

*α*_{a}, whereas we calculate the total damping rate by the atmosphere and ocean. Some studies suggest that oceanic damping is as large or larger than atmospheric damping (Hall and Manabe 1997), which explains why our calculated values of*α*are generally larger than 20 W m^{−2}K^{−1}.Our value of

*α*depends linearly on the chosen value of*D.*We chose*D*as the wintertime climatological MLD, a quantity that is quite large in the subpolar gyre, which also contributes to our relatively large values of*α.*Frankignoul et al. (1998) consider damping of monthly SST anomalies, whereas we consider wintertime SST and

*H.*As a result of the formation of the seasonal thermocline, and anomalies are damped to the atmosphere more slowly since the atmospheric damping is essentially zero during the summertime when anomalies are isolated below the seasonal thermocline. This process is expected to lead to a value of*α*_{a}smaller than that of Frankignoul et al. (1998), but this effect is relatively modest since damping is smaller in the summer (Frankignoul and Kestenare 2002).

The above differences suggest that our value of *α* should not match that of Frankignoul et al. (1998) and that it should be larger than 20 W m^{−2} K^{−1}, as indeed it is, particularly for . However, the sensitivity of our calculated value of *α* to the choice of *D* as well as the particular data products used (*α* for varies by almost a factor of 2 between the three different data products) suggests that *α* cannot be tightly constrained from calculations of decorrelation time scales.

## 6. Are there oscillatory variations in SST or upper-ocean heat content?

The value of *T*_{2} measures the decorrelation time scale of SST/*H*, but it does not distinguish between different types of variability that may be leading to the measured value of *T*_{2}. For example, does *T*_{2} simply indicate damping with an *e*-folding time scale given by *T*_{2} or does oscillatory behavior play a role? The goal of this section is to determine the extent to which decaying versus oscillatory behavior of the autocorrelation function is important in setting decorrelation time scales for and . For an AR2 model, this question can be answered analytically based on the AR coefficients (von Storch and Zwiers 2002). Figure 9 is a scatterplot showing the AR coefficients, and , from the AR2 fit to time series of (Fig. 9a) and (Fig. 9c) for all the points in the ETNA. Points above the black parabola have exponentially decaying autocorrelation functions; the further from the parabola, the more strongly the autocorrelation function decays. The points below the parabola have (decaying) oscillatory behavior; the farther from the parabola, the stronger the oscillations are. One can see that there are numerous points in the ETNA for which time series of and are in the regime with oscillatory behavior. Two points with the most oscillatory behavior for are chosen: P3 from the Ishii product is located south of the Gulf Stream (labeled on Fig. 3a), and P4 from the EN4 product is located in the central subpolar gyre (labeled on Fig. 3b). The autocorrelation function of these most extreme points shows only modest oscillatory behavior, with an autocorrelation of about −0.25 after about 3 years (Fig. 9b). For there are very few points that are a significant distance below the parabola, and all of these points lie in regions of seasonal sea ice: P5 from ERSST is located in Denmark Strait (labeled on Fig. 4a) and P6 from HadISST is located off the coast of Labrador (labeled on Fig. 4b). The autocorrelation function for these points exhibits very weak oscillatory behavior, with an autocorrelation of about −0.2 after about 2–3 years (Fig. 9d). Therefore, we conclude that on the time scales resolved by our observational products, oscillatory behavior does not play a significant role for variability in either or . Instead, *T*_{2} is mainly controlled by the decay of SST/*H* anomalies, and is expected to be quite similar to the familiar *e*-folding time scale.

## 7. Conclusions

In this work, a lower bound for predictability time scales for SST and UOHC is estimated purely from ocean observations, and the degree to which spatial variations in the MLD can explain predictability time scales is quantified. Two diagnostics are used for SST/UOHC: wintertime SST and the temperature averaged over the layer from the surface to the wintertime climatological MLD (integral denoted as *H*); both diagnostics quantify the ocean memory on interannual time scales and are designed to minimize the effects of seasonal variations, specifically the loss of SST memory in the summertime due to the formation of the seasonal thermocline. Gridded observational products are used to estimate predictability for wintertime SST and *H* in the North Atlantic using a measure of the decorrelation time scale based on the local autocorrelation function (DelSole 2001). Decorrelation time scales for both wintertime SST and *H* are longest in the subpolar gyre, with maximum decorrelation time scales of 4–6 years. Decorrelation time scales for wintertime SST and *H* are generally similar, except in regions with very deep mixed layers, such as the Labrador Sea, where time scales for *H* are much larger. Oscillatory variations are found to play little role in setting predictability time scales for wintertime SST and *H*; instead decorrelation time scales are controlled mainly by the decay rate of anomalies.

Our decorrelation time scales are generally similar to predictability time scales calculated from observations using other statistical techniques, such as linear inverse models (LIMs). Predictability for decadal SST variations from LIMs is generally 3–4 years over most of the North Atlantic domain, with predictability generally being larger in subpolar regions (Zanna et al. 2012; Newman 2013; Huddart et al. 2017). The similarity of our results to those from LIMs suggests that our consideration of a predictability measure based on the local autocorrelation (e.g., the decorrelation time scale) is likely not a significant limitation in the Atlantic basin. This is further supported by results from LIMs that suggest nonlocal SST variability has little influence on SST predictions in a given region of the Atlantic (Huddart et al. 2017). Our conclusion that oscillatory variations play little role in predictability of wintertime SST and *H* is supported by results from statistical models that suggest predictability in the Atlantic, while higher than most other basins, only slightly exceeds that of a persistence forecast (DelSole et al. 2013).

While our results generally agree with other studies on the magnitude of predictability time scales in the North Atlantic, there are some differences in the detailed spatial patterns. For example, Zanna et al. (2012) find that forecast skill compared to climatology is quite low in the subpolar region south of Greenland, but we find that this region has relatively high values of *T*_{2} (greater than 2 years). This result may be due to the analysis of wintertime SST (in our study) versus annual-average SST [in Zanna et al. (2012)], the different time periods studied [Zanna et al. (2012) start analysis in 1870 whereas we start in 1945], or the truncation of the SST variability using empirical orthogonal functions in the LIM. The spatial patterns of predictability time scales for SST in Huddart et al. (2017) are more similar to ours, but this study uses decadal filtering to construct the LIM, so results are not directly comparable to our study. It would be interesting to create a LIM to predict wintertime SST or wintertime/annual average *H* (with no additional filtering) and see how these results compare to our decorrelation time scales.

Spatial variations in the wintertime climatological MLD explain 51%–74% of the regional variations in decorrelation time scales for UOHC and 26%–40% of the regional variations in decorrelation time scales for SST in the extratropical North Atlantic. These results suggest that to leading order decorrelation time scales for *H* are determined by the thermal memory of the ocean. Interestingly, decorrelation time scales for UOHC that are significantly larger than expected based on the local mixed layer depth tend to be located in regions with shallow bathymetry. Decorrelation time scales for wintertime SST have a weaker relationship with the MLD, as a result of both noise and surface processes. In particular, in the Labrador Sea, decorrelation time scales for wintertime SST are much shorter than expected from the local MLD.

The suggestion that a large portion of the spatial variations in predictability time scales can be explained by variations in the wintertime climatological MLD raises the question of whether SST and UOHC variability in the ocean can simply be explained by stochastic atmospheric forcing. Addressing this question is the subject of ongoing work. The fact that the damping parameter calculated from our decorrelation time scales is significantly larger than the canonical atmospheric damping parameter (e.g., Frankignoul et al. 1998) suggests that oceanic damping is important, as found by Hall and Manabe (1997) and argued by Zhang (2017). Where dynamical oceanic process (e.g., ocean currents) are important in creating SST and UOHC anomalies remains to be explored in detail, but both our results and other studies suggest oceanic processes are more important in the subpolar gyre (e.g., Buckley et al. 2014, 2015; Karspeck et al. 2015; Robson et al. 2014; Menary et al. 2015; O’Reilly et al. 2016; Delworth et al. 2017; Piecuch et al. 2017; Zhang 2017).

Our observationally based results do not indicate the same degree of variation in the magnitude and spatial patterns of predictability time scales that are typically seen in CMIP5 models. Whether this is due to model deficiencies or may be related to the limited sampling period of ocean observations is currently not clear. Our results suggest that the predictability diagnostics that have been used for CMIP5 models, particularly temperature averaged over depth layers that are constant in space [e.g., top 300 m used in Branstator et al. (2012)] may not be the best measures of ocean memory. Calculating predictability time scales for wintertime SST and *H* for CMIP5/CMIP6 control integrations and historical runs is a subject of ongoing study.

One of the most significant caveats of our study is that we only have about 70 years of observations and remove a quadratic function (in time) from our data prior to analysis. Therefore, we can only resolve variations up to decadal time scales. If longer variations play a substantial role in ocean predictability, then our results are surely an underestimate of ocean predictability.

## Acknowledgments

We thank Laure Zanna, Sumat Nigam, and one anonymous reviewer for insightful comments. MB was funded by the NOAA ESS Program (NA16OAR4310167) and NASA Physical Oceanography Program (NNX17AH49G). TD was supported by the National Oceanic and Atmospheric Administration (NA16OAR4310175, NA14OAR4310160), National Science Foundation (AGS-1338427), and National Aeronautics and Space Administration (NNX14AM19G). MSL gratefully acknowledges NOAA support from Grant NA16OAR4310168, and LL acknowledges NSF support from Grant NSF-OCE-12-59102.

### APPENDIX

#### Determining the Level of *T*_{2} That Can Be Distinguished from White Noise on Interannual Time Scales

In our analysis, we use yearly time resolution observations (wintertime average SST and yearly and wintertime average *H*), and thus we are only able to measure interannual predictability. The point of this appendix is to evaluate what value of *T*_{2} is required in order to demonstrate statistically significant interannual predictability. From a theoretical perspective, the minimum value of *T*_{2} is one, and this value will occur when the autocorrelation decreases to zero at a lag of one year. However, due to finite sampling it is possible to get larger values of *T*_{2} even for white noise time series, which have no interannual predictability. Thus, we generate probability distribution functions for *T*_{2} based on white noise input (for time series of SST and *H*) and compute thresholds for *T*_{2}. Where *T*_{2} calculated from the data exceeds the threshold from white noise, we conclude that there is interannual predictability. Where *T*_{2} does not exceed the threshold, the variability cannot be distinguished from white noise on interannual time scales.

We first generate a time series of white noise with the same length as the number of years of the observational dataset of SST or *H*. The specific lengths of the time series of SST and *H* for each product are given in Tables 1 and 2, but in all cases the time series are about 70 years. Then, to be consistent with the approach taken for the analysis of the data, a quadratic function (in time) is removed from the time series and AR fits are computed (using AR order of 2). We repeat this experiment 10 000 times in order to get distributions for the AR parameters resulting from white noise. Figure A1a shows a histogram of the AR2 coefficients, and , based on the AR2 fit to the white noise time series. The gray line shows the 90% confidence ellipsoid for the AR parameters. Then, *T*_{2} is computed from the AR parameters; the distribution of *T*_{2} for an AR2 model is shown in Fig. A1b. We chose our threshold for *T*_{2} to be the 90% quantile of the distribution; when *T*_{2} exceeds this threshold we state that it can be distinguished from white noise.

## REFERENCES

## Footnotes

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-18-0509.s1.

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

DelSole and Tippett (2018) propose a generalized framework for predictability valid when there are variations in external forcing.

^{2}

As the Cheng product does not include salinity, we cannot calculate potential density from this product and thus cannot calculate MLDs.