Assessing External and Internal Sources of Atlantic Multidecadal Variability Using Models, Proxy Data, and Early Instrumental Indices

Atlantic multidecadal variability (AMV) of sea surface temperature exhibits an important inﬂuence on the climate of surrounding continents. It remains unclear, however, the extent to which AMV is due to internal climate variability (e.g., ocean circulation variability) or changes in external forcing (e.g., volcanic/anthro- pogenic aerosols or greenhouse gases). Here, the sources of AMV are examined over a 340-yr period using proxy indices, instrumental data, and output from the Last Millennium Ensemble (LME) simulation. The proxy AMV closely follows the accumulated atmospheric forcing from the instrumental North Atlantic Oscillation (NAO) reconstruction ( r 5 0.65)—an ‘‘internal’’ source of AMV. This result provides strong observational evidence that much of the AMV is generated through the oceanic response to atmospheric circulationforcing,as previouslydemonstratedin targetedmodelingstudies.In theLMEthereis a substantial externally forced AMV component, which exhibits a modest but signiﬁcant correlation with the proxy AMV (i.e., r 5 0.37), implying that at least 13% of the AMV is externally forced. In the LME simulations, however, the AMV response to accumulated NAO forcing is weaker than in the proxy/observational datasets. This weak response is possibly related to the decadal NAO variability, which is substantially weaker in the LME than in observations. The externally forced component in the proxy AMV is also related to the accumulated NAO forcing, unlike in the LME. This indicates that the external forcing is likely inﬂuencing the AMV through different mechanistic pathways: via changes in radiative forcing in the LME and via changes in atmospheric circulation in the observational/proxy record.


Introduction
In situ observations of North Atlantic sea surface temperatures (SSTs) have been made from the midnineteenth century to the present day. In this ;150-yr observational period, North Atlantic SSTs exhibit significant variability on time scales of decades and longer; this is often referred to as the Atlantic multidecadal variability (AMV) or Atlantic multidecadal oscillation (e.g., Kushnir 1994;Enfield et al. 2001;Deser et al. 2010). The AMV has been shown to influence decadal climate in the surrounding continental regions, including North America (McCabe et al. 2004;Sutton and Hodson 2005;Knight et al. 2006;Ting et al. 2009;Hu and Feng 2012;Ruprich-Robert et al. 2018), Europe (Sutton and Hodson 2005;Knight et al. 2006;Sutton and Dong 2012;O'Reilly et al. 2017;Ruprich-Robert et al. 2017;Qasmi et al. 2017;Ghosh et al. 2017;Årthun et al. 2018), and the Sahel (Palmer 1986;Folland et al. 1986;Zhang and Delworth 2006;Martin et al. 2014;Martin and Thorncroft 2014). The AMV has also been linked to decadal variability in hurricane frequency (e.g., Zhang and Delworth 2006;Yan et al. 2017) and Arctic sea ice (e.g., Zhang 2015). The influence the AMV exhibits on the surrounding climate means it is an important source of potential climate predictability and future decadal variability.
Denotes content that is immediately available upon publication as open access.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-19-0177.s1. The AMV has been linked to variability in ocean circulation and variability in external forcing, as well as other generation mechanisms. Many studies have shown that the AMV in general circulation models is associated with changes in ocean circulation-and more specifically, changes in ocean heat flux convergence (e.g., Delworth and Mann 2000;Knight et al. 2005). This has been associated with changes in the Atlantic meridional overturning circulation (AMOC; e.g., Danabasoglu 2008;Zhang and Wang 2013;Wills et al. 2019), but the AMV has also been attributed to changes in horizontal gyre circulations, particularly in the extratropical part of the AMV (Williams et al. 2014(Williams et al. , 2015Piecuch et al. 2017). The study by Clement et al. (2015) showed that similar North Atlantic SST variability is exhibited in climate models with a slab ocean model that does not feature ocean circulation variability. However, other studies have shown that ocean circulation is an important factor in reproducing the observed characteristics in the extratropics; the decadal SSTs in the extratropical North Atlantic are correlated with upward turbulent heat fluxes in observations and fully coupled models but not in slab ocean models (Gulev et al. 2013;O'Reilly et al. 2016;Zhang et al. 2016;Drews and Greatbatch 2016). Moreover, the observed influence of the AMV on surrounding continental regions, and particularly over Europe, is not reproduced in models without ocean circulation variability (O'Reilly et al. 2016;Sun et al. 2019).
Contrary to studies citing the importance of ocean circulation variability, several studies have demonstrated that changes in external forcing could be responsible for the observed AMV. Booth et al. (2012) demonstrated that, in a particular climate model, the indirect radiative effect of anthropogenic aerosols and their effect on downward shortwave radiation was responsible for much of the AMV over the twentieth century. However,  demonstrated that this particular simulation failed to capture some key features associated with the observed AMV, such as the upper-ocean heat content anomalies and the covariation of surface salinity anomalies (also see . Nonetheless, models in phase 5 of the Coupled Model Intercomparison Project (CMIP5) historical ensemble, the same model generation as the single model study of Booth et al., do have a substantial externally forced component of the AMV (Murphy et al. 2017). In the CESM ''Last Millennium Ensemble'' simulations, Bellomo et al. (2018) showed that much of the externally forced AMV over the CMIP5 historical period is from anthropogenic aerosols, but with substantial contributions from decadal variations in greenhouse gas concentrations and volcanic aerosol emissions. However, the influence of external forcing does not necessarily eliminate the role of ocean circulation in the AMV and indeed signatures of both external forcing and AMOC variability were found in the study of Tandon and Kushner (2015).
In contrast to studies demonstrating that external forcing has driven the observed AMV, there is substantial evidence that the AMV is an ''internal'' mode of the climate system. For example, in studies by Li et al. (2013) and McCarthy et al. (2015) it has been demonstrated that the observed AMV over the twentieth century closely follows the accumulated signal of the North Atlantic Oscillation (NAO). The ocean circulation changes as a result of the NAO forcing on decadal time scales, influencing heat content anomalies in the North Atlantic. In a recent modeling study, Delworth et al. (2017) showed that prescribing extratropical turbulent heat flux anomalies associated with decadal NAO variability was sufficient to generate the recent observed AMV changes, through changes in the overturning circulation. Kim et al. (2018b) also show evidence that ocean dynamics have played a key role in recent AMV behavior. Similarly, Robson et al. (2012) found that surface turbulent heat fluxes were responsible for the recent period of rapid North Atlantic warming in the mid-1990s. In a preindustrial control simulation, Sun et al. (2015) showed that the accumulated NAO leads the AMV and then there is a subsequent feedback that acts to generate substantial multidecadal variability. The NAO forcing of the AMV is present in most coupled models in the CMIP5 ensemble, in both historical and longer preindustrial control simulations, though the relationship is generally weaker than that seen in observations (O'Reilly and Zanna 2018;Peings et al. 2016). Since this mechanism is seen in preindustrial simulations (with constant external radiative forcing) the AMV driven by atmospheric forcing can be considered internal to the climate system. It is likely that intrinsic oceanic variability, which is not driven by accumulated NAO forcing, makes a substantial contribution to the oceanic circulation and thereby the AMV; however, it seems that this is lower in the extratropical North Atlantic where the atmospheric-forced variability dominates (Leroux et al. 2018). We refer to the AMV forced by anomalous atmospheric circulation as the ''internal'' AMV throughout this study.
A separation of the external forced AMV from the internal AMV is difficult to calculate using observations, primarily due to the relatively short observational period. To better understand the AMV and its behavior in the past, several studies have examined the presence of AMV using proxies. Gray et al. (2004) used a sparse tree-ring dataset to produce an AMV index from the late 1500s to the present that exhibited distinct multidecadal variability in the reconstructed period. More recently, Wang et al. (2017a) produced an annually resolved AMV proxy, which revealed that some of the multidecadal variability in the North Atlantic can be attributed to the response to volcanic eruptions, similar to the results of the modeling study of Otterå et al. (2010). However, a large amount of the AMV in the Wang et al. reconstruction cannot be attributed to external forcing.
In this study, we use proxy data to investigate the internally generated AMV and the externally forced AMV components. To do so, we analyze several proxy AMV reconstructions in combination with a reconstruction of sea level pressure (SLP) over the past ;350 years. The sea level pressure reconstruction is produced using early instrumental observations over Europe (Luterbacher et al. 2001). We use the proxy AMV datasets and reconstructed SLP to examine the internal AMV generated by anomalous atmospheric forcing. In addition, we analyze a long ensemble simulation over the same period to estimate the external forcing of the AMV in the proxy datasets. The datasets and methods are described in section 2. In section 3 we demonstrate that there is a large fraction of the proxy AMV that appears to be internally driven by atmospheric circulation anomalies; we also find evidence of an externally forced fraction of the proxy AMV. Conclusions follow in section 4.

a. Proxy AMV indices
To analyze the AMV over an extended period we utilize three available proxy-based indices. The AMV in all these indices is defined as the SST averaged over the entire North Atlantic Ocean (08-708N). The first, by Wang et al. (2017a), was produced using 46 annualresolution terrestrial proxies including tree rings and ice cores that exhibit positive correlations with local temperatures and/or hydroclimate, from the Past Global Changes 2000 yr (PAGES 2k) Consortium database (Ahmed et al. 2013). Wang et al. used these proxy records to reconstruct the (nondetrended) AMV for the extended boreal summer season [May-September (MJJAS)], as many proxies are particularly sensitive to summer variability and the AMV exhibits a particularly large signal in summer temperatures (e.g., O'Reilly et al. 2017), though this is very similar to the annual AMV over the observational period. The AMV was reconstructed using a nested principal component regression method and was demonstrated to be well validated using a sliding window approach. It was also shown to be insensitive to the various changes to the proxy composition and nesting method. The proxy reconstruction method was also validated using pseudoproxy climate experiments, where artificial proxies in a climate model are produced at the locations of the real proxies and are used to reconstruct the model AMV (e.g., Smerdon 2012). The Wang et al. proxy AMV index will be referred to as AMV-Wang herein.
The second proxy AMV index we analyze is from the study of Mann et al. (2009). In their study, Mann et al. used climate field reconstruction to produce a dataset of annual mean gridded temperatures. This temperature reconstruction is actually only resolved on decadal time scales, from which areal averages are taken to produce an AMV index. This dataset was produced using a set of proxies with global coverage, some of which were also used to produce AMV-Wang. However, the decadal resolution of this dataset reduces the degrees of freedom available over the instrumental period, which therefore reduces the confidence levels of the calibration and verification. In spite of these possible limitations, we will analyze the Mann et al. AMV index (AMV-Mann) since it provides an additional estimate over the proxy period.
The final proxy AMV index we will analyze is from the study of Gray et al. (2004). This index was produced using a sparse dataset of only 12 tree rings, with annual resolution, which are completely independent of the proxy datasets used to produce AMV-Wang and AMV-Mann. Gray et al. used a principal component reconstruction method to produce their AMV index, targeting detrended annual mean temperatures over the North Atlantic. However, the proxies were selected from a large tree-ring database based on the magnitude of their correlation (i.e., jrj . 0.25) with the observed AMV index over the instrumental period, rather than local temperatures as in the AMV-Wang and AMV-Mann reconstructions. Selecting the tree-ring proxies to include in the principle component reconstruction based on the observed AMV in this way leads to a selection bias that is likely to substantially overestimate the accuracy of the reconstruction (Miller 1984). For example, in the Gray et al. reconstruction, tree rings that are uncorrelated with local temperatures or precipitation but by chance have a reasonable correlation with the AMV would be included in the reconstruction. Despite the flaws in the proxy selection criteria, we nonetheless include the Gray et al. AMV index in our analysis (AMV-Gray). AMV-Gray is provided in terms of standard deviation, so is scaled by the instrumental AMV standard deviation (s 5 0.19 K) to be comparable with the other AMV indices.
We analyze the AMV-Wang and AMV-Mann indices over the period 1659-1999, the common period with the NAO index (described in the following subsection), and the AMV-Gray index over the period 1659-1990 since AMV-Gray is not available after 1990. We will present analysis for all three indices but have the most confidence in AMV-Wang, for the reasons outlined above. In addition, we analyze the AMV index from observations, which is the nondetrended North Atlantic SST (08-708N) from the Kaplan SST dataset (downloaded from https://www.esrl.noaa.gov/psd/data/timeseries/AMO/), this is used over the period 1856-1999. The three proxy AMV reconstructions disagree substantially in some parts or the analysis period. While effort is made as to the accuracy of the reconstructions in the source literature (Gray et al. 2004;Mann et al. 2009;Wang et al. 2017a), it is not clear that these uncertainties will be constant going back in time. The AMV reconstructions are considered to be more uncertain than the NAO index calculated from a sea level pressure reconstruction (see next section), since this is produced using instrumental data from barometers among other measurements.

b. NAO index from early instrumental data
To investigate the internally generated AMV in the proxy datasets we analyze the NAO, which is the dominant mode of large-scale atmospheric variability in the North Atlantic sector. To calculate the NAO index, we use the mean SLP reconstruction produced by Luterbacher et al. (2002). This dataset is a monthly 58 3 58 gridded SLP dataset over Europe and the eastern North Atlantic (308W-408E; 308-708N) and is available from 1659 onward. The SLP reconstruction was produced primarily using early instrumental station time series of pressure, temperature, and precipitation. The station pressure time series are the most important input data for the SLP reconstructions owing to the large spatial scale of monthly SLP anomalies. The number of predictors in the reconstruction increases substantially from 1755 onward, such that the quality of the reconstruction is not constant throughout the dataset. Numerous other proxy NAO reconstructions are available that extend the NAO back for up to 1000 years, though there is a large uncertainty and spread across these proxy indices (Pinto and Raible 2012), so for this reason we only analyze the NAO derived from the instrumental-based SLP reconstruction.
The NAO time series is calculated as the SLP difference between two boxes, one over Iceland and one over the Azores (shown in Figs. 2a,b), with each box consisting of four grid points, following Luterbacher et al. (2001). We calculate the NAO index for all months, normalizing over the entire monthly time series, such that there is more NAO variability in the winter months than in the summer months. This means that the winter months contribute more to the annual mean indices but the contribution from summer months is not neglected. The annual mean NAO index is computed from the monthly NAO index.
The NAO was calculated using this traditional gridpoint method rather than an empirical orthogonal function (EOF) method because the reconstructed SLP dataset is only provided over a limited spatial domain (308W-408E; 308-708N). We tested computing the NAO using the EOF method over this limited domain and compared this with the (EOF based) NAO index from Hurrell et al. (2003) from 1899 to 1999. 1 The gridpoint NAO exhibited a substantially higher correlation with the Hurrell NAO index than the NAO calculated from the EOF method (r 5 0.88 and r 5 0.77, respectively); therefore, the gridpoint-based NAO seems to give a better representation of the observed large-scale NAO.

c. LME simulations
To complement the proxy and instrumental data we also analyze data from the CESM Last Millennium Ensemble (LME; Otto-Bliesner et al. 2016). The LME was produced using the CESM climate model, running with 13 ensemble members from 850 to 2005. Each ensemble member is forced with the same transient evolution of solar intensity, volcanic emissions, greenhouse gases (GHG), ozone/aerosols, land-use conditions, and orbital parameters in an attempt to simulate the evolution of the climate system over the last millennium. The LME was run with a resolution of about 28 3 28 resolution in the atmosphere and about 18 3 18 resolution in the ocean. The 13-member ensemble means that the ensemble mean evolution of variables in the LME is expected to provide a good estimate of the externally forced mode response. In addition to the 13-member ensemble performed with all of the forcings, smaller ensembles (from 3 to 5 members) were performed with individual forcing components separately. Here we will analyze simulations with solar variability, volcanic emissions, GHGs, and ozone/aerosol forcings in addition to the full ensemble.
The AMV indices from the LME (AMV-LME) were calculated as the areal average of the annual mean SST anomalies over the North Atlantic (i.e., 08-708N), as for the observed indices. The NAO indices from the LME are calculated using the SLP difference over the same boxes as those used for the NAO index from the early instrumental data. The NAO indices are calculated for each ensemble member separately. The NAO calculated from the LME has the typical NAO pattern of anomalies with low pressure anomalies in the northern part of the North Atlantic basin and high pressure anomalies to the south, which is similar to that seen in the early instrumental NAO index (shown in Fig. S1 in the online supplemental material). The LME data is analyzed over 1659-1999 to compare with the proxy AMV indices and the instrumental NAO reconstruction.

d. CMIP5 historical simulations
In addition to the LME, we also analyze data from the historical simulations in the CMIP5 archive (Taylor et al. 2012). The CMIP5 historical ensemble is useful for putting the results from the LME into the context of other climate models. The first ensemble member for the 43 different models in which the historical simulations cover the period 1860-1999 is used. Similar to the LME, the CMIP5 historical simulations include the transient evolution of external forcing factors including solar intensity, volcanic emissions, GHGs, and ozone/ aerosols. AMV indices and NAO indices were calculated in the same way as for the LME.

e. Significance tests
To test the significance of the correlation values we used a Monte Carlo phase randomization techniquefollowing, for example, Ebisuzaki (1997). In this technique, 10 000 dummy time series are produced from one of the input time series that have the same spectral properties but are uncorrelated. These dummy time series are produced by taking the Fourier transform, then randomizing the phase of each component but retaining the amplitude-the inverse Fourier transform is then taken to give the dummy time series. These 10 000 dummy time series have a zero-mean correlation with the target time series. The significance is calculated as the percentage of the dummy time series that have a correlation with a lower amplitude than the original correlation between the input time series.

Internal AMV in proxy and early instrumental data
We begin our analysis by examining the presence of internal AMV (i.e., NAO-driven AMV) in the proxy indices through comparison with the evolution of the NAO. The annual mean NAO index, calculated from the instrumental data, is shown in Fig. 1a. The NAO reconstruction clearly has more interannual variance after about 1855-the period when a much larger number of station indices are included. However, the NAO index exhibits substantial variability on longer time scales throughout the reconstruction (i.e., thick black line, Fig. 1a). Li et al. (2013) showed that over the observational period, low-frequency AMV changes are proportional to the low-pass-filtered NAO (i.e., NAO): This can equivalently be written in the integral form and several studies have shown that the AMV is proportional to the accumulated NAO (e.g., McCarthy et al. 2015;Delworth et al. 2017;Sun et al. 2019): where NAO int is the accumulated NAO forcing and t 0 is the time at the start of the NAO record (i.e., 1659). Here we investigate the presence of internally generated AMV going further back in time using the proxy datasets. The integrated NAO index (NAO int ) and the low-pass proxy AMV indices (calculated using a 7-yr moving average) are shown in Fig. 1b. From visual inspection it is clear that the AMV indices generally follow the integrated NAO, and this is particularly clear for the AMV-Wang index. It should be noted that the NAO accumulation precedes the AMV in this analysis, so covariability between the integrated NAO and the AMV does not occur due to the coincident influence of the NAO on North Atlantic temperatures and the associated proxies. The correlation between the integrated NAO and the AMV indices is shown explicitly in Fig. 1c, over 100-yr moving windows as well as for the full period , early period (1659-1855), and late period . This late period represents the instrumental period over which the proxy indices were calibrated. Over the full period, the integrated NAO is most highly correlated with AMV-Wang (r 5 0.65) but also exhibits significantly positive correlations with AMV-Mann and AMV-Gray (both r 5 0.51). Much of this positive correlation is from the late period, during which there is a significantly positive relationship between the observed AMV index and the integrated NAO (r 5 0.67), which is spanned by the three AMV indices. However, there is also a clear relation between the integrated NAO and AMV-Wang in the early period (r 5 0.56), which is entirely based on the proxy data. The other two proxy AMV indices have nonsignificant but positive correlations with the integrated NAO forcing.
The AMV-Wang index shows a particularly clear relationship with the integrated NAO forcing from around 1750 or so to the start of the instrumental period in 1856.
During this period the AMV-Wang index is entirely outside the instrumental calibration period and is only determined by the proxy data; remarkably, the relationship with the integrated NAO is even stronger than during the instrumental period (Fig. 1c). From 1755 onward, the significantly more predictors go into the instrumental SLP reconstruction Luterbacher et al. (2002) and this extended period (i.e., 1755-1999) exhibits the stronger correlation between the integrated NAO and the AMV that in the late period. By comparing the integrated NAO with the AMV, we are implicitly testing the forcing of the AMV by the largescale atmospheric circulation changes. To test this more simply we can compare the AMV with the preceding NAO anomaly over a given period-this is shown for varying periods in Fig. 2a. The preceding NAO anomaly is positively correlated with the AMV in all the proxy datasets, peaking at around 30-35 years, and is largest in AMV-Wang. Correlation of the AMV with the gridpoint SLP anomaly over the preceding 30 years is shown in Fig. 2b. The SLP pattern clearly corresponds to the positive phase of the NAO, with a particularly strong dependence on the strength of the Icelandic low, similar to that seen over the modern instrumental period and in CMIP5 models (O'Reilly and Zanna 2018;Peings et al. 2016).
The relationship between the early instrumental NAO index and the proxy AMV is consistent with there being a substantial component of the AMV that reflects the response of the North Atlantic to large-scale atmospheric forcing. This is particularly the case in AMV-Wang, the most modern and well verified of the proxy AMV indices, for which the accumulated NAO forcing can explain over 40% of the variance (i.e., r 2 5 0.42). We now turn our attention to the externally forced AMV component of the proxy AMV.

Externally forced AMV in the Last Millennium Ensemble and proxy indices
To assess the externally forced component of the AMV, we analyze the AMV from the LME simulations.
The AMV indices from each member of the LME, as well as the ensemble mean, are shown in Fig. 3a. The ensemble mean AMV can be interpreted as the externally forced component of the AMV in the LME. We compare the externally forced AMV in the LME with the proxy AMV indices to estimate the externally forced component. This estimate of the externally forced component in the proxy AMV indices can perhaps be considered a lower bound on the externally forced AMV due to uncertainties associated with the prescribed external forcings and errors in the model response to the external forcing. However, the component of the proxy AMV indices that covaries with the LME can be interpreted as an externally forced component of the proxy AMV with a reasonable degree of confidence.
The proxy AMV indices are shown alongside the AMV-LME indices in Fig. 3a, all plotted using a 7-yr moving average filter. While there is relatively little similarity between the proxy AMV indices and AMV-LME over the whole period, there are certain periods during which there is clear covariability, such as the early 1800s and periods of the twentieth century. The correlation between the AMV-LME ensemble mean and the proxy AMV indices are shown in Fig. 3b for all filter lengths from 1 to 20 years. The correlation is significant at the 10% level for all proxy AMV indices and is largest for filter lengths of around 7 years, peaking at r 5 0.37 for AMV-Wang. This suggests that a significant portion of the AMV proxies, over 10% of the explained variance (i.e., r 2 5 0.13), is externally forced. Correlations between the AMV-LME and the proxy AMV  2. (a) Correlation between preceding NAO anomaly (i.e., averaged from t 2 N to t 2 1 years) and the following 7-yr AMV anomaly (i.e., averaged from t to t 1 6 years). The red, orange, and green lines in (a) show the correlation for the proxy AMV indices and the instrumental NAO index; the solid blue line shows the correlation averaged over all 13 ensemble members of the LME and the shading shows the interquartile range of the ensemble member correlations; the dotted blue line shows the correlation calculated from 1155 years of a single control simulation of LME model without external forcing. The filled circles indicate where the proxy AMV and instrumental NAO correlation is larger than at least 12 of the 13 LME ensemble members. indices in the early period (i.e., outside of the instrumental calibration period) are shown in Fig. 3c. Similar and significant correlation values are seen in this early period, peaking at r 5 0.38 with a 7-yr moving average filter for AMV-Wang. This confirms that the externally forced component in the proxy AMV occurs in the preindustrial period and corroborates the similar conclusions in Wang et al. (2017a), which were made using a simple statistical model of the externally forced AMV component.
In the late period, during which instrumental data are available, the correlation between AMV-LME and the observed AMV is slightly higher but is not significant at the 10% level and the proxy AMV indices exhibit similar behavior (Fig. 3d). As the length of the low-pass filter increases, the correlation between the observed AMV and AMV-LME increases, with r . 0.5 for a 20-yr moving average, although it is not significant at the 10% level. At first glance, the low level of significance of the late period may seem odd but this period is shorter and exhibits fewer degrees of freedom than in the early period, which is reflected in the low level of significance. Also shown in Fig. 3c is the correlation between the observed AMV and the ensemble mean AMV from the CMIP5 historical simulations (pink line in Fig. 3a). The correlation between the CMIP5 historical AMV and the observed AMV over the late period is very similar to the AMV-LME and increases with filter length. These correlation values are slightly lower values than those reported in the recent studies by Bellomo et al. (2018) for the LME and Murphy et al. (2017), which reported correlation values of around r ' 0.7 for the corresponding ensemble means. This could be due to the slightly different periods used in the analysis, the different filtering methods used or the linear detrending that is performed in both of these previous studies, which we do not apply to any of the indices in our analysis. It therefore seems likely that some fraction of the observed AMV over the later period is externally forced, but the exact amount is very uncertain. Moreover, as the correlation increases with the filter length, it could be that there is some aliasing between external forced temperature signals in the model and internal AMV variability in the observations over the late period.

Internal AMV in the LME
We now examine the internal AMV present in the LME to compare with the external component. Figure 4 shows the integrated NAO in the LME and the AMV-LME, analogous to the proxy AMV analysis in Fig. 1. The correlation between the integrated NAO and the AMV-LME is shown for each ensemble member over 100-yr windows in Fig. 4c, along with the mean correlation over all 13 ensemble members. It is immediately obvious that there is considerable spread across the ensemble members, though the LME typically demonstrates a small positive correlation throughout the analysis period. The light blue dots on the right-hand side of Fig. 4c show the correlation between the integrated NAO and AMV-LME, which is positive for all ensemble members with an ensemble mean of r 5 0.21, substantially less than that found for the proxy AMV indices (shown as colored crosses in Fig. 4c). Over the late period, there is a marginally stronger relationship between the integrated NAO and the AMV-LME (i.e., brown dots in Fig. 4c), though in all ensemble members the relationship is weaker than seen in observations (i.e., black cross in Fig. 4c). Across the CMIP5 ensemble there is clearly a much larger spread in the relationship between the integrated NAO and AMV, shown by the pink dots in Fig. 4c; while the CMIP5 ensemble mean is positive it is very small, and many models even exhibit a negative correlation. Only 2 of the 38 models in the CMIP5 ensemble exhibit a correlation between the integrated NAO and AMV as strong as that seen in observations. The relationships between the integrated NAO and AMV in the LME ensemble members over the late period are mostly within the upper half of the CMIP5 ensemble, so is certainly comparable with most of the CMIP5 models.
To further investigate how the AMV is forced by the NAO in the LME we now compare the AMV with the preceding NAO anomaly over a given period, which is shown in Fig. 2a. At lead times of between 10 and 50 years the correlation between the preceding NAO anomaly and the AMV averaged over the 13 ensemble members is small but positive; however, there is substantial spread across the 13 ensemble members. At lead times of 25 years and longer, the positive correlation between the early instrumental NAO anomaly and AMV-Wang becomes substantially larger than in the LME. This indicates that at these long lead times, the LME struggles to capture the stronger relationship between the NAO and AMV, which is also clear in the gridpoint correlation map in Fig. 2c. The difference in the fraction of NAO-forced variability in the model and proxy data may be due to the difference in the magnitude of the NAO variability on decadal time scales. The standard deviation of the decadal early instrumental NAO reconstruction is s 5 0.36 (using a 10-yr moving average), whereas the standard deviation of the decadal NAO in the 13 members of the LME are all in the range s LME 5 0.13-0.20. Therefore, the lack of NAO-driven AMV in the LME may be due to a lack of decadal NAO variability. Across the 13 ensemble members, those that have the highest decadal NAO variability tend to be those that have a stronger relationship between the NAO and AMV (Fig. S2), though the correlation is not clearly significant. The lack of decadal NAO variability in the LME is a typical feature of the current generation of coupled climate models, as demonstrated in the recent study of Bracegirdle et al. (2018).
To examine whether or not the presence of the external forcing influencing the link between the NAO and the AMV in the model, we performed the same analysis in the single-member 1155-yr LME control simulation, which has no external forcing (dashed line in Fig. 4a). The LME control simulation exhibits a very similar relationship between the NAO and the AMV as in the full LME, demonstrating that in the model this relationship is too weak on long time scales, even in the absence of external forcing. A similarly weak relationship is also found in the CMIP5 historical simulations over the recent observational period, as demonstrated in O'Reilly and Zanna (2018), so the LME is not unusual in this regard (see also Peings et al. 2016).

Assessing sources of externally forced AMV
Having established that there is a clear externally forced AMV component in the LME, which is to a lesser degree apparent in the proxy AMV indices, we now investigate the source of the externally forced AMV. To do so, we make use of the individual forcing experiments that were performed alongside the full LME: volcanic forcing (5 members), solar forcing (4 members), GHG forcing (3 members), and aerosol and ozone forcing (3 members). These experiments were all performed over the full analysis period except the aerosol and ozone forcing experiment, which was only run for the industrial period, from 1850 onward. Early and late periods are defined, as in the earlier analysis, as 1669-1855 and 1856-1999, respectively. The ensemble mean AMV from each of the individual forcing experiments is shown in Fig. 5a, along with the ensemble mean AMV-LME (i.e., the full forcing experiment). Much of the variability in the AMV-LME is captured by the volcanic forcing experiment, particularly in the early part of the analysis period, yet FIG. 4. (a) Annual NAO indices from the 13 LME members in gray; the ensemble mean is shown in the bold black line. (b) Integrated NAO indices from each LME member (gray) along with the AMV indices from each LME member (blue)-the interquartile range of the individual ensemble members is shaded and the mean shown by the line. (c) Correlation between the integrated NAO and AMV index for each LME member over running 100-yr windows (plotted at the center of the 100-yr window) with the ensemble mean shown in dark blue. Also shown in (c) are the correlation between the integrated NAO and AMV from the LME over the whole period (light blue dots; ensemble mean in dark blue), over the late period (light brown dots; ensemble mean in dark brown), and the correlation between the integrated NAO and AMV in each CMIP5 historical simulation (light pink dots; ensemble mean in dark pink). The crosses in (c) show the correlation of the integrated instrumental NAO (i.e., Fig. 1) with the AMV indices over the full period of observations (black), Wang et al. proxy (red), Mann et al. proxy (yellow), and Gray et al. proxy (green). similarity with the other forcing experiments is less clear. To quantify the similarity between the individual forcing experiments and the AMV-LME, we calculated the correlations between them over 100-yr moving windows, which are shown in Fig. 5b. Over the early period, the volcanic forcing AMV is highly correlated with the AMV-LME, whereas the GHG forcing AMV and the solar forcing AMV exhibit approximately no correlation and negative correlation, respectively. In the late period, when all four individual forcing experiments are available, there are positive correlations between all the experiments and the AMV-LME, indicating that all four of these forcing factors could be influencing the external AMV over the late period. The volcanic and solar forcing experiments exhibit the strongest correlations, the aerosol forcing also exhibits significantly positive correlation with the AMV-LMV but the correlation with the GHG forcing experiment is not significant.
Similarly, we can compare the AMV from the individual forcing experiments with the proxy AMV indices and correlations over moving 100-yr windows with AMV-Wang are shown in Fig. 5c. The volcanic forcing AMV exhibits the largest correlation with AMV-Wang in the early period and exhibits similar correlations with AMV-Wang as the AMV-LME (shown in blue). The GHG forcing AMV is uncorrelated with AMV-Wang in the early period and the solar forcing AMV is significantly negatively correlated with AMV-Wang. This suggests that either the solar variability was not important for past changes in the AMV or the model is incorrectly capturing these mechanisms through which solar variability influences the AMV, though it could be that the solar forcing acts to dampen the influence of the volcanic forcing on the AMV. Over the late period there are positive correlations with AMV-Wang in all of the individual forcing experiments; however, there are only significant correlations with the volcanic, solar, and aerosol forcing experiments. Moreover, it is noticeable that none of the individual forcing AMVs are as closely correlated with AMV-Wang as the ensemble mean AMV-LME, which suggests that several of the different forcings may be contributing to the externally forced component of the observed AMV.
To estimate the relative importance of each of the external forcing components, we performed a multiple linear regression (MLR) analysis by fitting the individual forcing experiment AMV indices to the AMV-LME and AMV-Wang. All ensemble mean indices were normalized prior to the analysis and the normalized regression coefficients are plotted for the early and late periods in Figs. 5d and 5e, respectively. Uncertainty estimates for the regression coefficients were calculated using block bootstrapping, using a block length of 20 years (e.g., Wilks 1997). The predictors are mostly uncorrelated except for the GHG and aerosol forcing AMV indices in the late period (r 5 20.66), where both exhibit substantial trends, so this may be expected to increase the uncertainty in the regression coefficients. The AMV-LME during the early period is dominated by the volcanic forcing component, whereas the solar forcing and GHG forcing make no significant contribution. The volcanic forcing also dominates the MLR fit to AMV-Wang, although the regression coefficient is much smaller than in AMV-LME. In the late period, the MLR suggests that all of the forcing components contribute somewhat to the AMV-LME, though there is large uncertainty in the relative contributions from each forcing. The MLR fit to AMV-Wang in the late period also has contributions from all the individual forcing components, with the exception of the volcanic forcing, which makes a very small contribution to the MLR fit.
7. Is the AMV-LME responding appropriately to external forcing?
Our analysis to this point has shown that there is a substantial externally forced AMV component in the LME (i.e., Fig. 3, which can explain a significant but modest fraction of the proxy AMV variance. Subsequent analysis of the internal AMV, specifically that driven by changes in the large-scale NAO forcing, showed that the LME tends to underrepresent this component of the AMV (i.e., Fig. 4). This suggests that the AMV in the LME may be responding too strongly to the external forcing. In this section we will attempt to assess how appropriately the AMV-LME responds to external forcing and use this analysis to interpret the externally forced AMV in the proxy indices.
To investigate whether the externally forced AMV is too strong in the LME, we assess the externally forced or predictable signal in comparison to internal variability in each ensemble member. To do this, we first calculate the perfect model correlation of AMV-LME, which is simply the correlation of each ensemble member of the LME with the ensemble mean. The perfect model correlation of the AMV-LME is shown for a varying number of ensemble members during the full, early, and late periods in Fig. 6 (solid blue lines). The perfect model correlation can be compared with the actual correlation between the AMV-LME with the proxy and observed AMV indices, shown in Fig. 6 for AMV-Wang. In the full, early, and late periods, the perfect model correlation is almost 0.7 with all 13 ensemble members and is significantly larger than the actual correlation with AMV-Wang and the observed AMV (during the late period). This suggests that the externally forced signal in the AMV-LME may be too large. This can be quantified by calculating the ratio of predictable components (RPC), which is the ratio of the predictable component in the real world to the predictable component (or signal to noise) in the model and has previously been used in studies of seasonal/decadal predictability (e.g., Eade et al. 2014;Scaife and Smith 2018;O'Reilly et al. 2019). Over the whole period, the RPC is significantly less than 1 (see Fig. 6), which indicates that the predictable signal in the LME is larger than the signal that is captured in the proxy/observed AMV.
If we were able to assume that the external forcings prescribed in the LME were a perfect replica of the real world forcing over the analysis period and that the proxy AMV indices are trustworthy, then we would be able to state with confidence that the AMV in the model is responding too strongly to external forcing. Of course, this assumption is not well justified, as many of the prescribed external forcings are estimates that are very uncertain. For example, while there is a general consensus on timings of large volcanic eruptions over the past 400 years, there is considerable uncertainty as to the amount and type of volcanic aerosols emitted (e.g., Crosweller et al. 2012). Nonetheless, over the recent analysis period-when the external forcing factors are expected to best constrained and we can compare with the observed AMV-the LME still appears to respond too strongly to the external forcing. This is certainly consistent with the AMV-LME not having a large enough internal AMV component, that is, responding too weakly to preceding NAO forcing (e.g., Figs. 2 and 4).
To this point, we have treated the externally forced AMV as distinct from the internal AMV that is associated with preceding large-scale circulation anomalies. However, previous studies have suggested that external forcing, particularly from volcanic aerosol emissions, can influence the phase of the NAO (e.g., Ortega et al. 2015). Therefore, it is possible that at least some of the NAO signal that leads the AMV in the observations and proxy records is externally forced. To examine this, we compare the externally forced (ensemble mean) AMV-LME with the integrated NAO from the early instrumental NAO index, which we previously found was able to explain over 40% of the variance in the AMV-Wang proxy index. These indices are plotted again in Fig. 7a, along with the ensemble mean integrated NAO index from the LME and the correlation coefficients calculated between the integrated NAO indices and the AMV-LME. As discussed above, the correlation between the integrated NAO in the LME and the AMV-LME is positive but very low (r 5 0.18). However, the AMV-LME is much more closely related to the integrated NAO from the instrumental SLP reconstruction, with a correlation of r 5 0.47 and a 90% confidence interval of r 5 0.34-0.56 (calculated by randomly sampling across the 13 ensemble members). Since the ensemble mean AMV-LME is the externally forced component of the AMV in the model, the reasonably large correlation with the integrated NAO from the early instrumental data indicates that a significant fraction of this may be externally forced.
We now test whether the externally forced component identified in the proxy AMV occurs through the influence of the NAO. To test this, in Fig. 7b we again compare the AMV-LME with AMV-Wang, but here we have (linearly) regressed out the integrated NAO influence. The correlation between AMV-LME and the AMV-Wang was r 5 0.37 prior to the regression (e.g., Fig. 3); however, after regressing out the integrated NAO from AMV-Wang there is practically zero correlation between the indices (r 5 0.07). Therefore, the component of AMV-Wang that was identified as being externally forced in section 3b (i.e., Fig. 4) seems to be wholly associated with the integrated NAO. It is therefore interesting that in the LME the externally forced AMV is not as strongly associated with the integrated NAO, suggesting that the external forcing is exerting a greater influence on the large-scale circulation in observations than in the LME. In the LME, the externally forced AMV is seemingly generated more directly through changes in radiative forcing, as argued in Bellomo et al. (2018). It appears, therefore, that while some of the externally forced AMV is correct, the pathway through which the external forcing influences the AMV is incorrect.
Detailed examination of the pathways through which external forcing is influencing the AMV is a difficult task given the relative paucity of observations. Nonetheless, we can try to examine the response of the model and observations/proxies to volcanic forcing, which was found to be the dominant individual forcing factor in section 3d. We perform a so-called superposed epoch analysis on the five explosive eruptions that occurred during our analysis period and are common to three alternative volcanic reconstructions, based on Swingedouw  Fig. 1b; ensemble mean of the integrated NAO indices from the LME (dark gray); ensemble mean AMV from the LME (in blue). The magenta squares in (a) indicate the five large volcanic eruptions between 1659 and 1999, following Swingedouw et al. (2017). The correlation between the integrated NAO indices and the AMV from the LME are also shown in (a), along with the 90% uncertainty range based on ensemble subsampling of the LME. (b) Proxy AMV (from Wang et al.) and the AMV index with the integrated NAO index from observations regressed out; these indices are essentially uncorrelated. et al. (2017), and these are indicated in Fig. 7a. We focus on the most explosive eruptions because the response to more minor eruptions can be difficult to detect as they do not necessarily inject substantial amounts of aerosol into the stratosphere. The LME uses the Gao et al. (2008) reconstruction, which was also used to identify the large eruptions, so it is reasonable to also perform this analysis for these same events in the model. Figure 8a shows the average response of the NAO in the winters following the large volcanic eruptions.
In the early instrumental NAO, there is a positive NAO response in all 4 years following the eruption. Of course, the response is fairly noisy because it is only averaged over five eruptions. Nonetheless, this is consistent to previous observational and modeling studies that have found that the NAO is typically positive in the years following an explosive volcanic eruption (e.g., Graft et al. 1993;Shindell et al. 2004;Ortega et al. 2015;Swingedouw et al. 2017). In the LME, however, there is no consistent NAO response to the major volcanic eruptions, which indicates that the LME is missing the dynamical response of the large-scale circulation to the external volcanic forcing. In terms of integrated NAO response, the absence of a consistent response in the LME means there is no signal in the integrated NAO following the major volcanic eruptions (Fig. 8b); whereas in the early instrumental NAO index, the integrated response typically increases and becomes significant between 5 and 10 years following the eruption. Despite the differences in the NAO response to the eruptions, however, the impact of the radiative forcing in the LME is enough to give a similar AMV response to the proxy AMV, which consists of a cooling for the years following the eruption (Fig. 8c). This simple analysis demonstrates that, for these five major eruptions, the LME is apparently missing the dynamical response in the large-scale circulation, but the radiative effects are able to produce a similar AMV response, in both sign and magnitude. It is therefore possible that the dynamical response to other external forcings, such as solar variability (e.g., Gray et al. 2010;Ineson et al. 2011), is not captured by the model but that some of the externally forced AMV variability is being realized in the model through radiative effects alone.

Further discussion and conclusions
In this study we have analyzed sources of AMV over a period of more than 300 years, using proxy indices and early instrumental data, along with a complementary ensemble climate model simulation over the same period. The proxy AMV is found to closely follow the accumulated NAO forcing over almost the entire analysis period, referred to as an ''internal'' source of AMV as it can be expected to occur to some extent in the absence of external forcing. The dependence of the AMV on the accumulated NAO forcing has previously been demonstrated over the observational period, the last 150 years or so, but here we show that this is also evident over a significantly longer period (i.e., Fig. 1). These results provide additional observational evidence that much of the variance in the AMV is driven by the oceanic response to the buildup of surface atmospheric forcing, as has been demonstrated in targeted modeling studies (e.g., Delworth et al. 2017).
The AMV response to the accumulated NAO forcing is found to be present in the LME simulations, though it is much weaker than that seen in the proxy/observational datasets. Similar behavior is seen in the majority of the CMIP5 historical simulations so, at least in this respect, the behavior of the LME is fairly representative of other models. The weak AMV response to the accumulated NAO forcing is particularly clear on time scales greater than a decade in the LME, which was also found FIG. 8. Superposed epoch analysis of climate impact following the five largest volcanic eruptions for (a) NAO index, (b) integrated NAO, and (c) AMV index. Anomalies are shown relative to the 5-yr period prior to the eruption for the reconstructed NAO (in gray), the proxy AMV from Wang et al. (in red), and from the LME (in blue). In (a)-(c) the shading indicates the standard error of the mean; the dots indicate where this is significantly different from zero at the 90% confidence level.
to be the case in the preindustrial control and historical CMIP5 simulations in O' Reilly and Zanna (2018). This could be due to a weaker-than-expected response of the ocean circulation to the accumulated NAO forcing (Xu et al. 2019). Alternatively, the weak response to the NAO could be due to the relatively weak NAO variability on multidecadal time scales. In the LME, the decadal NAO variability is significantly lower than in the NAO reconstruction, with s LME 5 0.13-0.20 in the LME compared with s 5 0.36 for the early instrumental NAO reconstruction. Kim et al. (2018a) showed that the weak multidecadal NAO variability is likely responsible for the weak AMV amplitude in the CESM large ensemble, which employs a very similar model setup to the (CESM) LME analyzed here. Weak multidecadal NAO variability has been shown to be a robust feature of the North Atlantic jet stream across almost all the CMIP5 models in the recent studies by Simpson et al. (2018) and Bracegirdle et al. (2018); see also Woollings et al. (2015). Therefore, the lower magnitude of the decadal NAO forcing may explain the weak response of the AMV to the accumulated NAO in the CMIP5 ensemble (e.g., Fig. 4c). Moreover, the weak multidecadal NAO forcing may therefore explain why the amplitude of the AMV is too low on multidecadal time scales in the CMIP5 ensemble (Peings et al. 2016), as highlighted by the study of Wang et al. (2017b). The misrepresentation of the multidecadal NAO forcing on the AMV clearly has implications for modeling past changes in the North Atlantic but it also has important implications for modeling the contributions of internal multidecadal variability in transient climate projections. Models that are missing multidecadal NAO variability are likely to underestimate the amplitude of the AMV and the internal multidecadal variability in the continental regions surrounding the North Atlantic, including Europe and the Sahel.
It is possible that the weak decadal variability in the LME is related to a weak response of the NAO to the AMV. In observations, a negative NAO on decadal time scales is found to follow a positive AMV anomaly (e.g., Peings and Magnusdottir 2014). Lag correlations between the 7-yr low-pass NAO and AMV indices are shown in Fig. 9 for the proxy and LME indices. Here it is clear that the negative correlation between the AMV and the NAO leading by 10-15 years is not captured by the LME, which has weaker correlations when the AMV is leading. Sun et al. (2015) showed that in a preindustrial control simulation (using the CCSM4 model) the coupled feedback of the AMV onto the NAO at later times likely contributed to the pronounced multidecadal NAO variability in the simulation. It is likely, therefore, that the weak feedback of the AMV onto the NAO is contributing to the weak NAO variability on decadal time scales in the LME. The weak feedback from the AMV to the NAO is not unique to the CESM model analyzed here and is found to be a general feature of coupled CMIP5-era models, as shown by Peings et al. (2016) (see also Simpson et al. 2018).
The analysis of the ensemble mean AMV in the LME reveals a substantial externally forced component. The externally forced AMV exhibits a modest but significant correlation with the proxy AMV (i.e., r 2 5 0.13), which implies that at least this fraction of the proxy AMV is estimated to be externally forced. The AMV in the individual forcing simulations show that the externally forced AMV is largely driven by volcanic forcing in both the AMV-LME and proxy AMV before the midnineteenth century. In the later period, however, GHG forcing, aerosol/ozone forcing, and to a lesser extent solar forcing can explain more of the AMV in both the LME and the observational/proxy records. While the LME is able to confirm an external influence on the observed AMV, the model appears to respond too strongly to this external forcing. Or, alternatively, the AMV in the LME does not exhibit enough internal variability. This may be related to the very weak multidecadal NAO influence on the AMV in the model, FIG. 9. The red, orange, and green lines show the lag correlation between the 7-yr mean proxy AMV indices and instrumental NAO index; the solid blue line shows the median correlation of the 13 ensemble members of the LME and the shading shows the interquartile range of the ensemble member correlations; the dotted blue line shows the correlation calculated from 1155 years of a single control simulation of LME model without external forcing. The filled circles indicate where the proxy AMV and instrumental NAO correlation is larger/ smaller than at least 12 of the 13 LME ensemble members. compared with observations, which would in turn be expected to result in a weak internal AMV.
Finally, we showed that externally forced AMV identified in the proxy AMV is related to the accumulated NAO forcing. In the LME, however, very little of the externally forced AMV that arises in the model is through changes in the accumulated NAO forcing. This evidence demonstrates that the external forcing could be influencing the AMV through different mechanistic pathways: via changes in radiative forcing in the LME and via changes in atmospheric circulation in the observational/proxy record. We demonstrated this by analyzing the response to large volcanic eruptions in the model and early instrumental NAO indices. The NAO response seen in the early instrumental data is completely absent in the model, which is nonetheless able to capture the AMV response. If the model was able to better respond to external forcing then it is possible that more of the AMV could be explained by the external forcing, though this is largely speculative. The response of the large-scale atmospheric circulation to changes in external forcing is already an area of active research. However, our findings motivate further study of the impact of these circulation responses on climate variability on multidecadal time scales via interaction with the ocean; this could also be important in governing forced responses over the Pacific Ocean (i.e., the Pacific decadal oscillation) as well as over the North Atlantic as outlined in this study.