1. Introduction
Multidecadal variability over the North Atlantic has significant societal impacts because of its influence on hurricane activity, shifts in the intertropical convergence zone, summer monsoon rainfall over Sahel and India, and summer climate over Europe and North America (Goldenberg et al. 2001; Sutton and Hodson 2005; Knight et al. 2006; Zhang and Delworth 2006). Future projections of global and regional temperatures and associated uncertainties depend critically on whether observed variability is forced or generated internally. Forced variability refers to the response to changes in radiative forcing, which may include the following: changes in land cover or land use, changes in the atmospheric concentration of greenhouse gases and anthropogenic aerosols, volcanic and natural aerosols, and changes in solar irradiance. Internal variability refers to variability that appears in the absence of changing external forcing due to dynamical instabilities and interactions within the coupled ocean–land–atmosphere system. There is no consensus about the relative importance of forced and internal variability in North Atlantic sea surface temperature (NASST). Some studies have suggested that anthropogenic aerosols are a predominant forcing (Booth et al. 2012). Others have attributed NASST variability to the effects of climate change, resulting in atmospheric changes that impact the ocean (He et al. 2022). Others have argued that internal variability within the coupled ocean–atmosphere system, particularly the North Atlantic Oscillation (NAO) and the Atlantic meridional overturning circulation (AMOC), can contribute to decadal-scale NASST variability (Zhang et al. 2013). The relative importance over time and interactions between these different drivers of NASST variability remains an active area of research.
Following Frankignoul et al. (2017), methods for separating forced and internal variability may be classified into two categories. One category relies solely on observations to estimate external variability. For example, the externally forced response could be assumed a smooth function of time that can be represented by a low-order polynomial, with parameters estimated from observations. However, if internal variability generates multidecadal variability, then regressing a polynomial out of observations may introduce error by overfitting to internal variability.
Another example in this category is to use the globally averaged SST to represent the temporal evolution of externally forced variability. Trenberth and Shea (2006) argue that the externally forced component can be removed by subtracting globally averaged SST from NASST, which assumes that the forced response of the North Atlantic is the same as the global average. Deser and Phillips (2021) demonstrate that regressing the globally averaged SST out of gridpoint data outperforms the subtraction method of Trenberth and Shea (2006), correcting for spatial differences in the forced response, although potentially overfitting to internal variability. Deser and Phillips (2021) point out that this estimate of external variability is unable to capture the effect of external forcings that are unique to the North Atlantic region, such as the regional aerosol forcing implicated by Booth et al. (2012) or regional feedbacks of the climate system to external forcing that operate on a different time scale as global forcing (Armour et al. 2016). More sophisticated approaches in this category rely on assumptions about differences in time scales between the externally forced and internal variability, including locally weighted scatterplot smoothing (LOWESS) and ensemble empirical mode decomposition (EEMD) (Cleveland 1979; Wu and Huang 2009). Other time-scale-based methods include the eigenmode decomposition of linear inverse models, both with and without optimal initial conditions (Frankignoul et al. 2017), and low-frequency component analysis, maximizing the ratio of low-frequency to total variability (Wills et al. 2019).
A second category of methods uses physics-based models, called general circulation models (GCMs), that represent Earth’s coupled ocean–atmosphere system. Because these GCMs generate their own internal variability, ensemble techniques are used to estimate the externally forced response. As larger ensemble sizes produce more accurate estimates of a single GCM’s response to external forcings, there has been a recent emphasis on producing externally forced ensembles with tens to hundreds of members that can facilitate detailed analysis on regional and decadal scales (Kay et al. 2015). In addition, comparing large ensembles from several GCMs can provide a robust basis for examining the relative influence of internal variability and model differences in simulating the observed climate (Deser et al. 2020; Jain et al. 2023).
Within the context of separating components of variability, after an ensemble average is calculated, the estimate of the externally forced response can be removed via subtraction or linear regression (Frankcombe et al. 2015; Steinman et al. 2015). More sophisticated approaches that utilize ensembles of GCM simulations, such as optimal fingerprinting (Allen and Tett 1999) and signal-to-noise maximizing EOF analysis (Ting et al. 2009; Wills et al. 2020), have also been applied. However, there is considerable debate about whether GCMs accurately represent the response to external forcings. In particular, weak observational constraints on aerosol forcings and parameterizations allow for a range of GCM behavior in response to aerosol forcing (Rotstayn et al. 2015; Hourdin et al. 2017). If the ability of GCMs to produce realistic responses to historical forcings is in doubt, conclusions from methods that rely on a simulated response to external forcing will remain unconvincing.
There is a third category of methods that avoids specifying characteristics of forced variability. Specifically, these methods focus on identifying the internal contribution to observed variability. These methods are called dynamical adjustment, an example of which is the study of Wallace et al. (1995). Wallace et al. (1995) performed a dynamical adjustment on observed Northern Hemisphere mean temperature by regressing out a single spatial pattern orthogonal to the hemispheric mean. This pattern, termed cold ocean–warm land (COWL), represented dynamically induced changes in the hemispheric mean associated with the configuration of warm and cold air masses. Removing COWL-associated variability produced a residual estimate of external variability exhibiting reduced interannual variability and enhanced autocorrelations, expected characteristics of the externally forced signal. This method avoids specifying the temporal shape or amplitude of externally forced variability but makes the assumption that external variability is nearly spatially uniform and mostly absent in patterns that are orthogonal to the spatially uniform. Wallace et al. (1995) noted that the COWL pattern may contain fingerprints of external forcing, particularly high latitude continental warming. Other fingerprints, due to aerosols, trace gases, and changes in cloud cover, may possess distinct spatial structures that also project onto COWL. Other dynamical adjustment studies have employed sea level pressure (SLP) to estimate and remove the effect of atmospheric circulations on surface temperature (Hurrell 1996). Using SLP in dynamical adjustment has been found to account for some of the effects of internal variability on surface temperature, reducing the spread of trends between different members of a GCM ensemble (Deser et al. 2014, 2016; Sippel et al. 2019) and trend discrepancies between GCMs and observations (Saffioti et al. 2016; Wallace et al. 2012).
In the context of quantifying variability in NASST, dynamical adjustment is particularly attractive because it avoids making strong assumptions about forced variability for which there is no consensus. In this paper, we develop such an approach. Specifically, we propose a dynamical adjustment method that uses a machine learning model to estimate the internal variability of a spatially uniform pattern. This pattern has a value equal to the basin-mean NASST. The physics-based machine learning model estimates the characteristics of internal variability, particularly covariability between the basin-mean and spatial patterns that are orthogonal to the spatially uniform pattern. This covariability is leveraged to estimate the internal contribution to basin-mean fluctuations on the basis of the temporal variance of the orthogonal patterns. SST is selected as the predictor over SLP because, while some SST variability might be related to SLP, particularly that driven by the atmosphere, other variations arising from internal ocean dynamics may be unrelated to atmospheric circulation. The proposed method extends the approach of Wallace et al. (1995), where a single orthogonal pattern was derived from observations. Specifically, our method uses a set of orthogonal patterns to estimate the internal variability, rather than relying on a single pattern. To emphasize large-scale patterns of spatial variability, we select Laplacian eigenfunctions for this study, calculated using the method of DelSole and Tippett (2015). Previous studies have demonstrated that gridpoint patterns of SST variability can be used to estimate internal variability in the North Atlantic, in particular, internal AMOC variability by DelSole and Nedza (2021) and basin-mean NASST in CESM1 by Liu et al. (2023). However, as discussed by DelSole and Nedza (2021), gridpoint patterns associated with variability can be highly model-specific. Our use of Laplacian eigenfunctions follows DelSole and Nedza (2021), who enhanced the interpretability of their predictive model of AMOC variability, without loss of skill, by using them to represent spatial patterns of SST.
For this introductory study on the proposed dynamical adjustment methodology, our goal is to establish that internal variability in the basin-mean NASST can be estimated using a linear regression model using Laplacian eigenfunctions as predictors. In contrast to the observational approach of Wallace et al. (1995), the predictive coefficients for our dynamical adjustment are learned in a multimodel set of preindustrial GCM simulations, which contain only internal variability. Training the dynamical adjustment in a multimodel set of GCMs emphasizes behavior that is shared among GCMs, providing a robust basis for estimating internal variability in independent datasets. Training and validation of the dynamical adjustment model are conducted using a leave-one-model-out experimental design, whereby the dynamical adjustment is tested using data that were excluded from the training step. After training and evaluating in preindustrial simulations, the dynamical adjustment is applied to forced historical simulations.
The remainder of this paper is organized as follows: section 2 discusses the data used and its representation using Laplacian eigenfunctions. Section 3 introduces the dynamical adjustment method and how it will be evaluated, as well as the methods that are applied for comparison. Section 4 includes the results of applying the dynamical adjustment to independent preindustrial and historical GCM simulations. Results from the comparative methods and the relative performance of the dynamical adjustment are discussed. The coefficients selected for the dynamical adjustment are examined, and a reduced parameter dynamical adjustment approach is explored. Section 5 contains concluding statements and thoughts on future research directions.
2. Data
The data used in this study are NASST within the domain of 0°–60°N and 10°–70°W. Data are analyzed from 12 CMIP5 GCMs (Taylor et al. 2012). These 12 GCMs are selected based on the availability of a preindustrial simulation at least 500 years in length and a minimum of three historical ensemble members (see Table 1). The last 500 years of each preindustrial simulation is selected. All historical ensemble members are included. This provides a combined multimodel preindustrial dataset of a length of 6000 years and 64 historical ensemble members. Models that do not meet the above criteria were excluded from the analysis. MIROC5 was omitted due to a known data issue.
CMIP5 archived global climate models utilized in this study with the length of preindustrial simulation and a number of historical simulations.
Data are interpolated onto a common 2.5° × 2.5° uniform grid. Annual means are calculated from July to June to include one winter season each year. Results are not sensitive to the annual-mean window. Time series for the results shown are smoothed with an 11-yr Lanczos-weighted running mean (Duchon 1979). An 11-yr running mean is selected to emphasize the decadal-to-multidecadal time scales that are of particular interest in NASST but may smooth through the response to volcanic impulses. Other widths of the smoothing window, including 21, 5, and 1 year (annual means with no additional smoothing), were also studied. Results for other smoothing windows show quantitative differences but produce conclusions similar to those shown. Conclusions were found to be insensitive to the specific smoothing method employed. Boxcar, triangular, and Gaussian weighting schemes were also applied with only minor changes in results and no change in final conclusions.
Preindustrial control runs are used to train the dynamical adjustment model. A third-order polynomial over the 500-yr period is removed to mitigate the potential effects of model spin up or drift. The resulting time series has zero time mean. To remove differences in variability across GCMs, preventing weighting of the dynamical adjustment model toward high-variance GCMs, each preindustrial simulation is normalized such that the variance of the NASST basin mean has a unit variance. The normalization is performed in such a way to maintain the variance spectrum of NASST.
Methods for separating forced and internal variability will be tested using historical simulations from the CMIP5 archive. Historical simulations are initialized from random states of a long GCM integration and then integrated under realistic estimates of external forcing. As such, historical simulations contain both forced and internal variability. A GCM’s forced variability is estimated by the ensemble average of historical simulations from that GCM. For this study, the resulting ensemble mean defines the “true” external variability. The accuracy of an ensemble mean at representing a GCM’s true response to external variability is a function of ensemble size and is related to the magnitude of internal variability (Frankcombe et al. 2018; Milinski et al. 2020). This dependency on ensemble size motivates the exclusion of GCMs with less than 3 historical ensemble members. The internal variability is calculated as the residual of the ensemble mean. The time period of July 1860–June 2005 is selected from each historical simulation. The time series are centered on each historical ensemble member individually. The variance of historical simulations is not normalized. In this study, historical simulations are only used for validation. The dynamical adjustment model is linear; therefore, the scale of the predictions is determined by the scale of the predictors.
The dynamical adjustment is applied to Extended Reconstructed SST, version 5 (ERSST.v5), a monthly global analysis compiled from observational data with missing data filled in Huang et al. (2017b,a). In a similar fashion to the GCM data, ERSST.v5 is interpolated to the 2.5° × 2.5° uniform grid and projected onto Laplacian eigenfunctions representing the North Atlantic domain, 0°–60°N, 10°–70°W. For this study, we utilize time steps from July 1860 to June 2018. Annual-mean time series are calculated using a July–June window and then centered and smoothed with an 11-yr running mean.
Laplacian eigenfunctions
The time series for the jth Laplacian eigenfunction, denoted by ψj(t), is obtained by projecting the area-weighted NASST data onto the jth Laplacian eigenfunction [the precise procedure is discussed in DelSole and Tippett (2015)]. The time series for the first eigenfunction, a uniform pattern over the North Atlantic domain, corresponds to the spatial mean of North Atlantic SST and is a common index of North Atlantic variability. This time series will be referred to as the NASST basin mean, without any insinuation about whether its variability is dominated by internal or external forcing. The original data can be recovered from a sufficiently large set of Laplacian eigenfunctions and their associated time series. Reconstructing data from a truncated set of eigenfunctions will only recover the data down to the minimum length scale of the truncated set. We include the first J = 100 eigenfunctions, which corresponds to a minimum characteristic length of around 500 km.
The temporal variance of each Laplacian eigenfunction is shown in Fig. 2 for the preindustrial and historical simulations of each GCM and ERSST.v5. The distribution of variance is very similar between preindustrial and historical simulations for most spatial scales. However, for the spatially uniform pattern, the first Laplacian eigenfunction, there tends to be much more variance in the historical simulations compared to the preindustrial. This difference is attributed to externally forced variability. The temporal variance approximately follows a k−3 distribution across most spatial scales (exempting the largest), as reported in DelSole and Nedza (2021). Temporal variance for Laplacian eigenfunctions derived from ERSST.v5 is plotted in Fig. 2. The distribution of variance in ERSST.v5 falls within the range of historical simulations and generally agrees with the average across GCMs. However, we note that the variance of smaller spatial scales in ERSST.v5 does appear slightly lower than for most GCMs. This may be due to the reconstruction method, which employs a finite number of spatially smoothed empirical orthogonal teleconnection patterns (Huang et al. 2017a).
3. Methods
a. Dynamical adjustment
The coefficients are learned in a multimodel set of preindustrial simulations. Preindustrial simulations have no interannual variations in external forcings and do not contain externally forced variability; the coefficients only characterize relationships that occur in internal variability. Training in a multimodel ensemble, the dynamical adjustment is intended to learn relationships that are shared across GCMs. The skill of the dynamical adjustment is evaluated on preindustrial and historical simulations from an independent GCM that is excluded from the training data. This process is repeated for all 12 GCMs in this study, with a new set of coefficients learned each time. Details on the coefficients are discussed in section 4f.
The dynamical adjustment can easily be modified to use leading or lagging predictors, or a combination of simultaneous, leading, and/or lagging predictors. For these experiments, a separate set of dynamical adjustment coefficients are learned under the same experimental outline described above. Results are discussed briefly in section 4a.
Regularized regression
The penalty term is penalized based upon the chosen norm of the coefficients. Each norm has a different impact on coefficients. The norm associated with lasso has the effect of setting some coefficients exactly to zero. Ridge, in contrast, tends to shrink the entire set of coefficients without setting any to zero, while the elastic net norm has the effect of combining the two. We refer the reader to Tibshirani (1996) for more detail on regularized regression. The regularized regression is implemented using the R package, cv.glmnet (Friedman et al. 2010). All three norms above were applied with none of them performing systematically better than the others. Therefore, the results discussed in this study will be limited to lasso. Lasso is chosen because of the norm’s characteristic behavior of setting some coefficients to zero. By reducing the number of nonzero coefficients, we hope to identify spatial patterns that are most important for prediction. However, due to the highly covarying nature of the NASST Laplacian eigenfunctions, interpreting these spatial patterns may be difficult, and similar predictions may be possible even when excluding some predictors.
The regularization parameter λ must be selected for each lasso model. The λ is selected through a cross-validation procedure performed on the training data. For each training set of 11 models, cross validation is performed by training on 10 models and validating upon the excluded model. This procedure is repeated 11 times for each training set, excluding data from a different model each time. The distributions of error for each λ value are estimated under this cross-validation procedure, an example of which is shown in Fig. 3. This example corresponds to the training set that excludes CanESM2, including the 11 other models. The cross-validation distributions are similar for all training sets.
There are two common criteria for selecting λ. One criterion is to select the λ that minimizes the cross-validated mean square error. The other is the “one-standard-error” rule, selecting the largest λ whose mean square error lies within the standard error range of the minimum. Selecting the “one-standard-error” λ value has the effect of setting more predictor coefficients to zero compared to the minimum λ selection. Results are not sensitive to “one-standard-error” or minimum λ value selection. The results discussed in this study correspond to the “one-standard-error” λ selection. A brief discussion of the coefficients selected by the lasso model is included in section 4f.
b. Comparative methods
In this section, we discuss methods that will be used for comparative purposes.
1) Polynomial method
The polynomial method estimates the basin-mean NASST external variability by fitting a polynomial to each historical ensemble member individually. In the simplest case, the polynomial is a linear trend. However, the linear assumption has little physical support and tends to perform poorly compared to other methods, motivating the use of a quadratic trend by Frankcombe et al. (2015) and Frankignoul et al. (2017). Hawkins and Sutton (2009) applied a fourth-order polynomial. We explore the application of polynomials to the tenth order, systematically evaluating each order and assessing performance. In this study, the polynomial method is applied to time series of the same length, and therefore, a given polynomial order has the same capacity for capturing variability in each GCM simulation.
2) Locally weighted scatterplot smoothing
LOWESS has been suggested as a filter for removing time scales of variability associated with most internally generated modes of oceanic variability (Cheng et al. 2022). LOWESS generates a smoothed time series by taking the value of a polynomial fit at each time step. The value of this polynomial is determined using a weighted regression including surrounding points within a prescribed span (Cleveland 1979). Cheng et al. (2022) suggest that this method is much more successful at isolating low-frequency variability, particularly due to anthropogenic forcing, in observed global mean oceanic heat content compared to a polynomial fit.
Cheng et al. (2022) suggest the use of a 25-yr LOWESS span for filtering out time scales associated with most internally generated modes of oceanic variability. We investigated a variety of spans and found a 25-yr LOWESS span to perform well across GCMs. Increasing the span width does not improve performance in all GCMs. Local linear fitting demonstrates an improvement over local constant fitting, particularly in the vicinity of endpoints. Applying a local quadratic fit does not produce a more accurate separation of variability compared to the local linear fit. Results shown correspond to a local linear fit with a 25-yr LOWESS span unless otherwise noted.
3) Ensemble empirical mode decomposition
EEMD extends from EMD and produces more robust results less sensitive to the presence of noise. EEMD modifies the above process by adding white noise to the original time series (Wu and Huang 2009). Many independent realizations of white noise are added to the original data, and then each one is decomposed using EMD, producing an ensemble for each of the IMFs as well as the residual. The ensemble for each component is then averaged, removing the influence of the independent realizations of white noise and producing a more robust estimate of the IMFs than simply applying EMD to the original time series.
EEMD has previously been applied to the study of low-frequency components in temperature time series, including the identification of a nonlinear secular trend and multidecadal component that contributes nontrivially to the overall increase in observed global mean temperatures (Wu et al. 2011) and as part of a methodological comparison when estimating the separation of temperature variability into externally forced and internal components (Frankignoul et al. 2017).
In this study, EEMD is implemented using the R package hht() produced by Bowman and Lees (2013) with 400 ensemble members and white noise with one-fifth the standard deviation of the data being decomposed, parameter choices consistent with Ji et al. (2014) and Frankignoul et al. (2017). Results shown represent the performance using the residual rn−2(t), equivalent to the summation of the final residual term rn(t) and the two lowest-frequency IMFs cn(t) and cn−1(t), to represent the externally forced variability. This summation performs well on average across GCMs when compared to estimating the external component of variability using the residual alone or summed with any number of successive IMFs.
4) Multimodel ensemble mean
For the multimodel ensemble mean (MMEM) method, an MMEM is calculated from all GCMs except the GCM in which the method is being evaluated. More precisely, one GCM is withheld, an ensemble mean is computed for each of the 11 remaining GCMs, and then the mean of the 11 ensemble means is used for the MMEM. We emphasize that instead of computing a mean over all historical runs, the MMEM is a mean of ensemble means, which equally weights each GCM. The validation GCM is excluded from the MMEM such that no information from that GCM is included. The MMEM is either subtracted or regressed from each ensemble member of the validation GCM. The regression step rescales the MMEM to correct for differences in the magnitude of the externally forced response. These approaches are referred to as “differenced” or “scaled” MMEM methods as described by Frankcombe et al. (2015).
GCMs within this study exhibit a range of responses to external forcing, likely related to individual GCMs’ aerosol effective radiative forcing (Rotstayn et al. 2015). The MMEM represents the average behavior across GCMs and will not represent responses unique to a single or subset of GCMs. Although an MMEM could be computed from a subset of GCMs, the question arises as to which subset should be selected for application to independent data. There is no consensus on the best criterion for selecting GCMs in this situation. We proceed under the premise that each GCM is equally likely to be representative of the real system and therefore do not group or weight the GCMs when calculating the MMEM.
c. Validation measure
Historical simulations
To compare dynamical adjustment to other methods in historical simulations, it should be recognized that the methods for separating forced and internal variability predict different things. Specifically, dynamical adjustment predicts the internal component of variability, whereas the other methods, the polynomial and MMEM, predict the forced component. A prediction of internal variability I(i1) can be transformed into a prediction for the forced component by computing the residual
Depending on the choice of normalizations, a given prediction may be deemed skillful under one normalization but not the other (i.e., the prediction may have NMSE < 1 under one normalization and NMSE > 1 under the other). We propose that the simplest way to proceed is to evaluate both skill measures and define skill as the prediction producing an NMSE less than unity under both normalizations. For this study, normalizing by internal variability produces a more strict threshold of skill. This is a consequence of the fact that forced variance in the historical runs, with an 11-yr running mean applied, generally exceeds or is equal to the internal variance in the GCMs in this study. Accordingly, we opt to normalize by internal variability, in which case a prediction that is skillful under this measure is skillful under both measures. This choice has the further advantage that it enables skill comparisons between historical and control simulations; normalizing by forced variance is not an option for control simulations because the variance of forced variability vanishes for these runs.
4. Results
a. Preindustrial validation
We now show the result of applying the dynamical adjustment to estimate variability in independent preindustrial simulations. Figure 4 shows the preindustrial NASST basin-mean time series for each GCM along with the dynamical adjustment estimate. Explained variability calculated using NMSE ranges from a low of 27% (NorESM1-M) to a high of 67% (CSIRO Mk3.6.0 and IPSL-CM5A-LR). We interpret this range of explained variability as the dynamical adjustment being unable to represent the diversity of GCMs’ internal variability. We carried out experiments that involved training and validating both within individual GCMs and across multiple GCMs. However, the outcomes of single-model trained dynamical adjustment were highly model-specific and presented challenges in summarizing them cohesively without a clear, overarching hypothesis. Consequently, we opted to narrow our focus to the multimodel trained dynamical adjustment. This focus is supported by better performance of the multimodel trained dynamical adjustment compared to single-model trained dynamical adjustment in almost every independent validation (not shown).
Explained variability calculated using correlation and NMSE are similar for all GCMs, with the explained variability estimated by R2 being equal or higher due to the metric’s forgiveness of amplitude errors. The influence of this amplitude correction is clear in HadGEM3-ES, where the explained variability between metrics differs by 14% due to consistent underestimation of amplitude.
In GCMs where the dynamical adjustment estimate performs well, for example, CSIRO Mk3.6.0 and IPSL-CM5A-LR, low-frequency variability is well estimated although there is a tendency to underestimate peaks and amplitude. In GCMs where the dynamical estimate performs poorly, for example, CCSM4 and NorESM1-M, some low-frequency variability is estimated correctly with other fluctuations being misrepresented in direction. The accuracy of higher-frequency variations depends on GCM. The NMSE ± SENMSE of applying dynamical adjustment to independent preindustrial simulations is summarized in Fig. 6. The dynamical adjustment is skillful, producing an NMSE < 1, in every GCM.
In a separate analysis, the dynamical adjustment approach was applied using predictors that lead the predictand by 1, 2, and 11 years (not shown). For 1- and 2-yr leading predictors, the dynamical adjustment performs similarly to the simultaneous predictors, with a small degradation of skill based on lead time. For the 11-yr leading predictors, where the predictors and predictand have nonoverlapping running mean windows, the NMSE ± SENMSE bars include the value of one for the majority of GCMs, implying that the dynamical adjustment has no skill. Due to their similar display of skill, or lack of skill for 11-yr leading predictors, dynamical adjustment predictions using only leading predictors will not be discussed for application on historical simulations in this study.
In a separate analysis, the dynamical adjustment was applied using a combination of simultaneous and 1-, 2-, or 11-yr leading predictors (not shown). Using a combination of simultaneous and leading predictors performs about as well as simultaneous predictors alone. This is not surprising as the 1- and 2-yr leading predictors are unlikely to contribute much additional information compared to the simultaneous only due to the lead being well within the much wider window of smoothing (11 year). Due to the similar performance, using combinations of predictors will not be discussed further.
b. Historical validation
The dynamical adjustment, trained in preindustrial simulations, is applied to the historical simulations of the GCM withheld from the multimodel training set. No modification to the predictor time series from the historical simulations is performed, and the dynamical adjustment estimate is not influenced by the validation GCM’s ensemble size. No retuning of the dynamical adjustment model is performed. The dynamical adjustment estimate of external variability in basin-mean NASST, obtained by subtracting the predicted internal component from the total, for the first two ensemble members of each GCM is shown in Fig. 5. The estimate of external variability from dynamical adjustment generally fits the shape of the true external variability, appearing to effectively account for the presence of decadal-to-multidecadal internal variability in most cases. However, dynamical adjustment sometimes fails to correctly or completely remove decadal internal variability. This is expected based on the preindustrial results shown in Fig. 4 where the phase or magnitude of variability is not always perfectly represented. Some examples of this include CSIRO Mk3.6.0, NorESM1-M, and GFDL CM3. In each of these cases, for at least one of the plotted estimates, the dynamical adjustment has clear errors in the magnitude of decadal variability in the external estimate, representing a failure to properly account for the internal contribution during that time. The NMSE ± SENMSE of the dynamical adjustment in historical simulations is summarized in Fig. 6. For nine models, the NMSE is below one indicating a skillful prediction. For three GCMs, the NMSE is above one. For these three GCMs, the dynamical adjustment is not considered skillful. The estimate in each historical ensemble has a higher NMSE, and therefore less skill, than the preindustrial validation for the same GCM.
The application of dynamical adjustment to historical simulations assumes that over the North Atlantic, the externally forced response is confined to the spatially uniform pattern and does not project onto the other Laplacian eigenfunctions. This assumption is unlikely to be accurate, particularly considering the presence of nonspatially uniform aerosol forcing in the North Atlantic region and the observed warming hole. Consequently, the time series associated with Laplacian eigenfunctions other than the spatially uniform pattern likely contain a combination of internal and external variability. The dynamical adjustment model is linear; therefore, the estimate of basin-mean NASST in historical simulations could be separated into two components, one based on the internal variability in the predictor time series and one based on the external variability. However, there is no basis for the external-variability-based component of the dynamical adjustment estimate to represent the internal variability in basin-mean NASST. This suggests that the degradation of skill when applying the dynamical adjustment to historical simulations is due to the presence of external variability in the predictor time series, contrary to the idea that predictors be pure internal variability. To investigate this hypothesis further, we examine the components of variance in each Laplacian eigenfunction, which can be done in an ensemble framework. Figure 2 plots the total variance of historical simulations, averaged across the 12 GCMs, for each Laplacian eigenfunction. This total variance is decomposed into contributions from the externally forced component, represented by the historical ensemble means, and the internal variability, calculated using the residuals about the ensemble means and corrected to account for the finite ensemble sizes. For the first Laplacian eigenfunction, the basin mean, the contribution of external variability dominates that of internal variability. For all other Laplacian eigenfunctions, the internal variability dominates the total. This demonstrates that on average, the external variability is nearly an order of magnitude smaller than the internal variability in the predictor time series, supporting an essential assumption of dynamical adjustment. However, considering the average across all GCMs disguises important GCM differences. Separate analysis reveals that external variability is present in spatial patterns other than the basin mean, but the particular spatial patterns in which this occurs and the magnitude of the external contribution are GCM dependent (not shown). GCM-to-GCM differences in the presence of externally forced variability could be related to varying representations of aerosol forcing response (Rotstayn et al. 2015) or a change of the physical system in response to one or several external forcings. A complete review of GCM-to-GCM differences for each of the 100 Laplacian eigenfunctions and attribution to individual mechanisms is beyond the scope of this study and therefore is not discussed further here. Nevertheless, these results confirm the presence of external variability in patterns orthogonal to the uniform, and it is plausible that this presence degrades the dynamical adjustment’s estimate of internal variability.
To test if the degradation in skill in historical simulations is due to the presence of forced variability, forced variability is removed from each predictor time series by subtracting that GCM’s ensemble mean calculated for each predictor prior to applying the dynamical adjustment. These results can be compared to the previous application of dynamical adjustment. The dynamical adjustment model is not retrained or tuned for this application. The dynamical adjustment estimate of basin-mean NASST external variability when removing external variability in the predictor time series is plotted in Fig. 5. In general, this approach appears as a small improvement over the base dynamical adjustment. The NMSE of this approach can be compared in Fig. 6, where the results are summarized. For every GCM, the NMSE for this application of dynamical adjustment lies below the previous results and below one. We conclude that removing the ensemble mean from the predictor time series when applying dynamical adjustment improves the skill in every GCM, even in the three models for which dynamical adjustment had previously not demonstrated skill. For 8 GCMs, these results overlap with the preindustrial results. For these GCMs, the ability for the dynamical adjustment to estimate internal variability in the basin-mean NASST is similar to the performance in preindustrial simulations. The near recovery of preindustrial levels of skill is suggestive of some level of consistency in the internal variability between preindustrial and historical simulations as well as the additive nature of variability. Further, this suggests that external variability in the predictors was responsible for the degradation between preindustrial and historical validations. For the other 4 GCMs, although removing external variability from the predictors does improve the performance in historical simulations, the performance does not reach the preindustrial validation. This suggests that there may be another source of error in these historical simulations. We have been unable to identify the source of these discrepancies in this study.
c. Results of the comparative methods
The polynomial method is applied to individual ensemble members to estimate the externally forced variability in basin-mean NASST. Polynomial orders are applied from one, a linear trend, up to tenth order. The associated NMSE is shown for orders one through eight in Fig. 7. In each column, the skill of each polynomial order is plotted with increasing order from left to right. That is, the leftmost, black bar represents the NMSE of a first-order polynomial; the second, a red bar, represents the second-order polynomial; and so on. In general, the performance of the polynomial method improves with increasing order, plateauing between orders 4 and 6. Few of the polynomials produce a NMSE less than one indicating that the estimate of internal variability based on a residual from any order polynomial tends to have no skill. The best polynomial order depends on the GCM and may be related to time scales of internal variability and the relative magnitude of forced and internal variability.
Results for the application of LOWESS and EEMD are also shown in Fig. 7. LOWESS outperforms the standard polynomial and EEMD in this application, demonstrating skill in all but two GCMs. Compared to the polynomial and EEMD methods, LOWESS performs as well as or better in all GCMs. EEMD performs similarly to polynomials across GCMs. In general, the polynomial, LOWESS, and EEMD methods suffer from the same underlying issue. Each of these methods, in the application, separates different time scales of variability. Separating time scales can be very effective at identifying external and internal components of variability if these components have distinct time scales. In NASST, there is considerable debate over the relative contribution of external and internal variability on decadal-to-multidecadal time scales. Methods that only rely on time-scale separation when identifying these components may therefore find it difficult to correctly identify relative contributions.
d. Results of ensemble mean method
The scaled MMEM estimate of external variability is plotted in Fig. 5 for the first two ensemble members of each GCM. The application of a single scaling step corrects for amplitude across the entire time period and produces a good approximation of the externally forced variability. However, externally forced variations within the study period, on decadal-to-multidecadal time scales, are not always well captured by the scaled MMEM. For example, in the MPI models, the MMEM overestimates the decadal fluctuations of the externally forced variability. In contrast, the MMEM underestimates such variability in CSIRO Mk3.6.0 and GFDL CM3. The MMEM may be fundamentally incapable of accurately capturing external variability on decadal time scales for every GCM. The MMEM is constructed using many GCMs, averaging the magnitude and timing of externally forced decadal variability. As a consequence, a single MMEM is not able to capture the diversity of GCM responses, particularly due to the large range of aerosol radiative forcing (Rotstayn et al. 2015). The NMSE of using a scaled MMEM to separate variability in basin-mean NASST is shown in Fig. 6. The scaled MMEM is skillful in 7 out of 12 models. In the other five models, the scaled MMEM lacks skill, with a NMSE greater than or equal to 1. We do not show the differenced MMEM approach because it is skillful in only three models and is generally outperformed by the scaled approach. This improvement of the scaling approach over the differenced agrees with previous studies (Frankcombe et al. 2015).
e. Dynamical adjustment’s relative performance in historical simulations
Having discussed each method’s results, we compare the dynamical adjustment, applied to historical simulations without modification, to other methods here. The NMSE of each method is shown in Fig. 6, where skill between two methods is considered indistinguishable if the standard error bars overlap. A method is considered the best if it has the lowest NMSE and does not overlap with any other method. Comparing NMSE demonstrates that no method is consistently best across the GCMs used in this study. The dynamical adjustment is the best-performing method in 4 out of 12 GCMs and ties a comparative method in two additional GCMs. We interpret this to demonstrate that dynamical adjustment, as proposed in this paper, has similar potential as other methods when applied to separating basin-mean NASST variability. This conclusion differs only slightly when different filtering schemes are used. For example, using boxcar filter weights, the dynamical adjustment is the best-performing method in five GCMs and ties a comparative method in two GCMs.
Applying dynamical adjustment to predictors after removing the ensemble mean gives the best performance in 8 out of 12 GCMs. In 3 of the remaining models, this application of dynamical adjustment is tied with a comparative method for best performance. In the remaining model, the scaled MMEM is the best method. This relative comparison may differ slightly using different filtering schemes. Using boxcar filter weights, this modified application of dynamical adjustment gives the best performance in 10 out of 12 GCMs, with the scaled MMEM being the more skillful in the two remaining GCMs. This modified application of dynamical adjustment appears as the best overall method; however, it cannot be applied to observations as the true external variability in observed predictors is not known.
f. Regression coefficients in dynamical adjustment
To gain insight into how the dynamical adjustment works, we examine the coefficients selected by lasso. These coefficients are shown as the bottom set of lines in Fig. 8. Each line corresponds to a different set of preindustrial training data, labeled by which GCM was excluded from the training. The coefficients learned from the different training samples are very consistent, with predictor selection and coefficient values being particularly consistent for lower-order Laplacian eigenfunctions, which have the largest characteristic length scales. This consistency is expected due to the fact that the training sets are nearly the same; two training sets contain 10 of the same GCMs and only differ by the 11th. Higher-order eigenfunctions appear in some iterations of the dynamical adjustment lasso model, but the predictor selection does not show the same robustness as for lower-order eigenfunctions.
The scale selective penalization has only a small effect on the NMSE of the dynamical adjustment (not shown). Overall, the NMSE is very similar with some validation GCMs showing a small improvement and others showing a small degradation, with no change in conclusions. The scale selective penalization does influence the coefficient selection, shown as the upper set of lines in Fig. 8, with a marked elimination of higher-order Laplacian eigenfunctions assigned nonzero coefficients.
g. Application to observations
We now examine the result of applying the above methods to ERSST.v5. We emphasize that the true external and internal variability is not known in observational data, and therefore, these results should be interpreted cautiously.
The dynamical adjustment is applied to ERSST.v5 data without modification. As discussed in section 4f, the dynamical adjustment models trained on the different sets of 11 GCMs are very consistent. The estimated internal variability plotted in Fig. 9b represents the averaged coefficients among these dynamical adjustment models. The reduced regression model, Eq. (11), is also applied to ERSST.v5. The dynamical adjustment estimates of external variability, calculated by subtracting the estimate of internal variability from the total, are plotted in Fig. 9a. Polynomials, LOWESS, EEMD, and MMEM methods as described previously are also applied to ERSST.v5 data, directly estimating the external component of variability. A selection of these external estimates is plotted in Fig. 9a, and their residual estimates of internal variability are plotted in Fig. 9b. The MMEM is applied only to time steps that overlap between CMIP5 and ERSST.v5 and does not cover the entire time series. Correlations are calculated between the different internal variability estimates and listed in Table 2. These correlations illustrate that estimates of internal variability vary by method and parameter choices. The dynamical adjustment estimate of internal variability tends to be uncorrelated with other estimates. Standard deviations of each internal variability estimate are calculated and shown following the method name in Table 2. The magnitude of standard deviations varies between methods. Methods that estimate external variability, such as polynomials and MMEM, tend to have larger standard deviations. The dynamical adjustment estimates have smaller standard deviations, similar in magnitude to LOWESS (25-yr span) and EEMD (calculated using two intrinsic modes added to the residual). We refrain from interpreting differences in the estimates of observed variability as neither dynamical adjustment nor the other methods could separate forced and internal variability in historical simulations with consistent skill.
Correlation values calculated between estimates of internal variability in ERSST.v5. Among these correlations are the correlations between the curves shown in Fig. 9b. The method applied is listed along the left and top sides. The method names on the left are followed by the standard deviation of the estimated internal variability in parenthesis. Polynomial estimates (Poly) are named by the order of polynomial applied. LOWESS estimates (LOW) are named by the span of the smoothing. EEMD estimates are named by the number of summed components; that is, EEMD1 is the residual only and EEMD2 is the residual summed with the lowest-frequency intrinsic mode. “Dynam” refers to the full dynamical adjustment model, and “Reduce” refers to Eq. (11). Correlations are calculated for methods applied to 1865–2000 to align with the length of the MMEM. The upper-right corner is suppressed due to the symmetry of the correlation values.
5. Conclusions
This study introduces a new approach to dynamical adjustment. Specifically, the internal component of the basin-mean NASST is estimated based on a set of spatial patterns of NASST orthogonal to a uniform pattern. In this implementation, the spatial patterns are Laplacian eigenfunctions, calculated for the North Atlantic basin using the method of DelSole and Tippett (2015). After projecting NASST onto Laplacian eigenfunctions, time series are smoothed using an 11-yr running mean. While the conclusions of this study are not sensitive to the choice of averaging window, the 11-yr running mean was chosen to demonstrate that this method is capable of representing variability on time scales longer than interannual variations in atmospheric circulation patterns.
This implementation of dynamical adjustment takes the form of linear regression, with regularization applied to control overfitting. The linear coefficients relating the Laplacian eigenfunctions to the basin-mean NASST are learned in a multimodel set of preindustrial simulations which contain only internal variability. The predictive skill of the dynamical adjustment is evaluated in independent preindustrial simulations, where the dynamical adjustment model was skillful for every GCM although the degree of skill depended on the validation GCM. Some amount of this spread in validation skill is due to differences in GCM internal variability that are not well captured when training in a multimodel dataset. Future research could utilize this method of representing internal variability to study differences between GCMs, including identifying what features are responsible for predictive skill and analyzing whether differences in identified features could be related to model biases or deficiencies.
The dynamical adjustment was applied to historical simulations without retraining. In contrast to preindustrial simulations, where all variability can be attributed to the coupled atmosphere–ocean system, there are two components of variability, internal and external, in historical simulations. Either the external or internal variability can be viewed as the target in this study; therefore, either could be used as the normalizing variance when calculating NMSE. Normalization by the internal variability is a more strict threshold of skill for the GCMs evaluated in this study and is chosen as the metric for skill in historical simulations. Similar to the preindustrial validation, the dynamical adjustment is applied to historical simulations from a GCM excluded from the training set. Dynamical adjustment applied to historical simulations is found to have skill in 9 out of 12 GCMs. However, dynamical adjustment performs worse in all historical simulations compared to preindustrial simulations and demonstrates no skill in 3 out of 12 GCMs. This degradation in skill is attributed to the presence of externally forced variability in the predictor time series, as demonstrated by the return of skill when dynamical adjustment is applied to predictor time series in which the external variability was removed (e.g., by subtracting out the ensemble mean). Thus, while the dynamical adjustment approach avoids specifying the temporal shape or amplitude of externally forced variability in the basin-mean NASST, it does make the slightly incorrect assumption that external variability does not project onto the spatial patterns of NASST orthogonal to the basin mean. Further study could employ large ensembles and single forcing runs to investigate whether the projection of external variability onto the patterns orthogonal to the basin mean and the consequent degradation of dynamical adjustment skill can be attributed to specific external forcings (Deser et al. 2020).
The dynamical adjustment is compared to several methods that have been previously used to separate forced and internal variability. These methods are evaluated on the same GCMs, allowing a comparison of their performance. It is found that these methods are not skillful in all GCMs and perform similarly, on average, to the proposed dynamical adjustment when applied to historical simulations. However, the dynamical adjustment applied with the removal of external variability from the predictor time series, in general, outperforms other methods across GCMs, being the most skillful method in 8 out of 12 GCMs and tying other methods in 3 additional GCMs. This outperformance emphasizes the potential of this dynamical adjustment method for estimating internal variability in basin-mean NASST.
The dynamical adjustment is applied to the observation analysis product ERSST.v5 (Huang et al. 2017b). The dynamical adjustment estimate of internal variability exhibits a relatively small standard deviation and is not highly correlated with estimates produced by other methods. Since neither the dynamical adjustment nor any other method could separate forced and internal variability in historical simulations with consistent skill, we refrain from interpreting differences in the estimated internal variability as it is not clear whether an application to observations, where the true internal variability is not known, should be interpreted as accurate. Nevertheless, it is clear that methods based on different principles yield considerably different estimates of external and internal variability.
Future studies will explore observationally constrained modifications to the dynamical adjustment in search of a consistently skillful estimate in historical simulations, thereby facilitating a more meaningful application to observations. The ultimate goal of this research is to produce an accurate estimate of internal variability over the observed period. We see various opportunities for improving the dynamical adjustment. Recall that the removal of known external variability in the predictor time series (by subtracting the ensemble mean) greatly enhanced the performance of the dynamical adjustment. Thus, one strategy for improving the dynamical adjustment would be the estimation and removal of external variability in the predictor time series. Another strategy is to include additional variables into the model that may better characterize internal variability in the North Atlantic or may have the attractive property of less, or easier to estimate, external variability. Variables of particular interest include integrated oceanic heat content, sea level height, and sea surface salinity. Finally, another strategy would be to train the dynamical adjustment in historical simulations, where external variability would be present during the training step. In doing so, any perceived issues in translating relationships from preindustrial simulations to historically forced simulations could be avoided.
Acknowledgments.
This research was supported primarily by the National Oceanic and Atmospheric Administration (NA20OAR4310401). The views expressed herein are those of the authors and do not necessarily reflect the views of these agencies.
Data availability statement.
The data used in this paper are publicly available from the CMIP5 archive at https://esgf-node.llnl.gov/search/cmip5/.
REFERENCES
Allen, M. R., and S. F. B. Tett, 1999: Checking for model consistency in optimal fingerprinting. Climate Dyn., 15, 419–434, https://doi.org/10.1007/s003820050291.
Armour, K. C., J. Marshall, J. R. Scott, A. Donohoe, and E. R. Newsom, 2016: Southern ocean warming delayed by circumpolar upwelling and equatorward transport. Nat. Geosci., 9, 549–554, https://doi.org/10.1038/ngeo2731.
Booth, B. B. B., N. J. Dunstone, P. R. Halloran, T. Andrews, and N. Bellouin, 2012: Aerosols implicated as a prime driver of twentieth-century North Atlantic climate variability. Nature, 484, 228–232, https://doi.org/10.1038/nature10946.
Bowman, D. C., and J. M. Lees, 2013: The Hilbert–Huang transform: A high resolution spectral method for nonlinear and nonstationary time series. Seismol. Res. Lett., 84, 1074–1080, https://doi.org/10.1785/0220130025.
Cheng, L., G. Foster, Z. Hausfather, K. E. Trenberth, and J. Abraham, 2022: Improved quantification of the rate of ocean warming. J. Climate, 35, 4827–4840, https://doi.org/10.1175/JCLI-D-21-0895.1.
Cleveland, W. S., 1979: Robust locally weighted regression and smoothing scatterplots. J. Amer. Stat. Assoc., 74, 829–836, https://doi.org/10.1080/01621459.1979.10481038.
DelSole, T., and M. K. Tippett, 2015: Laplacian eigenfunctions for climate analysis. J. Climate, 28, 7420–7436, https://doi.org/10.1175/JCLI-D-15-0049.1.
DelSole, T., and D. Nedza, 2021: Reconstructing the Atlantic overturning circulation using linear machine learning techniques. Atmos.–Ocean, 60, 541–553, https://doi.org/10.1080/07055900.2021.1947181.
Deser, C., and A. S. Phillips, 2021: Defining the internal component of Atlantic multidecadal variability in a changing climate. Geophys. Res. Lett., 48, e2021GL095023, https://doi.org/10.1029/2021GL095023.
Deser, C., A. S. Phillips, M. A. Alexander, and B. V. Smoliak, 2014: Projecting North American climate over the next 50 years: Uncertainty due to internal variability. J. Climate, 27, 2271–2296, https://doi.org/10.1175/JCLI-D-13-00451.1.
Deser, C., L. Terray, and A. S. Phillips, 2016: Forced and internal components of winter air temperature trends over North America during the past 50 years: Mechanisms and implications. J. Climate, 29, 2237–2258, https://doi.org/10.1175/JCLI-D-15-0304.1.
Deser, C., and Coauthors, 2020: Insights from Earth system model initial-condition large ensembles and future prospects. Nat. Climate Change, 10, 277–286, https://doi.org/10.1038/s41558-020-0731-2.
Duchon, C. E., 1979: Lanczos filtering in one and two dimensions. J. Appl. Meteor., 18, 1016–1022, https://doi.org/10.1175/1520-0450(1979)018%3C1016:LFIOAT%3E2.0.CO;2.
Frankcombe, L. M., M. H. England, M. E. Mann, and B. A. Steinman, 2015: Separating internal variability from the externally forced climate response. J. Climate, 28, 8184–8202, https://doi.org/10.1175/JCLI-D-15-0069.1.
Frankcombe, L. M., M. H. England, J. B. Kajtar, M. E. Mann, and B. A. Steinman, 2018: On the choice of ensemble mean for estimating the forced signal in the presence of internal variability. J. Climate, 31, 5681–5693, https://doi.org/10.1175/JCLI-D-17-0662.1.
Frankignoul, C., G. Gastineau, and Y.-O. Kwon, 2017: Estimation of the SST response to anthropogenic and external forcing and its impact on the Atlantic multidecadal oscillation and the Pacific decadal oscillation. J. Climate, 30, 9871–9895, https://doi.org/10.1175/JCLI-D-17-0009.1.
Friedman, J. H., T. Hastie, and R. Tibshirani, 2010: Regularization paths for generalized linear models via coordinate descent. J. Stat. Software, 33, 1–22, https://doi.org/10.18637/jss.v033.i01.
Goldenberg, S. B., C. W. Landsea, A. M. Mestas-Nuñez, and W. M. Gray, 2001: The recent increase in Atlantic hurricane activity: Causes and implications. Science, 293, 474–479, https://doi.org/10.1126/science.1060040.
Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions. Bull. Amer. Meteor. Soc., 90, 1095–1108, https://doi.org/10.1175/2009BAMS2607.1.
He, C., A. C. Clement, M. A. Cane, L. N. Murphy, J. M. Klavans, and T. M. Fenske, 2022: A North Atlantic warming hole without ocean circulation. Geophys. Res. Lett., 49, e2022GL100420, https://doi.org/10.1029/2022GL100420.
Hourdin, F., and Coauthors, 2017: The art and science of climate model tuning. Bull. Amer. Meteor. Soc., 98, 589–602, https://doi.org/10.1175/BAMS-D-15-00135.1.
Huang, B., and Coauthors, 2017a: Extended Reconstructed Sea Surface Temperature, version 5 (ERSSTv5): Upgrades, validations, and intercomparisons. J. Climate, 30, 8179–8205, https://doi.org/10.1175/JCLI-D-16-0836.1.
Huang, B., and Coauthors, 2017b: NOAA Extended Reconstructed Sea Surface Temperature (ERSST), version 5. NOAA National Centers for Environmental Information, accessed 20 January 2019, https://doi.org/10.7289/V5T72FNM.
Huang, N. E., and Z. Wu, 2008: A review on Hilbert-Huang transform: Method and its applications to geophysical studies. Rev. Geophys., 46, RG2006, https://doi.org/10.1029/2007RG000228.
Huang, N. E., and Coauthors, 1998: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. Roy. Soc. London, 454, 903–995, https://doi.org/10.1098/rspa.1998.0193.
Hurrell, J. W., 1996: Influence of variations in extratropical wintertime teleconnections on Northern Hemisphere temperature. Geophys. Res. Lett., 23, 665–668, https://doi.org/10.1029/96GL00459.
Jain, S., A. A. Scaife, T. G. Shepherd, C. Deser, N. Dunstone, G. A. Schmidt, K. E. Trenberth, and T. Turkington, 2023: Importance of internal variability for climate model assessment. npj Climate Atmos. Sci., 6, 68, https://doi.org/10.1038/s41612-023-00389-0.
Ji, F., Z. Wu, J. Huang, and E. P. Chassignet, 2014: Evolution of land surface air temperature trend. Nat. Climate Change, 4, 462–466, https://doi.org/10.1038/nclimate2223.
Kay, J. E., and Coauthors, 2015: The Community Earth System Model (CESM) large ensemble project: A community resource for studying climate change in the presence of internal climate variability. Bull. Amer. Meteor. Soc., 96, 1333–1349, https://doi.org/10.1175/BAMS-D-13-00255.1.
Knight, J. R., C. K. Folland, and A. A. Scaife, 2006: Climate impacts of the Atlantic multidecadal oscillation. Geophys. Res. Lett., 33, L17706, https://doi.org/10.1029/2006GL026242.
Laprise, R., 1992: The resolution of global spectral models. Bull. Amer. Meteor. Soc., 73, 1453–1455, https://doi.org/10.1175/1520-0477-73.9.1453.
Liu, G., P. Wang, and Y.-O. Kwon, 2023: Physical insights from the multidecadal prediction of North Atlantic sea surface temperature variability using explainable neural networks. Geophys. Res. Lett., 50, e2023GL106278, https://doi.org/10.1029/2023GL106278.
Milinski, S., N. Maher, and D. Olonscheck, 2020: How large does a large ensemble need to be? Earth Syst. Dyn., 11, 885–901, https://doi.org/10.5194/esd-11-885-2020.
Rotstayn, L. D., M. A. Collier, D. T. Shindell, and O. Boucher, 2015: Why does aerosol forcing control historical global-mean surface temperature change in CMIP5 models? J. Climate, 28, 6608–6625, https://doi.org/10.1175/JCLI-D-14-00712.1.
Saffioti, C., E. M. Fischer, S. C. Scherrer, and R. Knutti, 2016: Reconciling observed and modeled temperature and precipitation trends over Europe by adjusting for circulation variability. Geophys. Res. Lett., 43, 8189–8198, https://doi.org/10.1002/2016GL069802.
Sippel, S., N. Meinshausen, A. Merrifield, F. Lehner, A. G. Pendergrass, E. Fischer, and R. Knutti, 2019: Uncovering the forced climate response from a single ensemble member using statistical learning. J. Climate, 32, 5677–5699, https://doi.org/10.1175/JCLI-D-18-0882.1.
Steinman, B. A., M. E. Mann, and S. K. Miller, 2015: Atlantic and Pacific multidecadal oscillations and Northern Hemisphere temperatures. Science, 347, 988–991, https://doi.org/10.1126/science.1257856.
Sutton, R. T., and D. L. Hodson, 2005: Atlantic Ocean forcing of multidecadal variations in North American and European summer climate. 2005 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract PP21E-08.
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498, https://doi.org/10.1175/BAMS-D-11-00094.1.
Tibshirani, R., 1996: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc., 58B, 267–288, https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
Ting, M., Y. Kushnir, R. Seager, and C. Li, 2009: Forced and internal twentieth-century SST trends in the North Atlantic. J. Climate, 22, 1469–1481, https://doi.org/10.1175/2008JCLI2561.1.
Trenberth, K. E., and D. J. Shea, 2006: Atlantic hurricanes and natural variability in 2005. Geophys. Res. Lett., 33, L12704, https://doi.org/10.1029/2006GL026894.
Wallace, J. M., Y. Zhang, and J. A. Renwick, 1995: Dynamic contribution to hemispheric mean temperature trends. Science, 270, 780–783, https://doi.org/10.1126/science.270.5237.780.
Wallace, J. M., Q. Fu, B. V. Smoliak, P. Lin, and C. M. Johanson, 2012: Simulated versus observed patterns of warming over the extratropical Northern Hemisphere continents during the cold season. Proc. Natl. Acad. Sci. USA, 109, 14 337–14 342, https://doi.org/10.1073/pnas.1204875109.
Wills, R. C. J., K. C. Armour, D. S. Battisti, and D. L. Hartmann, 2019: Ocean–atmosphere dynamical coupling fundamental to the Atlantic multidecadal oscillation. J. Climate, 32, 251–272, https://doi.org/10.1175/JCLI-D-18-0269.1.
Wills, R. C. J., D. S. Battisti, K. C. Armour, T. Schneider, and C. Deser, 2020: Pattern recognition methods to separate forced responses from internal variability in climate model ensembles and observations. J. Climate, 33, 8693–8719, https://doi.org/10.1175/JCLI-D-19-0855.1.
Wu, Z., and N. E. Huang, 2009: Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal., 1 (1), 1–41, https://doi.org/10.1142/S1793536909000047.
Wu, Z., N. E. Huang, J. M. Wallace, B. V. Smoliak, and X. Chen, 2011: On the time-varying trend in global-mean surface temperature. Climate Dyn., 37, 759–773, https://doi.org/10.1007/s00382-011-1128-8.
Zhang, R., and T. L. Delworth, 2006: Impact of Atlantic multidecadal oscillations on India/Sahel rainfall and Atlantic Hurricanes. Geophys. Res. Lett., 33, L17712, https://doi.org/10.1029/2006GL026267.
Zhang, R., and Coauthors, 2013: Have aerosols caused the observed Atlantic multidecadal variability? J. Atmos. Sci., 70, 1135–1144, https://doi.org/10.1175/JAS-D-12-0331.1.