A Dynamical Adjustment Approach to Estimating Forced and Internal Variability in the North Atlantic

Douglas Nedza aDepartment of Atmospheric, Oceanic, and Earth Sciences, George Mason University, Fairfax, Virginia

Search for other papers by Douglas Nedza in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-1492-6675
and
Timothy DelSole aDepartment of Atmospheric, Oceanic, and Earth Sciences, George Mason University, Fairfax, Virginia
bCenter for Ocean–Land–Atmosphere Studies, Fairfax, Virginia

Search for other papers by Timothy DelSole in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Quantifying the relative contributions of external forcing and internal variability to North Atlantic sea surface temperature (NASST) has important implications for attributing and predicting climate changes around the North Atlantic basin. Many previous methods have approached this problem by estimating the externally forced signal directly, making assumptions about forced variability for which there is no consensus. In this work, the separation of variability is approached in a fundamentally different way that does not specify the forced response’s temporal evolution. We propose a dynamical adjustment method in which the internal, spatially uniform component of NASST is predicted based on patterns of NASST that are orthogonal to the spatially uniform pattern. When applied to preindustrial simulations, the dynamical adjustment demonstrates skill in reconstructing the NASST basin-mean variability. Applying the dynamical adjustment to historical simulations demonstrates skill in a majority of climate models although the skill is reduced relative to preindustrial simulations because external variability partly contaminates the predictors. The dynamical adjustment is compared to several other methods which directly estimate the externally forced signal. We find that dynamical adjustment performs similarly to these comparative methods despite the fundamentally different prediction method. However, methods based on different principles yield considerably different estimates of external and internal variability.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Douglas Nedza, dnedza@gmu.com

Abstract

Quantifying the relative contributions of external forcing and internal variability to North Atlantic sea surface temperature (NASST) has important implications for attributing and predicting climate changes around the North Atlantic basin. Many previous methods have approached this problem by estimating the externally forced signal directly, making assumptions about forced variability for which there is no consensus. In this work, the separation of variability is approached in a fundamentally different way that does not specify the forced response’s temporal evolution. We propose a dynamical adjustment method in which the internal, spatially uniform component of NASST is predicted based on patterns of NASST that are orthogonal to the spatially uniform pattern. When applied to preindustrial simulations, the dynamical adjustment demonstrates skill in reconstructing the NASST basin-mean variability. Applying the dynamical adjustment to historical simulations demonstrates skill in a majority of climate models although the skill is reduced relative to preindustrial simulations because external variability partly contaminates the predictors. The dynamical adjustment is compared to several other methods which directly estimate the externally forced signal. We find that dynamical adjustment performs similarly to these comparative methods despite the fundamentally different prediction method. However, methods based on different principles yield considerably different estimates of external and internal variability.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Douglas Nedza, dnedza@gmu.com

1. Introduction

Multidecadal variability over the North Atlantic has significant societal impacts because of its influence on hurricane activity, shifts in the intertropical convergence zone, summer monsoon rainfall over Sahel and India, and summer climate over Europe and North America (Goldenberg et al. 2001; Sutton and Hodson 2005; Knight et al. 2006; Zhang and Delworth 2006). Future projections of global and regional temperatures and associated uncertainties depend critically on whether observed variability is forced or generated internally. Forced variability refers to the response to changes in radiative forcing, which may include the following: changes in land cover or land use, changes in the atmospheric concentration of greenhouse gases and anthropogenic aerosols, volcanic and natural aerosols, and changes in solar irradiance. Internal variability refers to variability that appears in the absence of changing external forcing due to dynamical instabilities and interactions within the coupled ocean–land–atmosphere system. There is no consensus about the relative importance of forced and internal variability in North Atlantic sea surface temperature (NASST). Some studies have suggested that anthropogenic aerosols are a predominant forcing (Booth et al. 2012). Others have attributed NASST variability to the effects of climate change, resulting in atmospheric changes that impact the ocean (He et al. 2022). Others have argued that internal variability within the coupled ocean–atmosphere system, particularly the North Atlantic Oscillation (NAO) and the Atlantic meridional overturning circulation (AMOC), can contribute to decadal-scale NASST variability (Zhang et al. 2013). The relative importance over time and interactions between these different drivers of NASST variability remains an active area of research.

Following Frankignoul et al. (2017), methods for separating forced and internal variability may be classified into two categories. One category relies solely on observations to estimate external variability. For example, the externally forced response could be assumed a smooth function of time that can be represented by a low-order polynomial, with parameters estimated from observations. However, if internal variability generates multidecadal variability, then regressing a polynomial out of observations may introduce error by overfitting to internal variability.

Another example in this category is to use the globally averaged SST to represent the temporal evolution of externally forced variability. Trenberth and Shea (2006) argue that the externally forced component can be removed by subtracting globally averaged SST from NASST, which assumes that the forced response of the North Atlantic is the same as the global average. Deser and Phillips (2021) demonstrate that regressing the globally averaged SST out of gridpoint data outperforms the subtraction method of Trenberth and Shea (2006), correcting for spatial differences in the forced response, although potentially overfitting to internal variability. Deser and Phillips (2021) point out that this estimate of external variability is unable to capture the effect of external forcings that are unique to the North Atlantic region, such as the regional aerosol forcing implicated by Booth et al. (2012) or regional feedbacks of the climate system to external forcing that operate on a different time scale as global forcing (Armour et al. 2016). More sophisticated approaches in this category rely on assumptions about differences in time scales between the externally forced and internal variability, including locally weighted scatterplot smoothing (LOWESS) and ensemble empirical mode decomposition (EEMD) (Cleveland 1979; Wu and Huang 2009). Other time-scale-based methods include the eigenmode decomposition of linear inverse models, both with and without optimal initial conditions (Frankignoul et al. 2017), and low-frequency component analysis, maximizing the ratio of low-frequency to total variability (Wills et al. 2019).

A second category of methods uses physics-based models, called general circulation models (GCMs), that represent Earth’s coupled ocean–atmosphere system. Because these GCMs generate their own internal variability, ensemble techniques are used to estimate the externally forced response. As larger ensemble sizes produce more accurate estimates of a single GCM’s response to external forcings, there has been a recent emphasis on producing externally forced ensembles with tens to hundreds of members that can facilitate detailed analysis on regional and decadal scales (Kay et al. 2015). In addition, comparing large ensembles from several GCMs can provide a robust basis for examining the relative influence of internal variability and model differences in simulating the observed climate (Deser et al. 2020; Jain et al. 2023).

Within the context of separating components of variability, after an ensemble average is calculated, the estimate of the externally forced response can be removed via subtraction or linear regression (Frankcombe et al. 2015; Steinman et al. 2015). More sophisticated approaches that utilize ensembles of GCM simulations, such as optimal fingerprinting (Allen and Tett 1999) and signal-to-noise maximizing EOF analysis (Ting et al. 2009; Wills et al. 2020), have also been applied. However, there is considerable debate about whether GCMs accurately represent the response to external forcings. In particular, weak observational constraints on aerosol forcings and parameterizations allow for a range of GCM behavior in response to aerosol forcing (Rotstayn et al. 2015; Hourdin et al. 2017). If the ability of GCMs to produce realistic responses to historical forcings is in doubt, conclusions from methods that rely on a simulated response to external forcing will remain unconvincing.

There is a third category of methods that avoids specifying characteristics of forced variability. Specifically, these methods focus on identifying the internal contribution to observed variability. These methods are called dynamical adjustment, an example of which is the study of Wallace et al. (1995). Wallace et al. (1995) performed a dynamical adjustment on observed Northern Hemisphere mean temperature by regressing out a single spatial pattern orthogonal to the hemispheric mean. This pattern, termed cold ocean–warm land (COWL), represented dynamically induced changes in the hemispheric mean associated with the configuration of warm and cold air masses. Removing COWL-associated variability produced a residual estimate of external variability exhibiting reduced interannual variability and enhanced autocorrelations, expected characteristics of the externally forced signal. This method avoids specifying the temporal shape or amplitude of externally forced variability but makes the assumption that external variability is nearly spatially uniform and mostly absent in patterns that are orthogonal to the spatially uniform. Wallace et al. (1995) noted that the COWL pattern may contain fingerprints of external forcing, particularly high latitude continental warming. Other fingerprints, due to aerosols, trace gases, and changes in cloud cover, may possess distinct spatial structures that also project onto COWL. Other dynamical adjustment studies have employed sea level pressure (SLP) to estimate and remove the effect of atmospheric circulations on surface temperature (Hurrell 1996). Using SLP in dynamical adjustment has been found to account for some of the effects of internal variability on surface temperature, reducing the spread of trends between different members of a GCM ensemble (Deser et al. 2014, 2016; Sippel et al. 2019) and trend discrepancies between GCMs and observations (Saffioti et al. 2016; Wallace et al. 2012).

In the context of quantifying variability in NASST, dynamical adjustment is particularly attractive because it avoids making strong assumptions about forced variability for which there is no consensus. In this paper, we develop such an approach. Specifically, we propose a dynamical adjustment method that uses a machine learning model to estimate the internal variability of a spatially uniform pattern. This pattern has a value equal to the basin-mean NASST. The physics-based machine learning model estimates the characteristics of internal variability, particularly covariability between the basin-mean and spatial patterns that are orthogonal to the spatially uniform pattern. This covariability is leveraged to estimate the internal contribution to basin-mean fluctuations on the basis of the temporal variance of the orthogonal patterns. SST is selected as the predictor over SLP because, while some SST variability might be related to SLP, particularly that driven by the atmosphere, other variations arising from internal ocean dynamics may be unrelated to atmospheric circulation. The proposed method extends the approach of Wallace et al. (1995), where a single orthogonal pattern was derived from observations. Specifically, our method uses a set of orthogonal patterns to estimate the internal variability, rather than relying on a single pattern. To emphasize large-scale patterns of spatial variability, we select Laplacian eigenfunctions for this study, calculated using the method of DelSole and Tippett (2015). Previous studies have demonstrated that gridpoint patterns of SST variability can be used to estimate internal variability in the North Atlantic, in particular, internal AMOC variability by DelSole and Nedza (2021) and basin-mean NASST in CESM1 by Liu et al. (2023). However, as discussed by DelSole and Nedza (2021), gridpoint patterns associated with variability can be highly model-specific. Our use of Laplacian eigenfunctions follows DelSole and Nedza (2021), who enhanced the interpretability of their predictive model of AMOC variability, without loss of skill, by using them to represent spatial patterns of SST.

For this introductory study on the proposed dynamical adjustment methodology, our goal is to establish that internal variability in the basin-mean NASST can be estimated using a linear regression model using Laplacian eigenfunctions as predictors. In contrast to the observational approach of Wallace et al. (1995), the predictive coefficients for our dynamical adjustment are learned in a multimodel set of preindustrial GCM simulations, which contain only internal variability. Training the dynamical adjustment in a multimodel set of GCMs emphasizes behavior that is shared among GCMs, providing a robust basis for estimating internal variability in independent datasets. Training and validation of the dynamical adjustment model are conducted using a leave-one-model-out experimental design, whereby the dynamical adjustment is tested using data that were excluded from the training step. After training and evaluating in preindustrial simulations, the dynamical adjustment is applied to forced historical simulations.

The remainder of this paper is organized as follows: section 2 discusses the data used and its representation using Laplacian eigenfunctions. Section 3 introduces the dynamical adjustment method and how it will be evaluated, as well as the methods that are applied for comparison. Section 4 includes the results of applying the dynamical adjustment to independent preindustrial and historical GCM simulations. Results from the comparative methods and the relative performance of the dynamical adjustment are discussed. The coefficients selected for the dynamical adjustment are examined, and a reduced parameter dynamical adjustment approach is explored. Section 5 contains concluding statements and thoughts on future research directions.

2. Data

The data used in this study are NASST within the domain of 0°–60°N and 10°–70°W. Data are analyzed from 12 CMIP5 GCMs (Taylor et al. 2012). These 12 GCMs are selected based on the availability of a preindustrial simulation at least 500 years in length and a minimum of three historical ensemble members (see Table 1). The last 500 years of each preindustrial simulation is selected. All historical ensemble members are included. This provides a combined multimodel preindustrial dataset of a length of 6000 years and 64 historical ensemble members. Models that do not meet the above criteria were excluded from the analysis. MIROC5 was omitted due to a known data issue.

Table 1.

CMIP5 archived global climate models utilized in this study with the length of preindustrial simulation and a number of historical simulations.

Table 1.

Data are interpolated onto a common 2.5° × 2.5° uniform grid. Annual means are calculated from July to June to include one winter season each year. Results are not sensitive to the annual-mean window. Time series for the results shown are smoothed with an 11-yr Lanczos-weighted running mean (Duchon 1979). An 11-yr running mean is selected to emphasize the decadal-to-multidecadal time scales that are of particular interest in NASST but may smooth through the response to volcanic impulses. Other widths of the smoothing window, including 21, 5, and 1 year (annual means with no additional smoothing), were also studied. Results for other smoothing windows show quantitative differences but produce conclusions similar to those shown. Conclusions were found to be insensitive to the specific smoothing method employed. Boxcar, triangular, and Gaussian weighting schemes were also applied with only minor changes in results and no change in final conclusions.

Preindustrial control runs are used to train the dynamical adjustment model. A third-order polynomial over the 500-yr period is removed to mitigate the potential effects of model spin up or drift. The resulting time series has zero time mean. To remove differences in variability across GCMs, preventing weighting of the dynamical adjustment model toward high-variance GCMs, each preindustrial simulation is normalized such that the variance of the NASST basin mean has a unit variance. The normalization is performed in such a way to maintain the variance spectrum of NASST.

Methods for separating forced and internal variability will be tested using historical simulations from the CMIP5 archive. Historical simulations are initialized from random states of a long GCM integration and then integrated under realistic estimates of external forcing. As such, historical simulations contain both forced and internal variability. A GCM’s forced variability is estimated by the ensemble average of historical simulations from that GCM. For this study, the resulting ensemble mean defines the “true” external variability. The accuracy of an ensemble mean at representing a GCM’s true response to external variability is a function of ensemble size and is related to the magnitude of internal variability (Frankcombe et al. 2018; Milinski et al. 2020). This dependency on ensemble size motivates the exclusion of GCMs with less than 3 historical ensemble members. The internal variability is calculated as the residual of the ensemble mean. The time period of July 1860–June 2005 is selected from each historical simulation. The time series are centered on each historical ensemble member individually. The variance of historical simulations is not normalized. In this study, historical simulations are only used for validation. The dynamical adjustment model is linear; therefore, the scale of the predictions is determined by the scale of the predictors.

The dynamical adjustment is applied to Extended Reconstructed SST, version 5 (ERSST.v5), a monthly global analysis compiled from observational data with missing data filled in Huang et al. (2017b,a). In a similar fashion to the GCM data, ERSST.v5 is interpolated to the 2.5° × 2.5° uniform grid and projected onto Laplacian eigenfunctions representing the North Atlantic domain, 0°–60°N, 10°–70°W. For this study, we utilize time steps from July 1860 to June 2018. Annual-mean time series are calculated using a July–June window and then centered and smoothed with an 11-yr running mean.

Laplacian eigenfunctions

Laplacian eigenfunctions are used as the basis set to represent NASST. Laplacian eigenfunctions are solutions of Laplace’s equation over the North Atlantic domain obtained using the algorithm of DelSole and Tippett (2015). These eigenfunctions depend only on the geometry of the domain and hence are independent of data, thereby providing a common, convenient basis set for all GCMs. The eigenfunctions are spatially orthogonal and ordered from largest to smallest length scale. The characteristic length scale of an eigenfunction, θj, is calculated using
θj=aπ/nj,
where nj is the total wavenumber of eigenfunction j and a is the radius of Earth (Laprise 1992). The first eigenfunction is spatially uniform with n = 0, and therefore, θ1 = ∞. In plots, the length scale is assigned a value of 5000 km, the approximate width of the North Atlantic basin. The length scale θj is utilized primarily for plotting purposes and is not a parameter that appears in the dynamical adjustment method. An alternative version of the dynamical adjustment that incorporates these lengths scales is discussed in section 4f. The first four Laplacian eigenfunctions over the North Atlantic domain are shown in Fig. 1.
Fig. 1.
Fig. 1.

Laplacian eigenfunctions 1–4 for the North Atlantic domain, 0°–60°N and 10°–70°W. The first Laplacian eigenfunction is spatially uniform and represents the basin mean. Subsequent eigenfunctions have zero spatial mean, are spatially orthogonal to each other, and have decreasing characteristic length scale. The contour breaks are equally spaced, but the global maximum absolute value is irrelevant in this study and therefore suppressed in the figure.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

The time series for the jth Laplacian eigenfunction, denoted by ψj(t), is obtained by projecting the area-weighted NASST data onto the jth Laplacian eigenfunction [the precise procedure is discussed in DelSole and Tippett (2015)]. The time series for the first eigenfunction, a uniform pattern over the North Atlantic domain, corresponds to the spatial mean of North Atlantic SST and is a common index of North Atlantic variability. This time series will be referred to as the NASST basin mean, without any insinuation about whether its variability is dominated by internal or external forcing. The original data can be recovered from a sufficiently large set of Laplacian eigenfunctions and their associated time series. Reconstructing data from a truncated set of eigenfunctions will only recover the data down to the minimum length scale of the truncated set. We include the first J = 100 eigenfunctions, which corresponds to a minimum characteristic length of around 500 km.

The temporal variance of each Laplacian eigenfunction is shown in Fig. 2 for the preindustrial and historical simulations of each GCM and ERSST.v5. The distribution of variance is very similar between preindustrial and historical simulations for most spatial scales. However, for the spatially uniform pattern, the first Laplacian eigenfunction, there tends to be much more variance in the historical simulations compared to the preindustrial. This difference is attributed to externally forced variability. The temporal variance approximately follows a k−3 distribution across most spatial scales (exempting the largest), as reported in DelSole and Nedza (2021). Temporal variance for Laplacian eigenfunctions derived from ERSST.v5 is plotted in Fig. 2. The distribution of variance in ERSST.v5 falls within the range of historical simulations and generally agrees with the average across GCMs. However, we note that the variance of smaller spatial scales in ERSST.v5 does appear slightly lower than for most GCMs. This may be due to the reconstruction method, which employs a finite number of spatially smoothed empirical orthogonal teleconnection patterns (Huang et al. 2017a).

Fig. 2.
Fig. 2.

Temporal variance associated with each NASST Laplacian eigenfunction in normalized preindustrial (solid gray) and unnormalized historical (dashed gray) simulations. The offset between preindustrial and historical variances is due to the preindustrial variance distribution being normalized such that the spatially uniform pattern has unit variance. No normalization is applied to the historical data. The arithmetic average across the 12 GCMs’ historical simulations is plotted as the lowest black line. The blue line represents the average ensemble mean variance across GCMs. The red line is the average variance of the residuals about the ensemble mean. The green line plots the temporal variance of Laplacian eigenfunctions from ERSST.v5. A k−3 line is plotted, offset from the variance distribution, as a visual guide, and g(θ), defined in Eq. (10), an empirical function approximating the shape of variance distribution, is plotted for reference (following DelSole and Nedza 2021). Vertical lines correspond to Laplacian eigenvectors 2, 3, and 4 for visual identification.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

3. Methods

a. Dynamical adjustment

The dynamical adjustment takes the form of the linear regression model:
Y1Internal=X2:100β+β0+ϵ,
where the predictand Y1Internal is the internal component of the basin-mean NASST, the first Laplacian eigenfunction; the predictors X2:100 are the time series associated with Laplacian eigenfunctions 2–100 (the first is omitted from the predictor set since it is the quantity being predicted); the coefficient vector β relates anomalies in the predictor time series to anomalies in the basin mean; β0 represents the intercept term; and ϵ represents the error of the regression model, representing internal variability not captured by the estimated coefficients.

The coefficients are learned in a multimodel set of preindustrial simulations. Preindustrial simulations have no interannual variations in external forcings and do not contain externally forced variability; the coefficients only characterize relationships that occur in internal variability. Training in a multimodel ensemble, the dynamical adjustment is intended to learn relationships that are shared across GCMs. The skill of the dynamical adjustment is evaluated on preindustrial and historical simulations from an independent GCM that is excluded from the training data. This process is repeated for all 12 GCMs in this study, with a new set of coefficients learned each time. Details on the coefficients are discussed in section 4f.

The dynamical adjustment can easily be modified to use leading or lagging predictors, or a combination of simultaneous, leading, and/or lagging predictors. For these experiments, a separate set of dynamical adjustment coefficients are learned under the same experimental outline described above. Results are discussed briefly in section 4a.

Regularized regression

The predictor time series are highly covarying which can contribute to overfitting in an ordinary least squares (OLS) framework. To mitigate potential issues due to covarying predictors and overfitting, regularized regression is applied when learning the dynamical adjustment coefficients. Regularized regression includes a penalty term when computing estimates of the coefficients β^ as
β^=argminβ{[ψ1(t)β0j=2Jψj(t)βj]2+λR(β)},
where β^ is a vector containing the estimated regression coefficients, ψj(t) is the time series associated with Laplacian eigenfunction j, and the angled brackets represent a mean across all time steps and training GCMs. The penalty term consists of λ, the regularization parameter, and R(β), a norm of β. Standard choices of this norm include
R(β)=j=2J|βj|Lasso,
R(β)=j=2Jβj2Ridge,
R=j=2J[α|βj|+(1α)βj2]ElasticNet,
where α is a hyperparameter that controls the application of the elastic net regularization. The elastic net results investigated (not shown) utilized α = 0.5.

The penalty term is penalized based upon the chosen norm of the coefficients. Each norm has a different impact on coefficients. The norm associated with lasso has the effect of setting some coefficients exactly to zero. Ridge, in contrast, tends to shrink the entire set of coefficients without setting any to zero, while the elastic net norm has the effect of combining the two. We refer the reader to Tibshirani (1996) for more detail on regularized regression. The regularized regression is implemented using the R package, cv.glmnet (Friedman et al. 2010). All three norms above were applied with none of them performing systematically better than the others. Therefore, the results discussed in this study will be limited to lasso. Lasso is chosen because of the norm’s characteristic behavior of setting some coefficients to zero. By reducing the number of nonzero coefficients, we hope to identify spatial patterns that are most important for prediction. However, due to the highly covarying nature of the NASST Laplacian eigenfunctions, interpreting these spatial patterns may be difficult, and similar predictions may be possible even when excluding some predictors.

The regularization parameter λ must be selected for each lasso model. The λ is selected through a cross-validation procedure performed on the training data. For each training set of 11 models, cross validation is performed by training on 10 models and validating upon the excluded model. This procedure is repeated 11 times for each training set, excluding data from a different model each time. The distributions of error for each λ value are estimated under this cross-validation procedure, an example of which is shown in Fig. 3. This example corresponds to the training set that excludes CanESM2, including the 11 other models. The cross-validation distributions are similar for all training sets.

Fig. 3.
Fig. 3.

Mean-square error (red dotted curve) and standard errors (error bars) of the estimated basin-mean NASST as produced from cv.glmnet output. These values are estimated across a training set of 11 preindustrial simulations using a cross-validation procedure that excludes one of the training GCMs at a time. This plot corresponds to the preindustrial training sample that excludes CanESM2. The λ value that produces the minimum MSE is indicated by the left vertical dotted line. The largest λ value that produces an MSE within one standard error of the minimum is indicated by the right vertical dotted line. The numbers along the top margin indicate the number of Laplacian eigenfunctions assigned nonzero coefficients in the lasso model.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

There are two common criteria for selecting λ. One criterion is to select the λ that minimizes the cross-validated mean square error. The other is the “one-standard-error” rule, selecting the largest λ whose mean square error lies within the standard error range of the minimum. Selecting the “one-standard-error” λ value has the effect of setting more predictor coefficients to zero compared to the minimum λ selection. Results are not sensitive to “one-standard-error” or minimum λ value selection. The results discussed in this study correspond to the “one-standard-error” λ selection. A brief discussion of the coefficients selected by the lasso model is included in section 4f.

b. Comparative methods

In this section, we discuss methods that will be used for comparative purposes.

1) Polynomial method

The polynomial method estimates the basin-mean NASST external variability by fitting a polynomial to each historical ensemble member individually. In the simplest case, the polynomial is a linear trend. However, the linear assumption has little physical support and tends to perform poorly compared to other methods, motivating the use of a quadratic trend by Frankcombe et al. (2015) and Frankignoul et al. (2017). Hawkins and Sutton (2009) applied a fourth-order polynomial. We explore the application of polynomials to the tenth order, systematically evaluating each order and assessing performance. In this study, the polynomial method is applied to time series of the same length, and therefore, a given polynomial order has the same capacity for capturing variability in each GCM simulation.

2) Locally weighted scatterplot smoothing

LOWESS has been suggested as a filter for removing time scales of variability associated with most internally generated modes of oceanic variability (Cheng et al. 2022). LOWESS generates a smoothed time series by taking the value of a polynomial fit at each time step. The value of this polynomial is determined using a weighted regression including surrounding points within a prescribed span (Cleveland 1979). Cheng et al. (2022) suggest that this method is much more successful at isolating low-frequency variability, particularly due to anthropogenic forcing, in observed global mean oceanic heat content compared to a polynomial fit.

Cheng et al. (2022) suggest the use of a 25-yr LOWESS span for filtering out time scales associated with most internally generated modes of oceanic variability. We investigated a variety of spans and found a 25-yr LOWESS span to perform well across GCMs. Increasing the span width does not improve performance in all GCMs. Local linear fitting demonstrates an improvement over local constant fitting, particularly in the vicinity of endpoints. Applying a local quadratic fit does not produce a more accurate separation of variability compared to the local linear fit. Results shown correspond to a local linear fit with a 25-yr LOWESS span unless otherwise noted.

3) Ensemble empirical mode decomposition

Empirical mode decomposition (EMD) decomposes a given time series x(t) into intrinsic mode components (IMFs) of increasing time scales cj(t) and a residual trend rn(t) (Huang et al. 1998):
x(t)=j=1ncj(t)+rn(t).
EMD is an iterative process of fitting an upper and lower envelope of the data using cubic splines connecting the local maxima and minima of x(t). The local mean of the upper and lower envelopes is calculated, constituting the first IMF component c1(t). This component is then removed from the data, and the process is repeated on the residual r1(t) with this process iterated until the final residual rn(t) is monotonic or contains at most one internal extrema. For the complete process and considerations of this method, we refer the reader to Huang et al. (1998) and Huang and Wu (2008).

EEMD extends from EMD and produces more robust results less sensitive to the presence of noise. EEMD modifies the above process by adding white noise to the original time series (Wu and Huang 2009). Many independent realizations of white noise are added to the original data, and then each one is decomposed using EMD, producing an ensemble for each of the IMFs as well as the residual. The ensemble for each component is then averaged, removing the influence of the independent realizations of white noise and producing a more robust estimate of the IMFs than simply applying EMD to the original time series.

EEMD has previously been applied to the study of low-frequency components in temperature time series, including the identification of a nonlinear secular trend and multidecadal component that contributes nontrivially to the overall increase in observed global mean temperatures (Wu et al. 2011) and as part of a methodological comparison when estimating the separation of temperature variability into externally forced and internal components (Frankignoul et al. 2017).

In this study, EEMD is implemented using the R package hht() produced by Bowman and Lees (2013) with 400 ensemble members and white noise with one-fifth the standard deviation of the data being decomposed, parameter choices consistent with Ji et al. (2014) and Frankignoul et al. (2017). Results shown represent the performance using the residual rn−2(t), equivalent to the summation of the final residual term rn(t) and the two lowest-frequency IMFs cn(t) and cn−1(t), to represent the externally forced variability. This summation performs well on average across GCMs when compared to estimating the external component of variability using the residual alone or summed with any number of successive IMFs.

4) Multimodel ensemble mean

For the multimodel ensemble mean (MMEM) method, an MMEM is calculated from all GCMs except the GCM in which the method is being evaluated. More precisely, one GCM is withheld, an ensemble mean is computed for each of the 11 remaining GCMs, and then the mean of the 11 ensemble means is used for the MMEM. We emphasize that instead of computing a mean over all historical runs, the MMEM is a mean of ensemble means, which equally weights each GCM. The validation GCM is excluded from the MMEM such that no information from that GCM is included. The MMEM is either subtracted or regressed from each ensemble member of the validation GCM. The regression step rescales the MMEM to correct for differences in the magnitude of the externally forced response. These approaches are referred to as “differenced” or “scaled” MMEM methods as described by Frankcombe et al. (2015).

GCMs within this study exhibit a range of responses to external forcing, likely related to individual GCMs’ aerosol effective radiative forcing (Rotstayn et al. 2015). The MMEM represents the average behavior across GCMs and will not represent responses unique to a single or subset of GCMs. Although an MMEM could be computed from a subset of GCMs, the question arises as to which subset should be selected for application to independent data. There is no consensus on the best criterion for selecting GCMs in this situation. We proceed under the premise that each GCM is equally likely to be representative of the real system and therefore do not group or weight the GCMs when calculating the MMEM.

c. Validation measure

The skill of methods in this study will be evaluated using normalized mean-square error (NMSE). Let ψ1PI denote the basin-mean NASST time series for a preindustrial simulation and I(ψ1PI) denote a prediction function of internal variability. The error of the prediction model at each time step is defined as follows:
e=ψ1PII(ψ1PI).
The mean-square error of I(ψ1PI) is defined to be
MSE=(e)2,
where the angle brackets denote the mean across all time steps in the validation GCM. The normalized mean-square error is then defined as follows:
NMSE=MSE(ψ1PI)2.
For a perfect prediction, I(ψ1PI)=ψ1PI, MSE = 0, and NMSE = 0. For a prediction based on the climatological mean, in this case, I(ψ1PI)=0, MSE=I(ψ1PI)2, and NMSE = 1. NMSE = 1 is considered the threshold for skill with NMSE ≥ 1 considered to represent no skill. Conveniently, NMSE can be converted to the fraction of explained variability by 1 − NMSE. The standard error of MSE is
SEMSE=(e2MSE)2(N1),
where N is the length of the validation data. The SEMSE can then be normalized to produce the normalized standard error SENMSE that applies to NMSE:
SENMSE=SEMSE(ψ1PI)2.
The estimate of internal variability can also be evaluated by calculating the correlation coefficient, defined as
Rψ1PI,I(ψ1PI)=cov[ψ1PI,I(ψ1PI)]var(ψ1PI)var[I(ψ1PI)].
The fraction of variability in ψ1PI that I(ψ1PI) represents, after rescaling as ψ1PI=βI(ψ1PI)+c equals the correlation coefficient squared R2. Importantly, it is noted that correlation forgives amplitude errors, up to a single scaling factor, in the estimate of internal variability.

Historical simulations

For historical simulations, validating a method is not as straightforward because the simulations contain both forced and internal variability. If ψ1H represents the time series of the basin-mean NASST in a historical simulation, it can be decomposed into a forced f1 and internal i1 component:
ψ1H=f1+i1.
In this study, f1 is estimated by each GCM’s ensemble mean and i1 is estimated by each ensemble member’s residual about the ensemble mean. To calculate the NMSE and SENMSE of the dynamical adjustment, Eqs. (3)(7) are still appropriate with ψ1PI replaced with i1 and I(ψ1PI) replaced by I(i1).

To compare dynamical adjustment to other methods in historical simulations, it should be recognized that the methods for separating forced and internal variability predict different things. Specifically, dynamical adjustment predicts the internal component of variability, whereas the other methods, the polynomial and MMEM, predict the forced component. A prediction of internal variability I(i1) can be transformed into a prediction for the forced component by computing the residual ψ1HI(i1). Here, we emphasize that any error in the estimate of internal variability I(i1) will be included in the residual estimate of forced variability. Similarly, any prediction of forced variability can be transformed into a prediction of internal variability. For this case, any error in the prediction of forced variability will be included in the residual prediction of internal variability. Employing such transformations, all predictions can be expressed uniformly either as predictions of internal variability or as predictions of forced variability, regardless of the original form of the predictions. Importantly, the mean-square error of the prediction is the same for both representations: the error of a prediction of internal variability, (ψ1Hf1)I(i1), is the negative of the error of a prediction of forced variability, f1[ψ1HI(i1)]. However, NMSE can differ between the two representations due to differences in the target, which determines the appropriate normalization constant. Specifically, if the target is internal variability, then the appropriate normalization constant is the variance of internal variability (i1), whereas if the target is forced variability, the appropriate normalization constant is the variance of the forced part (f1). The resulting NMSEs differ by a scaling factor equal to the ratio of variances of forced and internal variability.

Depending on the choice of normalizations, a given prediction may be deemed skillful under one normalization but not the other (i.e., the prediction may have NMSE < 1 under one normalization and NMSE > 1 under the other). We propose that the simplest way to proceed is to evaluate both skill measures and define skill as the prediction producing an NMSE less than unity under both normalizations. For this study, normalizing by internal variability produces a more strict threshold of skill. This is a consequence of the fact that forced variance in the historical runs, with an 11-yr running mean applied, generally exceeds or is equal to the internal variance in the GCMs in this study. Accordingly, we opt to normalize by internal variability, in which case a prediction that is skillful under this measure is skillful under both measures. This choice has the further advantage that it enables skill comparisons between historical and control simulations; normalizing by forced variance is not an option for control simulations because the variance of forced variability vanishes for these runs.

4. Results

a. Preindustrial validation

We now show the result of applying the dynamical adjustment to estimate variability in independent preindustrial simulations. Figure 4 shows the preindustrial NASST basin-mean time series for each GCM along with the dynamical adjustment estimate. Explained variability calculated using NMSE ranges from a low of 27% (NorESM1-M) to a high of 67% (CSIRO Mk3.6.0 and IPSL-CM5A-LR). We interpret this range of explained variability as the dynamical adjustment being unable to represent the diversity of GCMs’ internal variability. We carried out experiments that involved training and validating both within individual GCMs and across multiple GCMs. However, the outcomes of single-model trained dynamical adjustment were highly model-specific and presented challenges in summarizing them cohesively without a clear, overarching hypothesis. Consequently, we opted to narrow our focus to the multimodel trained dynamical adjustment. This focus is supported by better performance of the multimodel trained dynamical adjustment compared to single-model trained dynamical adjustment in almost every independent validation (not shown).

Fig. 4.
Fig. 4.

NASST basin-mean anomaly time series of the last 500 years of each GCM’s preindustrial simulation (black). Each time series has a third-order polynomial removed and has been normalized to unit variance. The dynamical adjustment estimate is plotted in blue. Time series are labeled by their respective model along the right vertical axis. Each model name is followed by (R2, 1 − NMSE), which quantifies the fraction of variance explained by dynamical adjustment.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

Explained variability calculated using correlation and NMSE are similar for all GCMs, with the explained variability estimated by R2 being equal or higher due to the metric’s forgiveness of amplitude errors. The influence of this amplitude correction is clear in HadGEM3-ES, where the explained variability between metrics differs by 14% due to consistent underestimation of amplitude.

In GCMs where the dynamical adjustment estimate performs well, for example, CSIRO Mk3.6.0 and IPSL-CM5A-LR, low-frequency variability is well estimated although there is a tendency to underestimate peaks and amplitude. In GCMs where the dynamical estimate performs poorly, for example, CCSM4 and NorESM1-M, some low-frequency variability is estimated correctly with other fluctuations being misrepresented in direction. The accuracy of higher-frequency variations depends on GCM. The NMSE ± SENMSE of applying dynamical adjustment to independent preindustrial simulations is summarized in Fig. 6. The dynamical adjustment is skillful, producing an NMSE < 1, in every GCM.

In a separate analysis, the dynamical adjustment approach was applied using predictors that lead the predictand by 1, 2, and 11 years (not shown). For 1- and 2-yr leading predictors, the dynamical adjustment performs similarly to the simultaneous predictors, with a small degradation of skill based on lead time. For the 11-yr leading predictors, where the predictors and predictand have nonoverlapping running mean windows, the NMSE ± SENMSE bars include the value of one for the majority of GCMs, implying that the dynamical adjustment has no skill. Due to their similar display of skill, or lack of skill for 11-yr leading predictors, dynamical adjustment predictions using only leading predictors will not be discussed for application on historical simulations in this study.

In a separate analysis, the dynamical adjustment was applied using a combination of simultaneous and 1-, 2-, or 11-yr leading predictors (not shown). Using a combination of simultaneous and leading predictors performs about as well as simultaneous predictors alone. This is not surprising as the 1- and 2-yr leading predictors are unlikely to contribute much additional information compared to the simultaneous only due to the lead being well within the much wider window of smoothing (11 year). Due to the similar performance, using combinations of predictors will not be discussed further.

b. Historical validation

The dynamical adjustment, trained in preindustrial simulations, is applied to the historical simulations of the GCM withheld from the multimodel training set. No modification to the predictor time series from the historical simulations is performed, and the dynamical adjustment estimate is not influenced by the validation GCM’s ensemble size. No retuning of the dynamical adjustment model is performed. The dynamical adjustment estimate of external variability in basin-mean NASST, obtained by subtracting the predicted internal component from the total, for the first two ensemble members of each GCM is shown in Fig. 5. The estimate of external variability from dynamical adjustment generally fits the shape of the true external variability, appearing to effectively account for the presence of decadal-to-multidecadal internal variability in most cases. However, dynamical adjustment sometimes fails to correctly or completely remove decadal internal variability. This is expected based on the preindustrial results shown in Fig. 4 where the phase or magnitude of variability is not always perfectly represented. Some examples of this include CSIRO Mk3.6.0, NorESM1-M, and GFDL CM3. In each of these cases, for at least one of the plotted estimates, the dynamical adjustment has clear errors in the magnitude of decadal variability in the external estimate, representing a failure to properly account for the internal contribution during that time. The NMSE ± SENMSE of the dynamical adjustment in historical simulations is summarized in Fig. 6. For nine models, the NMSE is below one indicating a skillful prediction. For three GCMs, the NMSE is above one. For these three GCMs, the dynamical adjustment is not considered skillful. The estimate in each historical ensemble has a higher NMSE, and therefore less skill, than the preindustrial validation for the same GCM.

Fig. 5.
Fig. 5.

Ensemble mean of NASST basin mean corresponding to 1860–2005 for each GCM (black). Estimates of external variability for the first two ensemble members from each GCM, including the dynamically adjusted time series (blue), scaled MMEM (green), and dynamically adjusted time series when removing the ensemble mean from the predictors (red), are plotted. The time series have not been normalized and the y axes have different scales.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

Fig. 6.
Fig. 6.

NMSE ± the associated standard error (SENMSE) of the dynamical adjustment applied on independent preindustrial simulations (black), historical simulations (blue), and historical simulations when removing the ensemble mean from predictor time series (red). Dynamical adjustment is plotted as pairs of bars of the same color, with the left bar representing the performance of the full lasso model provided with 99 Laplacian eigenfunctions and the right bar, denoted with an “x” at the center, representing a reduced regression model with Laplacian eigenfunctions 2, 3, and 4 as predictors. NMSE ± SENMSE is shown for LOWESS (brown) and scaled MMEM (green) applied to historical simulations. Results are organized in columns of the validation model, named along the bottom axis, with the historical ensemble size in parenthesis.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

The application of dynamical adjustment to historical simulations assumes that over the North Atlantic, the externally forced response is confined to the spatially uniform pattern and does not project onto the other Laplacian eigenfunctions. This assumption is unlikely to be accurate, particularly considering the presence of nonspatially uniform aerosol forcing in the North Atlantic region and the observed warming hole. Consequently, the time series associated with Laplacian eigenfunctions other than the spatially uniform pattern likely contain a combination of internal and external variability. The dynamical adjustment model is linear; therefore, the estimate of basin-mean NASST in historical simulations could be separated into two components, one based on the internal variability in the predictor time series and one based on the external variability. However, there is no basis for the external-variability-based component of the dynamical adjustment estimate to represent the internal variability in basin-mean NASST. This suggests that the degradation of skill when applying the dynamical adjustment to historical simulations is due to the presence of external variability in the predictor time series, contrary to the idea that predictors be pure internal variability. To investigate this hypothesis further, we examine the components of variance in each Laplacian eigenfunction, which can be done in an ensemble framework. Figure 2 plots the total variance of historical simulations, averaged across the 12 GCMs, for each Laplacian eigenfunction. This total variance is decomposed into contributions from the externally forced component, represented by the historical ensemble means, and the internal variability, calculated using the residuals about the ensemble means and corrected to account for the finite ensemble sizes. For the first Laplacian eigenfunction, the basin mean, the contribution of external variability dominates that of internal variability. For all other Laplacian eigenfunctions, the internal variability dominates the total. This demonstrates that on average, the external variability is nearly an order of magnitude smaller than the internal variability in the predictor time series, supporting an essential assumption of dynamical adjustment. However, considering the average across all GCMs disguises important GCM differences. Separate analysis reveals that external variability is present in spatial patterns other than the basin mean, but the particular spatial patterns in which this occurs and the magnitude of the external contribution are GCM dependent (not shown). GCM-to-GCM differences in the presence of externally forced variability could be related to varying representations of aerosol forcing response (Rotstayn et al. 2015) or a change of the physical system in response to one or several external forcings. A complete review of GCM-to-GCM differences for each of the 100 Laplacian eigenfunctions and attribution to individual mechanisms is beyond the scope of this study and therefore is not discussed further here. Nevertheless, these results confirm the presence of external variability in patterns orthogonal to the uniform, and it is plausible that this presence degrades the dynamical adjustment’s estimate of internal variability.

To test if the degradation in skill in historical simulations is due to the presence of forced variability, forced variability is removed from each predictor time series by subtracting that GCM’s ensemble mean calculated for each predictor prior to applying the dynamical adjustment. These results can be compared to the previous application of dynamical adjustment. The dynamical adjustment model is not retrained or tuned for this application. The dynamical adjustment estimate of basin-mean NASST external variability when removing external variability in the predictor time series is plotted in Fig. 5. In general, this approach appears as a small improvement over the base dynamical adjustment. The NMSE of this approach can be compared in Fig. 6, where the results are summarized. For every GCM, the NMSE for this application of dynamical adjustment lies below the previous results and below one. We conclude that removing the ensemble mean from the predictor time series when applying dynamical adjustment improves the skill in every GCM, even in the three models for which dynamical adjustment had previously not demonstrated skill. For 8 GCMs, these results overlap with the preindustrial results. For these GCMs, the ability for the dynamical adjustment to estimate internal variability in the basin-mean NASST is similar to the performance in preindustrial simulations. The near recovery of preindustrial levels of skill is suggestive of some level of consistency in the internal variability between preindustrial and historical simulations as well as the additive nature of variability. Further, this suggests that external variability in the predictors was responsible for the degradation between preindustrial and historical validations. For the other 4 GCMs, although removing external variability from the predictors does improve the performance in historical simulations, the performance does not reach the preindustrial validation. This suggests that there may be another source of error in these historical simulations. We have been unable to identify the source of these discrepancies in this study.

c. Results of the comparative methods

The polynomial method is applied to individual ensemble members to estimate the externally forced variability in basin-mean NASST. Polynomial orders are applied from one, a linear trend, up to tenth order. The associated NMSE is shown for orders one through eight in Fig. 7. In each column, the skill of each polynomial order is plotted with increasing order from left to right. That is, the leftmost, black bar represents the NMSE of a first-order polynomial; the second, a red bar, represents the second-order polynomial; and so on. In general, the performance of the polynomial method improves with increasing order, plateauing between orders 4 and 6. Few of the polynomials produce a NMSE less than one indicating that the estimate of internal variability based on a residual from any order polynomial tends to have no skill. The best polynomial order depends on the GCM and may be related to time scales of internal variability and the relative magnitude of forced and internal variability.

Fig. 7.
Fig. 7.

NMSE ± SENMSE of polynomials applied to basin-mean NASST for each historical ensemble member. Results are organized by column for each GCM, named along the bottom axis with the historical ensemble size in parenthesis. The first-order polynomial, a linear trend, is shown as the leftmost, black bar in each column. The second-order polynomial is shown as the red, second from left, bar in each column. Subsequent polynomial orders, up to order 8, are plotted in each column from left to right. NMSE for LOWESS (brown) and EEMD (pink) are also shown.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

Results for the application of LOWESS and EEMD are also shown in Fig. 7. LOWESS outperforms the standard polynomial and EEMD in this application, demonstrating skill in all but two GCMs. Compared to the polynomial and EEMD methods, LOWESS performs as well as or better in all GCMs. EEMD performs similarly to polynomials across GCMs. In general, the polynomial, LOWESS, and EEMD methods suffer from the same underlying issue. Each of these methods, in the application, separates different time scales of variability. Separating time scales can be very effective at identifying external and internal components of variability if these components have distinct time scales. In NASST, there is considerable debate over the relative contribution of external and internal variability on decadal-to-multidecadal time scales. Methods that only rely on time-scale separation when identifying these components may therefore find it difficult to correctly identify relative contributions.

d. Results of ensemble mean method

The scaled MMEM estimate of external variability is plotted in Fig. 5 for the first two ensemble members of each GCM. The application of a single scaling step corrects for amplitude across the entire time period and produces a good approximation of the externally forced variability. However, externally forced variations within the study period, on decadal-to-multidecadal time scales, are not always well captured by the scaled MMEM. For example, in the MPI models, the MMEM overestimates the decadal fluctuations of the externally forced variability. In contrast, the MMEM underestimates such variability in CSIRO Mk3.6.0 and GFDL CM3. The MMEM may be fundamentally incapable of accurately capturing external variability on decadal time scales for every GCM. The MMEM is constructed using many GCMs, averaging the magnitude and timing of externally forced decadal variability. As a consequence, a single MMEM is not able to capture the diversity of GCM responses, particularly due to the large range of aerosol radiative forcing (Rotstayn et al. 2015). The NMSE of using a scaled MMEM to separate variability in basin-mean NASST is shown in Fig. 6. The scaled MMEM is skillful in 7 out of 12 models. In the other five models, the scaled MMEM lacks skill, with a NMSE greater than or equal to 1. We do not show the differenced MMEM approach because it is skillful in only three models and is generally outperformed by the scaled approach. This improvement of the scaling approach over the differenced agrees with previous studies (Frankcombe et al. 2015).

e. Dynamical adjustment’s relative performance in historical simulations

Having discussed each method’s results, we compare the dynamical adjustment, applied to historical simulations without modification, to other methods here. The NMSE of each method is shown in Fig. 6, where skill between two methods is considered indistinguishable if the standard error bars overlap. A method is considered the best if it has the lowest NMSE and does not overlap with any other method. Comparing NMSE demonstrates that no method is consistently best across the GCMs used in this study. The dynamical adjustment is the best-performing method in 4 out of 12 GCMs and ties a comparative method in two additional GCMs. We interpret this to demonstrate that dynamical adjustment, as proposed in this paper, has similar potential as other methods when applied to separating basin-mean NASST variability. This conclusion differs only slightly when different filtering schemes are used. For example, using boxcar filter weights, the dynamical adjustment is the best-performing method in five GCMs and ties a comparative method in two GCMs.

Applying dynamical adjustment to predictors after removing the ensemble mean gives the best performance in 8 out of 12 GCMs. In 3 of the remaining models, this application of dynamical adjustment is tied with a comparative method for best performance. In the remaining model, the scaled MMEM is the best method. This relative comparison may differ slightly using different filtering schemes. Using boxcar filter weights, this modified application of dynamical adjustment gives the best performance in 10 out of 12 GCMs, with the scaled MMEM being the more skillful in the two remaining GCMs. This modified application of dynamical adjustment appears as the best overall method; however, it cannot be applied to observations as the true external variability in observed predictors is not known.

f. Regression coefficients in dynamical adjustment

To gain insight into how the dynamical adjustment works, we examine the coefficients selected by lasso. These coefficients are shown as the bottom set of lines in Fig. 8. Each line corresponds to a different set of preindustrial training data, labeled by which GCM was excluded from the training. The coefficients learned from the different training samples are very consistent, with predictor selection and coefficient values being particularly consistent for lower-order Laplacian eigenfunctions, which have the largest characteristic length scales. This consistency is expected due to the fact that the training sets are nearly the same; two training sets contain 10 of the same GCMs and only differ by the 11th. Higher-order eigenfunctions appear in some iterations of the dynamical adjustment lasso model, but the predictor selection does not show the same robustness as for lower-order eigenfunctions.

Fig. 8.
Fig. 8.

The β^ for the dynamical adjustment trained in multimodel sets of preindustrial simulations corresponding to the “one-standard-error” λ selection. The β^ is labeled by the GCM that was excluded from the training set. The lower set of coefficients are those used to produce the results shown in this study and are trained using lasso regularization. The upper set of coefficients, offset by a value of 1, are trained on the same data, using lasso regularization that incorporates a length-scale-based weighting. The short black lines represent the coefficients for the reduced regression model, offset by a value of 1. The green dots are the average coefficients across reduced regression models. The coefficients are organized using a logarithmic x axis that emphasizes the Laplacian eigenfunctions with the largest length scales.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

In general, the largest spatial scales have one hundred times more temporal variance than the smallest spatial scales (see Fig. 2). This leads us to question if the small-scale predictors are really important to dynamical adjustment. The software package cv.glmnet standardizes the predictor time series when training the dynamical adjustment lasso model. This means that all predictors are penalized the same and no information about the length scales or variance distribution is utilized when selecting coefficient values. To emphasize large-scale Laplacian predictors, we modify lasso to penalize small-scale predictors more strongly than large-scale predictors. To do this, we follow DelSole and Nedza (2021) and apply a penalization of the form, wj = 1/g(θj), where
g(θj)=8(4000/θj)3+7,
and θj is the length scale of the jth Laplacian eigenfunction. The g(θ) approximates the shape of the variance distribution of NASST (plotted in Fig. 2).

The scale selective penalization has only a small effect on the NMSE of the dynamical adjustment (not shown). Overall, the NMSE is very similar with some validation GCMs showing a small improvement and others showing a small degradation, with no change in conclusions. The scale selective penalization does influence the coefficient selection, shown as the upper set of lines in Fig. 8, with a marked elimination of higher-order Laplacian eigenfunctions assigned nonzero coefficients.

The skillful performance at estimating internal variability with the scale-weighted lasso model suggests that predictive skill can be mostly attributed to large-scale patterns. Skill does not depend on smaller-scale features that may be more GCM dependent. Although the length scale weighting has removed small length scale predictors without a loss in predictive skill, the number of nonzero coefficients continues to make the interpretation of the model difficult. To address the possibility of a skillful prediction using a very reduced set of Laplacian eigenfunctions, we create a simple regression model with only Laplacian eigenfunctions 2, 3, and 4 as predictors (those shown in Fig. 1), representing the largest-scale patterns orthogonal to the uniform pattern. The coefficients for this regression model, trained on the same preindustrial data, are shown in Fig. 8 by the set of black lines overlaid with the scale-selective dynamical adjustment coefficients. Similar to the lasso models, we see that the coefficients are consistent across the different training sets. However, the magnitude of the coefficients is slightly different than the lasso models. Using coefficients averaged across the reduced regression models, a single reduced regression model can be written as follows:
Y^1Internal=0.62ψ20.63ψ30.27ψ4.
The NMSE of these simple regression models is included in Fig. 6. For some GCMs, in both preindustrial, historical, and historical when removing the ensemble mean from the predictor time series, this reduced regression model performs about as well as the original lasso model. For other GCMs, this reduced regression model does not perform as well as the full lasso model. The fact that much of the predictive skill can be produced using a small predictor set of the largest spatial patterns of NASST variability may be useful in constraining future work.

g. Application to observations

We now examine the result of applying the above methods to ERSST.v5. We emphasize that the true external and internal variability is not known in observational data, and therefore, these results should be interpreted cautiously.

The dynamical adjustment is applied to ERSST.v5 data without modification. As discussed in section 4f, the dynamical adjustment models trained on the different sets of 11 GCMs are very consistent. The estimated internal variability plotted in Fig. 9b represents the averaged coefficients among these dynamical adjustment models. The reduced regression model, Eq. (11), is also applied to ERSST.v5. The dynamical adjustment estimates of external variability, calculated by subtracting the estimate of internal variability from the total, are plotted in Fig. 9a. Polynomials, LOWESS, EEMD, and MMEM methods as described previously are also applied to ERSST.v5 data, directly estimating the external component of variability. A selection of these external estimates is plotted in Fig. 9a, and their residual estimates of internal variability are plotted in Fig. 9b. The MMEM is applied only to time steps that overlap between CMIP5 and ERSST.v5 and does not cover the entire time series. Correlations are calculated between the different internal variability estimates and listed in Table 2. These correlations illustrate that estimates of internal variability vary by method and parameter choices. The dynamical adjustment estimate of internal variability tends to be uncorrelated with other estimates. Standard deviations of each internal variability estimate are calculated and shown following the method name in Table 2. The magnitude of standard deviations varies between methods. Methods that estimate external variability, such as polynomials and MMEM, tend to have larger standard deviations. The dynamical adjustment estimates have smaller standard deviations, similar in magnitude to LOWESS (25-yr span) and EEMD (calculated using two intrinsic modes added to the residual). We refrain from interpreting differences in the estimates of observed variability as neither dynamical adjustment nor the other methods could separate forced and internal variability in historical simulations with consistent skill.

Fig. 9.
Fig. 9.

Estimates of external and internal components of ERSST.v5 basin-mean NASST. The top panel shows the results of estimating external variability using LOWESS with a 25-yr (brown) and 50-yr (purple) span, EEMD with the residual summed with one (gold) and two (pink) intrinsic modes, and MMEM (green). For reference, the ERSST.v5 basin-mean NASST, which includes both external and internal components, is indicated by the black line. The bottom panel shows the results of estimating internal variability using dynamical adjustment (blue). Each of the above estimates is subtracted from the ERSST.v5 basin-mean NASST to determine its external or internal counterpart, which is then plotted in the corresponding alternate panel.

Citation: Journal of Climate 37, 22; 10.1175/JCLI-D-23-0651.1

Table 2.

Correlation values calculated between estimates of internal variability in ERSST.v5. Among these correlations are the correlations between the curves shown in Fig. 9b. The method applied is listed along the left and top sides. The method names on the left are followed by the standard deviation of the estimated internal variability in parenthesis. Polynomial estimates (Poly) are named by the order of polynomial applied. LOWESS estimates (LOW) are named by the span of the smoothing. EEMD estimates are named by the number of summed components; that is, EEMD1 is the residual only and EEMD2 is the residual summed with the lowest-frequency intrinsic mode. “Dynam” refers to the full dynamical adjustment model, and “Reduce” refers to Eq. (11). Correlations are calculated for methods applied to 1865–2000 to align with the length of the MMEM. The upper-right corner is suppressed due to the symmetry of the correlation values.

Table 2.

5. Conclusions

This study introduces a new approach to dynamical adjustment. Specifically, the internal component of the basin-mean NASST is estimated based on a set of spatial patterns of NASST orthogonal to a uniform pattern. In this implementation, the spatial patterns are Laplacian eigenfunctions, calculated for the North Atlantic basin using the method of DelSole and Tippett (2015). After projecting NASST onto Laplacian eigenfunctions, time series are smoothed using an 11-yr running mean. While the conclusions of this study are not sensitive to the choice of averaging window, the 11-yr running mean was chosen to demonstrate that this method is capable of representing variability on time scales longer than interannual variations in atmospheric circulation patterns.

This implementation of dynamical adjustment takes the form of linear regression, with regularization applied to control overfitting. The linear coefficients relating the Laplacian eigenfunctions to the basin-mean NASST are learned in a multimodel set of preindustrial simulations which contain only internal variability. The predictive skill of the dynamical adjustment is evaluated in independent preindustrial simulations, where the dynamical adjustment model was skillful for every GCM although the degree of skill depended on the validation GCM. Some amount of this spread in validation skill is due to differences in GCM internal variability that are not well captured when training in a multimodel dataset. Future research could utilize this method of representing internal variability to study differences between GCMs, including identifying what features are responsible for predictive skill and analyzing whether differences in identified features could be related to model biases or deficiencies.

The dynamical adjustment was applied to historical simulations without retraining. In contrast to preindustrial simulations, where all variability can be attributed to the coupled atmosphere–ocean system, there are two components of variability, internal and external, in historical simulations. Either the external or internal variability can be viewed as the target in this study; therefore, either could be used as the normalizing variance when calculating NMSE. Normalization by the internal variability is a more strict threshold of skill for the GCMs evaluated in this study and is chosen as the metric for skill in historical simulations. Similar to the preindustrial validation, the dynamical adjustment is applied to historical simulations from a GCM excluded from the training set. Dynamical adjustment applied to historical simulations is found to have skill in 9 out of 12 GCMs. However, dynamical adjustment performs worse in all historical simulations compared to preindustrial simulations and demonstrates no skill in 3 out of 12 GCMs. This degradation in skill is attributed to the presence of externally forced variability in the predictor time series, as demonstrated by the return of skill when dynamical adjustment is applied to predictor time series in which the external variability was removed (e.g., by subtracting out the ensemble mean). Thus, while the dynamical adjustment approach avoids specifying the temporal shape or amplitude of externally forced variability in the basin-mean NASST, it does make the slightly incorrect assumption that external variability does not project onto the spatial patterns of NASST orthogonal to the basin mean. Further study could employ large ensembles and single forcing runs to investigate whether the projection of external variability onto the patterns orthogonal to the basin mean and the consequent degradation of dynamical adjustment skill can be attributed to specific external forcings (Deser et al. 2020).

The dynamical adjustment is compared to several methods that have been previously used to separate forced and internal variability. These methods are evaluated on the same GCMs, allowing a comparison of their performance. It is found that these methods are not skillful in all GCMs and perform similarly, on average, to the proposed dynamical adjustment when applied to historical simulations. However, the dynamical adjustment applied with the removal of external variability from the predictor time series, in general, outperforms other methods across GCMs, being the most skillful method in 8 out of 12 GCMs and tying other methods in 3 additional GCMs. This outperformance emphasizes the potential of this dynamical adjustment method for estimating internal variability in basin-mean NASST.

The dynamical adjustment is applied to the observation analysis product ERSST.v5 (Huang et al. 2017b). The dynamical adjustment estimate of internal variability exhibits a relatively small standard deviation and is not highly correlated with estimates produced by other methods. Since neither the dynamical adjustment nor any other method could separate forced and internal variability in historical simulations with consistent skill, we refrain from interpreting differences in the estimated internal variability as it is not clear whether an application to observations, where the true internal variability is not known, should be interpreted as accurate. Nevertheless, it is clear that methods based on different principles yield considerably different estimates of external and internal variability.

Future studies will explore observationally constrained modifications to the dynamical adjustment in search of a consistently skillful estimate in historical simulations, thereby facilitating a more meaningful application to observations. The ultimate goal of this research is to produce an accurate estimate of internal variability over the observed period. We see various opportunities for improving the dynamical adjustment. Recall that the removal of known external variability in the predictor time series (by subtracting the ensemble mean) greatly enhanced the performance of the dynamical adjustment. Thus, one strategy for improving the dynamical adjustment would be the estimation and removal of external variability in the predictor time series. Another strategy is to include additional variables into the model that may better characterize internal variability in the North Atlantic or may have the attractive property of less, or easier to estimate, external variability. Variables of particular interest include integrated oceanic heat content, sea level height, and sea surface salinity. Finally, another strategy would be to train the dynamical adjustment in historical simulations, where external variability would be present during the training step. In doing so, any perceived issues in translating relationships from preindustrial simulations to historically forced simulations could be avoided.

Acknowledgments.

This research was supported primarily by the National Oceanic and Atmospheric Administration (NA20OAR4310401). The views expressed herein are those of the authors and do not necessarily reflect the views of these agencies.

Data availability statement.

The data used in this paper are publicly available from the CMIP5 archive at https://esgf-node.llnl.gov/search/cmip5/.

REFERENCES

  • Allen, M. R., and S. F. B. Tett, 1999: Checking for model consistency in optimal fingerprinting. Climate Dyn., 15, 419434, https://doi.org/10.1007/s003820050291.

    • Search Google Scholar
    • Export Citation
  • Armour, K. C., J. Marshall, J. R. Scott, A. Donohoe, and E. R. Newsom, 2016: Southern ocean warming delayed by circumpolar upwelling and equatorward transport. Nat. Geosci., 9, 549554, https://doi.org/10.1038/ngeo2731.

    • Search Google Scholar
    • Export Citation
  • Booth, B. B. B., N. J. Dunstone, P. R. Halloran, T. Andrews, and N. Bellouin, 2012: Aerosols implicated as a prime driver of twentieth-century North Atlantic climate variability. Nature, 484, 228232, https://doi.org/10.1038/nature10946.

    • Search Google Scholar
    • Export Citation
  • Bowman, D. C., and J. M. Lees, 2013: The Hilbert–Huang transform: A high resolution spectral method for nonlinear and nonstationary time series. Seismol. Res. Lett., 84, 10741080, https://doi.org/10.1785/0220130025.

    • Search Google Scholar
    • Export Citation
  • Cheng, L., G. Foster, Z. Hausfather, K. E. Trenberth, and J. Abraham, 2022: Improved quantification of the rate of ocean warming. J. Climate, 35, 48274840, https://doi.org/10.1175/JCLI-D-21-0895.1.

    • Search Google Scholar
    • Export Citation
  • Cleveland, W. S., 1979: Robust locally weighted regression and smoothing scatterplots. J. Amer. Stat. Assoc., 74, 829836, https://doi.org/10.1080/01621459.1979.10481038.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., and M. K. Tippett, 2015: Laplacian eigenfunctions for climate analysis. J. Climate, 28, 74207436, https://doi.org/10.1175/JCLI-D-15-0049.1.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., and D. Nedza, 2021: Reconstructing the Atlantic overturning circulation using linear machine learning techniques. Atmos.–Ocean, 60, 541553, https://doi.org/10.1080/07055900.2021.1947181.

    • Search Google Scholar
    • Export Citation
  • Deser, C., and A. S. Phillips, 2021: Defining the internal component of Atlantic multidecadal variability in a changing climate. Geophys. Res. Lett., 48, e2021GL095023, https://doi.org/10.1029/2021GL095023.

    • Search Google Scholar
    • Export Citation
  • Deser, C., A. S. Phillips, M. A. Alexander, and B. V. Smoliak, 2014: Projecting North American climate over the next 50 years: Uncertainty due to internal variability. J. Climate, 27, 22712296, https://doi.org/10.1175/JCLI-D-13-00451.1.

    • Search Google Scholar
    • Export Citation
  • Deser, C., L. Terray, and A. S. Phillips, 2016: Forced and internal components of winter air temperature trends over North America during the past 50 years: Mechanisms and implications. J. Climate, 29, 22372258, https://doi.org/10.1175/JCLI-D-15-0304.1.

    • Search Google Scholar
    • Export Citation
  • Deser, C., and Coauthors, 2020: Insights from Earth system model initial-condition large ensembles and future prospects. Nat. Climate Change, 10, 277286, https://doi.org/10.1038/s41558-020-0731-2.

    • Search Google Scholar
    • Export Citation
  • Duchon, C. E., 1979: Lanczos filtering in one and two dimensions. J. Appl. Meteor., 18, 10161022, https://doi.org/10.1175/1520-0450(1979)018%3C1016:LFIOAT%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Frankcombe, L. M., M. H. England, M. E. Mann, and B. A. Steinman, 2015: Separating internal variability from the externally forced climate response. J. Climate, 28, 81848202, https://doi.org/10.1175/JCLI-D-15-0069.1.

    • Search Google Scholar
    • Export Citation
  • Frankcombe, L. M., M. H. England, J. B. Kajtar, M. E. Mann, and B. A. Steinman, 2018: On the choice of ensemble mean for estimating the forced signal in the presence of internal variability. J. Climate, 31, 56815693, https://doi.org/10.1175/JCLI-D-17-0662.1.

    • Search Google Scholar
    • Export Citation
  • Frankignoul, C., G. Gastineau, and Y.-O. Kwon, 2017: Estimation of the SST response to anthropogenic and external forcing and its impact on the Atlantic multidecadal oscillation and the Pacific decadal oscillation. J. Climate, 30, 98719895, https://doi.org/10.1175/JCLI-D-17-0009.1.

    • Search Google Scholar
    • Export Citation
  • Friedman, J. H., T. Hastie, and R. Tibshirani, 2010: Regularization paths for generalized linear models via coordinate descent. J. Stat. Software, 33, 122, https://doi.org/10.18637/jss.v033.i01.

    • Search Google Scholar
    • Export Citation
  • Goldenberg, S. B., C. W. Landsea, A. M. Mestas-Nuñez, and W. M. Gray, 2001: The recent increase in Atlantic hurricane activity: Causes and implications. Science, 293, 474479, https://doi.org/10.1126/science.1060040.

    • Search Google Scholar
    • Export Citation
  • Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions. Bull. Amer. Meteor. Soc., 90, 10951108, https://doi.org/10.1175/2009BAMS2607.1.

    • Search Google Scholar
    • Export Citation
  • He, C., A. C. Clement, M. A. Cane, L. N. Murphy, J. M. Klavans, and T. M. Fenske, 2022: A North Atlantic warming hole without ocean circulation. Geophys. Res. Lett., 49, e2022GL100420, https://doi.org/10.1029/2022GL100420.

    • Search Google Scholar
    • Export Citation
  • Hourdin, F., and Coauthors, 2017: The art and science of climate model tuning. Bull. Amer. Meteor. Soc., 98, 589602, https://doi.org/10.1175/BAMS-D-15-00135.1.

    • Search Google Scholar
    • Export Citation
  • Huang, B., and Coauthors, 2017a: Extended Reconstructed Sea Surface Temperature, version 5 (ERSSTv5): Upgrades, validations, and intercomparisons. J. Climate, 30, 81798205, https://doi.org/10.1175/JCLI-D-16-0836.1.

    • Search Google Scholar
    • Export Citation
  • Huang, B., and Coauthors, 2017b: NOAA Extended Reconstructed Sea Surface Temperature (ERSST), version 5. NOAA National Centers for Environmental Information, accessed 20 January 2019, https://doi.org/10.7289/V5T72FNM.

  • Huang, N. E., and Z. Wu, 2008: A review on Hilbert-Huang transform: Method and its applications to geophysical studies. Rev. Geophys., 46, RG2006, https://doi.org/10.1029/2007RG000228.

    • Search Google Scholar
    • Export Citation
  • Huang, N. E., and Coauthors, 1998: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. Roy. Soc. London, 454, 903995, https://doi.org/10.1098/rspa.1998.0193.

    • Search Google Scholar
    • Export Citation
  • Hurrell, J. W., 1996: Influence of variations in extratropical wintertime teleconnections on Northern Hemisphere temperature. Geophys. Res. Lett., 23, 665668, https://doi.org/10.1029/96GL00459.

    • Search Google Scholar
    • Export Citation
  • Jain, S., A. A. Scaife, T. G. Shepherd, C. Deser, N. Dunstone, G. A. Schmidt, K. E. Trenberth, and T. Turkington, 2023: Importance of internal variability for climate model assessment. npj Climate Atmos. Sci., 6, 68, https://doi.org/10.1038/s41612-023-00389-0.

    • Search Google Scholar
    • Export Citation
  • Ji, F., Z. Wu, J. Huang, and E. P. Chassignet, 2014: Evolution of land surface air temperature trend. Nat. Climate Change, 4, 462466, https://doi.org/10.1038/nclimate2223.

    • Search Google Scholar
    • Export Citation
  • Kay, J. E., and Coauthors, 2015: The Community Earth System Model (CESM) large ensemble project: A community resource for studying climate change in the presence of internal climate variability. Bull. Amer. Meteor. Soc., 96, 13331349, https://doi.org/10.1175/BAMS-D-13-00255.1.

    • Search Google Scholar
    • Export Citation
  • Knight, J. R., C. K. Folland, and A. A. Scaife, 2006: Climate impacts of the Atlantic multidecadal oscillation. Geophys. Res. Lett., 33, L17706, https://doi.org/10.1029/2006GL026242.

    • Search Google Scholar
    • Export Citation
  • Laprise, R., 1992: The resolution of global spectral models. Bull. Amer. Meteor. Soc., 73, 14531455, https://doi.org/10.1175/1520-0477-73.9.1453.

    • Search Google Scholar
    • Export Citation
  • Liu, G., P. Wang, and Y.-O. Kwon, 2023: Physical insights from the multidecadal prediction of North Atlantic sea surface temperature variability using explainable neural networks. Geophys. Res. Lett., 50, e2023GL106278, https://doi.org/10.1029/2023GL106278.

    • Search Google Scholar
    • Export Citation
  • Milinski, S., N. Maher, and D. Olonscheck, 2020: How large does a large ensemble need to be? Earth Syst. Dyn., 11, 885901, https://doi.org/10.5194/esd-11-885-2020.

    • Search Google Scholar
    • Export Citation
  • Rotstayn, L. D., M. A. Collier, D. T. Shindell, and O. Boucher, 2015: Why does aerosol forcing control historical global-mean surface temperature change in CMIP5 models? J. Climate, 28, 66086625, https://doi.org/10.1175/JCLI-D-14-00712.1.

    • Search Google Scholar
    • Export Citation
  • Saffioti, C., E. M. Fischer, S. C. Scherrer, and R. Knutti, 2016: Reconciling observed and modeled temperature and precipitation trends over Europe by adjusting for circulation variability. Geophys. Res. Lett., 43, 81898198, https://doi.org/10.1002/2016GL069802.

    • Search Google Scholar
    • Export Citation
  • Sippel, S., N. Meinshausen, A. Merrifield, F. Lehner, A. G. Pendergrass, E. Fischer, and R. Knutti, 2019: Uncovering the forced climate response from a single ensemble member using statistical learning. J. Climate, 32, 56775699, https://doi.org/10.1175/JCLI-D-18-0882.1.

    • Search Google Scholar
    • Export Citation
  • Steinman, B. A., M. E. Mann, and S. K. Miller, 2015: Atlantic and Pacific multidecadal oscillations and Northern Hemisphere temperatures. Science, 347, 988991, https://doi.org/10.1126/science.1257856.

    • Search Google Scholar
    • Export Citation
  • Sutton, R. T., and D. L. Hodson, 2005: Atlantic Ocean forcing of multidecadal variations in North American and European summer climate. 2005 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract PP21E-08.

  • Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485498, https://doi.org/10.1175/BAMS-D-11-00094.1.

    • Search Google Scholar
    • Export Citation
  • Tibshirani, R., 1996: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc., 58B, 267288, https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.

    • Search Google Scholar
    • Export Citation
  • Ting, M., Y. Kushnir, R. Seager, and C. Li, 2009: Forced and internal twentieth-century SST trends in the North Atlantic. J. Climate, 22, 14691481, https://doi.org/10.1175/2008JCLI2561.1.

    • Search Google Scholar
    • Export Citation
  • Trenberth, K. E., and D. J. Shea, 2006: Atlantic hurricanes and natural variability in 2005. Geophys. Res. Lett., 33, L12704, https://doi.org/10.1029/2006GL026894.

    • Search Google Scholar
    • Export Citation
  • Wallace, J. M., Y. Zhang, and J. A. Renwick, 1995: Dynamic contribution to hemispheric mean temperature trends. Science, 270, 780783, https://doi.org/10.1126/science.270.5237.780.

    • Search Google Scholar
    • Export Citation
  • Wallace, J. M., Q. Fu, B. V. Smoliak, P. Lin, and C. M. Johanson, 2012: Simulated versus observed patterns of warming over the extratropical Northern Hemisphere continents during the cold season. Proc. Natl. Acad. Sci. USA, 109, 14 33714 342, https://doi.org/10.1073/pnas.1204875109.

    • Search Google Scholar
    • Export Citation
  • Wills, R. C. J., K. C. Armour, D. S. Battisti, and D. L. Hartmann, 2019: Ocean–atmosphere dynamical coupling fundamental to the Atlantic multidecadal oscillation. J. Climate, 32, 251272, https://doi.org/10.1175/JCLI-D-18-0269.1.

    • Search Google Scholar
    • Export Citation
  • Wills, R. C. J., D. S. Battisti, K. C. Armour, T. Schneider, and C. Deser, 2020: Pattern recognition methods to separate forced responses from internal variability in climate model ensembles and observations. J. Climate, 33, 86938719, https://doi.org/10.1175/JCLI-D-19-0855.1.

    • Search Google Scholar
    • Export Citation
  • Wu, Z., and N. E. Huang, 2009: Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal., 1 (1), 141, https://doi.org/10.1142/S1793536909000047.

    • Search Google Scholar
    • Export Citation
  • Wu, Z., N. E. Huang, J. M. Wallace, B. V. Smoliak, and X. Chen, 2011: On the time-varying trend in global-mean surface temperature. Climate Dyn., 37, 759773, https://doi.org/10.1007/s00382-011-1128-8.

    • Search Google Scholar
    • Export Citation
  • Zhang, R., and T. L. Delworth, 2006: Impact of Atlantic multidecadal oscillations on India/Sahel rainfall and Atlantic Hurricanes. Geophys. Res. Lett., 33, L17712, https://doi.org/10.1029/2006GL026267.

    • Search Google Scholar
    • Export Citation
  • Zhang, R., and Coauthors, 2013: Have aerosols caused the observed Atlantic multidecadal variability? J. Atmos. Sci., 70, 11351144, https://doi.org/10.1175/JAS-D-12-0331.1.

    • Search Google Scholar
    • Export Citation
Save
  • Allen, M. R., and S. F. B. Tett, 1999: Checking for model consistency in optimal fingerprinting. Climate Dyn., 15, 419434, https://doi.org/10.1007/s003820050291.

    • Search Google Scholar
    • Export Citation
  • Armour, K. C., J. Marshall, J. R. Scott, A. Donohoe, and E. R. Newsom, 2016: Southern ocean warming delayed by circumpolar upwelling and equatorward transport. Nat. Geosci., 9, 549554, https://doi.org/10.1038/ngeo2731.

    • Search Google Scholar
    • Export Citation
  • Booth, B. B. B., N. J. Dunstone, P. R. Halloran, T. Andrews, and N. Bellouin, 2012: Aerosols implicated as a prime driver of twentieth-century North Atlantic climate variability. Nature, 484, 228232, https://doi.org/10.1038/nature10946.

    • Search Google Scholar
    • Export Citation
  • Bowman, D. C., and J. M. Lees, 2013: The Hilbert–Huang transform: A high resolution spectral method for nonlinear and nonstationary time series. Seismol. Res. Lett., 84, 10741080, https://doi.org/10.1785/0220130025.

    • Search Google Scholar
    • Export Citation
  • Cheng, L., G. Foster, Z. Hausfather, K. E. Trenberth, and J. Abraham, 2022: Improved quantification of the rate of ocean warming. J. Climate, 35, 48274840, https://doi.org/10.1175/JCLI-D-21-0895.1.

    • Search Google Scholar
    • Export Citation
  • Cleveland, W. S., 1979: Robust locally weighted regression and smoothing scatterplots. J. Amer. Stat. Assoc., 74, 829836, https://doi.org/10.1080/01621459.1979.10481038.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., and M. K. Tippett, 2015: Laplacian eigenfunctions for climate analysis. J. Climate, 28, 74207436, https://doi.org/10.1175/JCLI-D-15-0049.1.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., and D. Nedza, 2021: Reconstructing the Atlantic overturning circulation using linear machine learning techniques. Atmos.–Ocean, 60, 541553, https://doi.org/10.1080/07055900.2021.1947181.

    • Search Google Scholar
    • Export Citation
  • Deser, C., and A. S. Phillips, 2021: Defining the internal component of Atlantic multidecadal variability in a changing climate. Geophys. Res. Lett., 48, e2021GL095023, https://doi.org/10.1029/2021GL095023.

    • Search Google Scholar
    • Export Citation
  • Deser, C., A. S. Phillips, M. A. Alexander, and B. V. Smoliak, 2014: Projecting North American climate over the next 50 years: Uncertainty due to internal variability. J. Climate, 27, 22712296, https://doi.org/10.1175/JCLI-D-13-00451.1.

    • Search Google Scholar
    • Export Citation
  • Deser, C., L. Terray, and A. S. Phillips, 2016: Forced and internal components of winter air temperature trends over North America during the past 50 years: Mechanisms and implications. J. Climate, 29, 22372258, https://doi.org/10.1175/JCLI-D-15-0304.1.

    • Search Google Scholar
    • Export Citation
  • Deser, C., and Coauthors, 2020: Insights from Earth system model initial-condition large ensembles and future prospects. Nat. Climate Change, 10, 277286, https://doi.org/10.1038/s41558-020-0731-2.

    • Search Google Scholar
    • Export Citation
  • Duchon, C. E., 1979: Lanczos filtering in one and two dimensions. J. Appl. Meteor., 18, 10161022, https://doi.org/10.1175/1520-0450(1979)018%3C1016:LFIOAT%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Frankcombe, L. M., M. H. England, M. E. Mann, and B. A. Steinman, 2015: Separating internal variability from the externally forced climate response. J. Climate, 28, 81848202, https://doi.org/10.1175/JCLI-D-15-0069.1.

    • Search Google Scholar
    • Export Citation
  • Frankcombe, L. M., M. H. England, J. B. Kajtar, M. E. Mann, and B. A. Steinman, 2018: On the choice of ensemble mean for estimating the forced signal in the presence of internal variability. J. Climate, 31, 56815693, https://doi.org/10.1175/JCLI-D-17-0662.1.

    • Search Google Scholar
    • Export Citation
  • Frankignoul, C., G. Gastineau, and Y.-O. Kwon, 2017: Estimation of the SST response to anthropogenic and external forcing and its impact on the Atlantic multidecadal oscillation and the Pacific decadal oscillation. J. Climate, 30, 98719895, https://doi.org/10.1175/JCLI-D-17-0009.1.

    • Search Google Scholar
    • Export Citation
  • Friedman, J. H., T. Hastie, and R. Tibshirani, 2010: Regularization paths for generalized linear models via coordinate descent. J. Stat. Software, 33, 122, https://doi.org/10.18637/jss.v033.i01.

    • Search Google Scholar
    • Export Citation
  • Goldenberg, S. B., C. W. Landsea, A. M. Mestas-Nuñez, and W. M. Gray, 2001: The recent increase in Atlantic hurricane activity: Causes and implications. Science, 293, 474479, https://doi.org/10.1126/science.1060040.

    • Search Google Scholar
    • Export Citation
  • Hawkins, E., and R. Sutton, 2009: The potential to narrow uncertainty in regional climate predictions. Bull. Amer. Meteor. Soc., 90, 10951108, https://doi.org/10.1175/2009BAMS2607.1.

    • Search Google Scholar
    • Export Citation
  • He, C., A. C. Clement, M. A. Cane, L. N. Murphy, J. M. Klavans, and T. M. Fenske, 2022: A North Atlantic warming hole without ocean circulation. Geophys. Res. Lett., 49, e2022GL100420, https://doi.org/10.1029/2022GL100420.

    • Search Google Scholar
    • Export Citation
  • Hourdin, F., and Coauthors, 2017: The art and science of climate model tuning. Bull. Amer. Meteor. Soc., 98, 589602, https://doi.org/10.1175/BAMS-D-15-00135.1.

    • Search Google Scholar
    • Export Citation
  • Huang, B., and Coauthors, 2017a: Extended Reconstructed Sea Surface Temperature, version 5 (ERSSTv5): Upgrades, validations, and intercomparisons. J. Climate, 30, 81798205, https://doi.org/10.1175/JCLI-D-16-0836.1.

    • Search Google Scholar
    • Export Citation
  • Huang, B., and Coauthors, 2017b: NOAA Extended Reconstructed Sea Surface Temperature (ERSST), version 5. NOAA National Centers for Environmental Information, accessed 20 January 2019, https://doi.org/10.7289/V5T72FNM.

  • Huang, N. E., and Z. Wu, 2008: A review on Hilbert-Huang transform: Method and its applications to geophysical studies. Rev. Geophys., 46, RG2006, https://doi.org/10.1029/2007RG000228.

    • Search Google Scholar
    • Export Citation
  • Huang, N. E., and Coauthors, 1998: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. Roy. Soc. London, 454, 903995, https://doi.org/10.1098/rspa.1998.0193.

    • Search Google Scholar
    • Export Citation
  • Hurrell, J. W., 1996: Influence of variations in extratropical wintertime teleconnections on Northern Hemisphere temperature. Geophys. Res. Lett., 23, 665668, https://doi.org/10.1029/96GL00459.

    • Search Google Scholar
    • Export Citation
  • Jain, S., A. A. Scaife, T. G. Shepherd, C. Deser, N. Dunstone, G. A. Schmidt, K. E. Trenberth, and T. Turkington, 2023: Importance of internal variability for climate model assessment. npj Climate Atmos. Sci., 6, 68, https://doi.org/10.1038/s41612-023-00389-0.

    • Search Google Scholar
    • Export Citation
  • Ji, F., Z. Wu, J. Huang, and E. P. Chassignet, 2014: Evolution of land surface air temperature trend. Nat. Climate Change, 4, 462466, https://doi.org/10.1038/nclimate2223.

    • Search Google Scholar
    • Export Citation
  • Kay, J. E., and Coauthors, 2015: The Community Earth System Model (CESM) large ensemble project: A community resource for studying climate change in the presence of internal climate variability. Bull. Amer. Meteor. Soc., 96, 13331349, https://doi.org/10.1175/BAMS-D-13-00255.1.

    • Search Google Scholar
    • Export Citation
  • Knight, J. R., C. K. Folland, and A. A. Scaife, 2006: Climate impacts of the Atlantic multidecadal oscillation. Geophys. Res. Lett., 33, L17706, https://doi.org/10.1029/2006GL026242.

    • Search Google Scholar
    • Export Citation
  • Laprise, R., 1992: The resolution of global spectral models. Bull. Amer. Meteor. Soc., 73, 14531455, https://doi.org/10.1175/1520-0477-73.9.1453.

    • Search Google Scholar
    • Export Citation
  • Liu, G., P. Wang, and Y.-O. Kwon, 2023: Physical insights from the multidecadal prediction of North Atlantic sea surface temperature variability using explainable neural networks. Geophys. Res. Lett., 50, e2023GL106278, https://doi.org/10.1029/2023GL106278.

    • Search Google Scholar
    • Export Citation
  • Milinski, S., N. Maher, and D. Olonscheck, 2020: How large does a large ensemble need to be? Earth Syst. Dyn., 11, 885901, https://doi.org/10.5194/esd-11-885-2020.

    • Search Google Scholar
    • Export Citation
  • Rotstayn, L. D., M. A. Collier, D. T. Shindell, and O. Boucher, 2015: Why does aerosol forcing control historical global-mean surface temperature change in CMIP5 models? J. Climate, 28, 66086625, https://doi.org/10.1175/JCLI-D-14-00712.1.

    • Search Google Scholar
    • Export Citation
  • Saffioti, C., E. M. Fischer, S. C. Scherrer, and R. Knutti, 2016: Reconciling observed and modeled temperature and precipitation trends over Europe by adjusting for circulation variability. Geophys. Res. Lett., 43, 81898198, https://doi.org/10.1002/2016GL069802.

    • Search Google Scholar
    • Export Citation
  • Sippel, S., N. Meinshausen, A. Merrifield, F. Lehner, A. G. Pendergrass, E. Fischer, and R. Knutti, 2019: Uncovering the forced climate response from a single ensemble member using statistical learning. J. Climate, 32, 56775699, https://doi.org/10.1175/JCLI-D-18-0882.1.

    • Search Google Scholar
    • Export Citation
  • Steinman, B. A., M. E. Mann, and S. K. Miller, 2015: Atlantic and Pacific multidecadal oscillations and Northern Hemisphere temperatures. Science, 347, 988991, https://doi.org/10.1126/science.1257856.

    • Search Google Scholar
    • Export Citation
  • Sutton, R. T., and D. L. Hodson, 2005: Atlantic Ocean forcing of multidecadal variations in North American and European summer climate. 2005 Fall Meeting, San Francisco, CA, Amer. Geophys. Union, Abstract PP21E-08.

  • Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485498, https://doi.org/10.1175/BAMS-D-11-00094.1.

    • Search Google Scholar
    • Export Citation
  • Tibshirani, R., 1996: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc., 58B, 267288, https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.

    • Search Google Scholar
    • Export Citation
  • Ting, M., Y. Kushnir, R. Seager, and C. Li, 2009: Forced and internal twentieth-century SST trends in the North Atlantic. J. Climate, 22, 14691481, https://doi.org/10.1175/2008JCLI2561.1.

    • Search Google Scholar
    • Export Citation
  • Trenberth, K. E., and D. J. Shea, 2006: Atlantic hurricanes and natural variability in 2005. Geophys. Res. Lett., 33, L12704, https://doi.org/10.1029/2006GL026894.

    • Search Google Scholar
    • Export Citation
  • Wallace, J. M., Y. Zhang, and J. A. Renwick, 1995: Dynamic contribution to hemispheric mean temperature trends. Science, 270, 780783, https://doi.org/10.1126/science.270.5237.780.

    • Search Google Scholar
    • Export Citation
  • Wallace, J. M., Q. Fu, B. V. Smoliak, P. Lin, and C. M. Johanson, 2012: Simulated versus observed patterns of warming over the extratropical Northern Hemisphere continents during the cold season. Proc. Natl. Acad. Sci. USA, 109, 14 33714 342, https://doi.org/10.1073/pnas.1204875109.

    • Search Google Scholar
    • Export Citation
  • Wills, R. C. J., K. C. Armour, D. S. Battisti, and D. L. Hartmann, 2019: Ocean–atmosphere dynamical coupling fundamental to the Atlantic multidecadal oscillation. J. Climate, 32, 251272, https://doi.org/10.1175/JCLI-D-18-0269.1.

    • Search Google Scholar
    • Export Citation
  • Wills, R. C. J., D. S. Battisti, K. C. Armour, T. Schneider, and C. Deser, 2020: Pattern recognition methods to separate forced responses from internal variability in climate model ensembles and observations. J. Climate, 33, 86938719, https://doi.org/10.1175/JCLI-D-19-0855.1.

    • Search Google Scholar
    • Export Citation
  • Wu, Z., and N. E. Huang, 2009: Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal., 1 (1), 141, https://doi.org/10.1142/S1793536909000047.

    • Search Google Scholar
    • Export Citation
  • Wu, Z., N. E. Huang, J. M. Wallace, B. V. Smoliak, and X. Chen, 2011: On the time-varying trend in global-mean surface temperature. Climate Dyn., 37, 759773, https://doi.org/10.1007/s00382-011-1128-8.

    • Search Google Scholar
    • Export Citation
  • Zhang, R., and T. L. Delworth, 2006: Impact of Atlantic multidecadal oscillations on India/Sahel rainfall and Atlantic Hurricanes. Geophys. Res. Lett., 33, L17712, https://doi.org/10.1029/2006GL026267.

    • Search Google Scholar
    • Export Citation
  • Zhang, R., and Coauthors, 2013: Have aerosols caused the observed Atlantic multidecadal variability? J. Atmos. Sci., 70, 11351144, https://doi.org/10.1175/JAS-D-12-0331.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Laplacian eigenfunctions 1–4 for the North Atlantic domain, 0°–60°N and 10°–70°W. The first Laplacian eigenfunction is spatially uniform and represents the basin mean. Subsequent eigenfunctions have zero spatial mean, are spatially orthogonal to each other, and have decreasing characteristic length scale. The contour breaks are equally spaced, but the global maximum absolute value is irrelevant in this study and therefore suppressed in the figure.

  • Fig. 2.

    Temporal variance associated with each NASST Laplacian eigenfunction in normalized preindustrial (solid gray) and unnormalized historical (dashed gray) simulations. The offset between preindustrial and historical variances is due to the preindustrial variance distribution being normalized such that the spatially uniform pattern has unit variance. No normalization is applied to the historical data. The arithmetic average across the 12 GCMs’ historical simulations is plotted as the lowest black line. The blue line represents the average ensemble mean variance across GCMs. The red line is the average variance of the residuals about the ensemble mean. The green line plots the temporal variance of Laplacian eigenfunctions from ERSST.v5. A k−3 line is plotted, offset from the variance distribution, as a visual guide, and g(θ), defined in Eq. (10), an empirical function approximating the shape of variance distribution, is plotted for reference (following DelSole and Nedza 2021). Vertical lines correspond to Laplacian eigenvectors 2, 3, and 4 for visual identification.

  • Fig. 3.

    Mean-square error (red dotted curve) and standard errors (error bars) of the estimated basin-mean NASST as produced from cv.glmnet output. These values are estimated across a training set of 11 preindustrial simulations using a cross-validation procedure that excludes one of the training GCMs at a time. This plot corresponds to the preindustrial training sample that excludes CanESM2. The λ value that produces the minimum MSE is indicated by the left vertical dotted line. The largest λ value that produces an MSE within one standard error of the minimum is indicated by the right vertical dotted line. The numbers along the top margin indicate the number of Laplacian eigenfunctions assigned nonzero coefficients in the lasso model.

  • Fig. 4.

    NASST basin-mean anomaly time series of the last 500 years of each GCM’s preindustrial simulation (black). Each time series has a third-order polynomial removed and has been normalized to unit variance. The dynamical adjustment estimate is plotted in blue. Time series are labeled by their respective model along the right vertical axis. Each model name is followed by (R2, 1 − NMSE), which quantifies the fraction of variance explained by dynamical adjustment.

  • Fig. 5.

    Ensemble mean of NASST basin mean corresponding to 1860–2005 for each GCM (black). Estimates of external variability for the first two ensemble members from each GCM, including the dynamically adjusted time series (blue), scaled MMEM (green), and dynamically adjusted time series when removing the ensemble mean from the predictors (red), are plotted. The time series have not been normalized and the y axes have different scales.

  • Fig. 6.

    NMSE ± the associated standard error (SENMSE) of the dynamical adjustment applied on independent preindustrial simulations (black), historical simulations (blue), and historical simulations when removing the ensemble mean from predictor time series (red). Dynamical adjustment is plotted as pairs of bars of the same color, with the left bar representing the performance of the full lasso model provided with 99 Laplacian eigenfunctions and the right bar, denoted with an “x” at the center, representing a reduced regression model with Laplacian eigenfunctions 2, 3, and 4 as predictors. NMSE ± SENMSE is shown for LOWESS (brown) and scaled MMEM (green) applied to historical simulations. Results are organized in columns of the validation model, named along the bottom axis, with the historical ensemble size in parenthesis.

  • Fig. 7.

    NMSE ± SENMSE of polynomials applied to basin-mean NASST for each historical ensemble member. Results are organized by column for each GCM, named along the bottom axis with the historical ensemble size in parenthesis. The first-order polynomial, a linear trend, is shown as the leftmost, black bar in each column. The second-order polynomial is shown as the red, second from left, bar in each column. Subsequent polynomial orders, up to order 8, are plotted in each column from left to right. NMSE for LOWESS (brown) and EEMD (pink) are also shown.

  • Fig. 8.

    The β^ for the dynamical adjustment trained in multimodel sets of preindustrial simulations corresponding to the “one-standard-error” λ selection. The β^ is labeled by the GCM that was excluded from the training set. The lower set of coefficients are those used to produce the results shown in this study and are trained using lasso regularization. The upper set of coefficients, offset by a value of 1, are trained on the same data, using lasso regularization that incorporates a length-scale-based weighting. The short black lines represent the coefficients for the reduced regression model, offset by a value of 1. The green dots are the average coefficients across reduced regression models. The coefficients are organized using a logarithmic x axis that emphasizes the Laplacian eigenfunctions with the largest length scales.

  • Fig. 9.

    Estimates of external and internal components of ERSST.v5 basin-mean NASST. The top panel shows the results of estimating external variability using LOWESS with a 25-yr (brown) and 50-yr (purple) span, EEMD with the residual summed with one (gold) and two (pink) intrinsic modes, and MMEM (green). For reference, the ERSST.v5 basin-mean NASST, which includes both external and internal components, is indicated by the black line. The bottom panel shows the results of estimating internal variability using dynamical adjustment (blue). Each of the above estimates is subtracted from the ERSST.v5 basin-mean NASST to determine its external or internal counterpart, which is then plotted in the corresponding alternate panel.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1362 1362 1021
PDF Downloads 268 268 68