1. Introduction
The Earth system is characterized by complex interactions between its oceanic, atmospheric, land surface, and cryospheric components. These interactions can imprint themselves on the variability of key meteorological quantities such as near-surface air temperature (T2M) and precipitation—quantities of great relevance to society. A long-lived (multimonth) tropical sea surface temperature (SST) anomaly, for example, may induce a similarly long-lived rainfall or temperature anomaly above it and perhaps in downwind land regions (Kumar et al. 2013). A large-scale soil moisture anomaly may survive for a month or more, encouraging a similar time scale in the overlying T2M and, to a lesser extent, precipitation (Koster et al. 2011). However, while the slower-changing components of the Earth system have the potential to impart variability and memory to atmospheric quantities, thereby imparting some predictability to these quantities (e.g., Smith et al. 2012; Kushnir et al. 2019), atmospheric variables are also subject to the whims of dynamical chaos—the atmosphere’s unpredictable variability can indeed swamp out the externally forced signals, rendering them irrelevant. Chaos imposes fundamental limits on our ability to predict important meteorological variations, with useful predictability usually decreasing with increasing lead time (Lorenz 1963).
Establishing the contribution of unpredictable chaotic atmospheric variability (hereafter often referred to as “noise”) to the variability of, say, T2M over a specific continental region is a critical first step toward quantifying the ability to predict that region’s T2M at subseasonal-to-seasonal time scales. One way to analyze the climate system’s unpredictable variability is through ensembles of simulations with forecast systems consisting of coupled ocean, sea ice, atmosphere, and land models. An early analysis was performed by Murphy (1990), who demonstrated that an ensemble mean produced a modestly more accurate forecast than did the individual simulations comprising the ensemble. Siegert et al. (2016) provided a Bayesian framework for characterizing, in the context of an ensemble forecasting system, the strength of a forced signal in the presence of noise, using it specifically to analyze the predictability of the North Atlantic Oscillation (NAO). Eade et al. (2014) introduced the “ratio of predictable components” (RPC) metric as a means of characterizing, through analysis of an ensemble of coupled model simulations, the strength of the forced response in the presence of unpredictable variability. Using this framework, they found in their model a weak amplitude of the predictable signal relative to the unpredictable noise—paradoxically, the model could then better predict an observed time series than it could predict its own ensemble members. This paradox was further illuminated by Scaife and Smith (2018) and examined using a Markov model by Zhang and Kirtman (2019).
So-called “AMIP style” (Gates 1992) atmospheric model simulations—simulations in which the atmospheric model runs uncoupled from the ocean model, instead responding to prescribed, time-varying fields of SST and sea ice—are also subject to chaotic atmospheric dynamics (e.g., Scaife et al. 2009). For examining the specific question of how a given time series of SST fields imparts a signature on the evolution of Earth’s atmospheric fields, the AMIP-style simulation framework has a unique advantage. An ensemble of parallel AMIP-style simulations, each simulation using the same set of time-varying SST boundary conditions, will provide (after a suitable spinup period) multiple possible realizations of multidecadal weather consistent with those SSTs. Averaging across the ensemble members filters out much of the model’s atmospheric noise, allowing the isolation of the underlying SST-forced signal.
There are, however, dangers with reading too much into AMIP-style analyses. For example, using such an approach, Rodwell et al. (1999) and Mehta et al. (2000) demonstrated a more accurate reproduction of the behavior of the NAO when an ensemble of SST-forced simulations was averaged together, ostensibly suggesting—given the predictability of SSTs at longer time scales—that long-term predictions of the NAO might be possible. In remarking on these two studies, however, Bretherton and Battisti (2000) point out that in the real world, SSTs themselves respond to ocean–atmosphere interactions (a fully coupled seasonal forecast simulation, given imperfections in the modeled atmospheric and oceanic components and given chaotic dynamics, would likely not reproduce the SST time series prescribed in the AMIP-style simulations), reducing the relevance of the two studies to multimonth predictability. True predictability involves coupled ocean–atmosphere behavior, such as that examined in later studies of surface NAO predictability in coupled forecast systems (Scaife et al. 2014; Athanasiadis et al. 2017).
This said, SSTs do persist from month to month. Because the ocean has substantially more thermal inertia than the atmosphere, most persistent signals in the atmosphere at, say, the monthly time scale are likely to come from the ocean, giving AMIP-style analyses some relevance for predictability studies. For the specific question of how a given time series of SST fields guides the atmosphere in the presence of atmospheric noise, without added interpretations regarding long-term prediction, an ensemble of AMIP-style simulations can provide useful insight. Note, in addition, that there generally appears to be little difference in the SST-forced responses in coupled and uncoupled climate model simulations (Chen and Schneider 2014) although some regions, including the Indian Ocean, do show differences (Copsey et al. 2006).
Accordingly, in this paper, we examine a large ensemble (45 members) of multidecadal (1981–2014) AMIP-style simulations to quantify a model’s chaotic background variability. While this part of the study is reminiscent of past predictability studies, we then supplement the analysis with data from other AMIP-style ensembles (smaller ensembles, produced by different AGCMs) to quantify similarities in their chaotic variability as well as the agreement in their underlying ocean–land teleconnections. This latter comparison, used in conjunction with observational T2M data, allows a heretofore unexplored means of estimating predictability levels in the real world. Through a novel approach, in addition to characterizing and comparing the predictability characteristics of the models examined, we, in effect, provide new estimates of this otherwise unmeasurable real-world property.
Our modeling and data sources are provided in section 2, and our methodologies for teasing out unpredictable variability and teleconnection metrics from the model results and observations are provided in section 3. Results are presented in section 4, followed by discussion and conclusions in sections 5 and 6.
2. Data and modeling systems
a. GEOS modeling system
The analyses in this paper focus largely on a 45-member ensemble of AMIP-style simulations with the National Aeronautics and Space Administration (NASA) Goddard Earth Observing System (GEOS) atmospheric general circulation model (AGCM; Molod et al. 2015). The version of the AGCM used here is essentially the same as that underlying the NASA Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), reanalysis (Gelaro et al. 2017); it is run on a cubed-sphere grid and includes state-of-the-art representations of a host of atmospheric and land surface processes. Two important differences, however, should be noted between the AGCM used here and that underlying MERRA-2. First, we run the AGCM here at a coarser resolution (1° × 1°, rather than the ∼1/2° × 1/2° resolution underlying MERRA-2). Second, we use a tendency bias correction (TBC) approach to mitigate some of the model’s long-term biases (Chang et al. 2019). We apply diurnally varying and seasonally varying TBC corrections derived from the increments (forecast minus analysis) computed during the long-term MERRA-2 reanalysis itself; because the corrections applied are climatological, containing no prescribed interannual variability, the TBC-corrected AGCM remains a fully free-running global atmospheric model.
Each simulation in the 45-member ensemble covers the period 1981–2014, a period chosen for consistency with the other datasets used in our analysis. The simulations follow AMIP protocols, being forced by prescribed, observations-based fields of sea surface temperature and sea ice that vary daily and annually. For the present analysis, we process the simulated near-surface air temperature (T2M) values into monthly averages, our aim being to understand the sources of monthly scale T2M variability over continental areas.
Detrending the data was found to be necessary to isolate the signals borne of interannual SST variability from those associated with long-term SST trends. Trends were removed on a monthly basis; at each grid cell for a given month, the ensemble mean monthly T2M was computed for each year in 1981–2014, and the resulting time series was regressed against the year index. Then, for a given year, the T2M value computed with the regressed relationship was subtracted from each ensemble member’s monthly T2M.
b. Observed monthly temperature data
Gridded daily temperature (T2M) data at 0.5° × 0.5° resolution are available from the Climate Prediction Center (CPC; https://www.esrl.noaa.gov/psd/data/gridded/data.cpc.globaltemp.html). The raw source of the data is station-based measurements of minimum daily temperature Tmin and maximum daily temperature Tmax at the 2-m height; these point measurements were spatially interpolated by CPC onto the 0.5° × 0.5° grid. We converted the gridded Tmin and Tmax values into daily T2M using T2M = (Tmin + Tmax)/2, and we then spatially and temporally averaged the daily temperatures into monthly averages on the atmospheric model’s 1° × 1° grid. Finally, the T2M values for the period 1981–2014 were detrended on a monthly basis using the approach outlined above for the GEOS data, though computing the trend, of course, from the single time series of T2M values for a given month rather than from an ensemble mean.
We should note that using the mean of the minimum and maximum observed temperatures to approximate the daily T2M, while a common practice, can result in T2M values that are biased high (Bernhardt et al. 2018). This is an unavoidable limitation of the available observational data. Nevertheless, for our analysis framework, the relevant information in the observed T2M values lies not in their absolute magnitudes but in their time variability (specifically, their time correlation with other fields at the monthly scale), which arguably is minimally affected by this bias.
c. Supplemental AMIP experiments using independent models
For our analysis, we require estimates of T2M variations from a number of additional atmospheric general circulation models. For this, we utilize AMIP-style simulations produced by five other AGCMs as part of their contribution to the CMIP6 project (Eyring et al. 2016). See Table 1 for a list of these models and suitable references. While some models produced data over longer time periods, each model produced monthly T2M data for the 1981–2014 period with at least 10 ensemble members. To simplify the interpretation of our results, we use only 10 ensemble members from each AGCM.
Modeling systems providing monthly T2M data. Each model performed AMIP-style simulations as part of their contribution to the CMIP6 project; for each model, data from 10 ensemble members covering the period 1981–2014 were extracted for this study.
For the present work, for each of the five models, we regridded the T2M data for a given month to the GEOS model’s 1° × 1° grid using a nearest-neighbor approach. The 1° × 1° data for each model were then detrended independently on a monthly basis using the approach outlined above for the GEOS model’s data. One difference between these CMIP6 simulations and the GEOS simulations is the use in the former of time-varying constituent forcing—a forcing that, like SSTs, could impart a signal to the generated monthly T2M fields. We presume this impact to be small, even negligible, given that the applied detrending would largely remove the impacts of trends in the imposed constituents. Still, the lack of such time variations in the GEOS simulations should be interpreted as an additional source of error for the GEOS time series.
3. Analysis framework
a. Relative contributions of boundary forcing and unpredictable variability
The quantity
The new estimates of
We must emphasize here that in this analysis, we will not be evaluating the absolute magnitudes of the boundary-forced variance or of the variance induced by internal variability. An excessive ρ2 for a model, for example, could result from either an underestimate of the internal variability or an overestimate of the impact of the signal. Here, we are interested only in whether the model captures the correct fraction of the total variance induced by boundary forcing as opposed to noise, a critical quantity in itself.
This overall framework—particularly the part associated with estimating model values of
b. Details regarding ensemble sampling and correlation calculations
In some of our analyses, we will be examining correlations as a function of ensemble size. For a given ensemble size K in the determination of Corr2(Yo, Ym), we randomly select K members (without replacement) from the available members (45 available members for GEOS, 10 for the other models) and call this our reduced ensemble; after computing the desired correlation with this reduced set, we repeat the process 99 times. The correlation value for ensemble size K is the average over the 100 computed correlations.
For the model-based Corr2(Ymn, Ym) calculations, we consider five individual ensemble members in turn as Ymn in (3). Note that Ym in the equation accordingly differs each time, since each choice of Ymn defines a different set of remaining ensemble members. The five resulting correlations for a given ensemble size, each already representing an average over 100 computed values, are then averaged into the final result for that size. The use of five ensemble members (rather than 45 or 10) as Ymn here was considered a reasonable and tractable compromise given the need to perform at least some averaging in the face of the tremendous computational burden associated with each calculation.
Each individual correlation calculation is based on 102 sample pairs: 3 monthly averages per year for 34 years (1981–2014). As a consequence of the detrending described earlier, climatological monthly averages are removed from each value before computing the correlations so as not to capture trivial correlations associated with seasonality.
4. Results
a. Quantification of unpredictable variability within the GEOS model
Our analysis focuses first on the GEOS model rather than on the CMIP models. Given its much larger ensemble size, we expect the characterization of noise for GEOS to be the most robust.
Figure 1 shows, with blue dots, the degree to which the GEOS model’s ensemble mean captures the monthly T2M time series produced by a single ensemble member [i.e., Corr2(Ymn, Ym)] as a function of ensemble size for a grid cell in North America. We use (3) to produce the blue curve through the dots, with
Representative examples of how key model relationships vary with ensemble size, focusing on the simulation of monthly mean air temperature, T2M. Blue dots: Variation of Corr2(Ymn, Ym) with ensemble size, where Corr2(Ymn, Ym) characterizes the ability of the ensemble mean to capture the variability produced by a single ensemble member. Red dots: Variation of Corr2(Yo, Ym) with ensemble size, where Corr2(Yo, Ym) characterizes the ability of the ensemble mean to capture the variability seen in the observations. The lines through the dots are determined through a fitting procedure, which provides as a matter of course the indicated asymptotes, shown as dashed lines (see text). Results shown are for JJA.
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
It is important to emphasize that our procedure does not simply equate
Note that the results in Fig. 1 are representative; all of the grid cells we examined show similarly strong (often even better) fits through their corresponding blue dots, though we caution that this success may be specific to the particular variable, monthly T2M, that we are examining here. Such success in the fitting provides confidence that the analysis framework represented by (1)–(4) properly characterizes the relative contributions of signal and noise in the modeling system. In Fig. 2, we show global maps of
Derived global distribution of
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
Note that in these maps, any colored shading indicates at least a 99% confidence that the null hypothesis of no underlying signal is invalid. The value of 0.025 delimiting the lowest colored shading was determined from a Monte Carlo analysis in which appropriately sized ensembles of randomly generated time series, i.e., time series of normally distributed numbers produced with a random number generator, were processed in precisely the same manner as the AGCM data to quantify the statistics associated with that single null hypothesis. That same Monte Carlo analysis, by the way, indicates a 99.9% confidence that the null hypothesis is invalid for any value above 0.037.
b. Comparison of the GEOS model’s noise and teleconnection characteristics with those in other models
Now that the equations and fitting approach have been demonstrated for the GEOS model, we can take a look at the CMIP AGCMs listed in Table 1 to shed some light on how different models compare in regard to boundary forcing and noise levels. Because only 10 ensemble members are used for each of the CMIP models, the estimates obtained for them are presumably more uncertain than those for GEOS; still, enough data are available for some useful joint analyses. Indeed, as a check, we estimated
For each of the five CMIP models, curve fitting along the lines of that shown in Fig. 1 (blue curve) was performed to determine spatial distributions of
Spatial distribution of
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
Corr2(Bm, Bq), the degree to which two different models m and q agree with each other on the boundary-forced signal at a given location once the impact of noise is removed, is computed with (10). Figures 4a–e shows Corr2(Bm, Bq) between the GEOS model and each of the five CMIP models. We focus here on the GEOS model’s correlations with the CMIP models rather than on correlations among the different CMIP models because the size of the GEOS ensemble would presumably provide some extra robustness to the estimates. Also note that theoretically, while a large agreement in the boundary-forced signal is possible in the presence of substantial model noise, the presence of noise can make quantifying the agreement in this quantity difficult. Indeed, with
Spatial distribution of Corr2(Bm, Bq), the degree to which the different CMIP models agree with the GEOS model on the temporal variations in the boundary-forced signal. (a) Corr2(BGEOS, BMIROC6). (b) Corr2(BGEOS, BCESM2). (c) Corr2(BGEOS, BNorCPM1). (d) Corr2(BGEOS, BACCESS). (e) Corr2(BGEOS, BIPSL). (f) Arithmetic mean of the Corr2(Bm, Bq) values. Values are considered undefined (and shown as white) if the internal variability of either model involved in a calculation is overwhelmingly high (i.e., if
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
Figures 4a–e indicates that the teleconnections inherent in the GEOS system generally agree with those of the CMIP models in the Americas and along the northern coast of Australia, with squared correlations often exceeding 0.7. Agreement is weaker in Africa (though still in the neighborhood of 0.5) and is particularly weak in southern Asia. Excessive internal variability prevents a determination of the agreement in Europe and most of the rest of Asia. Figure 4f encapsulates these results by providing the arithmetic mean of the five Corr2(Bm, Bq) fields, with undefined values excluded from the mean calculations. To our knowledge, Fig. 4 represents a unique, first-of-its-kind comparison of teleconnection behavior between models. The indicated degree of disagreement underlines some uncertainty in this quantity; it indeed suggests that models could be improved in this regard, which could lead to substantial gains in long-range prediction.
Again, results for other seasons are provided in the supplemental material (section S4). For these other seasons, we find that, as with MAM, the
c. Quantification of maximum realizable skill with the GEOS model
The red dots in Fig. 1 show the square of the correlation, as a function of ensemble size, between the ensemble mean T2M from GEOS and the observed monthly value (section 2b) at the sample grid cell. Here, having already obtained the model’s
Figure 5 shows, as a function of season, the global distribution of the
Derived global distribution of
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
Of course, an infinite number of ensemble members is hardly realizable, limiting the practical usefulness of Fig. 5. Note, however, that having quantified both
(a) Square of the correlation [as determined with (12)] between the observed and ensemble mean monthly T2M for JJA, assuming a five-member ensemble. (b) Increase in this skill metric when the ensemble size is increased from 5 to 20 members. (c) Increase in the skill metric when the ensemble size is increased from 20 to 50 members. (d) Increase in the skill metric when the ensemble size is increased from 50 to 100 members.
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
As expected, increasing the ensemble size has little impact in areas like northeastern Asia, which has little maximum realizable skill in the first place (Fig. 5c). Increasing the ensemble size also has little impact in the tropics, since most of the skill is already achieved with the small ensemble size of 5 (cf. Figs. 5c and 6a, noting the differences in the color bars). In contrast, going to 20 ensemble members has a significant impact on skill in many other areas, including the Sahel and much of Canada. Based on the more limited skill increases in going from 20 to 50 members (Fig. 6c), one might conclude that 20 members is adequate for capturing the SST-forced signal in monthly T2M. A section of northern Eurasia (just east of Scandinavia), however, continues to show a benefit in skill as the ensemble size is increased, even in going from 50 to 100 members (Fig. 6d). This appears to be consistent with the large ensemble size needed to provide skillful predictions of the winter NAO (Scaife et al. 2014, their Fig. 3). Of course, the skill obtained there with even 1000 members would still be suboptimal due to the imperfect reproduction of teleconnections in the model.
d. Implications for quantifying the signal-to-total ratio present in nature
The quantity
Spatial distribution of
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
Of course, one of the reasons using
We reemphasize here an important point: the quantities Corr2(Bo, Bm) and
We present in Fig. 8 estimates of
Estimates of the degree to which SST boundary forcing controls the time variability of T2M in the real world (as measured with
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
Where the
Naturally all of these
Qualitative indication of the uncertainty associated with estimating
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
5. Discussion
The AMIP-style ensembles examined here focus only on SST boundary forcing and do not address, for example, the potential impact of land boundary forcing on meteorological signals. Presumably, if realistic yearly varying soil moisture and/or vegetation conditions were prescribed along with the realistic SSTs in modified versions of the AMIP-style ensembles, the computed values of
It is worth mentioning again that the
In addition to all of the caveats already discussed, another involves the reliability of the observed T2M data. While the estimation of
For the 0.5° × 0.5° grid cells containing at least one T2M measurement station (considering here the dimensions of the raw CPC data arrays), the fraction of the 408 months during 1981–2014 for which a monthly T2M value can be computed based on at least some submonthly measurements.
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
Having noted this particular caveat about the temperature record, we present in Fig. 11 the differences between Figs. 7 and 8—the differences between the
Differences between the
Citation: Journal of Climate 38, 6; 10.1175/JCLI-D-23-0740.1
As for the extratropics, several studies (e.g., Eade et al. 2014; Scaife and Smith 2018; Zhang and Kirtman 2019; Cottrell et al. 2024) have identified behavior corresponding to a “signal to noise paradox,” the fact that a model’s ensemble mean can often capture the observed variations of a quantity better than it can capture the variations simulated by a single ensemble member. In the context of the mathematical framework utilized here, the signal-to-noise paradox is consistent with
Finally, note that we have attempted in this work to minimize the use of the terms “prediction,” “predictability,” and “potential predictability.” The terms
Even in light of all these caveats, the framework provided herein does have unique value. In addition to allowing us the ability to quantify an AGCM’s internal SST-forced signal and its diminishment by chaotic atmospheric noise, it has provided a means of extending this information, through a novel approach, into new observational estimates of
6. Conclusions
A mathematical framework is provided for the processing of an ensemble of AMIP-style atmospheric model simulations into estimates of
Overall, the results presented herein serve to demonstrate how the framework could be applied to evaluate unpredictable T2M variability within any modeling system. Indeed, in principle, the framework can be used to examine the SST impacts on any meteorological variable (e.g., precipitation) if measurements of the variable spanning decades are available for analysis.
Acknowledgments.
This work was supported by the NASA Modeling, Analysis, and Prediction (MAP) Program (NNH20ZDA001N) and by the National Climate Assessment Enabling Tools project (WBS281945.02.03.05.13) at NASA’s Global Modeling and Assimilation Office (GMAO). Computational resources supporting this work were provided by the NASA High-End Computing (HEC) Program through the NASA Center for Climate Simulation (NCCS) at GSFC. We thank Wei Shi for help with the observed T2M data, and we thank the modeling centers that made the CMIP6 data used in our analysis available.
Data availability statement.
CMIP6 model simulation data are available from https://esgf-node.llnl.gov/search/cmip6/. Output from the GEOS AGCM simulations can be made available upon request. The gridded daily temperature (T2M) data are available from the Climate Prediction Center (CPC; https://www.esrl.noaa.gov/psd/data/gridded/data.cpc.globaltemp.html).
REFERENCES
Athanasiadis, P. J., and Coauthors, 2017: A multisystem view of wintertime NAO seasonal predictions. J. Climate, 30, 1461–1475, https://doi.org/10.1175/JCLI-D-16-0153.1.
Bernhardt, J., A. M. Carleton, and C. LaMagna, 2018: A comparison of daily temperature-averaging methods: Spatial variability and recent change for the CONUS. J. Climate, 31, 979–996, https://doi.org/10.1175/JCLI-D-17-0089.1.
Bethke, I., and Coauthors, 2021: NorCPM1 and its contribution to CMIP6 DCPP. Geosci. Model Dev., 14, 7073–7116, https://doi.org/10.5194/gmd-14-7073-2021.
Boucher, O., and Coauthors, 2020: Presentation and evaluation of the IPSL‐CM6A‐LR climate model. J. Adv. Model. Earth Syst., 12, e2019MS002010, https://doi.org/10.1029/2019MS002010.
Bretherton, C. S., and D. S. Battisti, 2000: An interpretation of the results from atmospheric general circulation models forced by the time history of the observed sea surface temperature distribution. Geophys. Res. Lett., 27, 767–770, https://doi.org/10.1029/1999GL010910.
Chang, Y., S. D. Schubert, R. D. Koster, A. M. Molod, and H. Wang, 2019: Tendency bias correction in coupled and uncoupled global climate models with a focus on impacts over North America. J. Climate, 32, 639–661, https://doi.org/10.1175/JCLI-D-18-0598.1.
Chen, H., and E. K. Schneider, 2014: Comparison of the SST-forced responses between coupled and uncoupled climate simulations. J. Climate, 27, 740–756, https://doi.org/10.1175/JCLI-D-13-00092.1.
Copsey, D., R. Sutton, and J. R. Knight, 2006: Recent trends in sea level pressure in the Indian Ocean region. Geophys. Res. Lett., 33, L19712, https://doi.org/10.1029/2006GL027175.
Cottrell, F. M., J. A. Screen, and A. A. Scaife, 2024: Signal-to-noise errors in free-running atmospheric simulations and their dependence on model resolution. Atmos. Sci. Lett., 25, e1212, https://doi.org/10.1002/asl.1212.
Danabasoglu, G., and Coauthors, 2020: The Community Earth System Model Version 2 (CESM2). J. Adv. Model. Earth Syst., 12, e2019MS001916, https://doi.org/10.1029/2019MS001916.
Eade, R., D. Smith, A. Scaife, E. Wallace, N. Dunstone, L. Hermanson, and N. Robinson, 2014: Do seasonal-to-decadal climate predictions underestimate the predictability of the real world? Geophys. Res. Lett., 41, 5620–5628, https://doi.org/10.1002/2014GL061146.
Eyring, V., S. Bony, G. A. Meehl, C. A. Senior, B. Stevens, R. J. Stouffer, and K. E. Taylor, 2016: Overview of the Coupled Model Intercomparison Project phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016.
Gates, W. L., 1992: An AMS continuing series: Global change—AMIP: The Atmospheric Model Intercomparison Project. Bull. Amer. Meteor. Soc., 73, 1962–1970, https://doi.org/10.1175/1520-0477(1992)073<1962:ATAMIP>2.0.CO;2.
Gelaro, R., and Coauthors, 2017: The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). J. Climate, 30, 5419–5454, https://doi.org/10.1175/JCLI-D-16-0758.1.
Koenigk, T., and U. Mikolajewicz, 2009: Seasonal to interannual climate predictability in mid and high northern latitudes in a global coupled model. Climate Dyn., 32, 783–798, https://doi.org/10.1007/s00382-008-0419-1.
Koster, R. D., M. J. Suarez, and M. Heiser, 2000: Variance and predictability of precipitation at seasonal-to-interannual timescales. J. Hydrometeor., 1, 26–46, https://doi.org/10.1175/1525-7541(2000)001<0026:VAPOPA>2.0.CO;2.
Koster, R. D., and Coauthors, 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 1138–1140, https://doi.org/10.1126/science.1100217.
Koster, R. D., and Coauthors, 2011: The second phase of the global land–atmosphere coupling experiment: Soil moisture contributions to subseasonal forecast skill. J. Hydrometeor., 12, 805–822, https://doi.org/10.1175/2011JHM1365.1.
Kumar, A., M. Chen, and W. Wang, 2013: Understanding prediction skill of seasonal mean precipitation over the tropics. J. Climate, 26, 5674–5681, https://doi.org/10.1175/JCLI-D-12-00731.1.
Kushnir, Y., and Coauthors, 2019: Towards operational predictions of the near-term climate. Nat. Climate Change, 9, 94–101, https://doi.org/10.1038/s41558-018-0359-7.
Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130–141, https://doi.org/10.1175/1520-0469(1963)020%3C0130:DNF%3E2.0.CO;2.
Mehta, V. M., M. J. Suarez, J. V. Manganello, and T. L. Delworth, 2000: Oceanic influence on the North Atlantic Oscillation and associated Northern Hemisphere climate variations: 1959–1993. Geophys. Res. Lett., 27, 121–124, https://doi.org/10.1029/1999GL002381.
Molod, A., L. Takacs, M. Suarez, and J. Bacmeister, 2015: Development of the GEOS-5 atmospheric general circulation model: Evolution from MERRA to MERRA2. Geosci. Model Dev., 8, 1339–1356, https://doi.org/10.5194/gmd-8-1339-2015.
Murphy, J. M., 1990: Assessment of the practical utility of extended range ensemble forecasts. Quart. J. Roy. Meteor. Soc., 116, 89–125, https://doi.org/10.1002/qj.49711649105.
Rodwell, M. J., D. P. Rowell, and C. K. Folland, 1999: Oceanic forcing of the wintertime North Atlantic Oscillation and European climate. Nature, 398, 320–323, https://doi.org/10.1038/18648.
Scaife, A. A., and D. Smith, 2018: A signal-to-noise paradox in climate science. npj Climate Atmos. Sci., 1, 28, https://doi.org/10.1038/s41612-018-0038-4.
Scaife, A. A., and Coauthors, 2009: The CLIVAR C20C project: Selected twentieth century climate events. Climate Dyn., 33, 603–614, https://doi.org/10.1007/s00382-008-0451-1.
Scaife, A. A., and Coauthors, 2014: Skillful long-range prediction of European and North American winters. Geophys. Res. Lett., 41, 2514–2519, https://doi.org/10.1002/2014GL059637.
Scaife, A. A., and Coauthors, 2017: Tropical rainfall, Rossby waves and regional winter climate predictions. Quart. J. Roy. Meteor. Soc., 143 (702), 1–11, https://doi.org/10.1002/qj.2910.
Siegert, S., D. B. Stephenson, P. G. Sansom, A. A. Scaife, R. Eade, and A. Arribas, 2016: A Bayesian framework for verification and recalibration of ensemble forecasts: How uncertain is NAO predictability? J. Climate, 29, 995–1012, https://doi.org/10.1175/JCLI-D-15-0196.1.
Smith, D. M., A. A. Scaife, and B. P. Kirtman, 2012: What is the current state of scientific knowledge with regard to seasonal and decadal forecasting? Environ. Res. Lett., 7, 015602, https://doi.org/10.1088/1748-9326/7/1/015602.
Tatebe, H., and Coauthors, 2019: Description and basic evaluation of simulated mean state, internal variability, and climate sensitivity in MIROC6. Geosci. Model Dev., 12, 2727–2765, https://doi.org/10.5194/gmd-12-2727-2019.
Zhang, W., and B. Kirtman, 2019: Understanding the signal-to-noise paradox with a simple Markov model. Geophys. Res. Lett., 46, 13 308–13 317, https://doi.org/10.1029/2019GL085159.
Ziehn, T., and Coauthors, 2020: The Australian Earth System Model: ACCESS-ESM1.5. J. South. Hemisphere Earth Syst. Sci., 70, 193–214, https://doi.org/10.1071/ES19035.