Robust Anthropogenic Signal Identified in the Seasonal Cycle of Tropospheric Temperature

Benjamin D. Santer aProgram for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, California
bJoint Institute for Regional Earth System Science and Engineering, University of California at Los Angeles, Los Angeles, California

Search for other papers by Benjamin D. Santer in
Current site
Google Scholar
PubMed
Close
,
Stephen Po-Chedley aProgram for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, California

Search for other papers by Stephen Po-Chedley in
Current site
Google Scholar
PubMed
Close
,
Nicole Feldl cDepartment of Earth and Planetary Sciences, University of California at Santa Cruz, Santa Cruz, California

Search for other papers by Nicole Feldl in
Current site
Google Scholar
PubMed
Close
,
John C. Fyfe dCanadian Centre for Climate Modelling and Analysis, Environment and Climate Change Canada, Victoria, British Columbia, Canada

Search for other papers by John C. Fyfe in
Current site
Google Scholar
PubMed
Close
,
Qiang Fu eDepartment of Atmospheric Sciences, University of Washington, Seattle, Washington

Search for other papers by Qiang Fu in
Current site
Google Scholar
PubMed
Close
,
Susan Solomon fMassachusetts Institute of Technology, Earth, Atmospheric, and Planetary Sciences, Cambridge, Massachusetts

Search for other papers by Susan Solomon in
Current site
Google Scholar
PubMed
Close
,
Mark England cDepartment of Earth and Planetary Sciences, University of California at Santa Cruz, Santa Cruz, California

Search for other papers by Mark England in
Current site
Google Scholar
PubMed
Close
,
Keith B. Rodgers gCenter for Climate Physics, Institute for Basic Science, Busan, South Korea
hPusan National University, Busan, South Korea

Search for other papers by Keith B. Rodgers in
Current site
Google Scholar
PubMed
Close
,
Malte F. Stuecker iDepartment of Oceanography and International Pacific Research Center, School of Ocean and Earth Science and Technology, University of Hawai‘i at Mānoa, Honolulu, Hawaii

Search for other papers by Malte F. Stuecker in
Current site
Google Scholar
PubMed
Close
,
Carl Mears jRemote Sensing Systems, Santa Rosa, California

Search for other papers by Carl Mears in
Current site
Google Scholar
PubMed
Close
,
Cheng-Zhi Zou kCenter for Satellite Applications and Research, NOAA/NESDIS, Camp Springs, Maryland

Search for other papers by Cheng-Zhi Zou in
Current site
Google Scholar
PubMed
Close
,
Céline J. W. Bonfils aProgram for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, California

Search for other papers by Céline J. W. Bonfils in
Current site
Google Scholar
PubMed
Close
,
Giuliana Pallotta aProgram for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, California

Search for other papers by Giuliana Pallotta in
Current site
Google Scholar
PubMed
Close
,
Mark D. Zelinka aProgram for Climate Model Diagnosis and Intercomparison, Lawrence Livermore National Laboratory, Livermore, California

Search for other papers by Mark D. Zelinka in
Current site
Google Scholar
PubMed
Close
,
Nan Rosenbloom lNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Nan Rosenbloom in
Current site
Google Scholar
PubMed
Close
, and
Jim Edwards lNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Jim Edwards in
Current site
Google Scholar
PubMed
Close
Free access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

Previous work identified an anthropogenic fingerprint pattern in TAC(x, t), the amplitude of the seasonal cycle of mid- to upper-tropospheric temperature (TMT), but did not explicitly consider whether fingerprint identification in satellite TAC(x, t) data could have been influenced by real-world multidecadal internal variability (MIV). We address this question here using large ensembles (LEs) performed with five climate models. LEs provide many different sequences of internal variability noise superimposed on an underlying forced signal. Despite differences in historical external forcings, climate sensitivity, and MIV properties of the five models, their TAC(x, t) fingerprints are similar and statistically identifiable in 239 of the 240 LE realizations of historical climate change. Comparing simulated and observed variability spectra reveals that consistent fingerprint identification is unlikely to be biased by model underestimates of observed MIV. Even in the presence of large (factor of 3–4) intermodel and inter-realization differences in the amplitude of MIV, the anthropogenic fingerprints of seasonal cycle changes are robustly identifiable in models and satellite data. This is primarily due to the fact that the distinctive, global-scale fingerprint patterns are spatially dissimilar to the smaller-scale patterns of internal TAC(x, t) variability associated with the Atlantic multidecadal oscillation and El Niño–Southern Oscillation. The robustness of the seasonal cycle detection and attribution results shown here, taken together with the evidence from idealized aquaplanet simulations, suggest that basic physical processes are dictating a common pattern of forced TAC(x, t) changes in observations and in the five LEs. The key processes involved include GHG-induced expansion of the tropics, lapse-rate changes, land surface drying, and sea ice decrease.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Benjamin D. Santer, bensanter1289@gmail.com

Abstract

Previous work identified an anthropogenic fingerprint pattern in TAC(x, t), the amplitude of the seasonal cycle of mid- to upper-tropospheric temperature (TMT), but did not explicitly consider whether fingerprint identification in satellite TAC(x, t) data could have been influenced by real-world multidecadal internal variability (MIV). We address this question here using large ensembles (LEs) performed with five climate models. LEs provide many different sequences of internal variability noise superimposed on an underlying forced signal. Despite differences in historical external forcings, climate sensitivity, and MIV properties of the five models, their TAC(x, t) fingerprints are similar and statistically identifiable in 239 of the 240 LE realizations of historical climate change. Comparing simulated and observed variability spectra reveals that consistent fingerprint identification is unlikely to be biased by model underestimates of observed MIV. Even in the presence of large (factor of 3–4) intermodel and inter-realization differences in the amplitude of MIV, the anthropogenic fingerprints of seasonal cycle changes are robustly identifiable in models and satellite data. This is primarily due to the fact that the distinctive, global-scale fingerprint patterns are spatially dissimilar to the smaller-scale patterns of internal TAC(x, t) variability associated with the Atlantic multidecadal oscillation and El Niño–Southern Oscillation. The robustness of the seasonal cycle detection and attribution results shown here, taken together with the evidence from idealized aquaplanet simulations, suggest that basic physical processes are dictating a common pattern of forced TAC(x, t) changes in observations and in the five LEs. The key processes involved include GHG-induced expansion of the tropics, lapse-rate changes, land surface drying, and sea ice decrease.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Benjamin D. Santer, bensanter1289@gmail.com

1. Introduction

Detection and attribution (D&A) studies seek to disentangle human and natural influences on Earth’s climate. This research made a significant contribution to the recent finding that human influence on climate is unequivocal (IPCC 2021). Pattern-based “fingerprint” methods are a key element of D&A research (Hasselmann 1979; North et al. 1995; Hegerl et al. 1996; Santer et al. 1996; Tett et al. 1996; Stott et al. 2000; Barnett et al. 2005).

The initial focus of fingerprint research was on changes in annual- or decadal-mean properties of surface temperature (Hegerl et al. 1996; Stott et al. 2000), atmospheric temperature (Santer et al. 1996; Tett et al. 1996; Thorne et al. 2002; Santer et al. 2003), and ocean heat content (Barnett et al. 2005). Examination of the hydrological cycle, cryosphere, and atmospheric circulation followed, targeting surface specific humidity and water vapor (Willett et al. 2007; Santer et al. 2009), rainfall (Zhang et al. 2007; Marvel and Bonfils 2013), salinity (Pierce et al. 2012), sea level pressure (Gillett et al. 2003), and Arctic sea ice (Min et al. 2008). Model-predicted patterns of mean changes in these and many other variables were detectable in observations and attributable to human influences (Santer et al. 1995; Mitchell and Karoly 2001; Hegerl et al. 2007).

After comprehensive interrogation of the causes of historical changes in average climate, the attention of D&A analysts shifted to aspects of climate change that are more directly relevant to societal impacts (Bindoff et al. 2013). Research began to examine extreme rainfall and heat (Min et al. 2009; Stott et al. 2016), the likelihood and severity of individual extreme events (Stott et al. 2004; Risser and Wehner 2017), and the seasonality of precipitation (Marvel et al. 2017) and temperature (Santer et al. 2018; Duan et al. 2019).

It is changes in the amplitude of the seasonal cycle that are of interest here. They have the potential to impact water availability, hydropower production, energy demand, agriculture, fire weather, vector-borne diseases, and many other aspects of society, the economy, and human health. Seasonality also influences animal and plant distributions and abundances (Parmesan and Yohe 2003; Root et al. 2005; Cohen et al. 2018). It is critically important to understand how this seasonal pacemaker may have been modulated by historical changes in anthropogenic forcing—and how seasonality may change over the twenty-first century (Dwyer et al. 2012; Stine and Huybers 2012; Donohoe and Battisti 2013; Qian and Zhang 2015; Yettella and England 2018).

A previous study by Santer et al. (2018) reported that satellite temperature records contained a fingerprint of human-caused changes in TAC(x, t), the amplitude of the annual cycle of mid- to upper-tropospheric temperature (TMT).1 Related work showed that internal climate variability affected observed annual-mean TMT changes over the satellite era (Kamae et al. 2015; Suárez-Gutiérrez et al. 2017; Po-Chedley et al. 2021). The relationship between changes in annual-mean TMT and changes in TAC(x, t) is unclear. It is conceivable, however, that multidecadal internal variability (MIV) may have influenced the identification of a human fingerprint in satellite TAC(x, t) data.

We explore this possibility here using output from large initial condition ensembles (LEs) performed with five different Earth system models (ESMs; Deser et al. 2012; Fyfe et al. 2017, 2021; Tatebe et al. 2019; Rodgers et al. 2021). In total, these five LEs provide 240 different plausible realizations of historical climate change, each with a unique sequence of internal variability (“noise”) superimposed on the response to anthropogenic and natural external forcing (“signal”). With such information, we can assess how frequently fingerprint detection occurs in model realizations of TAC(x, t). If fingerprint detection is a robust result in the 240 realizations, despite differences in the forcings, climate sensitivity, and MIV properties of the five LEs, it suggests that positive fingerprint detection in real-world TAC(x, t) data is unlikely to be due to the fortuitous phasing of MIV.

Most fingerprint methods rely on model MIV estimates to assess whether the random action of internal variability could explain a “match” between observed climate change patterns and a model-predicted anthropogenic fingerprint. Concerns have been raised about the adequacy of model noise estimates, thus calling into question the reliability of fingerprint results (Curry and Webster 2011; O’Reilly et al. 2021). We address such concerns here by comparing simulated and observed spectra for three key modes of MIV: the Atlantic multidecadal oscillation (AMO), El Niño–Southern Oscillation (ENSO), and interdecadal Pacific oscillation (IPO).

We use information from these spectra as the basis for a number of sensitivity studies. These studies explore whether the positive identification of annual cycle fingerprints in observations and model simulations is robust to large model differences in the amplitude of specific modes of internal variability. A further sensitivity study considers whether fingerprint identification is hampered by removing all information regarding global-mean TAC(x, t) changes.

In addition to assessing the robustness of our fingerprint detection results for annual cycle changes, we also seek to improve understanding of the physical mechanisms driving these changes. Some insights are provided by novel aquaplanet simulations with realistic, seasonally varying insolation (Feldl et al. 2017). These experiments were performed under preindustrial and quadrupled CO2 conditions with two climate models, each with a different representation of the effects of sea ice on high-latitude climate processes. We compare the two sets of aquaplanet experiments with conventional (land + ocean + ice) ESM simulations to investigate how the annual temperature cycle is affected by the presence or absence of land.

The structure of our paper is as follows. Section 2 introduces the observational and model datasets used here, with additional information available in the online supplemental material (SM) and in a previous paper (Santer et al. 2021). Section 3 introduces the spatial patterns of satellite-era TAC(x, t) trends in four observational datasets and in the average of the five LEs. As a prelude to the signal-to-noise (S/N) analysis of global patterns of annual cycle changes, section 4 performs a local S/N analysis of TAC(x, t) trends at individual grid points in each LE. The fingerprint method applied to discriminate between forced and unforced annual cycle changes is introduced in section 5 and documented in detail in the SM. Section 6 discusses the S/N ratios and “baseline” fingerprint detection times obtained for the full global pattern of TAC(x, t) changes. After using the five LEs to estimate and subtract signals of forced SST changes from individual LE realizations and observations, section 7 compares the simulated and observed variability spectra for the AMO, Niño-3.4 SSTs, and the IPO. Section 8 uses information from the model spectra to repeat the baseline fingerprint analysis of section 6 with subsets of the 240 realizations of internal TAC(x, t) fluctuations. These subsets comprise realizations with low- and high-amplitude variability of the AMO and ENSO. Annual cycle changes in the aquaplanet simulations performed with two different climate models are analyzed in section 9. We provide brief conclusions in section 10.

2. Observational data and model simulations

a. Satellite and reanalysis data

Our focus here is on TAC(x, t) changes over the satellite era (January 1979–December 2020). We rely on satellite TMT data from three research groups: Remote Sensing Systems (RSS; Mears and Wentz 2017), the Center for Satellite Applications and Research (STAR; Zou et al. 2018), and the University of Alabama at Huntsville (UAH; Spencer et al. 2017). All three groups analyze microwave emissions from oxygen molecules. Emissions are measured with Microwave Sounding Units (MSU) and Advanced Microwave Sounding Units (AMSU) and depend on the temperature of different broad atmospheric layers. Measurements at different microwave frequencies provide information on temperatures at different heights. In addition to TMT, we use measurements of the temperature of the lower stratosphere (TLS) to adjust TMT for the contribution it receives from stratospheric cooling (Fu et al. 2004; Fu and Johanson 2004; also see our online SM).

Our comparisons of simulated and observed TAC(x, t) changes also make use of synthetic TMT data from version 5.1 of the state-of-the-art ERA reanalysis of the European Centre for Medium-Range Weather Forecasts (ECMWF; Hersbach et al. 2020; Simmons et al. 2020; also see the SM). Reanalyses are a retrospective analysis of many different types of observational data using a data assimilation system and numerical weather forecast model that do not change over time (Kalnay et al. 1996).

b. SST data

Section 7 considers three commonly used indices of modes of SST variability. We use version 4 of the dataset developed jointly by the Hadley Centre and the Climatic Research Unit (HadCRUT4; Morice et al. 2012) to compute observational time series of the AMO, Niño-3.4 SSTs, and the IPO. Information regarding calculation of these indices is provided in the SM. Our focus in section 7 is on the 852 months from January 1950 to December 2020, a period unaffected by potential problems associated with SST measurements during World War II (Thompson et al. 2008).

c. Model simulations

We analyze TAC(x, t) changes in five different large initial condition ensembles (LEs). Deser et al. (2020) provide a comprehensive introduction to LEs and their many scientific applications. An LE typically consists of between 30 and 100 individual members. The ensemble is generated by repeatedly running the same physical climate model with the same spatiotemporal changes in external forcings. Each ensemble member commences from different initial states of the atmosphere and/or ocean. These are selected in various ways (see the SM). Slight differences in initial states result in different sequences of natural variability superimposed on the underlying forced response. The result is an envelope of plausible trajectories of historical and/or future climate change.

Here, we use LEs to explore both the local (section 4) and global (sections 5, 6, and 8) S/N characteristics of simulated changes in annual cycle amplitude. Of particular interest is the information LEs provide regarding the robustness of fingerprint detection; the stochastic uncertainty in fingerprint detection time; estimates of externally forced signals in the AMO, Niño-3.4 SSTs, and the IPO; and uncertainties in the internal variability spectra of these three modes.

The LEs considered here rely on both older and newer model versions and estimates of external forcings. Two LEs were generated with models participating in the older phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012). The CMIP5 LEs were performed with version 1 of the Community Earth System Model (CESM1; Kay et al. 2015) and with version 2 of the Canadian Earth System Model (CanESM2; Kirchmeier-Young et al. 2017; Fyfe et al. 2017; Swart et al. 2018). The CESM1 and CanESM2 LEs have 40 and 50 members, respectively. The three LEs produced with models taking part in the newer phase 6 of CMIP (CMIP6; Eyring et al. 2016) relied on version 5 of CanESM (CanESM5; Swart et al. 2019; Fyfe et al. 2021), version 2 of CESM (CESM2; Rodgers et al. 2021), and version 6 of the Model for Interdisciplinary Research on Climate (MIROC6; Tatebe et al. 2019). Each CMIP6 LE had 50 ensemble members.2

The CMIP5 and CMIP6 historical simulations ended in 2005 and 2014, respectively. To facilitate comparison with observational TAC(x, t) changes over the full 42-yr satellite era (1979–2020), historical simulations were spliced with scenario integrations initiated from the end of each historical run. The scenario integrations are representative concentration pathway (RCP) 8.5 for CanESM2 and CESM1 (Meinshausen et al. 2011), Shared Socioeconomic Pathway 5–8.5 (SSP5) for CanESM5 and MIROC6, and SSP 3–7.0 (SSP3) for CESM2 (Riahi et al. 2017). Further details of these scenarios are given in the SM.

Our pattern-based fingerprinting method requires model estimates of natural internal variability. We obtain these estimates from two sources: 1) multimodel ensembles of preindustrial control simulations with no year-to-year changes in external forcings and 2) the between-realization variability of each of the five LEs. In the former case, we use output from preindustrial control runs performed with 36 CMIP5 models and 30 CMIP6 models. In the latter case, we estimate the between-realization variability in a single model’s LE by subtracting the ensemble-mean changes in TAC(x, t) from each realization in the LE (see section 5 and SM). Tables S1 and S2 in the SM identify the CMIP5 and CMIP6 models we relied on for our multimodel noise estimates.

Section 9 examines changes in the amplitude of the annual cycle of TMT in aquaplanet simulations performed with two climate models. The first is version 2.1 of the Geophysical Fluid Dynamics Laboratory Atmospheric Model (GFDL-AM2.1). The model was run in a configuration with a 30-m fixed-depth slab ocean with no meridional ocean heat transport and a realistic seasonal cycle of insolation (Feldl et al. 2017). The simulations explore the impact of large differences in sea ice albedo under preindustrial and quadrupled CO2 conditions.

The second model relies on version 6 of the Community Atmospheric Model (CAM6; Rodgers et al. 2021). This is the atmospheric component of CESM2. Like GFDL-AM2.1, CESM2-CAM6 was run with a 30-m fixed-depth slab ocean, but with a symmetrical annual-mean ocean heat transport (an average of NH and SH conditions) diagnosed from the CESM2 preindustrial control run. A significant difference in the two models is that GFDL-AM2.1 has no ice thermodynamics, while CESM2-CAM6 includes ice thermodynamics and uses a simple version of the Los Alamos sea ice model (CICE5; Smith et al. 1992). As we show subsequently, model differences in sea ice treatment yield different high-latitude changes in TAC(x, t) in response to CO2 forcing.

Both sets of aquaplanet simulations allow us to investigate whether large-scale features of the annual cycle fingerprints in full ESMs can be captured without representation of land surface processes and without hemispheric asymmetry in land distribution or land–ocean differences in heat capacity. Further details of the aquaplanet simulations are given in the SM.

3. Changes in annual cycle amplitude in observations and the LE average

Santer et al. (2018) analyzed observed spatial patterns of TAC(x, t) trends over 1979–2016. It is useful to re-examine these patterns given four additional years of corrected TMT data, improved versions of satellite TMT datasets, and results from the state-of-the-art ERA5.1 reanalysis.

Updates and improvements to satellite TMT data have not altered the basic features of the TAC(x, t) trends. These features include increases in annual cycle amplitude at midlatitudes in both hemispheres (with larger increases in the NH than the SH), decreases in amplitude over the Arctic, and small changes of either sign in the tropics (Figs. 1a–c). ERA5.1 shows similar behavior (Fig. 1d). UAH differs from the other observational datasets at high latitudes in the SH: TAC(x, t) trends are positive in UAH and negative in RSS, STAR, and ERA5.1. The anomalous UAH results appear to be related to the decisions made by the UAH group in merging information from MSU and AMSU during the period of overlap between these different instruments (Santer et al. 2018).

Fig. 1.
Fig. 1.

Least squares linear trends over 1979–2020 in TAC(x, t), the amplitude of the annual cycle of mid- to upper-tropospheric temperature (TMT). (a)–(c) Satellite data from Remote Sensing Systems (RSS), the Center for Satellite Applications and Research (STAR), and the University of Alabama at Huntsville (UAH). (d) Version 5.1 of the reanalysis produced by the European Centre for Medium-Range Weather Forecasts. (e) The average of the ensemble-mean trends in TAC(x, t) in the five LEs analyzed here (see Figs. 2a–e). TMT is adjusted for stratospheric cooling in all satellite, reanalysis, and climate model datasets (see the SM).

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

Figure 1e shows the average of the ensemble-mean TAC(x, t) trends in the five LEs. As expected, simulated changes are smoother than in the observations (Santer et al. 2018; Po-Chedley et al. 2021). This is because the model results have been averaged over individual realizations with different sequences of internal variability, and then averaged over models. Averaging over realizations and models damps internal variability and reduces uncorrelated model biases, more clearly revealing the underlying forced response. Despite the larger spatial noise in observations, there is correspondence between the large-scale features of the simulated and observed TAC(x, t) changes in Fig. 1. Whether this correspondence is statistically significant is considered in section 6.

4. Local signal-to-noise ratios

Pattern-based fingerprinting utilizes the signal and noise properties of entire spatial fields (Hasselmann 1979; Santer et al. 1994; Hegerl et al. 1996). It provides an efficient means of discriminating between externally forced climate changes and the complex noise of internal variability. An alternate form of S/N analysis considers forced and unforced climate changes at individual model grid points (Hawkins and Sutton 2012; Mahlstein et al. 2012; Deser et al. 2014; Rodgers et al. 2015). Local S/N information can help to inform and interpret results from pattern-based fingerprinting (Santer et al. 2019). In this section, we briefly discuss a local S/N analysis before detailed consideration of our fingerprint results in section 5.

Figures 2a–e show the ensemble-mean TAC(x, t) trends in the five LEs. Trends are calculated over the same 1979–2020 analysis period used for the observations in Fig. 1. Although there are pronounced differences between the LEs in the amplitude of the changes, there are also key common features in the trend patterns. These include the previously noted increases in annual cycle amplitude at midlatitudes in both hemispheres (with larger increases in the NH than the SH), decreases in TAC(x, t) at high latitudes in the SH, and small changes with differing signs in the tropics (see section 3). At high latitudes in the NH, the observations and CanESM5 show pronounced decreases in TAC(x, t). This feature is absent in the other LEs.

Fig. 2.
Fig. 2.

Local signal-to-noise (S/N) analysis of least squares linear trends over 1979–2020 in TAC(x, t). Results are from five different LEs (columns 1–5). (a)–(e) Ensemble-mean TAC(x, t) trends. (f)–(j) Local 1σ standard deviation of the 42-yr trends in TAC(x, t) across all members in the LE. (k)–(o) The S/N ratio: the absolute value of the ensemble-mean trend in an LE (the signal) divided by the local standard deviation of trends in the same LE (the noise). Stippling in the top row identifies grid points where the local S/N ratio for ensemble-mean trends exceeds 2.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

The denominator of the local S/N ratio is the between-realization standard deviation of the 42-yr trend in TAC(x, t), calculated across all members of an ensemble. Patterns of this local noise are similar in the five LEs, with the smallest values in the tropics and the largest values at high latitudes in both hemispheres (Figs. 2f–j). There is some agreement across LEs in small-scale features of the noise patterns, such as the maxima over Greenland, the Himalayas, and East Antarctica. In all LEs, the local S/N ratio displays highest values at midlatitudes in the NH, where increases in TAC(x, t) are largest and noise is relatively low (Figs. 2k–o).

It is of interest to compare the annual cycle changes for TMT with those obtained for surface temperature (TS). In the Arctic and Antarctic, there are large reductions in the amplitude of the annual cycle of TS (Figs. 3a–e). These reductions in annual cycle amplitude have been linked to sea ice loss and associated seasonal feedbacks, ocean–atmosphere energy transfer, and changes in surface heat capacity (Serreze and Barry 2011; Donohoe and Battisti 2013; Bintanja and van der Linden 2013; Taylor et al. 2013; Santer et al. 2018; Feldl et al. 2020; Feldl and Merlis 2021). As for TMT, the amplitude of the annual cycle of TS increases at midlatitudes in the NH, but TS increases there are smaller, without the well-defined zonal structure of the TMT amplitude increases. Even for TS, however, there are midlatitude areas of the North Atlantic and North Pacific Oceans displaying significant increases in annual cycle amplitude, suggesting that the TS changes are not driven by land surface processes alone. Information on some of the factors driving annual cycle changes in TS and atmospheric temperature is given in Donohoe and Battisti (2013). In addition to the sea ice changes mentioned above, these factors include the shortwave absorption associated with GHG-forced increases in upper tropospheric water vapor.

Fig. 3.
Fig. 3.

As in Fig. 2, but for the annual cycle of surface skin temperature. To facilitate comparison with TMT results the color bar ranges are identical to those in Fig. 2.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

As expected, the between-realization variability of trends in annual cycle amplitude has a strong land–sea contrast component for TS but not for TMT (cf. Figs. 2f–j and 3f–j). Because of the higher noise over land for TS, few land areas have S/N ratios > 2 for changes in the annual cycle of TS (Figs. 3k–o). A notable exception is the Mediterranean region (Yettella and England 2018). Some of the most extensive areas of high S/N are in the regions of Arctic and Antarctic sea ice decrease where TS signals are largest.

5. Fingerprint method and results

Next, we seek to determine whether the patterns of forced changes in TAC(x, t) can be identified in observations and individual realizations of the LEs. The latter provide 240 different trajectories of climate change over the satellite era, each with a different estimate of MIV superimposed on the underlying response to forcing. The LEs allow us to estimate the stochastic uncertainty in td, the time required to identify the searched-for fingerprints of forced change (Santer et al. 2019).

We use a standard pattern-based fingerprint method to calculate td (Hasselmann 1979). The method has been successfully employed to identify anthropogenic fingerprints in many different independently monitored aspects of climate change (Hegerl et al. 1996; Santer et al. 1996, 2009, 2018; Marvel and Bonfils 2013; Bonfils et al. 2020; Sippel et al. 2020, 2021). The statistical methodology follows Santer et al. (2018); full details are provided in the SM. A brief description of the method is given below.

In the present application, the fingerprint pattern FAC(x) is an estimate of the response of the amplitude of the annual cycle of TMT to combined anthropogenic and natural forcing. Five different fingerprints are used here. Each is the leading empirical orthogonal function (EOF) of ensemble-mean TAC(x, t) in an LE, calculated over 1979–2020 (Figs. 4a–e). We assume that the spatial pattern of FAC(x) does not change markedly over time. For changes in the annual cycle of TMT, this assumption has been tested elsewhere and found to be reasonable (see the SM).

Fig. 4.
Fig. 4.

Leading modes of response to external forcing and natural internal climate variability for changes in the amplitude of the annual cycle of TMT. (a)–(e) Fingerprints of changes in TAC(x, t) in five LEs. The fingerprints are the leading EOF of changes in ensemble-mean TAC(x, t) over the 42-yr period from 1979 to 2020. (f)–(j) First EOF of natural internal climate variability of TAC(x, t), estimated from the between-realization variability of each LE. (k)–(o) Second EOF of natural internal variability. The total variance explained by each EOF is listed. The gray shaded regions poleward of 80° arise because of regridding to a 10° × 10° grid and masking model simulation output with observational TMT coverage (see the SM).

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

The five LE estimates of FAC(x) shown in Figs. 4a–e are searched for in sequences of time-varying TAC(x, t) patterns derived from satellite data, the ERA5.1 reanalysis, and individual realizations of an LE. In the latter case, a searched-for model fingerprint is always compared with individual realizations of TAC(x, t) changes generated with the same model; for example, the CESM1 fingerprint in Fig. 4a is compared with the 40 individual realizations of TAC(x, t) changes in the CESM1 LE (see Fig. 5a and the left box-and-whisker bar in Figs. 6a,b). In searching for FAC(x) in observations, each of the five model fingerprints is compared with each observational dataset (Fig. 5f).

Fig. 5.
Fig. 5.

Signal-to-noise ratio SNL as a function of the trend length L. (a)–(e) SNL for the strength of the model FAC(x) fingerprints in individual realizations of TAC(x, t) (thin gray lines) and in ensemble-mean TAC(x, t) changes (dark gray lines). Results are from five different LEs. Model fingerprints used in (a)–(e) are shown in the top row of Fig. 4. For CanESM2 and CESM1 (which are both CMIP5 models), the denominator of SNL was estimated with the unforced variability from 36 different CMIP5 preindustrial control runs. For the CMIP6 LEs (CanESM5, CESM2, and MIROC6), the denominator of SNL was computed with the internally generated variability from 30 different CMIP6 control integrations. (f) SNL ratios for the strength of model fingerprints in satellite and reanalysis TAC(x, t) data. There are five lines for each observational dataset. Each line corresponds to use of a different LE for estimating the fingerprint and noise (see Fig. 4 and SM). The SNL is always plotted on the final year of the L-yr analysis period, which is given in red in the upper x axis. The trend length L is given in blue in the lower x axis. The first analysis period is over 1979 to 1988; the final analysis period is over 1979–2020. The dashed horizontal magenta line is the stipulated 5% significance level used for calculating the td values shown in Fig. 6a.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

Fig. 6.
Fig. 6.

Stochastic uncertainty in fingerprint detection time in model LEs (box-and-whisker plots) and actual fingerprint detection time in satellite data (colored symbols). Detection time td is defined as the time at which the ratio SNL first exceeds a stipulated significance threshold (in this case, p = 0.05) and then remains continuously above this threshold as the analysis period L increases. (a) Values of td estimated with fingerprints from five different LEs (see first row in Fig. 4) and using the multimodel noise from concatenated preindustrial control runs performed with 36 CMIP5 models and 30 CMIP6 models. For details of the multimodel noise, refer to Fig. 5 and the SM. (b) Fingerprints calculated as in (a), but with noise estimated using the between-realization variability of each LE. In the box-and-whisker plots in both panels, the red horizontal line is the median td value in the individual realizations of TAC(x, t). The box size represents the interquartile td range; the whiskers span the full range of detection times in the ensemble.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

These comparisons involve computing a measure of pattern similarity (an uncentered spatial covariance). This yields the signal time series Z(t). If the observations or individual LE realizations are exhibiting greater magnitude of FAC(x) over time, Z(t) will exhibit a trend. To determine whether this trend in Z(t) is significant, we require null distributions of pattern similarity trends in which we know a priori that any changes in pattern similarity with time are due to the effects of natural variability only (see the SM).

We generate these null distributions by fitting trends to the noise time series N(t), which is calculated by measuring the pattern similarity between FAC(x) and time-varying patterns of natural internal variability in TAC(x, t). The latter are obtained from two sources: 1) multiple preindustrial control runs performed with either CMIP5 or CMIP6 models and 2) the between-realization variability of TAC(x, t) changes in each LE. We refer to these subsequently as multimodel and single-model noise estimates, respectively.

In the multimodel noise case there are nm model control runs, each of length 150 years. These are concatenated into one dataset (see the SM). The single-model noise is computed by subtracting the ensemble-mean TAC(x, t) changes in an LE from each realization of the LE. Calculation of the ensemble mean and residuals is over the 42-yr satellite era (1979–2020). The residuals are then concatenated and have the time dimension 42 × nr, the number of years in the satellite era times the number of realizations in the LE. Differences between single-model and multimodel noise estimates are discussed in section 6.

Our detection time estimates are based on SNL, the S/N ratio between bL, an L-yr trend in Z(t), and σL, the standard deviation of the sampling distribution of L-yr trends in N(t). Here, L varies from 10, 11, …, 42 years. A key aspect of our analysis is that trends in Z(t) and N(t) are always compared on the same time scale. Explicit consideration of the time scale dependence of S/N ratios is important because noise patterns and amplitude vary as a function of time scale (Tett et al. 1997; Stouffer et al. 2000).

For L = 10 years, for example, bL is calculated over 1979–88 and σL is computed from the sampling distribution of overlapping 10-yr trends in N(t). For L = 11 years, bL is the trend in Z(t) over the first 11 years (1979–89) and σL is calculated from the sampling distribution of overlapping 11-yr trends in N(t). The full satellite era (1979–2020) is the L = 42 case. The detection time td is defined as the final year of the L-yr period at which SNL first exceeds some stipulated significance level (generally 5% here) and then remains continuously above this level for all larger values of L. The null hypothesis we are testing is that trends in Z(t) are consistent with internal variability alone and SNL values are not statistically unusual relative to an assumed Gaussian distribution (see the SM for further details).

Before considering td results, it is useful to first examine the FAC(x) patterns and dominant modes of between-realization variability in the five LEs. The fingerprints are spatially similar across the LEs (Figs. 4a–e) and capture the zonally coherent mean changes in annual cycle amplitude described in the local S/N analysis (section 4). In contrast, the dominant noise modes are characterized by variability at smaller spatial scales. The leading noise EOF displays ENSO-like features (Po-Chedley et al. 2021) which are similar across the five LEs (Figs. 4f–j). The second noise EOF is also similar in the LEs, capturing anticorrelated variability in TAC(x, t) between North America, northern Eurasia, and the Indian subcontinent (Figs. 4k–o). The spatial dissimilarity3 between the large-scale zonally distinctive fingerprints and the smaller-scale noise patterns is important in explaining the fingerprint detection results described in the next section.

6. Fingerprint detection times in LEs and observationally based data

Values of SNL used for calculating td are given in Fig. 5. The 1991 Pinatubo eruption has a clear effect on simulated and observed annual cycle amplitude (Santer et al. 2018), resulting in an initial dip in SNL for analysis periods ending between 1991 and 1994. Thereafter, SNL increases linearly with increasing L, except in CESM2 and in observational data, where SNL exhibits relatively little change or decreases for L-yr trends ending after approximately 2012 (Figs. 5d,f).

The individual LE realizations cross the stipulated 5% significance threshold at a wide range of L values. When multimodel noise estimates are used to compute the denominator of SNL, the median detection time in the five LEs, td(med), ranges from 1994 for CanESM5 to 2005 for CESM1 (Fig. 6a). A similar range of td(med) results is obtained by calculating the denominator of SNL with the between-realization variability of an individual LE (Fig. 6b).

For each LE, we tested whether the between-realization variability is significantly larger than the multimodel variability. Tests were performed on time scales of 10, 20, 30, and 40 years (see the SM for significance test details). There were only two cases in which the between-realization variability was significantly larger at the 5% level: CanESM5 and MIROC6 (for 20- and 40-yr time scales, respectively). In these two LEs, the larger single-model noise in Fig. 6b yields slightly later values of td(med) relative to the corresponding results in Fig. 6a. Single-model noise also exceeds multimodel noise in CESM1, but is not significantly larger at the 5% level on the four time scales we examined. The single-model variability in the CanESM2 and CESM2 LEs is similar in amplitude to the CMIP5 and CMIP6 multimodel variability (respectively). Averaged across the five LEs, the median detection time is 1998.3 for the multimodel noise in Fig. 6a and 1999 for the between-realization variability in Fig. 6b.

There are two key findings from Fig. 6. First, despite model differences in external forcings, equilibrium climate sensitivity (ECS), and the amplitude of MIV (Andrews et al. 2012; Zelinka et al. 2014, 2020; Pallotta and Santer 2020; Fyfe et al. 2021; Po-Chedley et al. 2021), the FAC(x) patterns in the five LEs are robustly identifiable at the 5% significance level in individual model realizations of satellite-era annual cycle changes. Positive detection occurs in 239 out of 240 cases if multimodel noise is used to calculate the denominator of SNL and in the same number of cases if single-model noise is employed.4

The second key finding is that the model-predicted FAC(x) fingerprints are identifiable at the 5% level in 16 out of 20 different combinations of the five fingerprints (derived from the five LEs) and the four observational datasets. This holds for both the multimodel noise in Fig. 6a and the single-model noise in Fig. 6b. The null results in Figs. 6a and 6b are for the UAH dataset. All five fingerprints yield S/N ratios in UAH TAC(x, t) data that initially exceed the stipulated 5% significance threshold on time scales of ∼35 years, but then fall below this threshold for UAH S/N ratios calculated over the full satellite era (except in the case of the CESM2 fingerprint; see Fig. 5f).

Finally, we note that removal of all global-mean information from our S/N analysis, as described in Santer et al. (2018), has minimal impact on the detection time results in Fig. 6. This illustrates that the identification of model-predicted FAC(x) patterns in observational data and in individual LE realizations is not solely driven by global-mean changes in annual cycle amplitude; it primarily reflects similarity of large-scale pattern information (see Fig. S1 and section 5b of the SM).

In the following, we refer to the td results in Fig. 6b as the “baseline” case. In section 8, we report on tests that explore the sensitivity of the baseline detection times to use of low- and high-variability subsets of the single-model noise used in Fig. 6b. These subsets of the 240 realizations of internal TAC(x, t) variability are selected based on the power spectral density (PSD) of the model AMO and Niño-3.4 SST time series.

7. Comparison of simulated and observed internal variability spectra

The robust detection of model-predicted FAC(x) fingerprints in observations and in individual LE realizations has multiple interpretations. Under one interpretation, large-scale forcing by greenhouse gases drives large-scale physical processes that are common to observations and climate models. These processes include summertime drying of midlatitude continental interiors (Manabe et al. 1981; Wetherald and Manabe 1995; Douville and Plazzotta 2017), expansion of the tropics (Seidel and Randel 2007; Hu and Fu 2007; Quan et al. 2014), and lapse-rate changes (Frierson 2006; Donohoe and Battisti 2013). In contrast, modes of MIV are characterized by smaller-scale patterns of anticorrelated variability that do not project well onto the coherent FAC(x) patterns (see Fig. 4). This basic difference in the spatial scales of the forced response and MIV favors signal detection (Santer et al. 1994).

A second possible interpretation is that robust detection of model FAC(x) fingerprints is biased by errors in model representation of MIV (Curry and Webster 2011; O’Reilly et al. 2021). Under this interpretation, models systematically underestimate “observed” MIV, thereby spuriously inflating SNL and leading to incorrect fingerprint detection claims. This “biased variability” argument is challenging to address because there are large uncertainties in separating externally forced signals from MIV in the single occurrence of signal and noise available in observations (Frankcombe et al. 2015; Kravtsov 2017; Cheung et al. 2017; Kajtar et al. 2019; Pallotta and Santer 2020). This introduces uncertainty in determining the size and significance of model MIV errors.

These two interpretations are not mutually exclusive. We have already shown credible evidence that the first interpretation—dissimilarity of signal and noise patterns—contributes to our high success rate in identifying model FAC(x) fingerprints in individual LE realizations (see Figs. 4 and 6). In the current section, we consider the plausibility of the second interpretation of our results. In doing so, we make use of the fact that the climate change signals in LEs can be reliably estimated by averaging over many realizations.

We assume that these well-estimated signals, obtained from LEs generated using models with different ECS, MIV, and historical external forcings, encapsulate a significant portion of the true uncertainty in the amplitude and time evolution of forced changes in real-world climate. We apply a regression-based approach (see below) to remove these LE-derived signals from observed time series of three major modes of MIV: the AMO, ENSO, and IPO. Regression-based signal removal is not required in model LEs. The ensemble-mean signal of a given LE is a reasonable estimate of forced changes in that LE, and is simply subtracted from each realization of the LE.

Signal removal in the LEs and observations allows us to isolate the internally generated component of variability in the AMO, ENSO, and IPO time series. We calculate PSD from the “signal removed” residual time series, thus facilitating the direct comparison of simulated and observed MIV. We seek to determine whether there is evidence that the five LEs analyzed here significantly underestimate the observed MIV of the AMO, ENSO, and IPO (Kajtar et al. 2019). Such an error could provide support for the second interpretation of our fingerprint detection—particularly if the detection time for FAC(x) fingerprints is sensitive to large intermodel and inter-realization differences in the amplitude of AMO and ENSO variability. Whether such sensitivity exists is explored in section 8.

Consider results for the AMO first. The amplitude and time evolution of ensemble-mean SST changes in the AMO region varies markedly across the five LEs (Figs. 7a–e). This is unsurprising given model differences in ECS and in direct and indirect anthropogenic aerosol forcings (Zelinka et al. 2014, 2020; Santer et al. 2019).5 All five ensemble-mean signals show overall SST increases in the AMO region, punctuated by recovery from surface cooling caused by major volcanic eruptions. The SST increases are temporally complex and poorly captured by a linear trend.

Fig. 7.
Fig. 7.

Simulated and observed time series of the Atlantic multidecadal oscillation (AMO). Results are for SST changes spatially averaged over 0°–60°N and 80°W–0° [see Enfield et al. (2001), as well as the SM]. (a)–(e) AMO time series calculated from individual realizations (light gray) and multimodel averages (dark gray) of five LEs. (f) Raw (red) and filtered (dark red) AMO time series calculated from HadCRUT4 SST data. A Savitzky–Golay filter was applied to smooth the observations. The filter used a window width of 141 months and a third-order polynomial. The vertical magenta lines denote the eruption dates of El Chichón in March 1982 and Pinatubo in June 1991.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

Intermodel differences in the median detection time for FAC(x) fingerprints (Fig. 6) show some correspondence with intermodel differences in the ensemble-mean AMO signal time series in Fig. 7. CanESM5, for example, which has the earliest td(med) values in Fig. 6, also has the largest and most rapid SST increase in the AMO region (Fig. 7b). Similarly, the smaller and more gradual SST increase in the CESM1 AMO signal appears to be related to the later td(med) values in CESM1 (cf. Figs. 7c and 6).

Removing the ensemble-mean forced SST signals from individual realizations of an LE yields residual AMO variability that is smallest in amplitude in CESM1 and largest in CanESM5 (Figs. 8a–e). Subtracting the unscaled ensemble-mean model signals from observed HadCRUT4 data can produce residuals with large low-frequency variability, primarily because of mismatches between model ECS and the true (but uncertain) real-world ECS (Frankcombe et al. 2015). Model forcing errors also contribute to this large residual variability, thus inflating estimates of “observed” MIV associated with the AMO.

Fig. 8.
Fig. 8.

Simulated and observed time series of the Atlantic multidecadal oscillation (AMO) after removing externally forced SST signals. (a)–(e) “Signal removed” AMO time series (thin gray lines) after subtracting ensemble-mean AMO SST changes in a given LE from each realization of the LE. The blue line is the “signal removed” time series for the last realization in the LE. (f) Observed “signal removed” time series. The five ensemble-mean AMO signal time series in Figs. 7a–e were each subtracted from the HadCRUT4 AMO time series using regression-based scaling.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

We therefore subtract scaled model AMO signals from observations (Frankcombe et al. 2015; Steinman et al. 2015). Scaling involves Y(t)=a+bX¯(t)+ϵ(t), the regression between the observed AMO time series, Y(t), and X¯(t), the ensemble-mean AMO time series for an individual LE. The residual ϵ(t) is the “signal removed” AMO time series. Subtraction of bX¯(t) from the HadCRUT4 AMO time series markedly damps the residual low-frequency variability (Fig. 8f). For example, at 284 months (23.7 years), regression-based removal of scaled AMO signals decreases the observed PSD range by 92% relative to the range obtained with unscaled signal subtraction (not shown).

Simulated and observed “signal removed” spectra for AMO SSTs are shown in Fig. 9. While the observed spectrum and the spectra for both CanESM models are well described by simple power law fits, the CESM models and MIROC6 exhibit more complex spectral shape, with noticeable flattening of PSD at periods greater than 100 months. Of greatest interest here is the comparison of PLOW, the PSD at 284 months. This is the longest period that can be usefully resolved from the 852 months (71 years) of the observed AMO and Niño-3.4 SST time series. Systematic model underestimation of observed PLOW has the potential to spuriously inflate the signal-to-noise ratio SNL, thereby biasing fingerprint detection times toward earlier and more ubiquitous detection.

Fig. 9.
Fig. 9.

Power spectral density (PSD) in simulated and observed AMO time series. (a)–(e) PSD in individual realizations (gray lines) of “signal removed” AMO time series shown in Figs. 8a–e. (f) PSD in five “signal removed” observed AMO time series. The (scaled) forced component of AMO SST changes for each LE was subtracted from the HadCRUT4 AMO time series. Individual observed “signal removed” AMO time series in (f) are also plotted in (a)–(e) for their corresponding LE (i.e., for the LE used to estimate and subtract an AMO signal from observations). The red horizontal band delimits the lowest and highest values of PSD at a period of 284 months in the five “signal removed” observational spectra. The vertical dotted purple line at the left of each panel corresponds to this 284-month period (see SM for further technical details).

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

We compare simulated and observed PLOW in two ways. First, we determine the total number of model realizations in the five LEs with PLOW values exceeding the smallest of the observed PLOW values in Fig. 9f (see bottom edge of red bands). Second, for each LE, we determine the number of realizations in that LE with PLOW values exceeding the corresponding observed PLOW value.6 We refer to these two comparisons subsequently as method 1 and method 2, respectively. They are simple measures of the consistency between simulated and observed low-frequency PSD.7

For the AMO, methods 1 and 2 yield 56 and 50 realizations exceeding observed PLOW (23% and 21% of the total number of realizations).8 We conclude from this that the five model LEs analyzed here show evidence of underestimating the amplitude of observed low-frequency AMO variability (Kajtar et al. 2019), but that this underestimate is not statistically significant at the 5% level. If it were, we would expect a smaller fraction of model exceedances of observed PLOW (5% or less).

Qualitatively and quantitatively different results are obtained for SST variability in the Niño-3.4 region of the tropical Pacific (Fig. 10). SST changes in this region are a common proxy for ENSO variability. Fluctuations in ENSO have substantial impact on global surface temperature (Kosaka and Xie 2013), tropospheric temperature (Po-Chedley et al. 2021), and many other climatic variables (Bonfils et al. 2015).

Fig. 10.
Fig. 10.

As in Fig. 7, but for simulated and observed time series of SST spatially averaged over the Niño-3.4 region (5°N–5°S, 120°–170°W).

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

SST variability in the Niño-3.4 region is markedly larger than in the AMO region (cf. Figs. 7 and 10), so that even with ensemble sizes of 40–50 realizations, there is still substantial residual noise in the ensemble-mean Niño-3.4 SST time series (Figs. 10a–e). This noise displays power at a period of 12 months, most clearly in MIROC6 (Figs. 11a–e). This residual power is consistent with a change over the satellite era in the seasonal cycle of Niño-3.4 SSTs.

Fig. 11.
Fig. 11.

As in Fig. 9, but for spectra of simulated and observed “signal removed” Niño-3.4 SST time series.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

All five LEs have small positive warming trends in their ensemble-mean Niño-3.4 time series. Observed warming in this region is more muted (Fig. 10f), partly due to the phasing of ENSO and IPO variability over 1950–2020 (Kosaka and Xie 2013; Trenberth 2015; Meehl et al. 2011, 2016; England et al. 2014; Fyfe et al. 2016; Po-Chedley et al. 2021).

Because of the relatively small externally forced component in simulated Niño-3.4 SST changes and the large residual noise in this component, model ensemble-mean Niño-3.4 SST time series are only weakly correlated with the raw observed Niño-3.4 SST time series, with r ranging from 0.02 in MIROC6 to 0.17 in CESM1.9 Scaling and subtraction of these Niño-3.4 SST signals from observations has only small impact on the original observed Niño-3.4 SST time series, yielding the spectra shown in Fig. 11f.

All simulated and observed Niño-3.4 SST spectra in Fig. 11 have a discrete peak within the canonical 3–7-yr range of ENSO variability (AchutaRao and Sperber 2002). This peak is more narrowly defined in MIROC6 than in the other LEs or observations. Simulated Niño-3.4 spectra show a noticeable decrease in PSD for periods longer than approximately 7 years. This PSD decrease is less pronounced in observations. In contrast to the AMO results, methods 1 and 2 yield 185 and 178 exceedances of observed PLOW (i.e., 77% and 74% of the LE realizations have power at 284 months that is higher than in observations). There is no evidence from our analysis, therefore, that the LEs examined here systematically underestimate the observed low-frequency variability of ENSO. This is consistent with other findings (Lienert et al. 2011).

An analysis of the IPO (not shown) leads to a similar conclusion. Unlike Niño-3.4 SSTs, the IPO is influenced by both the tropical and extratropical variability of Pacific SSTs (Meehl et al. 2016; Trenberth 2015; Henley et al. 2015, 2017). For the IPO, we find 116 and 101 exceedances of observed PLOW for methods 1 and 2, corresponding to 48% and 42% of LE realizations with low-frequency PSD that is larger than in the “signal removed” observations (Kajtar et al. 2019). Possible implications of such simulated and observed PLOW differences for fingerprint detection time are explored in the next section.

8. Detection time sensitivity tests

Other previously published studies considered the links between fingerprint detection and model performance in simulating observed global-scale variability (Hegerl et al. 1996; Allen and Tett 1999) or investigated the sensitivity of D&A results to large intermodel differences in variability (Santer et al. 2009; Sippel et al. 2021). There have, however, been few studies of links between detection time results and the behavior of individual modes of MIV.

We explore these links here using sensitivity tests (Fig. 12). We repeat the “baseline” S/N analysis shown in Fig. 6b with two 50-member subsets of the 240 individual samples of between-realization TAC(x, t) variability. These two 50-member subsets10 correspond to low- and high-amplitude variability of a specific mode of MIV at a specific time scale. The mode amplitude is estimated from the spectra of “signal removed” time series (see Figs. 9a–e and 11a–e). There are four separate sensitivity tests, one for each mode (the AMO and ENSO) and each time scale of interest (284 months and 70 months). The procedure for conducting these sensitivity tests is described in detail in the SM.

Fig. 12.
Fig. 12.

Stochastic uncertainty in fingerprint detection time td in model LEs (box-and-whisker plots) and actual fingerprint detection time in satellite data (colored symbols). Results are for sensitivity tests involving the selection of 50-member subsets from the 240 realizations of unforced TAC(x, t) variability. (a),(b) Partitioning of internal TAC(x, t) variability into low- and high-variability subsets is based on the PSD values at 284 and 70 months in spectra calculated from “signal removed” AMO time series [ (a) and (b), respectively]. (c),(d) As in (a) and (b), but for the use of spectra from simulated “signal removed” Niño-3.4 SST time series. See section 8 and SM for further information on sensitivity tests. The caption of Fig. 6 provides details of box-and-whiskers plots. The shaded bars in each panel display td results for high-variability subsets of TAC(x, t). Unshaded bars show td for low-variability TAC(x, t) subsets.

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

Recall that the internal variability of TAC(x, t) is used to calculate the denominator of our S/N ratios, which in turn are used to estimate fingerprint detection times (section 6). Comparing detection times obtained for TAC(x, t) subsets—with subsetting based on the low and high PSD values of key modes of MIV—allows us to explore possible links between the simulated mode amplitude and our D&A results.

Our analysis time scales of 284 months and 70 months (23.7 and 5.8 years, respectively) were selected for the following reasons. Detection of a slowly evolving externally forced fingerprint requires information on the background noise of MIV. Given 852-month (71 year) record lengths for the AMO and Niño-3.4 SST time series, the longest noise time scale we can usefully resolve is 284 months. The choice of the shorter 70-month time scale was driven by the presence of a spectral peak close to this period in the “signal removed” MIROC6 AMO and Niño-3.4 SST time series (see Figs. 9e and 11e).

On both of time scales considered here, and for both the AMO and Niño-3.4 SSTs, the average PSD is typically a factor of 3–4 larger in the high-variability subset of spectra than in the low-variability subset. This indicates that for each mode and time scale, the amplitude differences between the high- and low-variability subsets are sufficiently large to justify investigating the implications of these differences for unforced TAC(x, t) variability and fingerprint detection time.

Our sensitivity tests yield three main results (Fig. 12). First, in each sensitivity test and for each LE, the “low PSD” and “high PSD” subsets of unforced TAC(x, t) variability yield similar values of the median detection time td(med), with td(med) differences < 1 year. Second, the “baseline” td(med) results in Fig. 6b are relatively unaffected by repeating the D&A analysis with “low PSD” and “high PSD” subsets of the original 240 realizations of unforced TAC(x, t) variability. All sensitivity tests preserve the relative differences in td(med) found in the “baseline” case (e.g., the earliest fingerprint detection is still in CanESM5 and the latest detection is still in CESM1). Third, the model-predicted FAC(x) fingerprints are statistically identifiable in 75% of the 160 sensitivity tests in Fig. 12 that involve satellite and reanalysis data.11

Figure S2 in the SM shows SNL for one of the four sensitivity tests: selecting subsets of unforced TAC(x, t) variability based on PSD at 284 months in the simulated AMO spectra (Figs. 9a–e). In all five LEs, the “low PSD” subset yields larger S/N ratios (relative to the “high PSD” subset) for analysis periods longer than ∼25–30 years (Figs. S2a–e). This means that low-amplitude AMO variability at 284 months tends to correspond to lower-amplitude multidecadal TAC(x, t) variability, which damps the denominator of S/N and increases S/N ratios. Conversely, high-amplitude AMO variability at 284 months tends to correspond to higher-amplitude multidecadal TAC(x, t) variability, thereby decreasing S/N ratios. Qualitatively similar “low PSD-versus-high PSD” differences in SNL are also found for the other three sensitivity tests (not shown).

The results in Fig. 12 and in Fig. S2 raise several questions. The first question is why the “low PSD-versus-high PSD” S/N differences in Figs. S2a–e have relatively small impact on td(med). The answer is that these S/N differences are small for L < ∼25–30 years. This explains why the median detection times in Fig. 12a are so similar in the “low PSD” and “high PSD” cases, particularly for CanESM2, CanESM5, and CESM2. In these three models, the S/N ratios for almost all individual realizations exceed the 5% significance threshold in less than 30 years, well before the “low PSD-versus-high PSD” S/N differences become pronounced.

The second question is why are our “baseline” fingerprint detection times are robust to partitioning the original 240 realizations of unforced TAC(x, t) variability into “low PSD” and “high PSD” subsets. Recall that the annual cycle fingerprints in the five LEs are spatially uncorrelated with the dominant TAC(x, t) noise modes (Fig. 4). This was true for both the multimodel CMIP5 and CMIP6 noise and for the single-model between-realization variability in each LE. Quasi-orthogonality of fingerprint and noise patterns also applies to the noise subsets in all of our “low PSD” and “high PSD” sensitivity tests. Because fingerprint and leading noise patterns are so dissimilar, differences in the amplitude of unforced TAC(x, t) variability associated with low- and high-amplitude behavior of the AMO and ENSO have relatively small impact on td(med).

Put differently, our fingerprint analysis reveals coherent, global-scale externally forced responses common to all five LEs. Examples include decreases in TAC(x, t) over the Arctic and midlatitude TAC(x, t) increases in NH continental interiors (Figs. 4a–e). These distinctive features are absent in patterns of unforced TAC(x, t) fluctuations associated with the AMO, ENSO, and other modes, which are characterized by variability at smaller spatial scales (Figs. 4f–o). This mismatch between the spatial scales of fingerprint and noise helps to explain why intermodel and inter-realization differences in the amplitude of key modes of MIV have limited impact on td(med).

9. Annual cycle changes in aquaplanet simulations

Santer et al. (2018) discussed some of the possible physical mechanisms involved in producing the distinctive patterns of observed and simulated TAC(x, t) changes shown in Fig. 1. They noted that there are pronounced hemispheric asymmetries in both the climatological mean state of TAC(x, t) and in its satellite-era trends. Climatological asymmetries in TAC(x, t) are related to NH-versus-SH differences in land fraction, heat capacity (through the differences in land fraction), and sea ice coverage. Hemispheric asymmetries in the TAC(x, t) trends over 1979–2020 are influenced not only by these factors, but also by hemispherically asymmetric external forcings. Examples of the latter include anthropogenic aerosol forcing (Bonfils et al. 2020; Kang et al. 2021) and the forcing and circulation response associated with stratospheric ozone depletion (see Fig. S3; Gillett et al. 2004; Thompson et al. 2011; Bandoro et al. 2014; Randel et al. 2017; Solomon et al. 2017). Low-frequency changes in modes of internal variability may also contribute to variations in the Hadley circulation (Mantsis and Clement 2009) and are another possible influence on TAC(x, t).

One of the most prominent aspects of the patterns in Fig. 1 is the increase in annual cycle amplitude at midlatitudes in both hemispheres, with larger increases in the NH than the SH. These “ridges” in TAC(x, t) trends arise from larger tropospheric warming in the summer hemisphere (Santer et al. 2018). Possible causes of these features include changes in the meridional temperature gradient or in atmospheric shortwave absorption that result in seasonally dependent changes in stability (Frierson 2006; Donohoe and Battisti 2013; Santer et al. 2018), poleward expansion of the Hadley circulation and the tropics (Held et al. 2000; Fu et al. 2006; Seidel and Randel 2007; Frierson et al. 2007; Kang and Liu 2012; Quan et al. 2014), lapse-rate changes unrelated to tropical expansion (Brogli et al. 2019), and summertime drying of the land surface (Manabe et al. 1981; Wetherald and Manabe 1995; Douville and Plazzotta 2017). Other factors may also be relevant, such as the response to land–sea warming contrast, the direct radiative effects of CO2, and SST trend patterns (He and Soden 2017). These explanations are not mutually exclusive.

To explore the influence of land and ice albedo on TAC(x, t) changes, we analyzed existing aquaplanet simulations performed with GFDL-AM2.1 (Feldl et al. 2017) and new simulations with CESM2-CAM6. These numerical experiments involve running an atmospheric model in aquaplanet configuration with a realistic seasonal cycle of insolation, a 30-m fixed-depth slab ocean, and quadrupled CO2. A key difference is that CESM2-CAM6 includes sea ice thermodynamics; GFDL-AM2.1 does not. In both sets of simulations, parameters influencing ice albedo were systematically varied in order to evaluate the effect of sea ice changes on atmospheric heat transport and feedback strength. We show results for one selected value of these parameters. Results for other values are qualitatively similar (see the SM, including Figs. S4 and S5).

In GFDL-AM2.1, annual-mean TMT changes between the 4 × CO2 and control simulations are largest in the tropics (Fig. 13a), where the net feedback in the simulations is positive and large (Feldl et al. 2017). The largest annual-mean TMT changes in CESM2-CAM6 occur in high-latitude regions of pronounced sea ice extent decrease (Fig. 13c). In terms of annual cycle changes, the most salient feature of Figs. 13b and 13d is that even without land and land–ocean warming contrasts, the aquaplanet simulations capture the midlatitude increases in TAC(x, t) evident in satellite data and in ESMs with realistic geography (Fig. 1). Unlike the observations and ESMs, however, these midlatitude “ridges” are more hemispherically symmetric in the aquaplanet runs. Relative to GFDL-AM2.1, midlatitude TAC(x, t) increases are larger, farther poleward, and more zonally symmetric in CESM2-CAM6. The more pronounced symmetry is likely due to the fact that the CESM2-CAM6 perturbation and control simulations are longer than in GFDL-AM2.1, yielding less noisy estimates of TAC(x, t) changes.

Fig. 13.
Fig. 13.

Changes in uncorrected TMT (°C) in aquaplanet simulations: Simulations with (a),(b) GFDL-AM2.1 (Feldl et al. 2017) and (c),(d) CESM2-CAM6 for (left) annual-mean TMT changes and (right) changes in the amplitude of the annual cycle of TMT. In GFDL-AM2.1, ocean albedo was set to values of 0.45 at grid points where the surface temperature was less than 270K. In CESM2-CAM6, the parameter used for tuning snow albedo, r_snw, was set to 0.7. Changes in the annual mean and annual cycle of TMT were calculated by differencing climatologies computed from averages of a 4 × CO2 experiment and a control run with preindustrial atmospheric CO2. The climatologies are of length 30 years for GFDL-AM2.1 and 100 years for CESM2-CAM6 (see SM).

Citation: Journal of Climate 35, 18; 10.1175/JCLI-D-21-0766.1

We draw three inferences from these results. First