This study investigates potential biases between equilibrium climate sensitivity inferred from warming over the historical period (ECShist) and the climate system’s true ECS (ECStrue). This paper focuses on two factors that could contribute to differences between these quantities. First is the impact of internal variability over the historical period: our historical climate record is just one of an infinity of possible trajectories, and these different trajectories can generate ECShist values 0.3 K below to 0.5 K above (5%–95% confidence interval) the average ECShist. Because this spread is due to unforced variability, I refer to this as the unforced pattern effect. This unforced pattern effect in the model analyzed here is traced to unforced variability in loss of sea ice, which affects the albedo feedback, and to unforced variability in warming of the troposphere, which affects the shortwave cloud feedback. There is also a forced pattern effect that causes ECShist to depart from ECStrue due to differences between today’s transient pattern of warming and the pattern of warming at 2×CO2 equilibrium. Changes in the pattern of warming lead to a strengthening low-cloud feedback as equilibrium is approached in regions where surface warming is delayed: the Southern Ocean, eastern Pacific, and North Atlantic near Greenland. This forced pattern effect causes ECShist to be on average 0.2 K lower than ECStrue (~8%). The net effect of these two pattern effects together can produce an estimate of ECShist as much as 0.5 K below ECStrue.
Equilibrium climate sensitivity (ECS; i.e., the equilibrium warming in response to a doubling of CO2) is one of the quantities that controls how much future warming we will experience in response to greenhouse gas emissions from anthropogenic activities. As such, it is frequently viewed as one of the most important numbers in climate science and much effort has been expended over decades attempting to constrain its value.
ECS can be calculated from observations or models as
where F2×CO2 is the radiative forcing from doubled CO2 and λ represents the top-of-atmosphere (TOA) flux change per degree of surface temperature change:
where TS is the global average surface temperature, R is the TOA flux, and F is the radiative forcing.
Some of the most influential estimates of ECS come from the observed warming during the historical period, between the mid-nineteenth century and today (referred to as ECShist). To estimate λ over this period (referred to as λhist), ∆ in Eq. (2) represents the change between the mid-nineteenth century and the early twenty-first century. ECShist is then calculated using Eq. (1) and λhist.
There have been many estimates of ECShist from observations [summarized in Forster (2016); see also Knutti et al. (2017)]. These tend to be lower than ECS estimated from other sources and they anchor the lower end of the IPCC’s canonical ECS range of 1.5–4.5 K. Recently, it has been argued that ECShist may not provide a good estimate of our climate system’s true ECS (hereafter ECStrue). This is based on demonstrations in models that ∆R depends not just on how much warming occurs, but also on how that warming is distributed across the globe (Armour et al. 2013; Andrews et al. 2015; Zhou et al. 2016, 2017). In other words, two climate states with the same ∆Ts, but distributed differently, can have different values of ∆R, leading to different estimates of ECShist (Olson et al. 2013; Huber and Knutti 2014). Following standard practice, I will refer to this as the “pattern effect” (Stevens et al. 2016).
The pattern effect causes ECShist to depart from ECStrue if the aspects of the warming we experienced over the historical period differ from aspects of warming at 2×CO2 equilibrium. I intentionally leave vague what is meant by “aspects” as these will be investigated in detail in the paper. Basically, though, there are two different reasons why warming over the historical period may be different from the long-term warming. The first reason is that the historical observational record is just one member of an infinity of possible climate trajectories that Earth could have experienced over the last 150 years. Dessler et al. (2018, hereafter D18) used an ensemble of climate model runs to show that different trajectories could yield widely varying estimates of ECShist. These differences in ECShist are due to internal variability, so I will refer to this variability in ECShist as the “unforced pattern effect.”
There is also a “forced pattern effect.” This is primarily related to the fact that the transient warming pattern over the twentieth century is expected to be different from the equilibrium pattern of warming; this will tend to make ECStrue larger than ECShist (Andrews et al. 2015; Armour 2017; Proistosescu and Huybers 2017; Ceppi and Gregory 2017). Previous analyses (e.g., Marvel et al. 2018; Andrews et al. 2018) have evaluated the combined forced and unforced pattern effects. In this paper, I analyze a large model ensemble to separately evaluate their magnitudes.
2. Model ensemble
We analyze output from various runs of the fully coupled Max Planck Institute Earth System Model version 1.1 (MPI-ESM1.1), collectively referred to as the Grand Ensemble. The MPI-ESM1.1 is a fully coupled climate model from the Max Planck Institute for Meteorology and consists of the ECHAM6.3 atmosphere and land model coupled to the MPI-OM ocean model. The Grand Ensemble is described in detail in Maher et al. (2019).
The MPI-ESM1.1 has a transient climate response of 1.78 K (Adams and Dessler 2019) and an effective climate sensitivity (calculated from a regression of the first 150 years of an abrupt 4×CO2 run) of 2.72 K. These values are near the middle of the CMIP5 ensemble range. We will analyze a large number of runs from this ensemble:
A 100-member ensemble of runs with historical forcing (hereafter, the “historical ensemble”). Each of the 100 members simulates the years 1850–2005 and uses identical historical natural and anthropogenic forcing. The ensemble members differ only in their initial conditions—each starts from a different state sampled from the preindustrial control simulation. This ensemble was used by D18 to characterize the impact of internal variability on ECShist and by Adams and Dessler (2019) to investigate the impact of internal variability on transient climate response. The ensemble produces a good simulation of the historical record, as seen in Fig. 2 of Maher et al. (2019).
A 68-member ensemble of runs with CO2 increasing at 1% per year (hereafter, the “1% ensemble”). Each of the members is 150 years long and uses identical forcing. Like the historical ensemble, this ensemble’s members differ only in their initial conditions.
An abrupt 4×CO2 run. In this run, CO2 is abruptly quadrupled from preindustrial values and then run for 2614 years. At that point, the model is nearly in equilibrium.
A preindustrial control run. In this run, atmospheric conditions held at preindustrial values for 2000 years. This is the run from which all other runs branch from.
Effective radiative forcing used in this paper is calculated from fixed SST runs of the model. Historical effective radiative forcing (1850–2005) is 2.2 W m−2 (D18), while 2×CO2 and 4×CO2 forcing are 3.7 and 7.8 W m−2, respectively (Adams and Dessler 2019). In all calculations, surface temperature refers to 2-m air temperature.
3. Feedbacks in the historical ensemble
D18 showed that λhist from the historical ensemble ranged from −1.63 to −1.17 W m−2 K−1 (5%–95%)—this spread is what I have designated as the unforced pattern effect. To gain physical insight into this, I decompose λhist into constituent feedbacks using the approach and radiative kernels of Soden et al. (2008), but using the feedback decomposition of Held and Shell (2012), in which the Planck and lapse-rate feedbacks assume constant relative humidity (RH). I will refer to these as the “conventional” feedbacks. For consistency with D18, I calculate those feedbacks by differencing relevant fields between the first and last decade of the runs. Picking different periods does not change the conclusions of this section.
One disadvantage of the conventional approach is that the sum of the feedbacks may not equal λhist, leaving a residual that may be comparable in magnitude to the pattern effect I am trying to diagnose. To address this, I also calculate feedbacks a second way, based on decomposing R into clear-sky and cloud radiative forcing (CRF) components:
where LW and SW refer to longwave and shortwave fluxes, clear-sky fluxes refer to what the fluxes would be in the absence of clouds (leaving everything else the same), and CRF is the all-sky flux minus the clear-sky flux. The change in these fluxes (with the corresponding forcing subtracted off) divided by ∆TS yields the individual feedbacks. For example, the clear-sky longwave feedback is
The terms λclear sky SW, λCRF LW, and λCRF SW are all calculated analogously. By construction, the sum of these feedbacks must equal λhist. I will refer to these as the CRF feedbacks.
Figure 1 shows a comparison between the ensemble-average feedbacks in the two breakdowns. I have grouped similar feedbacks together: λPlanck + λlapse rate + λΔRH with λclear sky LW, λalbedo with λclear sky SW, λLW cloud with λCRF LW, and λSW cloud with λCRF SW. The feedback pairs do not agree exactly because of differences in the underlying physical processes. For example, λclear sky SW disagrees with λalbedo because λclear sky SW contains a small fraction of the water vapor feedback caused by SW absorption by water vapor. Differences also arise from cloud-masking effects that mix the cloud and noncloud feedbacks in the CRF breakdown (Soden et al. 2004). A final difference arises because the conventional feedbacks do necessarily sum to λhist, but leave a small positive residual (λhist minus the sum of the feedbacks). The ensemble-average residual is +0.40 W m−2 K−1, with 90% of the residuals falling between 0.31 and 0.50 W m−2 K−1. Overall, though, the two feedback breakdowns give a similar picture of the breakdown of λhist. Comparisons to the CMIP5 ensemble average also show reasonable agreement (Fig. 1).
Figure 2 shows the latitude distribution of the average and standard deviation of the ensemble. In agreement with observations, the historical ensemble simulates the largest warming in the Northern Hemisphere (NH), although it overestimates warming there [Fig. 2a; see also Fig. 2 of Adams and Dessler (2019)] and simulates least warming in the Southern Hemisphere (SH).
The term λclear sky LW (Fig. 2c) is basically a mirror image of the surface warming pattern, showing that regions with more warming radiate more energy back to space. The term λclear sky SW is primarily driven by loss of sea ice, so it maximizes in the polar regions (Fig. 2d). Note that λcloud LW is larger than λCRF LW at almost all latitudes (Fig. 2e) due to cloud masking effects (Soden et al. 2004) and λcloud SW and λCRF SW are similar except in the Arctic (Fig. 2f), where cloud masking effects are also important.
4. Quantifying the unforced pattern effect
D18 calculated λhist in the historical ensemble and found that the 5%–95% spread in λhist is −1.63 to −1.17. Given that λhist is equal to the sum of individual feedbacks, variability in λhist must be driven by variability in the underlying feedbacks. Figure 3 summarizes this by plotting the average feedbacks in the 10 ensemble members with the highest ECShist minus the average in the 10 lowest ECShist ensemble members. This shows that 55% of the unforced pattern effect is due to differences in the λclear sky SW, with most of the remainder, 39%, due to differences in λCRF SW. Differences in the LW feedbacks contribute ~5%.
High ECShist ensemble members have a larger fraction of warming in the extratropics (and less in the tropics) than the low ECShist models (Fig. 4a). The difference in the LW clear-sky feedback (Fig. 4c), Δλclear sky LW, basically mirrors the temperature difference, with positive values (meaning the feedback is less negative) in regions with lower warming fractions (the tropics) and negative values in regions with larger warming fractions (the extratropics). Integrating over latitude, these differences cancel and so the global-average Δλclear sky LW is basically zero. This seems likely to be generally true, so it might be expected that this feedback should generally contribute little to the pattern effect.
The term Δλclear sky SW reflects changes in the surface albedo feedback and Fig. 3 shows that it is contributing the majority to the unforced pattern effect. Figure 4d shows that this is arising almost entirely from the Antarctic region. Thus, while there is a strong ensemble-average λclear sky SW in the Arctic (Fig. 2d), there is little variability within the ensemble in this feedback, so Δλclear sky SW there is close to zero.
The all-sky LW feedback difference (Δλall sky LW = ΔλLW CRF + Δλclear sky LW) in Fig. 4g shows that the near-zero difference in the LW feedback comes from cancellation between positive differences in the tropics and negative differences at high latitudes. The all-sky SW feedback difference (Δλall sky SW = ΔλSW CRF + Δλclear sky SW) in Fig. 4h, which is responsible for most of the unforced pattern effect, reveals that on average 45% of the difference is coming from the Southern Hemisphere extratropics (5%–95% of individual ensemble members range have values of 36%–56%), 20% is coming from the tropics (14%–44%), and 35% is coming from the Northern Hemisphere extratropics (12%–39%).
5. Causes of the unforced pattern effect
a. Sea ice
The pattern of Δλclear sky SW, with the maximum located in the Antarctic region (Fig. 4d), strongly suggests that variability in sea ice loss among the ensembles is responsible for the spread in this feedback in the ensemble. Indeed, I find a strong correlation between the decrease of sea ice over the historical period in each ensemble member and the surface albedo feedback in that member (Figs. 5a,b). More quantitatively, variability in the loss of sea ice explains about half of the variance in λhist (Fig. 5c).
I find some connection between variability in the ocean circulation and variability in sea ice. In particular, changes in the Atlantic multidecadal oscillation (AMO) index and the South Atlantic multidecadal oscillation (SAMO) index correlate with λalbedo from the same hemisphere (Fig. 6) (plots using λclear sky SW look nearly identical). The AMO and SAMO indices are defined as the average of detrended SST over the North Atlantic (0°–60°N, 0°–80°W) and South Atlantic (60°S–0°, 60°W–40°E), respectively (B. Yao et al. 2019, unpublished manuscript). There is little correlation between these indices and the albedo feedback in the other hemisphere (|r| < 0.13). I have looked at other indices [the Pacific decadal oscillation (PDO), South Pacific decadal oscillation (SPDO), interdecadal Pacific oscillation (IPO), tripole index (TPI), and Indian Ocean dipole (IOD)] and also find weak correlations between any of them and albedo feedback variability.
Of particular note, the slopes of the fits in Fig. 6a and 6b are 0.07 and 0.03 W m−2 K−1, respectively. This means that the albedo feedback response to a unit change in the SAMO is more than twice the response to the AMO. This again emphasizes the key role the Southern Hemisphere has in varying λhist. It is worth noting that CMIP5-era models do not always do a great job of simulating the details of Antarctic sea ice (Turner et al. 2013), so verifying this result with other approaches, preferably tied to observations, should be a priority.
b. Shortwave clouds
The latitudinal pattern of Δλcloud SW (Fig. 4f) does not point to a clear physical mechanism. However, previous work (Zhou et al. 2016, 2017; Andrews and Webb 2018; Ceppi and Gregory 2017; Fueglistaler 2019) has pointed toward atmospheric stability as key for regulating the cloud feedback and D18 identified ∆T500, 500-hPa tropical (30°N–30°S) temperature, as providing a fundamental control on planetary energy balance. I find that variability in ∆T500/∆TS, warming of the tropical atmosphere per unit global-average surface warming, explains much of the variability in λcloud SW (Fig. 7a). This leads to ∆T500/∆TS having a strong correlation with λhist (Fig. 7b).
The slope of the line in Fig. 7a indicates that greater warming of the troposphere makes the SW cloud feedback more negative. If the slope were due mainly to the low-cloud feedback, then I would expect to also see a similar or stronger correlation with the net cloud feedback (λcloud = λcloud SW + λcloud LW) because net cloud feedback is a better indicator of low cloud changes (changes in high clouds tend to have LW and SW feedbacks that cancel; e.g., Zelinka et al. 2012). However, correlations between λcloud (and also λCRF) versus ∆T500/∆TS yield low correlations (r = −0.08). This suggests that mid- and high-level clouds are also playing a role in the variation of the SW cloud feedback in Fig. 7a.
I have also correlated λcloud SW with other indicators of stability, such as estimated inversion strength (Wood and Bretherton 2006) and find that ensemble members whose atmosphere becomes more stable also have a more negative λcloud SW (not shown). However, the correlation (r = −0.54) is not as good as with ∆T500/∆TS. This is again consistent with the signal in Fig. 7a having a nontrivial contribution from mid- and high-level clouds. Investigating the altitude distribution of clouds driving the unforced pattern effect in models and observations should be a high priority for future work.
Given the role played by ∆T500 in regulating λcloud SW, a natural question is whether ENSO is playing a role. Figure 7c shows that ∆ENSO, the change in ENSO3.4 index between the first and last decade, does indeed correlate to some extent with λcloud SW. I also find that ∆AMO has about the same magnitude correlation (Fig. 7d). ∆PDO (not shown) also correlates with λcloud SW, but ∆ENSO and ∆PDO are strongly correlated (r = 0.73), so I do not consider them independent regressors.
c. Putting it all together
The magnitude of the unforced pattern effect is affected by the exact periods selected, as I discuss in some detail later in the paper. Most investigators use 1859–82 as a base period due to the lack of volcanic activity during those years. Using that base period, the 5%–95% spread in λhist in the ensemble is −1.46 to −1.14 W m−2 K−1, with an ensemble average of −1.33 W m−2 K−1 (changing the base and end periods does not change any of the previous results). This corresponds to an ensemble spread of ECShist of 2.53 to 3.24 K, with an ensemble average of 2.79 K. Thus, the unforced pattern effect can lead to ECShist having a bias of −0.26 to +0.46 K (−9% to +16%) relative to the ensemble average ECShist.
I also show that unforced variability in two key parameters, sea ice loss and ∆T500/∆TS, largely control the unforced pattern effect in this ensemble. These two parameters are correlated (r = −0.56) because sea ice loss exposes relatively warm ocean water, so members with more sea ice loss also have higher ∆TS. This means that there is also a positive correlation between λalbedo and λcloud SW (r = 0.40), so that variability in these two feedbacks work in the same direction to generate large variability in λhist.
A key caveat to our conclusions is the question of whether the model accurately simulates unforced variability. While we have not analyzed the fidelity of this particular model, previous work has pointed up some potential deficiencies in CMIP5-era models’ simulation of unforced variability (e.g., Zhou et al. 2016; Hedemann et al. 2017; Kajtar et al. 2019). Determining whether other models—and nature—show this same unforced pattern effect should be a high priority.
6. The forced pattern effect
To determine the magnitude of the forced pattern effect, I begin by averaging λhist over all members of the historical ensemble, yielding λhist-average. The main challenge with this calculation is that λhist-average is affected by the choice of averaging periods used in Eq. (2)—using different base and end periods can lead λhist-average to vary by a factor of 2, from −0.8 to −1.6 W m−2 K−1 (Fig. 8).
Previous investigators have attempted to get around this problem by picking periods unaffected by volcanic eruptions and I will follow that approach here. Picking volcanically unperturbed base periods (1860–69 or 1870–79) and end periods (1970–79 or 1996–2005) produce estimates of λhist-average between −1.33 and −1.27 W m−2 K−1, with an average of −1.32 W m−2 K−1.
To verify this estimate, I have also analyzed a 68-member ensemble forced by CO2 increasing at 1% per year (and no volcanoes). Figure 9 shows λ derived from the ensemble average of these 1% runs, hereafter λ1%-average. This is derived using Eq. (2) with ensemble average fluxes and temperatures; ∆ is the difference between the average of the first 10 years of the run and the average of a sliding 10-yr window. While there is some decadal variability, it is clear that λ1%-average has much less variation with time than λhist-average, reflecting more uniform forcing, particularly the lack of volcanoes. Over the entire 150-yr run, the median value of λ1%-average is −1.33 W m−2 K−1 (5%–95% of the values range from −1.29 to −1.41 W m−2 K−1), very close to λhist-average for nonvolcanic periods.
To estimate the forced pattern effect, I also need an estimate of λ from a more strongly forced equilibrium run, for which I use the 2614-yr abrupt 4×CO2 run. I estimate λ4×CO2 using Eq. (2), with ∆ representing the difference between the time average of the 2000-yr preindustrial control run and the time average of the last 500 years of the run, which covers a period nearly in equilibrium (the trend in TS over this period is 0.02 K century−1). This calculation yields a value of λ4xCO2 of −1.15 W m−2 K−1.
Thus, λ4×CO2 is about 15% less negative than λhist-average. However, the forced pattern effect should be the difference between the historical ensemble average and a 2×CO2 run, but the Grand Ensemble does not have appropriate 2×CO2 runs. Mauritsen et al. (2019) analyzed both 2×CO2 and 4×CO2 runs of the MPI-ESM 1.2 model, a model closely related to the one used here. Using data from Fig. 12 and Table 5 of that paper, I estimate that λ4×CO2 is about 7% less negative than λ2×CO2. Previous work on this (Meraner et al. 2013; Mauritsen et al. 2019) suggests that increasing λ with warming is due to increasing strength of the water vapor feedback, related to an increase in height of the tropopause, and an increasingly positive cloud feedback.
I therefore conclude that ECStrue (ECS in the 2×CO2 run) is about 8% larger, corresponding to 0.2 K, than ensemble-average ECShist. This estimate of the forced pattern effect is smaller than suggested by previous analyses (Armour 2017; Proistosescu and Huybers 2017) but close to values found by other analyses (Mauritsen and Pincus 2017; Lewis and Curry 2018).
Estimates of ECStrue require long forced runs to near-equilibrium conditions. Because such runs are relatively rare, most previous estimates of the pattern effect have used effective climate sensitivity (where λeff is estimated from a Gregory regression of an abrupt 4×CO2 run) instead of true climate sensitivity. For this model, λeff derived from regression of the first 150 years of the 4×CO2 run is −1.36 W m−2 K−1, meaning that effective climate sensitivity is more negative than λhist-average—meaning, in turn, that ECStrue < ECShist and implying a negative forced pattern effect. However, the result is quite sensitive to the period selected for the regression. Regressing years 20–150 yields a λeff of −1.09, a forced pattern effect larger than found here. I expect this may be quite different for different models, so one should be cautious interpreting estimates of the pattern effect based on Gregory regressions over arbitrary periods.
7. Causes of the forced pattern effect
Figure 10 shows global-average feedbacks from the 1% ensemble and abrupt 4×CO2 run, as well as the differences between them. The forced pattern effect shown here is almost entirely due to SW cloud feedbacks, also noted by Andrews et al. (2015).
Figure 11 shows the spatial pattern of the difference in the total cloud feedback. I plot the total cloud feedback (LW + SW) because that feedback correlates better with low-cloud changes. And I plot the cloud feedback rather than the CRF feedback because the CRF feedback has large values in the polar regions associated with cloud masking of changes in surface albedo rather than changes in clouds.
Maxima in Δλcloud occur in regions where warming is delayed: the Southern Ocean, eastern Pacific, and Atlantic Ocean south of Greenland. As the surface in these regions eventually warms in the future, the stability of the atmosphere decreases, leading to a reduction in low clouds, thereby increasing the magnitude of the cloud feedback as the climate warms (Senior and Mitchell 2000; Ceppi and Gregory 2017; Andrews et al. 2018).
In this paper, I have addressed the question: Is ECS estimated from historical observations (ECShist) a good estimate of the true ECS (ECStrue) of our climate system? I have investigated two reasons why the answer may be “no.” First, the historical observational record is just one member of an infinity of theoretical climate trajectories for the Earth since preindustrial. Different climate trajectories over this period yield estimates of ECShist that can differ from the ensemble average ECShist by −0.26 to +0.46 K, corresponding to −9% to +16% (5%–95% confidence interval).
This unforced pattern effect arises mainly from two sources: unforced variability in loss of sea ice, particularly in the Antarctic, which leads to variability in the surface albedo feedback, and unforced variability in tropical tropospheric warming, which leads to variability in the cloud feedbacks. Variability in these two parameters correlates with well-known climate indices (e.g., ENSO, AMO), suggesting that the unforced pattern effect is controlled by known modes of internal variability. This may give us a way to evaluate where the observed historical record lies within the envelope of all possible records. Doing so should obviously be a high priority for the community.
The second reason that ECShist may not be a good estimate of ECStrue is that the average transient warming pattern over the twentieth century is expected to be different from the equilibrium pattern of warming for doubled CO2, and this can also lead to differences between ECShist and ECStrue. Because this effect is related to forced warming, I refer to this as the forced pattern effect.
I estimate that ECStrue is 8% larger than ensemble-average ECShist, corresponding to a bias of 0.2–0.3 K. This forced pattern effect is mainly due to a less negative low-cloud feedback at equilibrium arising in oceanic regions of where warming is delayed, namely the Southern Ocean, east Pacific, and the Atlantic south of Greenland (Senior and Mitchell 2000; Andrews et al. 2015; Ceppi and Gregory 2017). Attempting to estimate the magnitude of the forced pattern effect with respect to effective climate sensitivity (estimated from regressions of the first 150 years or so of a 4×CO2 run) can yield widely varying results and those estimates should be treated with caution.
There are similarities between the forced and unforced pattern effect. Both operate primarily in the SW, and in both cases Δλcloud SW ≈ 0.2 W m−2 K−1 (Figs. 3 and 10). The main difference is that the unforced pattern effect has an additional contribution from Δλalbedo, which does not play a significant role in the forced pattern effect. Another difference is that we can have high confidence in the direction of the forced pattern effect: ensemble-average ECShist should be lower than ECStrue. For the unforced pattern effect, the sign is less clear, although previous analyses (Gregory and Andrews 2016; Marvel et al. 2018; Andrews et al. 2018) have argued that the observed climate trajectory of the twentieth century produces a ECShist that is lower than ensemble-average ECShist. If so then the forced and unforced pattern effects would add, leading to an ECShist from observations that could be 0.5 K or 17% less than ECStrue.
Any bias from this calculation will add to other biases in the calculation. For example, incomplete and changing spatial coverage of the surface temperature record, as well as the fact that historical surface temperature measurements are blends of SST over ocean and air temperature over land and ice (Cowtan and Way 2014; Cowtan et al. 2015) also bias ECShist low. These temperature biases can by themselves lead to a 20%–30% low bias for ECShist (Richardson et al. 2016; Adams and Dessler 2019). Combining this with a bias from the pattern effect could lead to very large low biases in ECShist. Considering all of the potential biases between ECShist and ECStrue, it seems premature to argue that climate models are overestimating ECStrue based on comparisons to observational estimates of ECShist.
This analysis highlights potential areas of future research. First, this analysis uses a single model ensemble. Do other model ensembles confirm these results? Second, how can we estimate the impact of the unforced pattern effect on the actual historical climate record? This will require a better understanding of how internal variability impacts the important climate parameters (in our analysis, sea ice and tropospheric temperature) as well as an estimate of how these have changed over the historical period.
In the short term, these uncertainties in estimates of ECShist may be difficult to reduce or eliminate. In that case, better estimates of ECS may rely on developing alternative methods of estimating ECS that are less impacted by these pattern effects, such as using short-term internal variability over the last few decades to estimate ECStrue (e.g., Dessler and Forster 2018).
This work was supported by National Science Foundation Grant AGS-1661861 to Texas A&M University. I thank Thorsten Mauritsen, Chen Zhou, and Mark Zelinka for helpful comments and the Max-Planck-Institut für Meteorologie for providing access to the Grand Ensemble. Code and selected data can be found at https://zenodo.org/record/3625002; the full data set analyzed here can be obtained from the Earth System Grid Federation CMIP6 archive, https://esgf-node.llnl.gov/search/cmip6/.