In 2018, Lewis and Curry presented a method for estimating the transient climate response (TCR) of the climate system from the temperature change between two time windows: an early baseline period in the nineteenth century and a modern period primarily in the twenty-first century. The results suggest a lower value of TCR than estimates from climate model simulations. Previous studies have identified uncertainty in the historical forcings, the impact of the time evolution of the forcing on temperature response, and observational issues as contributory factors to this disagreement. We investigate a further factor: uncertainty in the bias corrections applied to historical sea surface temperature data. This uncertainty can particularly affect the estimation of variables on decadal time scales and therefore affect the estimation of TCR using the window method as well as estimates of internal variability. We demonstrate that use of the whole historical record can mitigate the impacts of working with short time windows to some extent, particularly with respect to the early part of the record.
Several recent studies, including Lewis and Curry (2018) and Otto et al. (2013), use the ratio of the change in temperature to the change in forcing between two time windows as an estimator for transient climate response (TCR) and produce lower estimates of TCR than climate model simulations or other methods that are based on past change (Knutti et al. 2017). Previous studies have identified differences in the inferred forcings, differences in the temperature impact of historical versus transient forcing changes, and data type and coverage as potential explanatory factors for this difference (Storelvmo et al. 2016; Armour 2017; Richardson et al. 2016). In 2016 the authors of all of the major sea surface temperature (SST) datasets drew attention to major unresolved biases in historical sea surface temperature records (Kent et al. 2017), which may affect our understanding of both historical warming and internal variability. We demonstrate that these biases can also affect the results of the window method when estimating TCR, and we explore to what extent this may be mitigated by using more of the data.
Lewis and Curry (2018) choose windows at the start and end of the historical temperature record as the basis for their TCR calculation. The early time window (1869–82) was nominally chosen to avoid major volcanic eruptions, in particular the Krakatoa eruption of 1883. However, coverage of the “water hemisphere” (Boggs 1945) is almost nonexistent in the 1860s [Kennedy (2014) and the Hadley Centre SST dataset, version 3 (HadSST3), gridded data]. Infilled records (Hansen and Lebedeff 1987; Rohde et al. 2013; Cowtan and Way 2014) can mitigate coverage issues for recent decades but can only meaningfully address data “holes” of up to ~1000 km in radius (Hansen and Lebedeff 1987; Cowtan et al. 2018) and cannot reconstruct a missing hemisphere of data. Nineteenth-century temperatures are contingent on large “bucket corrections” to SST observations, the evolution of which are poorly constrained by metadata, and they show substantial differences between observational products (Folland and Parker 1995; Kent et al. 2017; Cowtan et al. 2017). An alternative early window (1930–50) used by Lewis and Curry (2018) spans the World War II period and is also the subject of large discrepancies among SST products (Kent et al. 2017).
We examine the impact of the choice of dates for the early and late windows and evaluate the impact of using short data windows rather than all of the data. The potential impact of volcanic events is addressed by application of the window method not to the observed data but to the difference between the observations and the mean of climate model simulations from phase 5 of the Coupled Model Intercomparison Project (CMIP5) using data from the historical and representative concentration pathway 4.5 (RCP4.5) scenarios. Masking the model outputs to match the observational coverage also allows us to control for the impact of changing coverage. Lehner et al. (2016) suggest that climate model simulations overestimate the volcanic response, although this may be a result of internal variability and other factors masking the volcanic response (Stevenson et al. 2017; Liu et al. 2018). Linear regression was therefore used to remove the residual volcanic contribution to the difference temperature series by using the stratospheric aerosol optical depth (Sato et al. 1993) convoluted with an exponential response function with an e-folding time of 1 yr (determined by fit to the data). No correction was made for internal variability; however, if an El Niño term is included in the regression the remaining short-term features in the variability of warming with window choice are slightly reduced.
For this analysis we will focus on the University of East Anglia Climatic Research Unit–Hadley Centre global land-plus-ocean temperature dataset, version 4 (HadCRUT4) as the temperature product (Morice et al. 2012); however, similar issues arise with the other temperature products, and in the case for the Extended Reconstructed Sea Surface Temperature(ERSST)-based products the problems in the early record are more serious (Cowtan et al. 2017). We also used temperature data from 36 CMIP5 models with, in total, 107 historical realizations, extended using RCP4.5 simulations for the period 2006–16 and regridded onto a common 1° × 1° grid. We calculated a multimodel mean gridded temperature series over the 107 simulations using monthly surface air temperature estimates, that is, the “tas” field (Taylor et al. 2012) in CMIP5 terminology (similar results are obtained if all of the simulations for each model are averaged and then an average is calculated across the models). We then converted the temperatures to temperature anomalies using a 1961–90 baseline. We averaged blocks of 5 × 5 grid cells to match the 5° HadCRUT4 grid and calculated a gridded difference map series between the HadCRUT4 gridded observations and the multimodel mean. Last, we determined the mean temperature difference for the common coverage region by using the cosine-weighted mean of the observed grid cells.
A comparison of early window dates for the HadCRUT4 temperature data (Morice et al. 2012) is shown in Fig. 1a, fixing the late window to 1995–2016, which is the longer option suggested by Lewis and Curry (2018) and is less affected by an uncorrected bias in ship observations (Hausfather et al. 2017). Coordinates represent the start and end dates of the early window, with red regions indicating that observations warmed more than model results and blue regions indicating that modeled results warmed more than observations for a given choice of early window. Different window choices can lead to the conclusion that the model results show significantly faster warming than the observations do or that the observations warm slightly faster than the model results, and this discrepancy is much larger than changes arising from presence or absence of a historical volcanic eruption in the window.
The experiment was repeated using land data only in Fig. 1b. In this case the University of East Anglia Climatic Research Unit–Hadley Centre land temperature dataset, version 4 (CRUTEM4), observations (modified by the Hadley Centre to account for urbanization and exposure biases) warm faster than the models for a window of any reasonable length. The HadSST3 observations are compared with the model marine air temperatures in Fig. 1c: these two tests show that variability in the results for different window dates arises primarily from the ocean data.
SST observations generally come from the top 10 m of the ocean and should strictly be compared with temperatures at a corresponding depth in the models. Cowtan et al. (2015) used the CMIP5 ocean surface temperature field (“tos” in CMIP5 nomenclature) for this purpose. Lewis and Curry argue that this field is not the top layer of the bulk ocean surface temperature. Richardson et al. (2016) examined 28 model configurations: in 22 of these configurations the tos field is identical (18 cases) or almost identical (4 cases) to the top layer of the bulk ocean temperature (“thetao” in CMIP5 nomenclature). We extend this analysis to 33 model configurations: for 20 of 33 model configurations the sea surface temperature field is essentially identical to the top layer of the bulk ocean temperature field. For 12 further model configurations (ACCESS1.0, ACCESS1.3, BCC_CSM1.1, CSIRO-Mk3.6-0, EC-EARTH, GISS-E2-H, GISS-E2-H-CC, GISS-E2-R, GISS-E2-R-CC, MPI-ESM-LR, MPI-ESM-MR, MRI-CGCM3, MRI-ESM1, and NorESM1-M; expansions of acronyms are available online at http://www.ametsoc.org/PubsAcronymList) the differences between tos and the upper thetao are noiselike and do not impact the trend. The remaining model (GFDL-ESM2G) shows large differences between tos and upper thetao that are suggestive of a data deposition or processing error.
The effect of window choice for observed and modeled SSTs (as opposed to modeled air temperatures) is shown in Fig. 1d. Use of model SSTs increases the warming of the observations relative to the models by approximately 0.1°C for any choice of window.
The period from 1850 to 1930 represents a change from the use of wooden buckets to poorly insulated canvas buckets in the measurement of SSTs, the latter requiring a large bias correction. The early features of Figs. 1c and 1d could be explained if this change occurred primarily between 1890 and 1910, as suggested by comparison of SSTs with coastal weather station observations (Jones et al. 1991; Folland and Parker 1995; Cowtan et al. 2017). After World War II, HadSST3 may be affected by incorrect inference of some observation types and other biases (Carella et al. 2018; Davis et al. 2018). Internal multidecadal variability may also contribute to the features of Fig. 1d, although the Pacific contribution is likely to be small in the nineteenth century because of poor coverage, and the coastal temperature difference is not localized to either the Pacific or Atlantic Oceans.
A similar experiment was conducted for the late window while holding the early window fixed at 1869–82 (Fig. 2). When using land data alone, all window choices of reasonable length lead to faster warming of the observations than the models. The sea surface temperature data show slower warming in the observations except for windows ending before 1975, because of the unusual warmth of HadSST3 relative to both models and ERSST between 1950 and 1980 (Kent et al. 2017; Cowtan et al. 2017; Carella et al. 2018; Davis et al. 2018). Windows starting after 2005 show a greater difference between observations and models: a residual bias in the sea surface temperatures for recent years (Hausfather et al. 2017) and the overestimation of forcings (Huber and Knutti 2014; Tatebe et al. 2019; Volodin and Gritsun 2018) are expected to contribute to a difference between modeled and observed warming for windows running to the present.
Multidecadal biases are present in all current SST products, including the ERSST temperature data (Huang et al. 2017) that are used in the other main temperature products not used by Lewis and Curry. ERSST shows little or no evidence of a lower early bias due to the use of wooden buckets (Kent et al. 2017), in contradiction of the observational metadata, suggesting the need for caution with respect to nineteenth-century temperatures in this product. ERSST is cooler than HadSST3 for the period 1930–50, except during World War II when it is too warm as a result of an uncorrected bias in the marine air temperatures and temporal smoothing in the ERSST algorithm suppressing the World War II bias correction (Cowtan et al. 2017).
The results of the window method are influenced by decisions concerning the criteria for window selection. We analyze the effect of window selection by evaluating the regression coefficient that fits the multimodel mean temperature change for the RCP4.5 simulations to the observational data, comparing model land air temperatures with land-based observations, model marine air temperatures with SST observations, and model SSTs with SST observations. Regression coefficients fitting the corrected model data to the observations were determined for different data selections and are given in Table 1, with values of greater than 1 indicating observations warming faster than the models, and vice versa.
Land temperature observations warm faster than the models for any of the chosen data selections, with some variation resulting from window choice (i.e., the values in the CRUTEM/tas column of Table 1 are always greater than unity). SST observations warm more slowly than modeled marine air temperatures for long windows running to the present (i.e., the values in the HadSST/tas column are less than unity). SST observations warm slightly faster than modeled SSTs for long windows (i.e., the values in the HadSST/tos column are greater than unity). Regression coefficients using model SSTs are typically ~15% higher than those using marine air temperatures (based on the ratios of the HadSST/tos to the HadSST/tas columns when using long windows). Observed SSTs warm more quickly than modeled SSTs prior to the twenty-first century, but the difference is reduced on inclusion of the last 20 years of data, consistent with the underestimation of recent SST observations and the overestimation of forcings. The inclusion of the intervening decades of data mitigates most of the variability resulting from choice of the early window, but has limited benefit with respect to the late window because the rapid temperature change at the end of the record gives the final decades greater leverage in determining the regression coefficient.
In summary, the use of short time windows and the difference between air and sea surface warming, as indicated by temperature, can influence conclusions concerning whether observations are warming faster than indicated by models, with the differences primarily arising in the sea surface temperatures. Since warming in model results is strongly correlated with forcing, this also impacts TCR estimates determined using window methods. In comparisons between observations and climate model simulations, use of longer spans of data can reduce the impact of early window choice, but varying the end point of the data still affects the results (with the implication that conclusions from historical data can change in future). It is vital that use of historical temperature data for the estimation of climate sensitivity or internal variability be informed by the literature on the limitations and biases in those products, which generally incorporates more recent results than the datasets themselves. On the basis of current data it is not possible to conclude that models show faster warming than observations do, and as a result discrepancies between model-based TCR estimates and those deduced from the observations must arise primarily from inconsistencies in TCR evaluation method, incompatibility of modeled and observed temperature estimates, and/or differences between the modeled and historical forcings.
The data computer code used in this paper, along with additional figures, is available online (https://doi.org/10.15124/92466e73-6012-4ab4-ad10-cd7fdc075cb3).
This article has a companion article which can be found at http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-17-0667.1.