Current state-of-the-art global climate models produce different values for Earth’s mean temperature. When comparing simulations with each other and with observations, it is standard practice to compare temperature anomalies with respect to a reference period. It is not always appreciated that the choice of reference period can affect conclusions, both about the skill of simulations of past climate and about the magnitude of expected future changes in climate. For example, observed global temperatures over the past decade are toward the lower end of the range of the phase 5 of the Coupled Model Intercomparison Project (CMIP5) simulations irrespective of what reference period is used, but exactly where they lie in the model distribution varies with the choice of reference period. Additionally, we demonstrate that projections of when particular temperature levels are reached, for example, 2 K above “preindustrial,” change by up to a decade depending on the choice of reference period. In this article, we discuss some of the key issues that arise when using anomalies relative to a reference period to generate climate projections. We highlight that there is no perfect choice of reference period. When evaluating models against observations, a long reference period should generally be used, but how long depends on the quality of the observations available. The Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) choice to use a 1986–2005 reference period for future global temperature projections was reasonable, but a case-by-case approach is needed for different purposes and when assessing projections of different climate variables. Finally, we recommend that any studies that involve the use of a reference period should explicitly examine the robustness of the conclusions to alternative choices.
Careful consideration of sensitivities to the choice of climate reference period is required to reliably compare climate models with observations and to produce robust projections of future climate.
Current state-of-the-art global climate models produce different values for Earth’s mean temperature. For this reason, projections of changes in Earth’s temperature over time are usually presented relative to a reference period, following Hansen et al. (1988). It is not widely appreciated, especially outside the climate science community, that the choice of reference period has important consequences for conclusions about such basic questions, such as: are climate model simulations of the past consistent with observations and what do climate models predict for the future? The importance of these questions has been highlighted by the recent debate about the difference between observed and projected multimodel-mean warming of global surface temperatures (Stott et al. 2013; Huber and Knutti 2014; Schmidt et al. 2014).
In this article, we demonstrate how the choice of reference period affects the conclusions drawn in relation to both these questions and discuss consequences for connecting climate model projections of global temperature change to the real world. We also discuss further the reasons why anomalies have long been used for comparing observational data and model output, noting important limitations. Last, we discuss the implications for near-term and long-term projections of global-mean surface temperature provided in the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) and provide recommendations for good practice.
WHY IS A REFERENCE PERIOD NEEDED FOR OBSERVATIONS?
In 1935, the World Meteorological Organization (WMO) first discussed defining a recommended “normal” period to set a standard and allow comparisons between different observational data. The length of a normal period was chosen to be 30 years, and the period 1901–30 was selected initially (Trewin 2007). The WMO has recently adopted a two-tier approach, maintaining 1961–90 as a fixed standard reference period, along with a regularly updated period, which is currently 1981–2010.
Observation-based temperature datasets also use a reference period because mean temperatures can vary over very short spatial scales (∼1 km), whereas the correlation scale for temperature anomalies is usually much larger (∼1,000 km) (Hansen and Lebedeff 1987). Fewer stations are therefore required to estimate changes in global temperatures (Jones et al. 1997). For example, Callendar (1938) first demonstrated the Earth was warming using just 147 stations, and his calculations match modern estimates well (Hawkins and Jones 2013).
A number of factors enter the decision about an appropriate observational reference period, for example, to be representative of the most recent conditions but long enough not to be overly influenced by random fluctuations, to be a period the public can relate to, to not need updating too often, to maximize the number of observations available and be simple to calculate. In addition, the mean over a reference period is insufficient to represent the climate as higher-order statistics are also required (e.g., Landsberg 1944). Huang et al. (1996) demonstrated that a normal period updated every year was optimal for making predictions for the following year.
Reference periods are also used to aid communication of the unusual (or not) nature of an observation, such as an increase in global temperatures, or of a particular event such as an extreme flood or heatwave. For example, the warm global temperatures of 2014 were not particularly unusual compared to other years since 2000 but were very unusual compared to temperatures before 1900.
Observation quality is also important for the choice of reference period. The uncertainty on the observed estimate of global and regional temperatures is larger in the past, especially pre-1900. In the case of the Hadley Centre/Climatic Research Unit, version 4.3 (HadCRUT4.3), dataset (Morice et al. 2012), the reference period is 1961–90 because of the high availability of observations during this period. However, the surface temperature observations available for this period still do not cover the whole planet. Jones et al. (1999) estimated the observed 1961–90 global-mean temperature as 14.0° ± 0.5°C, and Fig. 1 illustrates that different atmospheric reanalyses have global-mean temperatures within that range.
WHY IS A REFERENCE PERIOD NEEDED FOR MODEL SIMULATIONS?
Simulating the absolute value of many climate variables, such as global-mean surface temperature, is challenging because they represent the balance between many different physical processes. It is not currently possible to tune global climate models (GCMs) to produce accurate values for all climate variables. However, some variables have a higher priority than others. For example, it is essential to produce a model with a near-zero net top-of-atmosphere (TOA) energy balance. Without such a balance the model climate drifts and does not provide a stable baseline against which to measure the response to changing radiative forcings. As discussed below, the simulated value for global-mean temperature matters, but it is less essential—for projections of global-mean temperature—to simulate the observed value precisely. Thus, global-mean temperature is generally given less weight than TOA energy balance when climate models are tuned (although there are some exceptions; e.g., Mauritsen et al. 2012). As a result, the range in simulated global-mean temperatures is far larger than the observational uncertainty.
It may be a surprise to some readers that an accurate simulation of global-mean temperature is not necessarily an essential prerequisite for accurate global temperature projections. Some supporting evidence comes from climate models and theoretical considerations. The mean global temperatures among simulations of the historical period with the latest phase 5 of the Coupled Model Intercomparison Project (CMIP5) GCMs (see appendix A) differ by up to 3 K (with a standard deviation of 0.7 K), but the changes over time are similar (Fig. 1). Importantly, there is no robust correlation between projected future warming and historical simulated mean global temperature in the CMIP5 simulations (Fig. 2), although there are no simulations with a high mean global temperature and large future warming in this particular set of GCMs. In addition, Mauritsen et al. (2012) created four different parallel versions of the MPI-ESM-LR, tuned differently, and found only modest variations in climate sensitivity across the ensemble. This evidence suggests that these differences in mean global temperature may not be crucial for projecting future global temperature changes, given current uncertainties in climate feedbacks [also see Fig. 9.42 of IPCC AR5 (Flato et al. 2013) and the blog discussion of Schmidt (2014)].
Theoretical insight into these climate model results is provided in appendix B, which uses a simple 1D energy balance model to show that differences in mean global temperature are relatively unimportant for projections of global-mean temperature, if the feedbacks are linear. However, there is much discussion in the literature about the extent to which feedbacks may be nonlinear and what the implications would be (e.g., Good et al. 2012; Gregory et al. 2015; Bloch-Johnson et al. 2015).
The evidence discussed above summarizes the arguments that are typically presented in support of the common practice of using a reference period when comparing climate models with observations and when generating projections of global-mean temperature. However, it does not tell us what specific reference period we should choose and to what extent the choice matters. We turn to these issues next.
WHY DOES THE CHOICE OF REFERENCE PERIOD MATTER?
It is standard practice when comparing simulations of climate change with observed changes, and with each other, to use a common reference period and define “anomalies,” for example,
where T(t) is a time series of a particular variable, is the time average over a reference period, and ΔT(t) is the anomaly. This procedure is usually performed on the observations and any model simulations for the same reference period.
Figure 1 illustrates that the value of ΔT(t) changes when using different reference periods. In addition, the relative comparison of the different atmospheric reanalyses with each other, and with the simulations, also changes. For instance, in the example shown in Fig. 1, the simulations appear mostly warmer than the reanalyses with one choice of reference period but appear mostly cooler than the reanalyses with an alternative choice. Further, the NCEP CFSR reanalysis appears to be a slight outlier, but whether these differences are most apparent at the start or end of the simulation depends on the reference period. There is clearly sensitivity to the choice of reference period in any similar comparison (also see sidebar on “Illustrating the effect of reference period choice”).
We illustrate the sensitivity to the choice of reference period with an analogy and schematic. Different time series can be thought of as stiff (and “wiggly”) wires that are required to pass through a fixed length of tube. Different length reference periods correspond to tubes of different lengths, with longer tubes required to have wider diameters. There is little constraint on how the wires spread outside the tube, and for longer tubes, there is less constraint on how they vary within the tube, thanks to a larger diameter. In the extreme, a tube that is one time point long would have zero diameter because all of the wires can be forced to pass through the same point. The constraint on where the wires are positioned vertically, relative to each other and relative to the tube, varies as the tube is slid horizontally along the loose bundle of wires.
Interpreting the wires as time series of annual-mean global-mean temperature illustrates the effect of choosing a reference period (Fig. SB1). Which is the warmest time series at later times depends on the choice of reference period (or tube position). The black dashed lines show the range of possible futures for a larger set of time series demonstrating that the uncertainty shrinks for later reference periods (as discussed later in Fig. 7).
An animated version of Fig. SB1 is shown in Fig. ES1 (more information can be found online in the supplemental information available at http://dx.doi.org/10.1175/BAMS-D-14-00154.2), which also highlights the sensitivity to length of reference period.
Evaluating historical simulations.
One important test of the climate models used by the IPCC is their ability to simulate the climate of the instrumental period since around 1850. This evaluation depends strongly on the choice of reference period used.
Figure 3 shows the CMIP5 simulations of global-mean temperature from 1861 to 2005, with different percentile ranges denoted by the blue bands. The HadCRUT4.3 observations (Morice et al. 2012) and associated uncertainties are shown in black and gray, respectively. However, HadCRUT4.3 is not spatially complete. Cowtan and Way (2014, hereafter CW14) recently used spatial interpolation to fill the gaps in HadCRUT4.3, and this CW14 dataset is shown in red (also see appendix C). The four panels perform the comparison with different reference periods, first using the whole period (1861–2005) and then using three different 30-yr periods.
The following observations may be made: First, the percentile ranges clearly change with the choice of reference period—they tend to be narrower during the chosen reference period than at other times. Second, the observations fall outside the 5%–95% ranges at different times when using the different reference periods. Hence, any conclusions about the consistency of models and observations that may be inferred from analyses of this type are sensitive to the choice of reference period.
For example, there has been much attention on global temperatures over the past 15 years, which have risen more slowly than projected by the mean of the CMIP5 simulations (Fyfe et al. 2013). Figure 4 highlights that exactly where the most recent decade of observations falls within the CMIP5 simulated range is dependent on the choice of reference period but that they are always toward the lower end of the range. [However, appendix C highlights that part of the difference between the multimodel mean and observations is because the comparison is not quite like with like because of the incomplete coverage of the observations and the type of observation used (Hawkins 2013; Cowtan et al. 2015).]
The importance of this comparison is highlighted by an article in the media that stated that there is “irrefutable evidence that official predictions of global climate warming have been catastrophically flawed” (Rose 2013), based on a version of Fig. 4 for one particular choice of reference period that was published on a blog (Hawkins 2013). Other subsequent media articles more correctly discussed the implications of the most recent period using the same figure (e.g., Economist 2013).
There are different frameworks to interpret multimodel ensembles of climate simulations (e.g., Annan and Hargreaves 2010; Sanderson and Knutti 2012). Here, we consider a simple way of evaluating the reliability of the CMIP5 ensemble by examining whether the observations fall within each percentile the appropriate number of years. For example, in a reliable ensemble, the observations should be above the 95th (or below the 5th) percentile about 1 year in 20 (or 5% of the time). Table 1 shows the number of years that fall outside various percentile ranges for a range of reference periods, including the whole period (1861–2005) and subperiods. When using the whole period, the CMIP5 ensemble is close to reliable, and perhaps slightly too wide in the tails, as indicated by the smaller number of years outside the 5th and 95th percentiles than expected. However, when evaluated using other reference periods, the ensemble appears far from reliable. This behavior is likely due to the phasing of internal variability in the particular realization of climate that we have observed and will also be influenced by errors in the specified historical forcings and in the simulated response to those forcings. One might conclude that the reference period should be as long as possible to reduce the influence of variability, but a counter argument is that both the forcing uncertainties (e.g., Carslaw et al. 2013; Stevens 2013) and observation uncertainties (Morice et al. 2012) may be larger further back in time (also see appendix C). An alternative approach to assess reliability is to use trends in temperature, which are independent of the reference period (e.g., van Oldenborgh et al. 2013; Marotzke and Forster 2015). However, the analysis of trends is also influenced by the forcing uncertainties.
Projections of global-mean temperature.
Future projections derived from climate models are similarly sensitive to the choice of reference period. The IPCC AR5 used a 1986–2005 reference period for generating climate projections, but previous assessment reports used earlier periods. To understand the impact of the choice of reference period on projections it is helpful to express results relative to a common baseline.
In the United Nations Framework Convention on Climate Change (UNFCCC) process, the change in global-mean temperature since preindustrial times has become an important metric for discussions of mitigation policy. A difficulty with this metric is that preindustrial climate is not well defined because of a lack of observations before 1850 and a nonstationary climate due to natural external forcings such as solar variability and volcanic eruptions. However, a pragmatic approach is to use an early period in the instrumental record, such as 1850–1900, to define a baseline. Such a baseline should not strictly be described as preindustrial but does provide a useful reference point and was used in IPCC AR5 (Kirtman et al. 2013).
Projections relative to such a “preindustrial” baseline ΔWfuture can be constructed in two ways. First, the raw model output can be referenced to the preindustrial period:
This is perhaps the simplest method but may not be optimal because both the observations and radiative forcings in the past are uncertain, as discussed above. Instead, Joshi et al. (2011) constructed projections by combining the observed warming from a preindustrial period to a recent reference period and used the model projections to project future warming relative to the same recent reference period. This reduces the impact of the uncertainty in past radiative forcings and ties the projections to more recent observations. This approach was also used by Vautard et al. (2014) when considering changes in European temperatures and was adopted by the IPCC AR5 (Kirtman et al. 2013).
Using this approach, the observed warming up to the chosen reference period is
and the simulated temperature anomaly above the preindustrial baseline is then
As both Tmod and are reference period independent, an important quantity is
If this quantity were constant for any choice of reference period, then the reference period would not matter. However, it is not constant because that would require a perfect correlation between Tobs and Tmod.
Figure 5 shows the effect of choice of reference period for four different GCMs. The projected global temperatures have a strong dependence on the reference period for some GCMs: the impact on projected temperature changes relative to the baseline can be as much as 0.5 K. Other models show much less sensitivity. The black line, which is often the warmest, uses an early reference period and is close to the simple approach of Eq. (2). In HadGEM2-ES, it is the chosen IPCC AR5 reference period that is warmest. Note that a strong dependence on reference period is likely due to an incorrect simulation of the forcings or feedbacks, but a weak dependence could simply be due to cancelling errors. Large-amplitude internal decadal variability may also be important.
Figure 6 shows how ΔWdiff changes during the historical simulations for various CMIP5 models using rolling 30-yr reference periods. It is clear that different models behave in very different ways, warming more or less than the observations at different times. Note, for example, that ΔWdiff generally increases in the early to mid-twentieth century, when the observations warm faster than the simulations and vice versa during the recent “slowdown.”
This metric may help understand how the CMIP5 models are responding differently to different types of radiative forcing. For example, consider two models with the same overall warming, but with different amplitudes of response to aerosols or volcanic eruptions. The evolution of temperature change over the twentieth century will look different and ΔWdiff will change substantially with time. This metric merits further investigation.
Assessing multimodel projections.
From the previous results, it is clear that the choice of reference period will influence both the mean and the range of the projected global-mean temperature. Figure 7 shows projections of global-mean temperature using representative concentration pathway (RCP) 4.5 (Thomson et al. 2011) for four different reference periods. Note that using an early 1861–90 reference period produces a larger magnitude and wider range for the future than more recent reference periods. The reduction in ensemble spread when using a more recent reference period is because there is less time for the ensemble to diverge (see appendix D). The 1986–2005 period was used by IPCC AR5, and ΔWobs = 0.61 K in this case (Kirtman et al. 2013). Updating to a more recent 1995–2014 reference period reduces the projections by about 0.1 K. The choice of reference period affects the bounds on the projected range of global-mean temperature for specific time periods (e.g., 2016–35, 2046–65, and 2080–99, as indicated in the figure and as used in AR5) by up to 0.2 K. As a proportion of the total projected change (relative to the reference period or to the baseline), this sensitivity is considerably larger for the near term (2016–35) than for the long term (2080–99).
The most commonly used magnitude of global temperature change discussed in the context of climate change policy is 2°C above preindustrial. Using the projections and an early 1861–90 reference period, the projected median year of crossing this threshold is 2049. However, as discussed above, this choice is not likely to be optimal for making future projections. Using alternative choices, the projected median year of crossing this threshold changes from 2052 using the 1861–2005 reference period to 2063 for the most recent reference period—an apparent delay of a decade.
Which is the most appropriate reference period to use to make such projections? There is no straightforward answer as it will depend on the role of natural variability in recent observed changes as well as the simulated response to both greenhouse gases and volcanic eruptions. It will also depend on the quality of the observations and radiative forcings as discussed above. An additional issue is sensitivity to the length of the reference period. This is explored in appendix D, which suggests that the 20-yr period length used by the IPCC AR5 is a reasonable choice, but the optimal choice depends on the climate variable and region of interest. In summary, the sensitivity to choice of the reference period in any similar analysis needs to be examined.
Projections of global-mean temperature presented in the IPCC AR5.
The IPCC AR5 presented assessed likely ranges for global-mean temperature in the near-term (Kirtman et al. 2013) and long term (Collins et al. 2013), where “likely” refers to >66% probability of occurrence. For the long term, ranges were presented for each of the RCP scenarios. Importantly, the likely ranges were based on an assessment of the all the evidence available at the time. This evidence included, but was not limited to, CMIP5 climate model projections expressed relative to the 1986–2005 reference period. For the near-term assessment, the sensitivity to the choice of reference period was discussed explicitly (Kirtman et al. 2013, section 126.96.36.199). Taken together with other lines of evidence, this resulted in the assessed likely range for global-mean temperature in 2016–35 being significantly cooler than was suggested by the “raw” CMIP5 projections expressed relative to 1986–2005. For the long-term assessment, the sensitivity to the reference period was not discussed explicitly, but as noted above, this sensitivity is a much smaller proportion of the change signal than is the case for the near term, except for RCP2.6. In addition, the sensitivities described in this article are unlikely to affect any of the IPCC AR5 assessment statements on the likelihood of crossing particular temperature levels by certain times because these assessments were based on conservative assumptions.
REGIONAL TEMPERATURES AND OTHER CLIMATE VARIABLES.
The previous sections have focused entirely on projections of global-mean temperature, which is an important variable for summarizing future climatic changes. However, the impacts of climate change depend strongly on changes in regional temperatures, precipitation, and other climate variables. The use of a reference period for such projections raises more fundamental questions, since the (previously discussed) arguments advanced to justify this approach for global-mean temperature cannot be readily transferred to regional scales.
For example, the simulated mean temperature may be a critical issue in regions where phase transitions between water and ice are common, for example, in the presence of sea ice (e.g., Wang and Overland 2009; Mahlstein and Knutti 2012) or permafrost. Mean temperature, rainfall, and evapotranspiration will all likely be important in regions where soil moisture may become limited. Further work is needed to examine the sensitivity of regional climate projections to errors in the simulation of the mean state.
The implications for climate impact studies are profound. If, for example, daily output from GCM simulations is used as an input to a climate impact model or if a temperature threshold is used to calculate integrated measures of temperature exceedance, then an adequate simulation of the mean and variance (at least) of the variables used is necessary. Often this criteria is not met, and various bias correction techniques are adopted that add additional uncertainties (e.g., Christensen et al. 2008; Piani et al. 2010; Ho et al. 2012; Hawkins et al. 2013; Koehler et al. 2013). A case-by-case approach is required to assess the implications of mean state errors for climate impact studies.
Issues concerning the availability and quality of observational records are also challenging for regional projections. This is partly because optimal reference periods are typically much longer, as the variability is larger relative to forced changes (see appendix D).
SUMMARY AND RECOMMENDATIONS.
Because climate models produce different values for Earth’s global-mean surface temperature, it is standard practice to define a reference period when comparing simulations and projections of temperature change with observations and with each other. While there are some justifications for this approach, it necessarily involves approximations that have limited validity. Further investigating the limitations of this approach is an important area for further research. In addition, this article has highlighted the following points:
There is no perfect choice of reference period, but relevant considerations include
the need for a sufficiently long time period to reduce the effects of multidecadal natural climate fluctuations,
the quality and global coverage of the available observations, and
the quality of information about past radiative forcings that drive climate change.
The first point argues for using as long a period as possible, whereas the second and third points argue in favor of using a recent period for which better quality and more complete observations are available.
Conclusions concerning (i) the consistency of simulations with observations (e.g., over the recent slowdown period, 1998–2013) and (ii) the magnitude of projected future changes in climate both exhibit sensitivity to the choice of reference period.
A strong recommendation is that any studies that seek to draw quantitative conclusions from analyses that involve the use of a reference period should explicitly examine the robustness of those conclusions to alternative choices of reference period. This approach was taken in the assessment of near-term (2016–35) changes in global-mean temperature in the IPCC AR5 (Kirtman et al. 2013) but has not been used systematically in climate research. An alternative approach is to focus on trends in climate that do not require the definition of a reference period [see box 11.2 of Kirtman et al. 2013, van Oldenborgh et al. (2013), or Marotzke and Forster (2015)].
When presenting temperature projections relative to a fixed baseline, the impact of the choice of reference period can be several tenths of kelvins for some models. This is a significant issue for near-term projections of climate change but less significant for longer-term projections of climate change. Similarly, the reference period choice affects the projected ensemble spread of the CMIP5 models by up to 0.2 K. The same sensitivity can affect estimates of the time at which policy relevant temperature targets (e.g., 2 K above preindustrial climate) may be exceeded by as much as 15 years.
The optimal length and timing of the reference period for producing projections depends on the climate variable under consideration. The most recent 20 years [as used by IPCC Fourth Assessment Report (AR4) and AR5] is a reasonable choice for global-mean temperature (see appendix D). For other variables, such as precipitation, the optimal reference periods are likely to be much longer, as the variability is large relative to the changes.
The issues associated with the use of anomalies relative to a reference period are particularly serious for regional climate projections. Errors in simulating the mean (and higher-order moments) of regional climate variables may have consequences for regional climate and impact projections that need to be assessed on a case-by-case basis.
We thank Gavin Schmidt, Steve Smith, and Robert Vautard for useful discussions, the three reviewers and editor for their helpful comments, Francis Zwiers for suggesting the wire and tube analogy used in Fig. SB1, and Geert Jan van Oldenborgh for making much of the CMIP5 and reanalysis data available through the Climate Explorer website. We also acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modelling groups for producing and making available their model output.
APPENDIX A: CMIP5 MODELS
ACCESS1.0 Australian Community Climate and Earth-System Simulator, version 1.0
ACCESS1.3 Australian Community Climate and Earth-System Simulator, version 1.3
BCC_CSM1.1 Beijing Climate Center, Climate System Model, version 1.1
BCC_CMS1.1(m) Beijing Climate Center, Climate System Model, version 1.1 (moderate resolution)
BNU-ESM Beijing Normal University–Earth System Model
CCSM4 Community Climate System Model, version 4
CESM1(BGC) Community Earth System Model, version 1 (Biogeochemistry)
CESM1(CAM5) Community Earth System Model, version 1 (Community Atmosphere Model, version 5)
CMCC-CM Centro Euro-Mediterraneo per I Cambiamenti Climatici Climate Model
CMCC-CMS Centro Euro-Mediterraneo per I Cambiamenti Climatici Stratosphere-resolving Climate Model
CNRM-CM5 Centre National de Recherches Météorologiques Coupled Global Climate Model, version 5
CSIRO Mk3.6.0 Commonwealth Scientific and Industrial Research Organisation Mark 3.6.0
CanESM2 Second Generation Canadian Earth System Model
EC-EARTH European Consortium Earth System Model
FGOALS-g2.0 Flexible Global Ocean–Atmosphere–Land System Model, gridpoint version 2.0
FIO-ESM First Institute of Oceanography Earth System Model
GFDL CM3 Geophysical Fluid Dynamics Laboratory Climate Model, version 3
GFDL-ESM2G Geophysical Fluid Dynamics Laboratory Earth System Model with Generalized Ocean Layer Dynamics (GOLD) component
GFDL-ESM2M Geophysical Fluid Dynamics Laboratory Earth System Model with Modular Ocean Model (MOM), version 4 component
GISS-E2-H Goddard Institute for Space Studies (GISS) Model E2, coupled with Hybrid Coordinate Ocean Model (HYCOM)
GISS-E2-H-CC GISS Model E2, coupled with either HYCOM and interactive terrestrial carbon cycle (and oceanic biogeochemistry)
GISS-E2-R GISS Model E2, coupled with the Russell ocean model
GISS-E2-R-CC GISS Model E2, coupled with Russell and interactive terrestrial carbon cycle (and oceanic biogeochemistry)
HadGEM2-AO Hadley Centre Global Environment Model, version 2—Atmosphere and Ocean
HadGEM2-CC Hadley Centre Global Environment Model, version 2—Carbon Cycle
HadGEM2-ES Hadley Centre Global Environment Model, version 2—Earth System
INM-CM4.0 Institute of Numerical Mathematics Coupled Model, version 4.0
IPSL-CM5A-LR L’Institut Pierre-Simon Laplace Coupled Model, version 5A, low resolution
IPSL-CM5A-MR L’Institut Pierre-Simon Laplace Coupled Model, version 5A, midresolution
IPSL-CM5B-LR L’Institut Pierre-Simon Laplace Coupled Model, version 5B, low resolution
MIROC5 Model for Interdisciplinary Research on Climate, version 5
MIROC-ESM Model for Interdisciplinary Research on Climate, Earth System Model
MIROC-ESM-CHEM Model for Interdisciplinary Research on Climate, Earth System Model, Chemistry Coupled
MPI-ESM-LR Max Planck Institute Earth System Model, low resolution
MPI-ESM-MR Max Planck Institute Earth System Model, medium resolution
MRI-CGCM3 Meteorological Research Institute Coupled Atmosphere–Ocean General Circulation Model, version 3
NorESM1-ME NorESM1-M with carbon cycling (and biogeochemistry)
NorESM1-M Norwegian Earth System Model, version 1 (intermediate resolution)
APPENDIX B: DOES GLOBAL MEAN TEMPERATURE MATTER FOR CLIMATE SENSITIVITY?
Inspired by Schmidt (2007), a simple 1D energy balance model of Earth can be written as
where λ is the emissivity of the atmosphere (i.e., the strength of the greenhouse effect), and S = S*(1 – a)/4, where a is Earth’s albedo and S* is the solar constant. In terms of temperature, A = σTa4 and G = σTg4, where σ is the Stefan–Boltzmann constant with Ta and Tg representing the temperature of the atmosphere and surface, respectively.
and the surface temperature,
Figure B1a shows how the mean surface temperature changes with a and λ. As a “standard” model, Tg = 14.0°C for a = 0.3 and λ = 0.7643. For similar reference models with a fixed albedo (a = 0.3), a range of global temperatures (Tg = 13° and 15°C) can be produced for small changes in λ (0.7470 and 0.7814, respectively). Alternatively, for a fixed emissivity (λ = 0.7643), Tg = 13° and 15°C for a = 0.31 and 0.29, respectively. Note that the observed albedo is 0.29–0.30 (Stephens et al. 2015) and that the same global temperature can be produced with widely different parameter settings.
A change in forcing can be introduced by varying S* to change S. In the case where there are no feedbacks, then
Figure B1b shows that the warming for a ΔS = 1 W m−2 forcing change is rather insensitive to the initial mean global temperature. For example, in the range of reference models given above with initial temperatures from 13° to 15°C, this no feedback (or Planck) sensitivity only varies from 0.298 to 0.305 K (W m−2)−1.
The Planck sensitivity is amplified by various feedbacks (albedo, water vapor, lapse rate, clouds, etc.). As a simple example, we parameterize the albedo and emissivity to be linearly dependent on temperature change:
If we set k = –0.003 and m = 0.005, then Fig. B1c shows the corresponding temperature change for a 1 W m−2 forcing. The ratio of k and m is set by the changes in a and λ for the range of reference models discussed above. The resulting warming is larger than without the feedbacks but still rather independent of global-mean temperature; there is a near orthogonality between the contours of mean temperature and climate sensitivity. For our reference model, the climate sensitivity with feedbacks is 2.78 K for a forcing of 3.7 W m−2, equivalent to a doubling of CO2.
Although this is only a toy model of global climate, it provides some simple physical explanations for why climate sensitivity may not depend strongly on global-mean temperature as long as it does not vary too much from the observed value, as seen in the CMIP5 models (Fig. 2). However, Bloch-Johnson et al. (2015) discuss the possible consequences of nonlinear dependence of feedbacks on temperature and find slightly larger sensitivity to the mean state.
APPENDIX C: CONSIDERING OBSERVATIONAL ISSUES.
HadCRUT4.3 is not a spatially complete dataset, as observations are not available everywhere. Therefore, comparing HadCRUT4.3 with the full global temperature from GCMs is not necessarily a fair comparison. To test the sensitivity to the lack of complete observational coverage, Fig. C1 first compares the global temperatures using HadCRUT4.3 (Morice et al. 2012) and the interpolated version of CW14. The estimates of CW14 fall inside the HadCRUT4.3 uncertainties for the vast majority of years. The most recent decade, however, is at the upper edge of the uncertainties, suggesting that the missing regions are warming more rapidly than the global average in the last few years.
Figure C1 also shows the effect on simulated global temperatures when computed only where there is observational coverage in HadCRUT4.3, on a month-by-month basis. This “masking” introduces a slight cool bias in the simulations; that is, the simulations warm less when masked with the observational coverage, again implying that the real world may have actually warmed slightly more than observed with HadCRUT4.3. In addition, the lack of complete coverage may reduce the measured long-term future observed change by around 0.07 K and the near term by around 0.02 K (Fig. C1), assuming the observational coverage does not improve.
Figure C1 also shows the difference between CW14 and HadCRUT4.3, which broadly matches the estimates from the simulations. The differences between the two datasets apparent in the last few years are not particularly large compared to the range of corrections expected from the CMIP5 simulations.
In addition, Cowtan et al. 2015 highlight a further complication with such a comparison. The observations are constructed from sea surface temperatures (SSTs) over the ocean and near-surface air temperature over the land, whereas the models are normally presented using averaged air temperatures everywhere. As the SSTs warm slightly slower than the corresponding air temperatures over the oceans, this results in the simulated changes using air temperatures being only slightly larger than when using SSTs over the ocean.
Overall, the masking and surface-type effects account for around a third of the difference between the multimodel mean and HadCRUT4.3 when using a 1961–90 reference period (Cowtan et al. 2015).
APPENDIX D: HOW LONG SHOULD A REFERENCE PERIOD BE?
To consider the question of an appropriate length of reference period (or tube; see the sidebar on “Illustrating the effect of reference period choice”) to make future projections, we use a toy simulator of temperatures. We assume that temperature θ changes linearly with time t as
from 1970 to 2100. We consider realizations of temperature using different models (or sensitivity α) sampled from and red noise ε with variance γ2 and the AR(1) parameter fixed at 0.5.
We generate 1,000 realizations (or climate models) of temperature (with and without the noise component). These simulations are then referenced to different periods of length L = 1–30 years but all ending in 2005. The shortest reference period is then only using 2005, and the longest is 1976–2005.
The total uncertainty (using the standard deviation across the 1,000 realizations) in future temperatures can be separated into components due to different model sensitivity and that due to the noise. Figure D1 shows the total uncertainty for different future time periods (black), the uncertainty due to the model sensitivity (blue) and noise (red) for three different future time periods (columns), and two different sets of toy simulator parameters (rows). The two sets of parameters are chosen to approximately represent global-mean temperature (top row; = 0.21 K decade−1, σ = 0.055 K decade−1, and γ = 0.12 K) and European land temperature (bottom row; = 0.23 K decade−1, σ = 0.10 K decade−1, and γ = 0.75 K) in the CMIP5 GCMs.
As L increases, the noise uncertainty component decreases because of averaging over more years in the reference period. The model sensitivity uncertainty component increases linearly with L because there is more time between the middle of the reference period and the verification time, allowing for a longer period of model uncertainty growth. The total uncertainty therefore has a minimum. For global temperature, this minimum occurs for L = 1–5 yr, whereas for regional temperature, the optimal L is longer, around 15−20 yr, depending on the verification time. For climate variables with larger variability, such as precipitation, the optimal L may increase further.
We have used the toy model to demonstrate that there are competing effects when choosing a reference period for making projections. A similar procedure can be performed for simulated global temperature and European land temperatures to test whether these effects are seen in the CMIP5 GCMs. In this case, the total CMIP5 uncertainty is shown in Fig. D1 as the gray lines. This should not be expected to match the toy simulator perfectly because the CMIP5 trends are nonlinear. However, a similar structure in the change in total uncertainty for different L is seen but with less sensitivity to L. For global and regional temperatures a reference period length of 10 and 20 years, respectively, is close to optimal. The IPCC AR5 decision to use L = 20 appears to have been a good choice for presenting future changes in temperature.
A supplement to this article is available online (10.1175/BAMS-D-14-00154.2)