Reconstructions of past climate show notable temperature variability over the past millennium, with relatively warm conditions during the Medieval Climate Anomaly (MCA) and a relatively cold Little Ice Age (LIA). Multimodel simulations of the past millennium are used together with a wide range of reconstructions of Northern Hemispheric mean annual temperature to separate climate variability from 850 to 1950 CE into components attributable to external forcing and internal climate variability. External forcing is found to contribute significantly to long-term temperature variations irrespective of the proxy reconstruction, particularly from 1400 onward. Over the MCA alone, however, the effect of forcing is only detectable in about half of the reconstructions considered, and the response to forcing in the models cannot explain the warm conditions around 1000 CE seen in some reconstructions. The residual from the detection analysis is used to estimate internal variability independent from climate modeling, and it is found that the recent observed 50- and 100-yr hemispheric temperature trends are substantially larger than any of the internally generated trends even using the large residuals over the MCA. Variations in solar output and explosive volcanism are found to be the main drivers of climate change from 1400 to 1900, but for the first time a significant contribution from greenhouse gas variations to the cold conditions during 1600–1800 is also detected. The proxy reconstructions tend to show a smaller forced response than is simulated by the models. This discrepancy is shown, at least partly, to be likely associated with the difference in the response to large volcanic eruptions between reconstructions and model simulations.
Climate variability originates from two fundamentally different mechanisms: (i) changes in the large-scale (often global) energy budget of the planet due to influences external to the climate system and (ii) chaotic interactions within and between climate system components, which generate substantial variability over a broad range of time scales (e.g., Hasselmann 1976) and are unrelated to this external forcing. The externally forced component can be subdivided into that due to anthropogenic forcing (e.g., due to changes in land use and fossil fuel burning producing greenhouse gases and aerosols) and natural external forcings (e.g., solar variations and large volcanic eruptions). Changes in greenhouse gases over the last millennium have been strongly influenced by humans since the industrial revolution, while earlier changes, such as the dip over the Little Ice Age, may be at least in part due to Earth system feedbacks (see, e.g., Cox and Jones 2008; Frank et al. 2010).
To determine the relative importance of each forcing, studies often utilize detection and attribution analysis. This first determines whether an externally forced signal can be detected in observations, given our understanding of the expected response to the forcing and internal variability, and then attempts to attribute the observed response to a particular combination of individual forcings (for a review, see Hegerl et al. 2007b). Hence, detection and attribution studies require reliable estimates of internal climate variability.
Much of our understanding of the climate system originates from observations during the twentieth century, a period covered by high-quality instrumental data (for a review, see Trenberth et al. 2007). However, it is difficult to estimate internal climate variability from the twentieth-century record alone, as this period is too short to obtain well-sampled estimates of variability on multidecadal time scales. In addition, climate over the twentieth century experienced substantial anthropogenic radiative forcing, which has to be accounted for in order to derive estimates of climate variability.
Consequently, climate models are usually used to determine the characteristics of internal variability and its possible contribution to the recent warming, with the model dependence of this estimate understood as a source of uncertainty (see, e.g., Hegerl and Zwiers 2011). Reconstructions of temperature over the last millennium can provide alternative estimates of internal variability. While such estimates are prone to uncertainties (see Jansen et al. 2007; Jones et al. 2009), they nevertheless provide valuable information on the role of internal climate variability on interdecadal and longer time scales. However, to obtain these estimates we first need to separate internal variability from the externally forced component of change over the last millennium. This paper attempts to do that.
Our knowledge about the climate of the past millennium originates from two main sources: proxy reconstructions and climate modeling. Reconstructions attempt to determine past climate variability by combining information from a number of different proxies, such as tree-ring widths and/or tree-ring densities, corals, documentary evidence, ice cores, speleothems, boreholes, and sedimentary deposits (for a review, see, e.g., Jones et al. 2009). Climate modeling, in contrast, aims to simulate past climate variability based on our understanding of the underlying physics. The models are driven by reconstructions of climate forcings, such as volcanic eruptions, fluctuations in solar irradiance, orbital changes, variations in CO2, sulfate aerosols, and land-use changes (see, e.g., Schmidt et al. 2011, 2012; Forster et al. 2007). Both the forcing histories and the response of the models to the forcing are sources of uncertainty. This uncertainty implies that model-based estimates of the forced component present in proxy reconstructions are incomplete, which in turn implies uncertainty in estimates of internal variability derived by removing these estimated forced components from actual reconstructions. Nevertheless, these empirically derived estimates can provide a valuable cross-check against purely model-based estimates of internal climate variability.
Previous analyses that aimed at separating forced and internal variability over the past millennium have typically used only a limited number of climate reconstructions; few, often simple, climate models (e.g., Hegerl et al. 2007a; Weber 2005); and a very limited sample of internal climate variability. Many new reconstructions of temperature variability over the past millennium have recently become available. These reconstructions make use of an expanding body of proxy evidence in combination with improved statistical techniques aiming to better preserve variance (Ammann and Wahl 2007, Juckes et al. 2007; Mann et al. 2008, 2009; Moberg et al. 2005; D'Arrigo et al. 2006; Frank et al. 2007; Christiansen and Ljungqvist 2011, Hegerl et al. 2007a) and more thorough exploration of the sensitivity of reconstructions to the choice of proxy data and the reconstruction methods. This includes additional studies that test reconstruction methods using model output (see, e.g., Hegerl et al. 2007a; Mann et al. 2007; Jones et al. 2009; Smerdon 2012).
In addition, a relatively large number of simulations with fully coupled GCMs have recently been completed for the whole of the last millennium (section 3). These were predominantly performed as part of the fifth phase of the Coupled Model Intercomparison Project (CMIP5; see Taylor et al. 2012) and the third phase of the Paleoclimate Modelling Intercomparison Project (PMIP3; Braconnot et al. 2012). Here we make use of these new model simulations and the newly expanded range of proxy reconstructions to improve our knowledge of natural variability and its potential implications for detection and attribution studies.
The reconstructions used in this paper are introduced in section 2 and the model simulations are described in section 3. Section 4 presents results aimed at calculating the relative importance of external forcing over the past millennium. This is done by first examining the variance explained by the forced component in the reconstructions. Then a detection and attribution analysis is carried out, followed by a discussion of results and their implication for studies of recent climate change. The relative importance of the various external forcings is analyzed in section 5, followed by a summary (section 6).
A list of the reconstructions used in this paper is given in Table 1. These reconstructions were calibrated to three different geographical regions: 0°–90°N land and sea (Ammann and Wahl 2007; Juckes et al. 2007; Mann et al. 2009; Moberg et al. 2005), 20°–90°N land only (D'Arrigo et al. 2006; Frank et al. 2007), and 30°–90°N land only (Christiansen and Ljungqvist 2011; Hegerl et al. 2007a). Some reconstructions are based on a fixed number of sites (Christiansen and Ljungqvist 2011; Hegerl et al. 2007a; although the sampling within sites may decline back in time), and some are based on varying numbers of proxy sites over time (e.g., D'Arrigo et al. 2006; Frank et al. 2007; Mann et al. 2009). Hence, it is expected that uncertainties will increase further back in time. Some reconstructions are based on averaging across the available sites and then calibrating to the target of the reconstruction (e.g., D'Arrigo et al. 2006; Hegerl et al. 2007a), in some cases calibrating high- and low-frequency bands separately (e.g., Moberg et al. 2005), while others are based on reconstructing the underlying spatial patterns using multilinear regression techniques (Mann et al. 2009; Ammann and Wahl 2007). Overall, the large number of reconstructions available, based on a mix of data and methods, provides a reasonable estimate of uncertainty due to varying methodological assumptions and choices of data.
The reconstructions are shown in Fig. 1 and generally show a warmer period around the start of the millennium from around 900 to 1200 [the Medieval Climate Anomaly (MCA)], followed by a cooler period from around 1450 to 1800 [the Little Ice Age (LIA)]. They also show relatively abrupt periods of cooling associated with volcanic eruptions (e.g., following the eruption of Mount Tambora in 1815). Figure 1 shows the University of East Anglia–Met Office Hadley Centre Climate Research Unit temperature, version 4 (HadCRUT4) instrumental data (Morice et al. 2012) from 1850 to 2000 as well. All reconstructions, except Christiansen and Ljungqvist (2011), show similar trends to the HadCRUT4 data over the instrumental period. Whereas all the other reconstructions scale the proxy record in some way to the instrumental data, the Christiansen and Ljungqvist reconstruction represents an unweighted average of a number of different proxies scaled locally. To ensure consistency during the modern interval with the instrumental record over the region sampled (extratropical NH land), we have rescaled that reconstruction using an inverse regression onto the instrumental temperature series (note that the inverse regression assumes that instrumental error and noise is negligible relative to that for the proxy reconstruction; see Christiansen and Ljungqvist 2011; Hegerl et al. 2007a). Results for both the scaled and unscaled Christiansen and Ljungqvist reconstruction will be shown throughout the paper.
We first smooth all annual reconstructions and model simulations using a 10-yr Butterworth filter (see Mann 2008; also used in Mann et al. 2009), reducing power by one-half on 10-yr time scales. This ensures that both simulations and data are comparable, and that the analysis focuses on the better-reconstructed interdecadal variability (see, e.g., Frank et al. 2007; D'Arrigo et al. 2006). In our standard analysis, this is followed by an 11-yr boxcar filter in order to focus on truly interdecadal time scales. To determine the sensitivity to the smoothing length, our analysis has been repeated both without the additional smoothing and using a 21-yr boxcar filter instead of an 11-yr boxcar. This tests the sensitivity to focusing the analysis on multidecadal rather than interdecadal time scales [which, e.g., Christiansen and Ljungqvist (2011) argued is more faithfully reconstructed]. Results in an earlier paper (Hegerl et al. 2006) showed that calibration of a tree-ring-based reconstructions on interdecadal time scales yielded similar estimates of climate sensitivity compared to one using a multidecadal filtered version of the same reconstruction (Cook et al. 2004), supporting the approach taken here. Extensive sensitivity tests in earlier papers (Hegerl et al. 2003, 2006, 2007a) showed little sensitivity of detection results to the shape and length of the filter between the limits of 5 yr (where the signal-to-noise ratios of forced versus internal variability become increasingly low) and secular time scales (at which it becomes increasingly difficult to distinguish the effects of different external forcings). The sensitivity of our results to the choice of smoothing length is discussed later in the paper.
3. Model simulations
Table 2 contains details of all the climate model simulations which are used in the multimodel mean fingerprint used in this paper [CCSM4 (Landrum et al. 2013), MPI-ECHAM5 (Jungclaus et al. 2010), MPI-ESM-P (Giorgetta et al. 2013), HadCM3 (Pope et al. 2000; Gordon et al. 2000), GISS-E2-R (Schmidt et al. 2006), and BCC-CSM1.1 (Wu 2012)], and one additional model (CSIRO Mk3L; Phipps et al. 2011, 2012) whose results contributed to the calculation of the individually forced fingerprints. The surface air temperatures (SATs) of the different models are shown in Fig. 2. All the model simulations are smoothed the same way as the reconstructions and are calculated as the mean over the three different geographical regions represented by the different reconstructions. Only results for the 0°–90°N, land plus sea case are shown in Fig. 2. The GISS-E2-R simulations (Fig. 2a) included a significant initial model drift that was removed from the control simulation by fitting a second-order polynomial to the control simulation [this is the same correction technique as applied in Tett et al. (2007)].
The forcings used in the model simulations are listed in Table 2. Where two forcings are given in the solar forcing column, the simulations have been driven with a combination of two solar forcings that have been spliced together, following the guidance given by Schmidt et al. (2011, 2012). For the CCSM4 model and GISS-E2-R models, the land-use forcing has been merged into the Hurtt et al. (2009) land-use dataset after 1850, following Schmidt et al. (2011, 2012).
For the period 1850–2000, other anthropogenic forcings have been included. The CCSM4, GISS-E2-R, MPI-ESM-P, and BCC-CSM1.1 model simulations used the CMIP5 anthropogenic historical forcings. The HadCM3 simulation followed the forcings used in Tett et al. (2007), while the MPI-ECHAM5 model simulation has been driven with aerosol concentrations following Lefohn et al. (1999) (see Jungclaus et al. 2010). These differences in the treatment of the anthropogenic forcings likely explain the discrepancies in the twentieth-century trends seen in Fig. 2a.
The natural forcing datasets used by these studies are uncertain (see Schmidt et al. 2011, 2012). There is uncertainty in the amplitude of solar forcing [see, e.g., the difference between Steinhilber et al. (2009) and Shapiro et al. (2011)] and the forcing by individual volcanic eruptions [see, e.g., the difference between Crowley et al. (2008) and Gao et al. (2008)]. There is also the possibility of systematic bias owing to the scaling between the observed sulfate spikes found in ice cores and the aerosol optical depth used by the models (see Hegerl et al. 2006), although the different volcanic reconstructions span a range of assumptions. The level of land-use change in preindustrial times is also debated (see Pongratz et al. 2008; Kaplan et al. 2009).
There are also known model limitations in the response to these forcings. For example, it is likely that the models described in this paper may not be capable of fully capturing the dynamic response to solar forcing that has been proposed by several studies and involves an amplification of the response by ozone feedback within the stratosphere (see, e.g., Shindell et al. 2006; review by Gray et al. 2010). Many of the models used here do not have a fully resolved stratosphere and contain no interactive ozone chemistry. Such dynamic responses would, however, affect the hemispheric annual mean response studied here less than regional and seasonal responses. There is also evidence that the models may not be capturing the dynamic response to volcanic forcing (see, e.g., Driscoll et al. 2012), while some may be responding too strongly (see, e.g., Gent et al. 2011). There is therefore still considerable uncertainty in the model-simulated response to climate forcing over the past millennium.
The simulations driven with all forcings are shown in Fig. 2a and show similar features to the reconstructions (for a comparison, see Fig. 2b): The model simulations are slightly warmer in the MCA, although the timing of the warming is different in models and reconstructions (see Jungclaus et al. 2010). The simulations are also substantially colder than the millennial average for much of the LIA, and all show a strong increase in temperature over the twentieth century.
Perhaps the most prominent features of the simulations are the pronounced cooling episodes following large volcanic eruptions (the largest of which are highlighted by gray bars in Figs. 2a,b), in particular those in 1258 (origin unknown), mid-1450s (Kuwae), and 1815 (Mount Tambora). Note there may be uncertainty in the dating of some volcanic eruptions, particularly Kuwae (Plummer et al. 2012). The volcanic cooling simulated for these large eruptions appears far larger than that seen in the reconstructions (see Fig. 2b). This is particularly true for the 1258 eruption, which causes a large cooling in the simulations that is hardly seen in the reconstructions. This discrepancy is further explored later in this paper.
Figure 2c shows the results from composite simulations, which include the effect of solar and volcanic forcing only. For the HadCM3 and MPI-ECHAM5 models, this is the linear combination of simulations forced by volcanic and solar forcing only. For the CSIRO model, this is calculated by subtracting simulations with just orbital and greenhouse gas forcings from simulations including orbital, greenhouse gas, solar, and volcanic forcings. In all of these simulations, the solar forcing is weak, so that the combined fingerprint is dominated by volcanic forcings. The behavior of the combined simulations in Fig. 2c is similar to the all-forced simulations shown in Fig. 2a for the preindustrial periods, with a correlation of +0.87 for the period 1401–1900. This suggests that, in the model world at least, these are the most important preindustrial forcings. The composite simulations diverge from the all-forced simulations significantly from 1850 onward as anthropogenic forcings become increasingly important.
Figure 2d shows the results from simulations forced by well-mixed greenhouse gases only. For the CSIRO model, these results were calculated by subtracting simulations including just the orbital forcing from simulations with orbital and greenhouse gas forcing. The effect of the greenhouse gas forcing is clearly visible, causing a steady increase of temperature beginning around 1800. In addition to this recent warming there are also preindustrial long-term variations in greenhouse gas only simulations, with a noticeable cooling around 1600 in response to a small dip in the abundance of CO2 (see discussion below).
Each of the models that provided forced simulations also has an equivalent unforced control simulation of varying length (not shown). These were used to construct the internal variability samples required for the detection and attribution analysis discussed in section 4b.
4. Results: The role of external forcing
a. Explained variance
Before analyzing the entire millennium or substantial parts of it in a detection and attribution analysis, changes in the role and importance of forcing are explored over 200-yr windows. This serves to test for variation in the role of external forcing versus internal variability over the millennium in model simulations and addresses the extent to which these variations are reflected in reconstructions.
We define the explained variance as the squared correlation between the model simulations and individual reconstructions. For this test the period encompassing the large 1258 eruption was ignored, since the large discrepancy in response to this eruption between the simulations and reconstructions (see Fig. 2b) is likely to dominate our results early in the millennium. Where a correlation is negative, the explained variance is set to zero, as only positive correlations are meaningful measures of the correspondence between simulations and reconstructions.
The explained variance for 200-yr periods is shown in Fig. 3, where each colored symbol represents the variance within a reconstruction explained by the multimodel mean. The average of all the variances calculated for each reconstruction is also shown. While there is substantial variation between reconstructions, some common features emerge: The largest explained variances are found over the most recent 200 yr (1750–1950) with average values over 60% (also see Stott et al. 2000). The explained variance then decreases to an average of about 30% for 200-yr periods between 1400 and 1900. Before 1300, the explained variances begin to decline further, and for the periods 900–1100, 950–1150, and 1000–1200 the explained variance is negligible (note that this is robust with respect to the exclusion of the 1258 eruption). Could this decline be due to a decreasing role of external forcings back in time?
To address this question, we performed a “perfect model” test. In this analysis, the explained variance was calculated from the correlations between individual model simulations and a fingerprint that was derived from all the other simulations, which are then averaged. If the models have a similar level of internal variability to the observations and the simulated response to external forcing is accurate, then this perfect model correlation should be similar to that between the multimodel mean and the reconstructions. Errors in the external forcing used in the model simulations, errors in the model physics, and errors and additional noise in the reconstructions will reduce the explained variance relative to the average explained variance obtained from the simulations. Therefore, we expect the average of the explained variance obtained from the simulations to yield an approximate upper limit of explained variance given varying forcing levels over time (see dashed line in Fig. 3). If there was a strong divergence between the perfect model result and the explained variance in reconstructions, this would suggest that there is an increasing role of data uncertainty, that the true forcing uncertainty is larger than that represented in the forcings used in the simulations, that there are systematic biases in the responses of the models to external forcings, or any combination of these.
As expected the highest explained variance for the perfect model test is found for the most recent 200-yr period, since this is when multidecadal forcing is strongest because of anthropogenic emissions. As seen in the results for the multimodel mean versus reconstruction comparison the explained variance in the perfect model study also decreases back in time. However, if the 1258 eruption is not removed, the variance decreases by a smaller amount because of the presence of a strong cooling event common to all the model simulations (not shown). The perfect model explained variance remains within the range of results from the reconstructions from about 1200. This shows that a decrease in the importance of external forcing relative to internal variability can explain much of the observed decrease in explained variance in the simulation–reconstruction comparison.
A striking result of this perfect model study is that the explained variance during the MCA is quite low even in the perfect model study, on the order of 20%, suggesting that given the forcings used this period should be dominated by internal variability rather than strongly forced (with the exception of the enigmatic 1258 eruption, which was excluded). This is possibly due to the substantially reduced volcanic activity (other than the 1258 eruption) during this period. Therefore, values of the explained variance of the order of 20% are expected. However, the correlations between the models and the reconstructions for this period are substantially lower than the perfect model values. As discussed in section 2, increased sampling error in the reconstructions (e.g., due to decreasing availability of proxy data) could be partly responsible for the reduction in correlations with the model simulations [e.g., Frank et al. (2007) and D'Arrigo et al. (2006) caution use of their reconstructions prior to 1200 and 1117, respectively; see also Esper and Frank 2009]. Unusually pronounced internal variability during this period may also account for the reduction in explained variance (see Goosse et al. 2012). Of course, the observed discrepancies may result from some combination of these factors.
b. Detection and attribution analysis
The previous results show that there is agreement between the model simulations and the reconstructions, particularly for time periods after 1200, demonstrating at least some role for external forcing in the climate of the past millennium over most 200-yr segments. Here we use detection and attribution techniques to estimate the magnitude of the forced change, separating the climate response into forced and internal variability.
The multimodel mean response, smoothed in order to focus on multidecadal frequencies, provides a fingerprint for forced variability in the reconstructions. The contribution by the fingerprint of external forcings to reconstructed NH temperature has been estimated using a total least squares (TLS) detection and attribution technique (for details, see Allen and Stott 2003), which estimates a scaling factor β to best match the time-dependent fingerprint Xi(t) to the reconstructions Y(t),
The fingerprint of external forcing Xi is provided by the mean of an ensemble of climate model simulations (averaged over the same region as the reconstruction) and represents the time fingerprint of NH mean temperature in vector form. As only a limited ensemble of forced simulations is available, each fingerprint Xi(t) will still contain internal variability generated within the simulation υi(t), whose variance is reduced by averaging over the ensemble. The reconstruction is assumed to have an associated internal variability υ0. The method assumes a ratio of noise variance between reconstructions and that in the fingerprint, which we set to 1/n with n equal to 11, the number of ensemble members.
The scaling factors are determined following Allen and Stott 2003 by calculating the singular value decomposition (SVD) of ,
where is a matrix formed by combining the model fingerprints and reconstructions (after scaling to equal noise variance; see Allen and Stott 2003),
We can estimate the true underlying response to forcing represented in the model simulations and reconstructions Ž where Ž is calculated following Eq. (38) in Allen and Stott (2003),
and is taken from the SVD.
The uncertainty in scaling factors can be approximated analytically (see Allen and Stott 2003) but is here calculated by superimposing 2000 random samples of internal variability taken from the control simulations onto both the noise reduced observations and model fingerprints Ž. To construct these model-based samples of internal variability we use segments of control simulations of the same length as the analysis period, taken from the same models that are used to form the model mean fingerprint. The 5%–95% uncertainty range for β is based on the sampling distribution derived from these multiple samples and should be a credible range over which to quantify uncertainty, given the 12 000 yr of control simulation used (see, e.g., Hegerl et al. 1996; Allen and Tett 1999).
A fingerprint is detected in the reconstructions if the scaling factor β is significantly larger than zero. This means that the effect of external forcing is detected at the 5% confidence level in the reconstructions if the calculated 5%–95% scaling range does not encompass zero.
To evaluate the consistency of the fit, the residuals of the regression were checked against the estimates of model-based internal variability. If a fit to a reconstruction yields a residual with a χ-squared value [Allen and Stott 2003, their Eq. (26)] that is smaller than the sum of squares of 90% of the control samples, then the amplitude of υ0 is said to be consistent with the internal variability as sampled by the control simulations.
The detection and attribution analysis was performed for several different time periods: a recent time period where reconstructions are based on a larger database (1401–1950; see Jansen et al. 2007), a short period encompassing the MCA (851–1400), the full time period including the MCA (851–1950), and the corresponding preindustrial time periods (851–1850 and 1401–1850). The results for the full time period for three representative reconstructions are shown in Fig. 4a. Figure 4c shows detection results for all time periods and all eight reconstructions plus the rescaled version of the Christiansen and Ljungqvist (2011) reconstruction. The results show that the fingerprint for external forcing is detectable in all reconstructions for four of the time periods to a 5% significance level. While both the reconstructions of external forcing and of temperature are uncertain, the uncertainties between the two should be independent from each other, making spurious detection of the fingerprint highly unlikely. Thus, our results confirm a clear and important role of external forcing during the last millennium, even when the last 150 years are excluded. For the time period 851–1400, however, the external forcing is only detected in half of the reconstructions. This is perhaps unsurprising given the poor correlations found in the previous section over this period and is at least partly due to the smaller role of forcing as estimated by the perfect model correlations.
Scaling factors were also calculated using an ordinary least squares (OLS) fit (see Allen and Tett 1999). The results are very similar, with the fingerprint for external forcing detectable in all reconstructions for all time periods except for about half the cases for the period 851–1400 (results not shown). OLS analysis places all the internal variability onto the reconstructions, so this is the limiting case where error and internal variability in the fingerprint is negligible relative to that in the reconstructions. This analysis shows that the results are insensitive to the assumed ratio of internal variability between the reconstructions and models used in the TLS approach.
Figure 4b shows the internal variability samples calculated as part of the analysis υ0. As can be seen from this figure, as well as the bars shown in Fig. 4c, the residual variabilities of several of the reconstruction-derived samples are not consistent with the models' internal variabilities. This is especially true for the time periods containing the MCA. To test whether this potentially larger variability in certain reconstructions exerts any leverage on our detection results, the variance of the samples of internal variability taken from the control simulations used to calculate the range of scaling factors was scaled to fit the variance of the TLS-generated sample of internal variability, if the latter was larger, prior to repeating the detection analysis. As Fig. 4c shows, the external forcing is still detectible when testing against this inflated variability in all but one reconstruction (excluding the period 851–1400) and even for that reconstruction for all but one of four time periods.
Questions have been raised about the faithfulness of the low-frequency climate signal recorded by climate proxy data (for a review, see, e.g., Jones et al. 2009). One recent study (Esper et al. 2012), for example, argues, based on a comparison of tree-ring width and tree-ring density record estimates from one location in the Arctic, that tree-ring width records and therefore potentially any reconstructions using them may underestimate millennial-scale trends such as those associated with orbital forcing. Whether this effect actually impacts hemispheric temperature reconstructions, which reflect a mix of proxy data and sample diverse seasonal windows and latitudinal ranges, is less clear. If a long-term trend, such as that suggested by Esper et al. (2012), was missing in any of the reconstructions studied here, it should lead to a positive trend in the residuals shown in Fig. 4b (note that the simulations account for orbital forcing). We find, however, that, for all of the multiproxy reconstructions, the residuals exhibit a negative long-term trend [ranging from −0.23°C (1000 yr)−1 in Mann et al. (2009) to −0.03°C (1000 yr)−1 in Juckes et al. (2007)], suggesting if anything an overestimation of any potential long-term cooling trend. Interestingly the two tree-ring only reconstructions (D'Arrigo et al. 2006; Frank et al. 2007) do exhibit a positive long-term trend and quite a substantial one in the case of the Frank et al. (2007) reconstruction [0.17°C (1000 yr)−1] that is consistent with the potential bias noted by Esper et al. (2012). Attributing some of this trend to a stronger response to orbital forcing than simulated is difficult, however, because of the large uncertainties in the reconstructions themselves and given that internal climate variability (e.g., at the time of the MCA; residuals shown in Fig. 4b) projects onto the trend.
Figure 5 shows scatterplots for the externally forced model fingerprints plotted against the reconstructions (based on the decadal smoothed data used for the regression) and the regression lines calculated in the above analysis. This plot further highlights differences in the estimated amplitude of the forced response for different reconstructions. Several of the reconstructions have periods during the LIA that are clearly colder in the reconstructions than in the models; equally, there are several reconstructions that have periods of the MCA that are significantly warmer in the reconstructions than in the model simulations. Neither of these features is present for every reconstruction, however, indicating that there is substantial uncertainty in the level to which the MCA and LIA can be reproduced because of external forcing (see also Fig. 4b and Fernández-Donado et al. 2013). Also present in many of the regressions, particularly those for the period 851–1950, are tails where the models are far cooler than the reconstructions. These tails result from volcanic cooling and highlight that the reconstructions tend to exhibit considerably less of a cooling response to the largest volcanic eruptions than is simulated by the models (Fig. 2b).
c. Possible explanations for the model data mismatch in amplitude
A striking result in Fig. 4c is that the multimodel fingerprint appears to have too strong a response when compared to the reconstructions, as indicated by many scaling factors being significantly less than unity. A scaling factor less than 1 means that the model response needs to be reduced in amplitude to match those reconstructions. We first check the dependence of this effect upon the degree of smoothing that the model simulations and reconstructions undergo prior to the analysis. Results for when no additional smoothing is added (on top of the decadal Butterworth filter) are shown in Fig. 6a, and results for when an additional 21-yr boxcar filter is used (rather than the normal smoothing length of 11 yr) are shown in Fig. 6b. These figures show that there is some dependence of the calculated scaling values on the smoothing length. With less smoothing the scaling ranges are less consistent with unity, indicating a larger discrepancy in response to forcings in the model simulations compared to the reconstructions. When the smoothing is increased, however, to focus on lower-frequency responses, the model response becomes more consistent with the reconstructions. Many reconstructions now yield scaling factors that are consistent with unity, at least for the more reliable more recent periods. It is worth noting that, although the values of β may be sensitive to the smoothing length, the detectability of the external forcing is not.
It is also possible that the modeled response to volcanism is systematically too large. Comparisons between simulated and observed twentieth-century records suggest a stronger simulated response than that of the observations (see Hegerl et al. 2007b); however, the observations are within the uncertainty range, and the cooling response may have been masked by substantial El Niño events closely following several large eruptions. As the uncertainties in reconstructed forcing and model response are larger prior to the twentieth century, the possibility of an excessively large model response can neither be ruled out nor confirmed based on present data. However, if the response of the multimodel mean to every forcing was systematically too large, then the observed response should be smaller than the model response regardless of the choice of smoothing length. This is not the case (Fig. 6c).
As our previous analyses indicate, this problem seems to be linked to the high-frequency response. Volcanic eruptions play a substantial role over much of the last millennium (see Hegerl et al. 2003, 2007a,b; Miller et al. 2012; Weber 2005) and show the strongest response on short time scales. A visual inspection reveals that some of these seem to be excessively large in the fingerprint relative to the reconstructions. This forcing has large short-term effects; therefore the low scaling factors observed in the high-frequency response could plausibly result from discrepancies between the simulated and observed responses to volcanic forcing, rather than a systematic error in the model response. One possible factor may be errors in the volcanic forcing history. For example, Hegerl et al. (2006) estimated a total uncertainty in the magnitude of the overall volcanic forcing time series of ~35% due to uncertainty relating to the scaling of sulfate measurements in ice cores to the aerosol forcing. This would therefore indicate that a scaling factor as small as about 0.7 might not be inconsistent with the data given forcing uncertainties, which would yield a multimodel mean response consistent with many more of the reconstructions, at least over the best reconstructed periods (Fig. 4). It is further possible that inaccuracies in the implementation or response to the volcanic forcing could play a role (see, e.g., Driscoll et al. 2012; Gent et al. 2011; Timmreck 2012), especially for larger eruptions such as the 1258 eruption because of the coagulation of sulfate aerosol particles (Timmreck et al. 2009). On the other hand, Mann et al. (2012a) showed that a reconstruction displayed less cooling than energy balance models even when forced using the smallest published volcanic forcing estimates (Mann et al. 2012a), although an older density-based record (Briffa et al. 2001) showed volcanic cooling in the past few centuries that was very similar to that simulated by an energy balance model (Hegerl et al. 2003).
Other recent work (Mann et al. 2012a) suggest that this discrepancy could arise from limitations in certain types of proxy information used in temperature reconstructions, in particular tree-ring width temperature proxies which are typically obtained from tree-line proximal environments. This finding has been challenged by Anchukaitis et al. (2012), which in turn has been challenged by Mann et al. (2012b).
To test whether the low scaling factors could be arising solely because of the differences in response to large volcanic eruptions, the detection analysis was repeated with the years surrounding the largest volcanic eruptions masked out. For this analysis, large volcanic eruptions were defined as periods when the aerosol optical depth in the tropics within the Crowley et al. (2008) dataset (which many of the models implement; see Table 2) exceeds 0.25. All these events (viz., three major eruptions in the thirteenth century, Kuwae in the mid-fifteenth century, and Tambora in 1815) plus 5 yr on either side were masked out (indicated by gray bars in Fig. 2) prior to the detection analysis. The results are shown in Fig. 6c and are similar to those calculated using 21-yr smoothing (Fig. 6b). The majority of the scaling factors now lie around unity, indicating that the model response is consistent with the reconstructions. The uncertainty ranges have also increased. This is to be expected, as the large volcanic eruptions represent some of the strongest signals in the record. By masking out large volcanic eruptions, substantial constraints on the scaling factors are removed and the signal-to-noise ratio is reduced.
d. Implications for the detection of recent climate change
We now turn to examining internal climate variability on long time scales. We have two alternative samples of internal variability: one taken from model control simulations and one given by the residual variability in the reconstructed temperature not explained by the fingerprint for external forcing, calculated from Ž [see Eq. (4)]. For the TLS regression to be self-consistent, the variability of the residuals should be comparable to that of the control simulations. Figure 4c shows that the residual from six out of eight reconstructions [ignoring the unscaled Christiansen and Ljungqvist (2011) reconstruction] is consistent with at least one control simulation for the period 1401–1850. The other two show a larger residual over part of the LIA (see Fig. 5) and have poor correlations with the model simulations (see Fig. 3). In contrast, residuals from only four reconstructions are consistent for the longer time period 851–1850 because the largest residuals occur early in the millennium (Fig. 4b), during the MCA, whose peak does not coincide with periods of strong forcing (e.g., high solar activity; see Ammann et al. 2007, and Jungclaus et al. 2010) and which, if the model fingerprints are correct, would point either toward unusually pronounced internal variability (Goosse et al. 2012) or perhaps increased sampling uncertainty and data noise in the reconstructions and/or forcings.
If the control simulations do not adequately sample the full range of the climate's internal variability then it could have a profound impact on many detection studies carried out over the last couple of decades (see Hegerl and Zwiers 2011), as these have mainly relied on samples of internal variability derived from models. To examine if the recent warming is detectably different from internal variability, given the estimates of residual variability calculated here, we examine the largest trends in these estimates of internal variability and compare them to the recent period. Figure 7 shows the recent 50-yr trend (corresponding to 1960–2010) calculated from the HadCRUT4 data (Morice et al. 2012) for all domains considered here compared to estimates of internal climate variability from the reconstructions. For all the reconstructions investigated, this alternative sample of internal variability calculated from the residuals of the regression has 50-yr trends that are much smaller than the recent instrumental trend in the domain reconstructed (this conclusion also holds for 100-yr trends; not shown). Thus, reconstructed temperatures of the last millennium confirm that the contribution by internal climate variability to the recent warming is small, strengthening the claim that internal variability alone is “extremely unlikely” to explain recent warming (Hegerl et al. 2007b).
The recent observed trends are also unusual in the context of total natural climate variability (forced and unforced) since the maximum trends calculated from all the raw reconstructions for preindustrial periods (850–1850) are found to be significantly smaller than the recent 50-yr trend (not shown). This is also true for the multimodel mean; however, several of the individual model simulations contain a small number of slightly larger 50-yr trends associated with the largest volcanic eruptions.
5. Which forcings are important?
To address the question of which external forcing is most important to explain the changes observed, individually forced simulations are required. Here we use multimodel fingerprints from three different GCMs (see Figs. 2c,d and Table 2) to investigate the contribution from natural external forcings (solar and volcanic forcing combined) and from changes in the concentrations of well mixed greenhouse gases, particularly the dip in CO2 recorded over parts of the LIA (see, e.g., MacFarling Meure et al. 2006). The fingerprint method is based on the period 1400–1900, after which other anthropogenic forcings, particularly anthropogenic aerosols and, to a lesser extent, land-use change become increasingly important (e.g., Hegerl et al. 2007b; Tett et al. 2007). This analysis used the TLS detection and attribution method [Eq. (1)], where several scaling factors βi were estimated to fit the fingerprints Xi(t) to the reconstructions Y(t). Several of the model simulations which are used to calculate the fingerprints (see Figs. 2c,d) are themselves calculated as the sum of two simulations, and this was taken into account when estimating the ratio of internal variability in the fingerprints to that in the reconstructions.
The detailed results for three reconstructions and the scaling factors for a larger range of reconstructions are shown in Fig. 8. The combined volcanic and solar fingerprint is detectable in all the reconstructions used and causes large cooling episodes in the mid-fifteenth, seventeenth, and early nineteenth centuries. Since the volcanic signal dominates the volcanic plus solar fingerprint, at least in the models, these results suggest that volcanic forcing is the dominant driver of forced variability in preindustrial SATs for the time period studied here. However, independently from solar and volcanic forcing, a significant temperature change has been detected in response to pre-twentieth-century greenhouse gas variations in all but three reconstructions. This forcing caused a small but sustained cooling during much of the sixteenth and seventeenth centuries with a best estimate of up to ~0.1°–0.2°C (depending on the reconstruction used) relative to the mean temperature for the period 1400–1900 (see Fig. 8a).
The cause of this decrease in CO2 has not been conclusively determined. Some authors (e.g., Ruddiman 2003; Faust et al. 2006; Nevle and Bird 2008) have argued that it could be a consequence of human land-use activity, attributing the decrease in CO2 to a decrease in agricultural usage and therefore a subsequent increase in natural vegetation following the conquest of the Americas (from ~1519 to ~1700). However, Pongratz et al. (2011) suggest that this is unlikely. Yet other studies (Joos et al. 1999; Trudinger et al. 2002) attribute the drop to natural forcings, such as solar and volcanic forcing. It is also possible that internal climate variability could partly explain some of the dip (see, e.g., Jungclaus et al. 2010). Despite this uncertainty in the origin of the reduced greenhouse gas concentration over that period, our paper shows for the first time that this decrease in CO2 and the subsequent slow increase caused a detectible temperature response to greenhouse gases prior to 1900 in many of the records, highlighting the role of greenhouse gas forcing prior to the more recent period of industrial greenhouse gas emissions.
6. Discussions and conclusions
The work presented in this paper examines the role of external forcings on the climate of the last millennium. Consistent with earlier studies (Crowley 2000; Yoshimori et al. 2005; Hegerl et al. 2007a), we find the LIA likely to have been in large part externally forced, since a large fraction of the variance in most reconstructions can be explained by the model simulations and since the model fingerprint for forced variability is detectable at the 5% level in all the reconstructions analyzed.
The variance of the residuals that is not explained by the response to external forcing as simulated in the models is, for the majority of reconstructions, consistent with the variance of control simulations if analyzed over the past 600 yr. There are, however, large differences between the different reconstructions. Several are only poorly correlated to the model simulations and have large residuals that cannot be explained by the estimated radiative forcing even over this shorter interval. Since the uncertainties in the model simulations and reconstructions are independent of each other, the high correlation between the models and some reconstructions is unlikely to be due to chance alone. From attribution analysis using fingerprints of natural (volcanic and solar) and anthropogenic (greenhouse gas forcing), it can be shown that explosive volcanism and changes in solar output combined are the dominant drivers of forced variability over the second half of the last millennium, although greenhouse gas variations are also likely to have significantly contributed to the cold conditions during the period 1600–1800.
The variance of the residuals calculated from the detection analysis encompassing the MCA is for many of the reconstructions larger than the variance of the control simulations during this period. This could be due to increased uncertainty in the reconstructions, for example, due to the declining number of proxies or to errors in the forcing datasets used to drive the models. It could also be due to strong and anomalous periods of internal variability or both.
The 50-yr trends in the samples of internal variability resulting from the detection analysis for the full period analyzed here (850–1950) were compared to the recent 50-yr temperature trend. This shows that, for all the samples of internal variability calculated (even those with higher variance than the control simulations), the largest 50- and 100-yr trend found in reconstructions after removing the forced component is much smaller than that found in the last 50–100 yr of the instrumental record (1960–2010 and 1910–2010). This substantially strengthens the claim that internal variability alone is “extremely unlikely” to explain recent warming (Hegerl et al. 2007b).
For the majority of the reconstructions the detection analysis estimates scaling factors significantly less than unity, indicating that the response to external forcing in the models is stronger than that inferred from the proxy reconstructions. While we cannot rule out that this discrepancy is due to an excessively large response in the multimodel mean to all forcings, this would not explain our finding that the discrepancy between the simulated and reconstructed responses is no longer apparent when disregarding a short period immediately following the largest volcanic eruptions of the past millennium. Possible explanations for this latter observation, as noted earlier, are (i) better fidelity of the low-frequency signal in proxy reconstructions or (ii) possible loss of fidelity of certain types of proxy data (particularly tree-ring data) in resolving very large volcanic cooling episodes. Other possible factors are (iii) uncertainties in the magnitude of the volcanic forcing used in the multi model ensemble used here, (iv) uncertainty in the representation of volcanic forcing within the models, (v) errors in the response of the models to volcanic cooling, or some combination of all of these factors.
To conclude, this paper builds on previous studies looking at the detection and attribution of the causes of climate change in NH temperature reconstructions, such as those by Hegerl et al. (2003, 2007a). This work uses an ensemble of GCM simulations, many of which have only just become available as part of the CMIP5/PMIP3 initiative, as well as many more reconstructions compared to earlier results using fewer simulations, less reliable forcing estimates and sometimes energy balance models. Our analysis also pushes detection of the forced response back to 850 in many cases.
Our results have enabled us to better place the recent warming in the context of long-term change, have strengthened the evidence for the importance of natural forcing in the climate of the last millennium, and have highlighted that the model–reconstruction discrepancy in the response to volcanic eruptions, as well as significant differences in the magnitude of the MCA, cannot be fully explained by our understanding of internal variability. We also detect, for the first time, a preindustrial greenhouse gas signal prior to 1900 in many of the records.
The publication was supported by NERC Grant NE/G019819/1 and NCAS via a CMIP5 grant and a long-term grant. M.E.M. acknowledges support from the ATM program of the National Science Foundation (Grant ATM-0902133).
We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table 2 of this paper) for producing and making available their model output. For CMIP, the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.
We acknowledge the help of Bette Otto-Bliesner, Johann Jungclaus, Gavin Schmidt, and Jie Zhang for help obtaining and using the model data and Gareth Jones for making a long HadCM3 control simulation available for our use and thank three anonymous reviewers for the insightful comments that helped improve the manuscript.