1. Introduction
Evidence of how climate has changed in past centuries can inform our assessment of the anthropogenic role in observed twentieth-century warming (e.g., Folland et al. 2001). The lack of widespread instrumental surface temperature estimates prior to the mid-nineteenth century (e.g., Jones et al. 1999) places particular emphasis on the need to reconstruct the history of climate changes accurately, which can only be achieved via the careful use of long-term empirical evidence. Such empirical evidence comes from “proxies” of climate variability derived from the environment itself and from documentary evidence (Le Roy Ladurie 1971; Wigley et al. 1981; Crowley and North 1991; Bradley and Jones 1995; Bradley 1999; Jones et al. 2001a).
Particularly useful in this context are high-resolution (annually or seasonally resolved) proxies such as tree rings (e.g., Fritts et al. 1971; Fritts 1991; Briffa et al. 1994, 1998a, b, 2001, corals (e.g., Evans et al. 2002; Hendy et al. 2002), ice cores (O’Brien et al. 1995; Appenzeller et al. 1998; Meeker and Mayewski 2002), lake sediments (Hughen et al. 2000), and long documentary and instrumental series (Pfister et al. 1998; Luterbacher et al. 1999), all of which may be combined into “multiproxy” assemblages (Bradley and Jones 1993; Overpeck et al. 1997; Mann 2002a, b; Mann et al. 1998; Crowley and Lowery 2000; Folland et al. 2001; Jones et al. 1998, 2001a; Cook et al. 2002; Luterbacher et al. 2002a, b). A critical advantage of using such high-resolution proxy data is the possibility of comparing the proxies against long temporally overlapping instrumental records both to estimate the climate signal in the data (calibration) and independently test the reliability of the signal (verification or cross validation).
Annually resolved proxy indicators have been used to reconstruct spatial climate fields such as sea level pressure (SLP; Fritts 1991; Luterbacher et al. 2002a, b; Meeker and Mayewski 2002), terrestrial surface air temperature (SAT; Briffa et al. 1994, 1998a, 2002b, continental drought (Cook et al. 1999), sea surface temperature (SST; Evans et al. 2002), and the combined global SAT–SST temperature field (Mann et al. 1998, 1999). These reconstructed fields have been spatially averaged to yield estimates of hemispheric mean temperature (e.g., Osborn et al. 2004, manuscript submitted to Global Planet. Change, hereafter OSB; Mann et al. 1998, 1999) or circulation/SST indices such as the Niño-3 index of El Niño–Southern Oscillation (ENSO; Mann et al. 2000a, b) and the North Atlantic Oscillation (NAO; Luterbacher et al. 2002b; Cook 2002). Unlike hemispheric mean reconstructions, spatial field reconstructions retain vital information that can provide insight into the mechanisms or forcing underlying observed variability (e.g., Briffa et al. 1994, 2002a, b; Cook et al. 1997; Delworth and Mann 2000; Shindell et al. 2001; Waple et al. 2002; Braganza et al. 2003).
Annually resolved proxy networks have also been used to directly reconstruct indices of climate variability such as the NAO (D’Arrigo et al. 1993; Appenzeller et al. 1998; Cullen et al. 2001; Mann 2002b; Cook et al. 2002), the Pacific decadal oscillation (PDO; Biondi et al. 2001; Gedalof et al. 2002), ENSO [including the Niño-3 (Mann et al. 2000a, b) and Southern Oscillation (Stahle et al. 1998) indices], and hemispheric mean temperature series (Jacoby and D’Arrigo 1989; Bradley and Jones 1993; Overpeck et al. 1997; Jones et al. 1998; Briffa et al. 1998a, 2001, 2002a; Crowley and Lowery 2000; Mann and Jones 2003). Such approaches are potentially limited by the assumed relationship between local variables recorded by the proxies (temperature and precipitation) and larger-scale climate patterns, since the relationship between local and large-scale influences may change over time (e.g., Jones et al. 2003b).
Of particular interest in this study are various recent reconstructions of NH temperature from proxy data networks (Bradley and Jones 1993; Overpeck et al. 1997; Briffa et al. 1998a, b, 2001; Jones et al. 1998; Mann et al. 1998, 1999; Mann 2002a; Crowley and Lowery 2000). Most reconstructions show notable overall similarity (Mann 2000, 2001, 2002a; Briffa and Osborn 2002; Jones et al. 1998, 2001a; Folland et al. 2001; Mann and Jones 2003; Mann et al. 2003a, b). For example, the late-twentieth-century warmth is unprecedented in the context of the past 1000 yr in all reconstructions given the published estimates of uncertainty in the reconstructions (e.g., Folland et al. 2001; Jones et al. 2001a; Mann et al. 2003b; Cook et al. 2004). In addition, the empirical reconstructions generally show considerable similarity to independent climate model simulations (Free and Robock 1999; Crowley 2000; Shindell et al. 2001; Gerber et al. 2003; Bertrand et al. 2002; Bauer et al. 2003), with isolated exceptions (Gonzalez-Rouco et al. 2003).
Some differences do exist, however, among hemispheric temperature reconstructions, with certain reconstructions (e.g., Esper et al. 2002) indicating greater peak cooling in past centuries than others (see also Briffa and Osborn 2002; Mann and Hughes 2002; Mann 2002a; Mann et al. 2003b). It is important to try to understand the sources of the differences between the various NH temperature reconstructions. This undertaking is complicated by the fact that several distinct factors in varying combinations could be responsible for the differences between reconstructions. One factor is the method employed to assimilate the information from proxy data networks into a reconstruction of past climate. The simplest method is to construct an unweighted average of a set of “standardized” proxy series believed to represent a particular quantity (e.g., temperature, or an index of ENSO). The single composite series can then be scaled against an appropriate target index. For example, a composite of proxy indicators known (or assumed or shown by correlation) to reflect local surface temperatures can be scaled against the instrumental Northern Hemispheric mean temperature record during the period when proxy and instrumental data overlap. The scaled series is then interpreted as an NH mean temperature reconstruction based on the proxy data (e.g., Bradley and Jones 1993; Jones et al. 1998; Crowley and Lowery 2000; Mann and Jones 2003). Similarly, one can composite indicators believed to be sensitive to ENSO and scale the composite to the instrumental Southern Oscillation index (SOI) to yield an SOI reconstruction (Stahle et al. 1998). Alternatively, a large number of local or regional regressions between proxy indicators and instrumental data can be used to build up a reconstruction of an entire field. Such “local calibration” approaches assume a local relationship between predictor (e.g., maximum tree-ring latewood density) and climate variable (e.g., summer surface air temperature; Briffa et al. 1998a, 2001, 2002a, b).
A more elaborate approach is to use a climate field reconstruction (CFR) technique (see Smith et al. 1996; Kaplan et al. 1997; Schneider 2001; Mann and Rutherford 2002; Rutherford et al. 2003) to reconstruct a large-scale field from a proxy data network through multivariate calibration of the large-scale information in the proxy data network against instrumental data (see also Fritts et al. 1971; Guiot 1985, 1988; Fritts 1991; Cook et al. 1994; Mann et al. 1998, 1999, 2000a, b; Mann and Rutherford 2002; Luterbacher et al. 2002a, b; Evans et al. 2002; Pauling et al. 2003). The CFR approach does not assume any a priori local relationship between proxy indicator and the climatic field being reconstructed. For example, a proxy sensitive to convection/rainfall in the central tropical Pacific (indicative of ENSO variability) can be used to calibrate the surface temperature patterns associated with ENSO even though the proxy itself is not related to local temperature. In this manner, a large-scale climate field can often be efficiently reconstructed through CFR techniques from a relatively modest network of indicators (e.g., Bradley 1996; Evans et al. 1998; Mann and Rutherford 2002; Zorita et al. 2003). Such methods arguably depend more heavily on assumptions about the stationarity of relationships between proxy indicators and large-scale patterns of climate variability than the local calibration approach. Model experiments suggest that this probably is not problematic for the range of variability inferred for recent past centuries (Rutherford et al. 2003). Reconstructions of the more distant past [e.g., the mid-Holocene (Bush 1999; Clement et al. 2000)] would require, however, a more careful consideration of stationarity issues.
A second complicating factor in comparing different reconstructions involves the potentially different character of the proxy network used to produce the reconstruction. Some proxy networks consist of only one specific type of proxy information [e.g., tree-ring maximum latewood density measurements (Briffa et al. 1998a, b, 2001], while other multiproxy networks combine several types of proxy information [tree-ring width and density measurements, indicators derived from corals, ice cores, lake sediments, and historical documentary records (Mann et al. 1998, 1999, 2000a, b)]. Proxy networks can differ in their sensitivity to specific meteorological variables. Extratropical high-latitude tree-ring networks typically provide warm-season temperature information, while tree-ring information from lower-latitude semiarid/Mediterranean or tropical environments, corals, ice cores, and documentary records provides variable seasonal information regarding distinct climate variables. In addition, proxy networks often differ dramatically in the number of proxies used, ranging from a handful of very long proxies (Jones et al. 1998; Crowley and Lowery 2000; Esper et al. 2002) to a potentially much larger [hundreds (e.g., Mann et al. 1998, 1999; Briffa et al. 1998a, b, 2001] but temporally variable set of proxies. The sampling error in hemispheric estimates based on the latter is likely to be smaller than that in the former, but errors will increase back in time leading to expanding uncertainties in earlier periods (e.g., Mann et al. 1999; Jones et al. 2001a).
An additional factor is the target season of the reconstruction (annual mean, boreal warm season, or boreal cold season), as discussed by Briffa and Osborn (2002) and Jones et al. (2003a). The target season is, to some extent, constrained by the particular proxies used. However, it is not always possible to know the precise mix of seasonal information in the proxy network a priori. This is particularly true in large-scale CFR where a precipitation-sensitive proxy may, for example, be an important predictor of a large-scale temperature pattern, as discussed above. In this case, the optimal target seasonal window can nonetheless be evaluated through calibration and cross-validation exercises (e.g., Mann et al. 2000b).
A final related factor is the target region of the reconstruction. Tropical SSTs are, for example, typically less variable than extratropical, continental surface air temperatures, so a reconstruction targeting the entire NH [land and ocean, tropics and extratropics (e.g. Jones et al. 1998; Mann et al. 1999; Crowley and Lowery 2000; Mann and Jones 2003)] is likely to yield smaller amplitude variability than one targeting extratropical continental regions only (e.g., Briffa et al. 1998a, b, 2001; Esper et al. 2002).
One approach used to compare reconstructions based on proxy data with different seasonal or regional emphases is to rescale the reconstruction against an appropriate target index. For example, a reconstruction based on extratropical land-only proxies might still be rescaled to the full NH instrumental mean series, or a reconstruction based on annual proxies might still be scaled to a warm-season instrumental hemispheric mean series (Briffa and Osborn 2002). However, there are some pitfalls to this approach. Any similarity between patterns of temperature change in different seasons and regions over the instrumental record may be relatively unique to the late nineteenth and twentieth centuries. Seasonal temperature trends show greater differences in prior centuries (see, e.g., Jones et al. 2003a; Luterbacher et al. 2004), and preanthropogenic, natural forcing appears to have a different spatial and seasonal temperature signature from anthropogenic forcing (Shindell et al. 2003). Moreover, although tropical and extratropical temperature trends are similar during the instrumental period, there is some evidence that they may have been quite different in past centuries (Hendy et al. 2002; Cobb et al. 2003). More sophisticated approaches to dealing with differing seasonal and spatial emphases are thus preferable.
The intent of this study is to provide a systematic assessment of the relative impacts of these four factors on published large-scale surface temperature reconstructions. We do this based on the use of two different reconstruction techniques: (a) local calibration and (b) large-scale CFR. We analyze two nearly independent networks of predictors, one that is globally extensive (land and ocean, tropical and extratropical), represents various seasons, and consists of multiple proxy types [multiproxy (Mann et al. 1998)], and another that is extratropical and terrestrial, is reflective primarily of warm-season conditions, and is based entirely on maximum latewood tree-ring density (MXD; Briffa et al. 1998a, b, 2001, 2002a, b). For each of the two proxy networks, reconstructions are performed for three different target seasons (annual mean, boreal cold season, and boreal warm season), and the resulting NH mean temperature reconstructions are compared based on averages over distinct spatial domains (full NH land and ocean and extratropical land regions only). Additional insights are obtained from comparisons with other published NH temperature reconstructions (Mann et al. 1998, 1999; Briffa et al. 1998a, b, 2001; OSB; Esper et al. 2002).
2. Data
a. Instrumental surface temperature data
We use the 5° latitude × 5° longitude Climatic Research Unit (CRU) grid-box surface temperature dataset available from 1856 to the present to calibrate and reconstruct the surface temperature field from proxy data networks (note: the surface temperature data are available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a). The data consist of SAT over land and SST over the oceans (Jones et al. 1999, 2001b; Jones and Moberg 2003). We use the HadCRUT dataset (Jones et al. 2001b) rather than the more recently published HadCRUT2 (Jones and Moberg 2003) version. We use the restricted period 1856–1971 for calibration as discussed below but consider the resulting reconstructions in the context of the entire record (1856–1998). This instrumental surface temperature dataset exhibits some differences from the older instrumental surface temperature dataset (Jones 1994) used by Mann et al. (1998).
We averaged the monthly mean data into annual (calendar), boreal warm (April to September) and cold (October to March) seasonal averages, the target seasons for subsequent reconstructions. In the case of cold-season averages, our convection is to designate the year as corresponding to the early (October–December) rather than late (January–March) half of the 6-month interval. For example, the 1815 cold season is October 1815 through March 1816. The averaged (and raw) data are both temporally and spatially incomplete because of a lack of available data at a given location and time. In particular, the data coverage during the nineteenth century is relatively sparse compared to that of the latter half of the twentieth century. To produce a complete instrumental field, we infilled the missing instrumental values for seasonal and annual mean values using the regularized expectation maximization (REGEM) method as described by Schneider (2001); see also Mann and Rutherford 2002; Rutherford et al. 2003; Zhang et al. 2004). Only the Northern Hemisphere data (grid centers at 2.5°N through 67.5°N; the instrumental data are extremely sparse poleward of 67.5°N) were used in this study. Spatial means, including the NH mean, are constructed from areally weighted averages of the grid-box data. The correlation between the time series of the NH mean based on available data only and the REGEM infilled field is r = 0.98 (over the 1856–1971 period).
b. Proxy data
We used two largely independent predictor networks to assess the sensitivity of the temperature reconstructions to the network used. The first of these is a multiproxy dataset used by Mann and coworkers (Mann et al. 1998, 1999, 2000a, b; Mann 2002b) to reconstruct global patterns of annual mean surface temperature (SAT over land and SST over ocean) in past centuries. The second dataset consists entirely of MXD data used by Briffa and coworkers (Briffa et al. 1998a, b, 2001, 2002a, b; OSB) to reconstruct extratropical terrestrial warm-season SAT. Strictly speaking, the two networks are not entirely independent because they share a small number of tree-ring density series [19, or 4.6%, of the 415 series used by Mann et al. (1998) and 6% of the 387 series contributing to the MXD network, are common to both networks]. In addition, many of the ring-width series from Russia used by Mann et al. (1998) were from sites for which density data were used by Briffa and coworkers. We also prepared a third “combined” network by combining both networks.
1) Multiproxy–PC dataset
The multiproxy–principle component (PC) network (Mann et al. 1998) is a combination of annually resolved proxy indicators including tree-ring chronologies (ring width and density), ice cores (stable isotope, ice melt, and ice accumulation data), coral records (stable isotope and fluorescence data), and long historical and instrumental records (temperature and precipitation) from the Tropics and extratropics of both hemispheres. (The data in the multiproxy–PC network are available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a.) The individual proxies in the network were chosen not for their reliability as local indicators of temperature, but for their potential relationship with some seasonal meteorological or climatic variable tied to larger-scale patterns of climate, and surface temperature, change. In areas with spatially dense tree-ring networks, principal components analysis was used to extract the leading PCs from the network. Although 415 individual proxy series were used, data reduction by using leading PCs of tree-ring networks results in a smaller set of 112 indicators in the multiproxy–PC network available back to 1820 (Fig. 1a), with a decreasing number of indicators available progressively further back in time. Twenty-two of the indicators (representing 95 individual proxy series) extend back to at least a.d. 1400. Many of the indicators in the network end at or near 1980, motivating a termination of the calibration interval at 1980 by Mann et al. (1998), with a modest subset of series terminating between 1971 and 1980, infilled by persistence of the final available value through to 1980. We terminate the calibration period at 1971 in this study to avoid any possible influence of the infilling process used by Mann et al. (1998).
It should be noted that some reported putative “errors” in the Mann et al. (1998) proxy data claimed by McIntyre and McKitrick (2003) are an artifact of (a) the use by these latter authors of an incorrect version of the Mann et al. (1998) proxy indicator dataset and (b) their apparent misunderstanding of the methodology used by Mann et al. (1998) to calculate PC series of proxy networks over progressively longer time intervals. In the Mann et al. (1998) implementation, the PCs are computed over different time steps so that the maximum amount of data can be used in the reconstruction. For example, if a tree-ring network comprises 50 individual chronologies that extend back to a.d. 1600 and only 10 of those 50 extend to a.d. 1400, then calculating one set of PCs from 1400 to 1980 [the end of the Mann et al. (1998) calibration period] would require the elimination of 40 of the 50 chronologies available back to a.d. 1600. By calculating PCs for two different intervals in this example (1400–1980 and 1600–1980) and performing the reconstruction in a stepwise fashion, PCs of all 50 series that extend back to a.d. 1600 can be used in the reconstruction back to a.d. 1600 with PCs of the remaining 10 chronologies used to reconstruct the period from 1400 to 1600. The latter misunderstanding apparently led McIntyre and McKitrick (2003) to eliminate roughly 70% of the proxy data used by Mann et al. (1998) prior to a.d. 1600 (McIntyre and McKitrick 2003, their Table 7), including 77 of the 95 proxy series used by Mann et al. (1998) prior to a.d. 1500. This elimination of data gave rise to spurious warmth during the fifteenth century in their reconstruction, sharply at odds with virtually all other empirical and model-based estimates of hemispheric temperature trends in past centuries (see, e.g., Jones and Mann 2004).
2) Maximum latewood density dataset
The MXD network (Briffa et al. 2001, 2002a, b) is primarily a reflection of growing (warm season) conditions, though some limited cold-season information is also apparent in the data (Briffa et al. 2002a). The version of the MXD dataset used here was compiled using a combination of grid-box estimates based on traditionally standardized MXD records (with limited low-frequency information) and regional estimates developed to retain low-frequency information (OSB; the data in the MXD network are available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a). The latter were developed using the age-band decomposition (ABD) method of standardization, wherein density data from trees of similar ages are averaged to create long chronologies with minimal effect of tree age and size (Briffa et al. 2001). The ABD method is designed to preserve low-frequency information in tree-ring data that may be reduced when more traditional methods to remove long-term growth trends are used (see Cook et al. 1995). Because the age-banding method requires large numbers of samples throughout the time period being studied, it has been applied only at a regional scale for the MXD network used here, rather than at the level of the 387 original site chronologies.
OSB therefore worked first with the traditionally standardized data at the individual chronology scale and gridded them to provide values in 115 5° by 5° grid boxes (26 available back to a.d. 1400) in the extratropical NH (Fig. 1b). They then developed temperature reconstructions by the local calibration of the MXD grid-box data against the corresponding instrumental grid-box temperatures. The “missing” low-frequency temperature variability was then identified as the difference between the 30-yr smoothed regional reconstructions of Briffa et al. (2001) and the corresponding 30-yr smoothed regional averages of the gridded reconstructions. OSB add this missing low-frequency variability to each grid box in a region. After roughly 1960, the trends in the MXD data deviate from those of the collocated instrumental grid-box SAT data for reasons that are not yet understood (Briffa et al. 1998b, 2003; Vaganov et al. 1999). To circumvent this complication, we use only the pre-1960 instrumental record for calibration/cross validation of this dataset in the CFR experiments.
3. CFR reconstruction method
a. REGEM approach
Various mathematical techniques have been applied to the problem of CFR from sparse data (Smith et al. 1996; Kaplan et al. 1997; Schneider 2001), including applications to paleoclimate field reconstruction (Cook et al. 1994; Mann et al. 1998; Luterbacher et al. 2002a, b; Evans et al. 2002). Here we use the REGEM method described by Schneider (2001), which offers several theoretical advantages over other methods of CFR. (Note: Matlab scripts are available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a.) The REGEM method is an iterative method for estimating missing data through the estimation of means and covariances from an incomplete data field to impute missing values in a manner that makes optimal use of the spatial and temporal information in the dataset. When a reconstruction is sought from proxy data based on calibration against modern instrumental measurements, the combined (proxy-plus-instrumental dataset) can be viewed as an incomplete data matrix, which contains both instrumental data (surface temperature grid-box values arranged with rows representing the years and columns representing grid boxes) and proxy data (proxy time series with rows representing the years and columns representing the proxy used). The columns of the matrix (i.e., the instrumental gridpoint data and proxy indicators) are standardized to have zero mean and unit standard deviation over the calibration interval.
Missing values in this matrix represent the unknown preinstrumental surface temperature values and are considered as values to be imputed through an iterative infilling of the data matrix making use of the covariance information between all available (instrumental and proxy) data. By analogy with conventional paleoclimate reconstruction approaches (see, e.g., Rutherford et al. 2003), a calibration interval can be defined as the time interval over which the proxy and instrumental data overlap, while a verification interval is defined by additional cross-validation experiments in which an appropriate subset of the available instrumental data are withheld from the calibration process (e.g., through their specification as missing values in the initial matrix). Schneider (2001) provides a detailed description of the REGEM algorithm, including a comparison with conventional methods such as principal components regression and application to the infilling of missing values in climate field data, while Rutherford et al. (2003), Mann and Rutherford (2002), and Zhang et al. (2004) discuss specific applications to paleoclimate reconstruction. The REGEM method has been shown to perform well even in the presence of nonstationary climate forcing, as long as the leading patterns underlying low-frequency variability are captured in calibration (Rutherford et al. 2003).
We have modified the application of the method in two ways to improve its performance for long-term CFR. This includes, first, implementing a stepwise approach, where we reconstruct the field back in time in discrete steps to accommodate changing availability of data and, second, incorporating a hybrid frequency domain approach where both the proxies and the instrumental calibration data are decomposed into two frequency bands prior to reconstruction. These modifications are discussed below.
b. Stepwise modification of REGEM
The REGEM approach was in all cases applied in a stepwise fashion to make increasingly better use of low-frequency information in the calibration process. The reconstruction is performed one step at a time, using all available climate field information (both instrumental field and proxy-reconstructed extension thereof) in the calibration process for the reconstruction of each subsequent step back in time. For example, in the first reconstruction step, the (infilled) instrumental data are available from 1856 to 1971, and the proxies extend back to a.d. 1400. This leaves 455 yr in which the entire 1008 NH grid boxes (2.5° to 67.5°N at 5° centers) are missing. Rather than reconstructing all 455 yr at one step, we first reconstruct 1800–55, producing complete NH field (1008) grid boxes from 1800 to 1971. We then use the completed 1800–1971 data as input into the next step with the proxies extending back to 1700. In this step, the interval 1700–99 is reconstructed. The process continues until the reconstruction is complete back to the targeted beginning date (in this case, a.d. 1400). In the case of the multiproxy–PC and combined networks, the step lengths are constrained by the network because the PCs of the dense tree-ring networks are recalculated over discrete time intervals. Thus the multiproxy–PC and combined networks require some type of stepwise approach with step lengths dictated by the calculation of the PCs. For consistency with the Mann et al. (1998) approach, we use the same step lengths here for both the multiproxy–PC and combined networks. The MXD network has no such constraints, and we chose a step length of 100 yr, but the results are insensitive to the exact step length chosen.
Because of the shortness of the instrumental record, one cannot gauge the relative performance of the stepwise versus nonstepwise approaches through cross-validation experiments using the actual instrumental record. Instead, we used a network of synthetic proxy data (“pseudoproxy”) derived from long control and forced integrations of the Geophysical Fluid Dynamics Laboratory’s R30 coupled ocean–atmosphere model (Knutson et al. 1999) to test the relative performance of the two methods. We used the approach described by Mann and Rutherford (2002) to derive networks of synthetic proxy data from the model surface temperature field. In these tests, 450 yr of the control run were combined with 150 yr of the forced run to create a continuous and complete temperature field qualitatively similar in character to reconstructed temperature histories over the past six centuries. We constructed 112 pseudoproxies (the same number as is in the multiproxy–PC indicator network back to 1820), from the modeled temperature field, and selected an increasingly sparse subset of the 112 indicators back in time to emulate the decrease in the size of the actual proxy networks back in time. The final 150 yr were used for calibration to reconstruct the preceding 450 yr using only the information available in the pseudoproxy network. The stepwise approach performed as well as or better than the nonstepwise approach in cross validation in each of these experiments. The results of these pseudoproxy experiments give us some confidence that the primary conclusions presented in this study are insensitive to whether the stepwise or nonstepwise approach is used.
c. Hybrid frequency domain modification of REGEM
We modified the REGEM method (Schneider 2001; Mann and Rutherford 2002; Rutherford et al. 2003) to employ a hybrid frequency domain calibration approach, in which the combined proxy/instrumental dataset is split into two distinct datasets, through application of a low-pass filter to the data. The low-pass component of the data defines the low-frequency component, while the residual defines the high-frequency component. The frequency split boundary can be varied arbitrarily, but reasonable constraints on the appropriate choice are, at the high-frequency end, the Nyquist frequency (f = 0.5 cpy for annual or seasonal mean data) divided by two or so (i.e., f = 0.25 cpy) and, at the low-frequency end, the Rayleigh frequency (f = 0.01 cpy for, e.g., 100 yr of data) multiplied by two or so (i.e., f = 0.02 cpy). This corresponds to a high-frequency–low-frequency band split at periods between 4 yr, and 50 yr for a 100 yr interval. As described below, cross-validation experiments motivate the choice f = 0.05 cpy (20-yr period) for the split frequency in almost all cases.
There are two primary motivations for the hybrid frequency domain approach. Different types of proxy data exhibit fundamentally different frequency domain fidelity characteristics (Jones et al. 1998). Conventionally standardized tree-ring data, if based on short constituent segments, are unlikely to resolve century or longer time-scale variability (e.g., Cook et al. 1995), while very conservatively standardized tree-ring data based on long constituent segments may resolve century-scale and longer variability (Briffa et al. 1996; Cook et al. 1995). Other proxy indicators, such as annually laminated lake sediments or ice core variables subject to diffusion (Fisher et al. 1996), may preferentially resolve decadal and lower-frequency variability (e.g., Bradley 1999). Furthermore, some proxies may themselves exhibit different climate responses at low and high frequencies (e.g., LaMarche 1974; Osborn and Briffa 2000; Hughes and Funkhouser 2003). The underlying patterns of climate variability may also exhibit time-scale dependence. Interannual time-scale variability may be dominated by processes such as ENSO and the NAO, while lower-frequency variability may be dominated by modes involving the overturning ocean circulation (e.g., Delworth and Mann 2000) or the response to global radiative forcing (e.g., Crowley 2000). Distinguishing between patterns of high- and low-frequency variability may thus provide a more efficient means of calibration of the large-scale patterns of climate variability and permit the use of a wider range of natural archives.
Our hybrid frequency domain calibration approach involves the use of two distinct frequency bands in the calibration process. In the limit of an increasingly large number of distinct frequency bands, this approach would become analogous to the spectral canonical regression approach described by Guiot (1985), in which the calibration process is performed explicitly in the frequency domain rather than the time domain. In such a case, however, the small number of statistical degrees of freedom in calibrating the lowest-frequency bands of variance leads to a poorly constrained characterization of variability in the lowest frequencies. Employing a two-band hybrid calibration approach represents a trade-off between the ability to adequately distinguish distinct patterns of variability with respect to time scale and yet retain adequate statistical degrees of freedom to characterize and calibrate both bands of variability.
The REGEM method is applied separately to the calibration of proxy and instrumental data in the high- and low-frequency bands. The results of the two independent reconstructions are then recombined to yield a complete reconstruction. Each proxy record is weighted by a bandwidth retention factor defined as the percent of its total variance within the particular frequency band under consideration. For example, a proxy record dominated by interannual variability, with very little low-frequency variability (e.g., a very data-adaptive standardized tree-ring record) would be assigned a high weight (near one) in the high-frequency band calibration and a low weight (near zero) in the low-frequency band calibration. Generally, all proxy series have weights between zero and one in each frequency band with greater weight in the frequency band with the greatest concentration of variance in the unfiltered series. This approach ensures that, for example, a proxy with a small amount of variability in the low-frequency band (which might be residual noise) does not have the same impact as a proxy with much greater low-frequency variability. However, it has the disadvantage that a high-frequency-dominated record containing a nonetheless faithful record of low-frequency fluctuations (e.g., an indicator of ENSO wherein the interannual variability is intrinsically dominant) might be unduly discounted.
d. Experimental approach
We performed an array of REGEM CFR experiments based on different target seasons and proxy networks and tested variants of the approach including (a) both the conventional and hybrid frequency domain approach, the latter with varying split frequency; (b) allowing and not allowing for lags between predictor (proxy indicator) and predictand (instrumental surface temperature data); and (c) use of both prewhitened and raw predictor/predictand data (see, e.g., Cook et al. 1999; Zhang et al. 2004). We compared our results against previous reconstructions based on common predictor datasets (multiproxy–PC and MXD), and alternative reconstruction methodologies [the eigenvector-based CFR approach of Mann et al. (1998) and the local-calibration approach used by OSB, respectively] to assess the impact of using different reconstruction methodologies with common data. We areally averaged spatial reconstructions over both the full NH domain and subregions of the domain (e.g., extratropics and/or continents only) to examine the sensitivity of NH “hemispheric mean” estimates to the actual region sampled.
CFR experiments were performed using each of the three proxy networks, multiproxy–PC, MXD, and combined, and three seasonal target windows for the surface temperature predictand (boreal warm season, boreal cold season, and annual calendar mean). In the hybrid frequency domain approach, alternative frequency boundaries were tested (5-, 10-, 20-, and 25-yr period) within the practical constraints discussed in section 3c.
We also performed experiments in which the proxy indicators were lagged (both forward and backward) relative to the instrumental data, under the assumption that some proxies may reflect, at least in part, a lagged or running average response to climate. We lagged the proxy data at −1, 0, and +1 yr both independently and in various combinations (e.g., −1 and 0 only; +1 and 0 only; and −1, 0, and +1). Use of lagged versions of the proxy indicator network in addition to the nominal network itself increases the effective size of the predictor network. Including the proxy network at both lag 0 and at lag −1, for example, produces a maximum predictor network of 224 indicators (twice the nominal maximum of 112 indicators) for the multiproxy–PC network.
We also performed reconstructions in which predictors were prewhitened prior to calibration, followed by the reintroduction of the estimated level of serial correlation into the predictand. In drought reconstructions based on tree-ring networks, this procedure has been show to lead to modest improvements in reconstructive skill (Cook et al. 1999; Zhang et al. 2004).
The relative skill of the reconstructions with respect to the different variants of the CFR approach are addressed by cross-validation experiments described in section 4a.
e. Cross-validation procedure
A series of verification diagnostics was calculated to evaluate the skill of the reconstructions. First, we conducted full field verifications by removing the instrumental surface temperature data from the CFR analysis between 1856 and 1900 and reconstructing the surface temperature field over that interval using only the information in the predictor networks calibrated during the twentieth century (1901–71 for the PC/multiproxy network for reasons discussed below, and 1901–60 for the MXD and network, for reasons discussed above). We calculated verification scores using only available instrumental data from grid boxes that were 95% complete (a total of 210 grid boxes) prior to initial infilling with REGEM (see section 2a). We refer to these verification scores as the “full field” verification scores. We assessed verification scores both for the full predictor network (available back to at least 1820), and using the increasingly sparse predictor networks available on a century-by-century basis, to assess the fidelity of the reconstruction back in time as the predictor network becomes increasingly sparse. We refer to these verification scores as the “available predictor” scores. As a cross check, experiments were also performed for the multiproxy–PC network for both the full network and the sparse network available back to 1400 in which an earlier period, 1856–1928, was used for calibration and the more recent 1929–71 period was used for cross validation. In these cases, the cross-validation scores are equal to or better than those for the standard verification period of 1856–1900.
We also used 10 long, annual mean instrumental grid-box temperature series, 9 of which are from western Europe and England with 1 from North America, to extend cross-validation exercises back into to 1755 (3 of the 10 records are available back to that date, and all 10 are available back to at least 1820) on a more spatially restricted basis. These instrumental records are part of the multiproxy–PC predictor network used by Mann et al. (1998) and this study. However, these records can also serve as verification (for the annual mean reconstructions) by removing them from the predictor network and reconstructing them using the information available in the other (noninstrumental) proxy predictors.
The sums are over the reconstructed values,
If the reconstruction is simply the mean of the calibration period, then RE = 0, which is the threshold for no skill in the reconstruction. Similarly, if the reconstruction is simply the mean of the verification period, then CE = 0. Thus, depending on the standard, the zero values of these statistics define the threshold for “skill” in the reconstruction. Therefore, CE ≥ 0 is a more challenging threshold since, unlike RE, CE does not reward the reconstruction of an observed change in mean relative to the calibration period.
For each experiment, we calculated RE (Tables 1, 2 and 3) and CE (available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a) verification skill diagnostics. While one could seek to estimate verification skill with the square of the Pearson correlation measure (r 2), this metric can be misleading when, as is the case in paleoclimate reconstructions of past centuries, changes are likely in mean or variance outside the calibration period. To aid the reader in interpreting the verification diagnostics, and to illustrate the shortcomings of r 2 as a diagnostic of reconstructive skill, we provide some synthetic examples that show three possible reconstructions of a series and the RE, CE, and r 2 scores for each (supplementary material available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a).
To test whether or not the REGEM reconstruction approach might systematically either overestimate or underestimate the variance in the reconstruction, we performed an additional set of verification experiments in which the reconstructions were systematically rescaled by an inflation factor between 0.5 and 2 (where a factor of 1 leaves the reconstruction unchanged) after calibration. If there were any systematic overestimate or underestimate of variance in the calibration process, improved verification statistics should be achieved for scale factors significantly different from 1. Instead, we found that the optimal scale factor was close to unity for reconstructions using each of the three networks. The optimal RE statistic (supplementary material available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a) for the Northern Hemisphere mean is centered approximately at unity for the annual (combined network) reconstruction, slightly below unity (approximately 0.75) for the warm-season (MXD network) reconstruction, and slightly greater than unity (approximately 1.25) for the cold-season (multiproxy–PC network) reconstruction. These results indicate that any substantial (i.e., factor of 2 or greater) underestimate of variance is unlikely for all three reconstructions.
We estimated self-consistent uncertainties using the available predictor verification residuals for each grid box back in time after establishing that the residuals were consistent with Gaussian white noise (supplementary material available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a). Grid-box uncertainties were propagated to estimate the uncertainty in spatial means, taking into account spatial correlation.
4. Results
a. Comparisons between variants of REGEM approach
We first considered the sensitivity of the results to the effect of prewhitening predictors and predictand prior to calibration. In two test cases (multiproxy–PC annual and MXD boreal warm season), such a procedure did not result in any consistent improvement of the verification scores. We thus concluded that this step was both unnecessary and, given the importance of faithfully retaining low-frequency variance, probably undesirable in this context, since the approach admits only a limited representation of the temporal dependence structure of the data.
We then considered the impact of allowing for lagged relationships between predictor and predictand (including combinations of lags, e.g., 0 and +1). The cross-validation exercises indicated that a lag of zero (i.e., no lag) produced the optimal skill diagnostics in all cases, with the following provisos for the cold-season reconstructions. Because the cold-season mean encompasses parts of two calendar years, it is important to define the cold-season convention. For the MXD network, optimal results were achieved for cold-season reconstructions when predictors were temporally aligned with the predictand during the year in which the cold season ends. This finding is not surprising since a tree growing during the warm season cannot respond to the climate of the following cold season but can potentially respond to the climate of the preceding cold season through antecedent soil moisture or soil temperature conditions. For the multiproxy–PC cold-season reconstructions, two lag choices give similar verification scores, “case 1,” in which predictors are aligned with the predictand during the year in which the cold season ends and “case 2,” in which predictors are aligned with the predictand during the year in which the cold season begins. We adopt case 2 because, though it performs slightly worse for the multivariate statistics (RE and CE lower by about 0.03), it performs considerably better for the hemispheric mean statistics (RE higher by 0.04 and CE higher by 0.08). Case 2 nonetheless seems inappropriate from a biological response point of view and suggests the importance of a more general approach, beyond the scope of the present study, which allows for variable lags among the different indicators that make up the multiproxy network. Apart from the interannual variability, the hemispheric mean reconstruction is not sensitive to the choice of case 1 or case 2. Henceforth, only the optimal results with respect to choice of lag, as described above, are presented for the various seasonal reconstructions based on the various predictor networks.
We then examined the dependence of skill on the frequency band split boundary (5-, 10-, 20-, and 25-yr period) used in the hybrid frequency band calibration approach, finding the 20-yr period boundary to give superior results in almost all cases (cross-validation skill was either equal or greater than that for any other choice in all cases). We thus consider henceforth in this study both the standard nonhybrid method (referred to as “nonhybrid”) and the hybrid method with a f = 0.05 cpy (20-yr period) frequency boundary (referred to as “hybrid-20”). A comparison of the NH mean temperature reconstruction for the two approaches (nonhybrid and hybrid-20) is shown in Fig. 2 for the multiproxy–PC network annual mean reconstruction. While the two reconstructions are seen to be broadly similar, the hybrid-20 reconstruction exhibits greater low-frequency variability, particularly prior to a.d. 1600 when the multiproxy network becomes relatively sparse. The hybrid-20 reconstruction is observed in this case (see discussion below) to demonstrate greater skill in cross validation for the earlier centuries, suggesting that the greater variability is likely meaningful. As discussed below, whether the nonhybrid or hybrid-20 approach gives optimal results generally depends on the particular predictor network and target season used in the reconstruction.
Another point that must be made is that, although we settle on “optimal” reconstructions, it is not always clear from the verification scores which network, lag, and method implementation (hybrid or nonhybrid) is the optimal one for a given situation. One set of possibilities (e.g., network and lag) may produce a better NH mean verification than another, but at the cost of a degraded multivariate verification, or the hybrid method may outperform the nonhybrid with a sparse network, but the opposite might be true with a more extensive network. In short, it can be difficult to determine which is the “best” reconstruction when verification skill differences are small. In light of this consideration, we present reconstructions below for each network and season but recognize that there is a larger suite of reconstructions that might be acceptable based on verification scores. In addition, it is not possible to perform verification experiments on long time scales because of the limitations of the instrumental data. Although we use the few long instrumental records that are available for verification, spatially extensive, long time-scale verification can only be done using output from long GCM simulations.
b. Comparison of REGEM results for different networks and seasonal windows
The results of the cross-validation exercises for the various experiments are summarized in Table 1 for the full network available back to 1820 and Table 2 for the increasingly sparse available predictor networks back in time (CE statistics provided in supplementary material available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a). The hybrid-20 exhibits the greatest skill (RE = 0.72 and CE = 0.46) for annual hemispheric mean reconstructions using the multiproxy– PC network back to 1820. The nonhybrid method, however, exhibits better multivariate skill (RE = 0.22 and CE = 0.04), but similar hemispheric mean and multivariate skill if instrumental predictors are withheld (Table 1). However, for proxy networks available further back in time (Table 2), the hybrid-20 approach produces cross-validation skill as good as or better than the nonhybrid approach. We thus favor the use of hybrid-20 for long-term annual reconstructions using the multiproxy–PC network. Similarly, hybrid-20 is favored for the annual reconstructions using the combined network prior to a.d. 1600. By contrast, the nonhybrid approach is favored by the cross-validation results for the MXD-based summer temperature reconstructions. The separate selection of optimal weightings in distinct frequency bands afforded by the hybrid-20 approach likely offers a greater advantage for a network of multiple proxy types (multiproxy–PC network) than for a more homogenous (tree-ring MXD) proxy network.
While the different networks often differ by less than 0.03 in terms of cross-validation RE or CE scores, and comparisons of multivariate and hemispheric-mean skill scores sometimes lead to differing inferences, a few general conclusions can be drawn from the cross-validation results summarized in Tables 1 and 2: 1) the hybrid-20 approach produces the best verification skill scores in general, but there are important exceptions (i.e., the MXD summer temperature reconstructions); 2) the multiproxy–PC network appears best suited for annual and cold-season reconstructions; 3) as expected on the basis of previous work (e.g., Briffa et al. 2002b), the MXD network appears best suited for annual and warm-season reconstructions and appears to provide the best warm-season reconstructions of all three networks; and 4) the combined network exhibits the best skill of all networks in annual reconstruction and skill in cold-season reconstruction that is comparable to the multiproxy–PC network. The fact that the MXD network outperforms the combined network for the warm season indicates that the addition of more proxy series does not always produce better results, even if some of those additional proxies are of high quality (e.g., the long instrumental series in the multiproxy–PC network).
The fact that the combined network performs, at best, only marginally better than the two independent networks alone suggests that most of the degrees of freedom in the Northern Hemisphere surface temperature field are already sampled by either network alone. We conclude that the primary limiting factor governing the skillfulness of current proxy-based large-scale temperature reconstructions may be the quality of the network (e.g., the effective signal-to-noise ratios of the proxy data and the availability of records in key regions such as the tropical Pacific), rather than the size of the network. This conclusion is reinforced by a comparison of the verification skill for the multiproxy–PC network with and without the long instrumental records (Table 1), which emphasizes that a few high-quality indicators can significantly improve reconstructive skill. Furthermore, the addition of poor or inappropriate seasonal indicators to a network can degrade the skill of the reconstruction.
c. Comparisons between REGEM results and results with other methods
We compare results using the REGEM method with previously published results that used the same proxy networks but different methods (and a potentially different target region). In these comparisons, we control for the proxy network used and the target season.
1) Comparison with Mann et al. (1998) annual NH reconstruction
Although both the Mann et al. (1998) and REGEM methods make use of covariance information in the calibration/reconstruction process, they do so in a quite different manner (Schneider 2001; Rutherford et al. 2003). We compared the REGEM reconstruction with the Mann et al. (1998) surface temperature reconstruction employing the same predictor network, the same calendar annual target season, and the same global target region as Mann et al. (1998). We eliminated the infilled values from a.d. 1400–03 used by Mann et al. (1998) to complete one of the Jacoby and D’Arrigo (1989) “Northern Treeline” series back to a.d. 1400. This is easily done in the REGEM method by treating those values as missing, something that could not have been done in Mann et al. (1998). We terminated the calibration period in 1971 to address the criticism by McIntyre and McKitrick (2003) of the use by Mann et al. (1998) of a modest number of infilled missing proxy values in the multiproxy–PC network between 1971 and 1980. However, we also show the verification results for the case where the calibration interval ends in 1980 for direct comparison with the Mann et al. (1998) results. Cross-validation results are compared (Table 3) with those of Mann et al. (1998) for the multiproxy–PC network available back to 1820, using the same (219) grid boxes used for verification by Mann et al. (1998) over the period 1856–1900 (top section of Table 3) and for the 10 temperature grid boxes available back to 1820 (bottom section of Table 3—in this case long instrumental indicators have been withheld from the predictor network). These comparisons indicate similar levels of skill in the REGEM (both nonhybrid and hybrid-20) and Mann et al. (1998) reconstructions, with any preference dependent on the precise metric of reconstructive skill. We deduce from the available predictor skill diagnostics (Table 2) that the hybrid-20 REGEM reconstruction is increasingly preferable over the nonhybrid reconstruction as the predictor network becomes sparser back in time (for annual NH temperature, at least).
A remarkably close similarity is observed (Fig. 3) between the REGEM and Mann et al. (1998) NH annual mean surface temperature reconstructions. The two reconstructions are indistinguishable well within their two-sigma uncertainties. The REGEM NH reconstruction using all available individual proxy records [rather than replacing spatially dense tree-ring networks with their leading principal components as in the Mann et al. (1998) multiproxy–PC network] again yields nearly indistinguishable estimates (Fig. 2). The close reproducibility of the Mann et al. (1998) reconstruction based on both (a) the use of an independent CFR method and (b) the use of the individual proxies used by Mann et al. (1998) rather than the multiproxy–PC representation used by Mann et al. (1998) disproves the arguments put forth by McIntyre and McKitrick (2003) in support of their putative “correction” to the Mann et al. (1998) reconstruction.
2) Comparison involving previous MXD-based warm-season extratropical NH reconstruction
Here we compare the REGEM warm-season MXD-based NH mean reconstruction with that of OSB, the latter based on an areally weighted mean of 115 locally calibrated MXD 5° by 5° grid boxes (Fig. 1b). This reconstruction (Fig. 4) is similar, though not identical, to that presented by Briffa et al. (2001) using the same MXD data; the minor differences arise because Briffa et al. (2001) used a principal component regression of regionally averaged MXD data, rather than the average of locally calibrated reconstructions generated by OSB. In this comparison, we control for the proxy network (both use the MXD network) and the target season (both target the boreal warm season mean) and investigate the effects of both the target region and reconstruction method.
Figure 4a compares the OSB MXD reconstruction and the REGEM hybrid-20 NH reconstruction of the full NH mean. The OSB reconstruction exhibits greater interannual variability and is on average slightly cooler in past centuries than the REGEM reconstruction. Since the proxy network and the target season are identical, the observed differences must be due to a combination of differing methods and target regions. To progressively control for target region, we first mask the REGEM spatial reconstruction for only the terrestrial extratropical (i.e., north of 20°N) grid boxes (Fig. 4a) and finally the precise 115 grid boxes averaged by OSB to obtain a hemispheric mean reconstruction (Fig. 4b). The latter masking of the REGEM reconstruction yields a hemispheric mean estimate that is nearly indistinguishable from the OSB reconstruction, suggesting that the initial differences evident in Fig. 4a result largely from differing initial target regions. The remaining modest differences (Fig. 4b), which are mostly evident during the relatively data-sparse initial centuries, are presumably due to the differences between methods (REGEM CFR method versus spatial average of the locally calibrated grid-box data).
Finally, we include a comparison with an alternative warm-season continental surface temperature reconstruction based on an even more restricted spatial distribution (a maximum of 14 sites) of tree-ring width data (Esper et al. 2002). This reconstruction exhibits greater variability than most other published reconstructions (see Briffa and Osborn 2002; Mann and Hughes 2002; Mann 2002a; Mann et al. 2003a, b). However, when restricted to the grid-box locations corresponding to the modest number of sites used in this reconstruction (excepting one grid box that is unavailable from the instrumental record and one that is outside our reconstruction domain), the REGEM MXD warm-season NH reconstruction shows a remarkably similar character to the Esper et al. (2002) reconstruction (Fig. 4c). This result suggests that the greater variability evident in the Esper et al. reconstruction likely results from the restricted sampling provided by the network used, though some residual differences may be due to different methods of tree-ring standardization (Esper et al. 2002; Briffa and Osborn 2002; Mann and Hughes 2002; Cook et al. 2004) and differences in reconstruction method.
From these comparisons, we can draw an important conclusion that might have been anticipated from spatial sampling considerations alone: reconstructions of full hemispheric means are likely to exhibit lower amplitude variability than those based on a more restricted subdomain of the field, as a result of the tendency for the cancellation of anomalies of different signs and magnitudes in different regions (see, e.g., Mann et al. 2003b).
d. Comparisons of hemispheric mean series
Figure 5 shows the annual, warm-season, and cold-season NH mean reconstructions produced using the different predictor networks. The annual (Fig. 5a) reconstructions are quite similar for all three predictor networks back to approximately a.d. 1700 and are largely within the statistical uncertainties of each other back through a.d. 1400. A similar statement holds for the cold-season reconstructions, though the differences are slightly greater during certain time intervals. For the warm season, only the MXD network provides a skillful reconstruction back to a.d. 1400, but all reconstructions are similar over the interval in which the cross-validation experiments indicate a skillful reconstruction (1750 for the multiproxy–PC network and 1500 for the combined network). It is clear from the similarity of the MXD and combined network warm-season reconstructions that the combined network reconstruction is dominated by the MXD predictors, as one would expect based on the verification scores.
Finally, we compare (Figs. 5d–e) the REGEM NH reconstructions based on what appears to be optimal apparent predictor network for each season (see section 4b: MXD—warm season, multiproxy–PC—cold season, and combined—annual). These reconstructions show that the cold-season mean reconstruction generally exhibits the greatest interannual variability (particularly when a larger number of indicators are available). The warm-season reconstructions often show stronger cooling events, in many cases associated with large explosive volcanic events (e.g., after the a.d. 1600 eruption). As discussed further below, this observation is consistent with the modeled response to volcanic forcing, which shows cooling to dominate during the warm season, particularly over continental regions (Kirchner et al. 1999; Shindell et al. 2003). Reconstructions that emphasize the warm season and continental regions are thus likely to exhibit greater summer cooling during periods of intense explosive volcanic activity.
e. Spatial patterns
It is instructive to examine the spatial and seasonal details evident in the actual reconstructed patterns. We thus focus on the reconstructed temperature patterns for some selected years (Fig. 6), using reconstructions based on the optimal networks for each season as discussed above. We consider the year 1601 (cold season: 1600/01) following the Huaynaputina (Peru) eruption (February 1600); 1783, the year of the Laki eruption in Iceland and an exceptionally cold winter in parts of North America and Europe; 1791, an established unusually strong El Niño year (Quinn and Neal 1992); 1816, the “year without a summer” 1 yr after the explosive Tambora eruption of April 1815; 1817, two years after the eruption; and finally, 1834, an exceptionally warm year in Europe as evidenced by the central England temperature record (Manley 1974; see also Mann et al. 2000b; Briffa et al. 1998a, 2002b).
There is a tendency for opposite seasonal surface temperature responses to tropical volcanic forcing. Strong warm-season continental cooling is apparent in the summers after the volcanic years 1600 and 1815, contrasting with the tendency for an offsetting pattern of continental warming during the winter following those eruptions (or even two winters, following the 1815 eruption). This pattern has been observed in model simulations of the dynamical response to an explosive tropical eruption (Groisman 1992; Graf et al. 1993; Robock and Mao 1995; Kirchner et al. 1999; Shindell et al. 2003). The tendency for cooler summers and warmer winters appears to be responsible for the reduced annual mean cooling response to volcanic forcing (Shindell et al. 2003) evident in the annual mean reconstructions. Large-scale warmth both in the tropical Pacific and in the extratropics is clearly evident for the El Niño year of 1791 during all seasons but is particularly evident in the cold-season (i.e., 1791/92) pattern.
5. Conclusions
Comparisons both within the suite of reconstructions presented in this study and between these reconstructions and others previously developed (Mann et al. 1998; OSB; Esper et al. 2002) allow us to evaluate the impacts of method, target season, target region, and underlying proxy data network on large-scale surface temperature reconstructions. (The reconstructions performed in this study are available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a.) These evaluations suggest that differing methods of reconstruction (e.g., different CFR techniques or local calibration approaches) yield nearly indistinguishable results if differences in underlying proxy network, target season, and target region are controlled for. We conclude that proxy-based temperature reconstructions are robust with respect to a wide array of alternative statistical approaches. Differences in target region appear to lead to more substantial differences. Spatial averages over increasingly sparse domains (e.g., extratropical continents only or a small number of isolated regions of the extratropical continents only) yield “hemispheric mean” estimates with increasingly greater variability. Meaningful comparisons of different hemispheric mean estimates are thus only possible when differences in target spatial domain are taken into account, for example, through an appropriate spatial masking of the surface temperature field reconstructions. Differences in target seasonal window are also important, with different predictor networks (e.g., the multiproxy–PC versus MXD versus combined networks), each indicating preferential reconstructive skill for different seasonal windows. The resulting optimal seasonal (cold season, warm season, and annual mean) reconstructions indicate modest differences for the main hemispheric mean temperature changes, and more substantial differences spatially, consistent with the distinct spatial and seasonal features typically associated with climate signals such as El Niño or the response to volcanic radiative forcing.
In addition, we find that the number of proxies can be less important than the quality of the proxy and its suitability for reconstructing a particular season. As an example, the MXD network alone clearly outperforms the combined network in warm-season verification tests. Furthermore, it is not always easy to determine the best network, lag, and method to use in every situation because differences in verification scores can be small. In the situations we examined, however, differences in the reconstructions are also small.
Finally, the evidence for exceptional late-twentieth-century warmth in the context of the period since a.d. 1400 (in warm, cold, and annual temperatures) is a robust conclusion with respect to all of the factors considered.
Acknowledgments
This work was supported by the NSF- and NOAA-funded “Earth Systems History” program (MEM, SR, RSB, and MKH; NOAA award NA16GP2913), the Office of Science (BER), U.S. Department of Energy, Grants DE-FG02-98ER62601 (PDJ) and DE-FG02-98ER62604 (RSB), and the European Community under the SOAP project: EVK2-CT-2002-00160 (TJO and KRB).
REFERENCES
Appenzeller, C., T F. Stocker, and M. Anktin, 1998: North Atlantic Oscillation dynamics recorded in Greenland ice cores. Science, 282 , 446–449.
Bauer, E., M. Claussen, and V. Brovkin, 2003: Assessing climate forcings of the earth system for the past millennium. Geophys. Res. Lett., 30 .1276, doi:10.1029/2002GL016639.
Bertrand, C., M F. Loutre, M. Crucifix, and A. Berger, 2002: Climate of the last millennium: A sensitivity study. Tellus, 54A , 221–244.
Biondi, F., A. Gershunov, and D R. Cayan, 2001: North Pacific decadal climate variability since 1661. J. Climate, 14 , 5–10.
Bradley, R S., 1996: Are there optimum sites for global paleotemperature reconstruction? Climatic Variations and Forcing Mechanisms of the Last 2000 Years, P. D. Jones, R. S. Bradley, and J. Jouzel, Eds., NATO ASI Series, Springer-Verlag, 603–624.
Bradley, R S., 1999: Paleoclimatology: Reconstructing Climates of the Quaternary Harcourt. Academic Press, 610 pp.
Bradley, R S., and P D. Jones, 1993: “Little Ice Age” summer temperature variations: Their nature and relevance to recent global warming trends. Holocene, 3 , 367–376.
Bradley, R S., and P D. Jones, 1995: Climate since A.D. 1500. Routledge, 706 pp.
Braganza, K., D. Karoly, T. Hirst, M E. Mann, P. Stott, R J. Stouffer, and S. Tett, 2003: Indices of global climate variability and change: Part I—Variability and correlation structure. Climate Dyn., 20 ., doi:10.1007/s00382-002-0286-0.
Briffa, K R., and T J. Osborn, 2002: Blowing hot and cold. Science, 295 , 2227–2228.
Briffa, K R., P D. Jones, and F H. Schweingruber, 1994: Summer temperatures across northern North America: Regional reconstructions from 1760 using tree-ring densities. J. Geophys. Res., 99 , 25835–25844.
Briffa, K R., P D. Jones, F H. Schweingruber, W. Karlen, and S G. Shiyatov, 1996: Tree-ring variables as proxy-climate indicators: Problems with low-frequency signals. Climatic Variations and Forcing Mechanisms of the Last 2000 Years, P. D. Jones, R. S. Bradley, and J. Jouzel, Eds., NATO ASI Series, Springer-Verlag, 9–41.
Briffa, K R., P D. Jones, F H. Schweingruber, and T J. Osborn, 1998a: Influence of volcanic eruptions on Northern Hemisphere summer temperatures over the past 600 years. Nature, 393 , 450–454.
Briffa, K R., F H. Schweingruber, P D. Jones, T J. Osborn, S G. Shiyatov, and E A. Vaganov, 1998b: Reduced sensitivity of recent tree-growth to temperature at high northern latitudes. Nature, 391 , 678–682.
Briffa, K R., T J. Osborn, F H. Schweingruber, I C. Harris, P D. Jones, S G. Shiyatov, and E A. Vaganov, 2001: Low-frequency temperature variations from a northern tree ring density network. J. Geophys. Res., 106 , 2929–2941.
Briffa, K R., T J. Osborn, F H. Schweingruber, P D. Jones, S G. Shiyatov, and E A. Vaganov, 2002a: Tree-ring width and density data around the Northern Hemisphere: Part 1, local and regional climate signals. Holocene, 12 , 737–757.
Briffa, K R., T J. Osborn, F H. Schweingruber, P D. Jones, S G. Shiyatov, and E A. Vaganov, 2002b: Tree-ring width and density data around the Northern Hemisphere: Part 2, spatio-temporal variability and associated climate patterns. Holocene, 12 , 759–789.
Briffa, K R., T J. Osborn, and F H. Schweingruber, 2003: Large-scale temperature inferences from tree rings: A review. Global Planet. Change, 40 ., doi:10.1016/S0921-8181(03)00095-X.
Bush, A. B. G., 1999: Assessing the impact of mid-Holocene insolation on the atmosphere–ocean system. Geophys. Res. Lett., 26 , 99–102.
Clement, A C., R. Seager, and M A. Cane, 2000: Suppression of El Niño during the mid-Holocene by changes in the Earth’s orbit. Paleoceanography, 15 , 731–737.
Cobb, K M., C D. Charles, H. Cheng, and R L. Edwards, 2003: El Niño–Southern Oscillation and tropical Pacific climate during the last millennium. Nature, 424 , 271–276.
Cook, E R., 2002: Multi-proxy reconstructions of the North Atlantic Oscillation (NAO) Index: A critical review and a new well-verified winter NAO Index reconstruction back to AD 1400. North Atlantic Oscillation, J. W. Hurrell et al., Eds., Amer. Geophys. Union, 63–81.
Cook, E R., K R. Briffa, and P D. Jones, 1994: Spatial regression methods in dendroclimatology: A review and comparison of two techniques. Int. J. Climatol., 14 , 379–402.
Cook, E R., K R. Briffa, D M. Meko, D A. Graybill, and G. Funkhouser, 1995: The “segment length curse” in long tree-ring chronology development for palaeoclimatic studies. Holocene, 5 , 229–237.
Cook, E R., D M. Meko, and C W. Stockton, 1997: A new assessment of possible solar and lunar forcing of the bidecadal drought rhythm in the western United States. J. Climate, 10 , 1343–1356.
Cook, E R., D M. Meko, D W. Stahle, and M K. Cleaveland, 1999: Drought reconstructions for the continental United States. J. Climate, 12 , 1145–1162.
Cook, E R., R D. D’Arrigo, and M E. Mann, 2002: A well-verified, multiproxy reconstruction of the winter North Atlantic Oscillation since a.d. 1400. J. Climate, 15 , 1754–1764.
Cook, E R., J. Esper, and R D. D’Arrigo, 2004: Extra-tropical Northern Hemisphere land temperature variability over the past 1000 years. Quat. Sci. Rev., 23 , 2063–2074.
Crowley, T J., 2000: Causes of climate change over the past 1000 years. Science, 289 , 270–277.
Crowley, T J., and G R. North, 1991: Paleoclimatology. Oxford University Press, 349 pp.
Crowley, T J., and T. Lowery, 2000: How warm was the medieval warm period? Ambio, 29 , 51–54.
Cullen, H., R D. D’Arrigo, E. Cook, and M E. Mann, 2001: Multiproxy-based reconstructions of the North Atlantic Oscillation over the past three centuries. Paleoceanography, 15 , 27–39.
D’Arrigo, R D., E R. Cook, G C. Jacoby, and K R. Briffa, 1993: NAO and sea surface temperature signatures in tree-ring records from the North Atlantic sector. Quat. Sci. Rev., 12 , 431–440.
Delworth, T L., and M E. Mann, 2000: Observed and simulated multidecadal variability in the Northern Hemisphere. Climate Dyn., 16 , 661–676.
Esper, J., E R. Cook, and F H. Schweingruber, 2002: Low frequency signals in long tree-ring chronologies for reconstructing past temperature variability. Science, 295 , 2250–2253.
Evans, M N., A. Kaplan, and M A. Cane, 1998: Optimal sites for coral-based reconstruction of global sea surface temperature. Paleoceanography, 13 , 502–516.
Evans, M N., A. Kaplan, and M A. Cane, 2002: Pacific sea surface temperature field reconstruction from coral δ18O data using reduced space objective analysis. Paleoceanography, 17 ., 1007, doi:10.1029/2000PA000590.
Fisher, D A., R M. Koerner, K. Kuivinen, H B. Clausen, S J. Johnsen, J-P. Steffensen, N. Gundestrup, and C U. Hammer, 1996: Intercomparison of ice core δ (O18) and precipitation records from sites in Canada and Greenland over the last 3500 years and over the last few centuries in detail using EOF techniques. Climatic Variations and Forcing Mechanisms of the Last 2000 Years, P. D. Jones, R. S. Bradley, and J. Jouzel, Eds., NATO ASI Series, Springer-Verlag, 297–328.
Folland, C K., and Coauthors, 2001: Observed climate variability and change. Climate Change 2001: The Scientific Basis, J. T. Houghton et al., Eds., Cambridge University Press, 99–181.
Free, M., and A. Robock, 1999: Global warming in the context of the Little Ice Age. J. Geophys. Res., 104 , 19057–19070.
Fritts, H C., 1976: Tree Rings and Climate. Academic Press, 567 pp. and xii.
Fritts, H C., 1991: Reconstructing Large-Scale Climatic Patterns from Tree-Ring Data. The University of Arizona Press, 286 pp.
Fritts, H C., T J. Blasing, B P. Hayden, and J E. Kutzbach, 1971: Multivariate techniques for specifying tree-growth and climate relationships and for reconstructing anomalies in paleoclimate. J. Appl. Meteor., 10 , 845–864.
Gedalof, Z., N J. Mantua, and D L. Peterson, 2002: A multi-century perspective of variability in the Pacific Decadal Oscillation: New insights from tree rings and coral. Geophys. Res. Lett., 29 .2204, doi:10.1029/2002GL015824.
Gerber, S., F. Joos, P P. Bruegger, T F. Stocker, M E. Mann, and S. Sitch, 2003: Constraining temperature variations over the last millennium by comparing simulated and observed atmospheric CO2. Climate Dyn., 20 , 281–299.
Gonzalez-Rouco, F., H. von Storch, and E. Zorita, 2003: Deep soil temperature as proxy for surface air-temperature in a coupled model simulation of the last thousand years. Geophys. Res. Lett., 30 .2116, doi:10.1029/2003GL018264.
Graf, H-F., I. Kirchner, A. Robock, and I. Schult, 1993: Pinatubo eruption winter climate effects: Model versus observations. Climate Dyn., 9 , 81–93.
Groisman, P Y., 1992: Possible regional climate consequences of the Pinatubo eruption: An empirical approach. Geophys. Res. Lett., 19 , 1603–1606.
Guiot, J., 1985: The extrapolation of recent climatological series with spectral canonical regression. J. Climatol., 5 , 325–335.
Guiot, J., 1988: The combination of historical documents and biological data in the reconstruction of climate variations in space and time. Palaeoclimatforschung, 7 , 93–104.
Hendy, E J., M K. Gagan, C A. Alibert, M T. McCulloch, J M. Lough, and P J. Isdale, 2002: Abrupt decrease in tropical Pacific sea surface salinity at end of Little Ice Age. Science, 295 , 1511–1514.
Hughen, K A., J T. Overpeck, and R. Anderson, 2000: Recent warming in a 500-year paleoclimate record from Upper Soper Lake, Baffin Island, Canada. Holocene, 10 , 9–19.
Hughes, M K., and G. Funkhouser, 2003: Frequency-dependent climate signal in upper and lower forest border trees in the mountains of the Great Basin. Climate Change, 59 , 233–244.
Jacoby, G C., and R. D’Arrigo, 1989: Reconstructed Northern Hemisphere annual temperature since 1671 based on high-latitude tree-ring data from North America. Climate Change, 14 , 39–59.
Jones, P D., 1994: Hemispheric surface air temperature variations: A reanalysis and an update to 1993. J. Climate, 7 , 1794–1802.
Jones, P D., and A. Moberg, 2003: Hemispheric and large-scale surface air temperature variations: An extensive revision and an update to 2001. J. Climate, 16 , 206–223.
Jones, P D., and M E. Mann, 2004: Climate over past millennia. Rev. Geophys., 42 .RG2002, doi:10.1029/2003RG000143.
Jones, P D., K R. Briffa, T P. Barnett, and S. F. B. Tett, 1998: High-resolution paleoclimatic records for the last millennium: Interpretation, integration and comparison with circulation model control-run temperatures. Holocene, 8 , 455–471.
Jones, P D., M. New, D E. Parker, S. Martin, and J G. Rigor, 1999: Surface air temperature and its changes over the past 150 years. Rev. Geophys., 37 , 173–199.
Jones, P D., T J. Osborn, and K R. Briffa, 2001a: The evolution of climate over the last millennium. Science, 292 , 662–667.
Jones, P D., T J. Osborn, K R. Briffa, C K. Folland, B. Horton, L V. Alexander, D E. Parker, and N A. Rayner, 2001b: Adjusting for sample density in grid-box land and ocean surface temperature time series. J. Geophys. Res., 106 , 3371–3380.
Jones, P D., K R. Briffa, and T J. Osborn, 2003a: Changes in the Northern Hemisphere annual cycle—Implications for paleoclimatology? J. Geophys. Res., 108 .4588, doi:10.1029/2003JD003695.
Jones, P D., T J. Osborn, and K R. Briffa, 2003b: Pressure-based measures of the NAO: A comparison and an assessment of changes in the strength of the NAO and in its influence on surface climate parameters. North Atlantic Oscillation, J. W. Hurrell et al., Eds., Amer. Geophys. Union, 51–62.
Kaplan, A., Y. Kushnir, M A. Cane, and M B. Blumenthal, 1997: Reduced space optimal analysis for historical data sets: 136 years of Atlantic sea surface temperatures. J. Geophys. Res., 102 , C13,. 27835–27860.
Kirchner, I., G L. Stenchikov, H F. Graf, A. Robock, and J C. Antuña, 1999: Climate model simulation of winter warming and summer cooling following the 1991 Mount Pinatubo volcanic eruption. J. Geophys. Res., 104 , 19039–19055.
Knutson, T R., T L. Delworth, K. Dixon, and R J. Stouffer, 1999: Model assessment of regional surface temperature trends (1949–97). J. Geophys. Res., 104 , 30981–30996.
LaMarche, V C., 1974: Frequency-dependent relationships between tree-ring series along an ecological gradient and some dendroclimatic implications. Tree-Ring Bull., 34 , 1–20.
Le Roy Ladurie, E., 1971: Times of Feast, Times of Famine: A History of Climate since the Year 1000. Doubleday, 426 pp.
Lorenz, E N., 1956: Empirical orthogonal functions and statistical weather prediction. Statistical Forecasting Project, Science Rep. 1, MIT, Cambridge, MA, 48 pp.
Luterbacher, J., C. Schmutz, D. Gyalistras, E. Xoplaki, and H. Wanner, 1999: Reconstruction of monthly NAO and EU indices back to AD 1675. Geophys. Res. Lett., 26 , 2745–2748.
Luterbacher, J., and Coauthors, 2002a: Reconstruction of sea level pressure fields over the Eastern North Atlantic and Europe back to 1500. Climate Dyn., 18 , 545–561.
Luterbacher, J., and Coauthors, 2002b: Extending North Atlantic Oscillation reconstructions back to 1500. Atmos. Sci. Lett., 2 ., doi:10.1006/asle.2001.0044.
Luterbacher, J., D. Dietrich, E. Xoplaki, M. Grosjean, and H. Wanner, 2004: European annual and seasonal temperature variability, trends and extremes since 1500. Science, 303 , 1499–1503.
Manley, G., 1974: Central England temperatures: Monthly means 1659 to 1973. Quart. J. Roy. Meteor. Soc., 100 , 389–405.
Mann, M E., 2000: Lessons for a new millenium. Science, 289 , 253–254.
Mann, M E., 2001: Climate during the past millenium. Weather, 56 , 91–102.
Mann, M E., 2002a: The value of multiple proxies. Science, 297 , 1481–1482.
Mann, M E., 2002b: Large-scale climate variability and connections with the Middle East in past centuries. Climate Change, 55 , 287–314.
Mann, M E., and M K. Hughes, 2002: Tree-ring chronologies and climate variability. Science, 296 , 848–849.
Mann, M E., and S. Rutherford, 2002: Climate reconstruction using ‘Pseudoproxies.’. Geophys. Res. Lett., 29 .1501, doi:10.1029/2001GL014554.
Mann, M E., and P D. Jones, 2003: Global surface temperatures over the past two millennia. Geophys. Res. Lett., 30 .1820, doi:10.1029/2003GL017814.
Mann, M E., R S. Bradley, and M K. Hughes, 1998: Global-scale temperature patterns and climate forcing over the past six centuries. Nature, 392 , 779–787.
Mann, M E., R S. Bradley, and M K. Hughes, 1999: Northern Hemisphere temperatures during the past millennium: Inferences, uncertainties, and limitations. Geophys. Res. Lett., 26 , 759–762.
Mann, M E., R S. Bradley, and M K. Hughes, 2000a: Long-term variability in the El Niño–Southern Oscillation and associated teleconnections. El Niño and the Southern Oscillation: Multiscale Variability and Its Impacts on Natural Ecosystems and Society, H. F. Diaz and V. Markgraf, Eds., Cambridge University Press, 357–412.
Mann, M E., E. Gille, R S. Bradley, M K. Hughes, J T. Overpeck, F T. Keimig, and W. Gross, 2000b: Global temperature patterns in past centuries: An interactive presentation. Earth Interactions, 4 .[Available online at http://EarthInteractions.org.].
Mann, M E., S. Rutherford, R S. Bradley, M K. Hughes, and F T. Keimig, 2003a: Optimal surface temperature reconstructions using terrestrial borehole data. J. Geophys. Res., 108 .4203, doi:10.1029/2002JD002532.
Mann, M E., and Coauthors, 2003b: On past temperatures and anomalous late-20th century warmth. Eos, Trans. Amer. Geophys. Union, 84 , 256–258.
McIntyre, S., and R. McKitrick, 2003: Corrections to the Mann et al. (1998) proxy data based and Northern Hemispheric average temperature series. Energy Environ., 14 , 751–771.
Meeker, L D., and P A. Mayewski, 2002: A 1400-year high-resolution record of atmospheric circulation over the North Atlantic and Asia. Holocene, 12 , 257–266.
O’Brien, S R., P A. Mayewski, L D. Meeker, D A. Meese, M S. Twickler, and S I. Whitlow, 1995: Complexity of Holocene climate as reconstructed from a Greenland ice core. Science, 270 , 1962–1964.