Constraining the low-frequency (LF) behavior of general circulation models (GCMs) requires reliable observational estimates of LF variability. This two-part paper presents multiproxy reconstructions of Niño-3.4 sea surface temperature over the last millennium, applying two techniques [composite plus scale (CPS) and hybrid regularized expectation maximization (RegEM) truncated total least squares (TTLS)] to a network of tropical, high-resolution proxy records. This first part presents the data and methodology before evaluating their predictive skill using frozen network analysis (FNA) and pseudoproxy experiments. The FNA results suggest that about half of the Niño-3.4 variance can be reconstructed back to A.D. 1000, but they show little LF skill during certain intervals. More variance can be reconstructed in the interannual band where climate signals are strongest, but this band is affected by dating uncertainties (which are not formally addressed here). The CPS reliably estimates interannual variability, while LF fluctuations are more faithfully reconstructed with RegEM, albeit with inevitable variance loss. The RegEM approach is also tested on representative pseudoproxy networks derived from two millennium-long integrations of a coupled GCM. The pseudoproxy study confirms that reconstruction skill is significant in both the interannual and LF bands, provided that sufficient variance is exhibited in the target Niño-3.4 index. It also suggests that FNA severely underestimates LF skill, even when LF variability is strong, resulting in overly pessimistic performance assessments. The centennial-scale variance of the historical Niño-3.4 index falls somewhere between the two model simulations, suggesting that the network and methodology presented here would be able to capture the leading LF variations in Niño-3.4 for much of the past millennium, with the caveats noted above.
The tropical Pacific Ocean is the stage for powerful air–sea interactions, whose oceanic expression ripples through the climate system on a variety of time scales (e.g., Dijkstra and Neelin 1995; Cane 2005; Emile-Geay 2006). The epitome of such interactions is the El Niño–Southern Oscillation (ENSO) in the interannual range, although variability exists on decadal (Graham 1994), interdecadal (Deser et al. 2004), centennial (Cobb et al. 2003), and longer time scales (Tudhope et al. 2001). The nature of future tropical Pacific trends in sea surface temperature (SST) will shape the response of the large-scale atmospheric circulation to global warming, with profound implications for regional hydroclimate regimes (Seager et al. 2009; Seager and Vecchi 2010). Despite its importance, the evolution of tropical Pacific SST under continued anthropogenic forcing is still subject to considerable uncertainties (Meehl et al. 2007; DiNezio et al. 2009; Collins et al. 2010; Vecchi and Wittenberg 2010). Constraining the tropical Pacific’s sensitivity to greenhouse forcing requires knowledge of its response to natural (solar and volcanic) radiative perturbations on decadal to centennial time scales. This type of problem is well posed for general circulation models (GCMs; Meehl et al. 2003), provided that their output can be validated against some estimate of natural climate variability. Because nonlinearities in the climate system complicate the separation of responses to natural and anthropogenic forcing, estimating the envelope of natural climate variability compels the investigator to look beyond the onset of the anthropogenic era, which corresponds to the best-observed interval in Earth’s history. In the tropical Pacific, the sparse network of instrumental climate observations rarely extends before 1900 and is marked by considerable discrepancies between SST datasets (Vecchi et al. 2008; Deser et al. 2010). Adding to this challenge, there exist very few long, high-resolution paleoclimate records from the area. Published records from the circum-Pacific region paint a coarse and heterogeneous picture of its climate of the past millennium (Moy et al. 2002; Graham et al. 2007; Conroy et al. 2009; Sachs et al. 2009; Tierney et al. 2010), illustrating the difficulty of extrapolating large-scale climate patterns from a select handful of individual proxy records, and highlighting the need for a unifying mathematical framework in which all the relevant proxy data can be synthesized.
Multiproxy reconstructions of tropical Pacific SST aim to combine the common signals contained in a comprehensive network of proxy records to provide continuous, quantitative reconstructions that can be rigorously compared against radiative forcing histories and climate model output. Despite several papers published to date, there is little consistency among reconstructions (Gergis et al. 2006). As recently reviewed by Wilson et al. (2010, hereafter W10), the earliest attempt combined a dozen annually resolved ENSO-sensitive proxy records, mostly corals and tree rings, to reconstruct ENSO variability over the last 300 years (Stahle et al. 1998). Using a more comprehensive network of 112 annually resolved proxy records (both tropical and extratropical), Mann et al. (2000) produced Niño-31 reconstructions for the past four centuries, quite highly correlated to that of Stahle et al. (1998) over their period of overlap, as expected from the number of common series (Adams et al. 2003). Evans et al. (2002) used a network of tropical Pacific coral records to reconstruct the first two modes of tropical Pacific SST, with limited reliability prior to A.D. 1800 or so. Other attempts focused on interannual oscillations obtained from long, replicated tree-ring chronologies (D’Arrigo et al. 2005) and other records (Braganza et al. 2009; Gergis et al. 2006; Gergis and Fowler 2009), in lieu of reconstructing low-frequency (LF) variations. W10 expand on previous work by considering several carefully chosen networks of high-resolution tropical records to reconstruct ENSO using three statistical methods. However, they explicitly filtered out LF signals from corals to focus on interannual variations, representing a limitation with regard to the assessment of LF variability. Their reconstructions stopped approximately A.D. 1600, but they report little consistency among the three methodologies they implemented before A.D. 1800 or so, even with identical proxy networks. This highlights the need for improved statistical methodologies and/or more proxies located from the central tropical Pacific to extract robust signals from such networks.
Using a global multiproxy database and a climate field reconstruction technique, Mann et al. (2009, hereafter M09) were able to extend a near-global surface temperature field back to A.D. 500, yielding the most comprehensive picture to date of the climate of the past millennium. In the tropical Pacific, they found the Niño-3 index to exhibit a slow, millennial-scale warming trend characterized by a relative cooling during the Medieval Climate Anomaly (MCA) compared to the Little Ice Age (LIA), which was absent from the numerical simulations performed with two coupled GCMs [Goddard Institute for Space Studies Model E-R (GISS-ER) and the National Center for Atmospheric Research (NCAR) Climate System Model, version 1.4 (CSM1.4)]. Because the method was global and contained a modest number of records from the tropical Pacific, the Niño-3 signal depends substantially on large-scale teleconnections that, in principle, might not be stationary. Its strict reliance on continuous records also precludes the use of some valuable ENSO records (e.g., Cobb et al. 2003). Indeed, most efforts to date have focused on annually resolved, continuous proxy records, resulting in a bias toward tree rings from the extratropics.
In this study, we use a complementary approach to reconstruct tropical Pacific SST variability over the past millennium. By focusing on tropical records from circum-Pacific regions, and by relaxing the requirement of continuous, annually resolved proxies, we explicitly target LF tropical Pacific variability, as well as intermittent interannual signals over the past millennium. We apply the same method as Mann et al. (2008)—hybrid regularized expectation maximization (RegEM) truncated total least squares (TTLS), described in section 3b—to reconstruct a widely used ENSO index on annual to multicentennial time scales. This differs from the M09 methodology in that we target a single index, not a climate field. The focus on a single index, although eliminating spatial information, yields a simpler estimate that better lends itself to sensitivity analyses. It also allows the simpler composite-plus-scale (CPS) method (Bradley and Jones 1993; Jones and Mann 2004) to be used as a point of comparison. In the first part of this two-part paper, we extensively evaluate both methods using frozen network and pseudoproxy tests, assessing the reliability of the reconstructed indices. In the second part (Emile-Geay et al. 2013, hereafter Part II), we apply these methods to extend three instrumental datasets [Extended Reconstructed SST, version 3 (ERSSTv3), the Second Hadley Centre SST dataset (HadSST2), and Kaplan SST] over the past millennium. It will be shown that, to some extent, details of the statistical methodology are secondary, and that much of the uncertainty between these preinstrumental tropical SST reconstructions stems from discrepancies among the historical SST analysis products themselves. Nonetheless, they enable robust conclusions that challenge our current understanding of LF tropical Pacific variability, with important implications for climate prediction beyond the interannual scale.
The paper is structured as follows: we start with a description of the proxy database (section 2) and statistical methodology (section 3); we then validate both methods over the historical interval via frozen network analysis (section 4) and pseudoproxy experiments (section 5). Results are discussed in section 6, providing an outlook toward Part II, in which several reconstructions of Niño-3.4 are analyzed and implications for GCM representations of the tropical Pacific climate response to radiative forcing are discussed.
a. Target temperature data
To alleviate computational complexity, we focus on a simple measure of ENSO (the Niño-3.4 index, defined as the average SST anomaly in the region 5°N–5°S, 170°–120°W; e.g., Trenberth 1997). The Niño-3.4 index is closely related to the first principal component (PC1) of monthly SST in all three major historical SST analyses: HadSST2 (Rayner et al. 2006), Kaplan SST (Kaplan et al. 1998), and ERSSTv3 (Smith et al. 2008). Both Kaplan and ERSSTv3 use some form of interpolation to impute missing values when they are missing. Because HadSST2 does not, we performed the imputation via the RegEM algorithm (Schneider 2001) with individual ridge regressions.2 We refer to the resulting dataset as HadSST2i. Niño-3.4 indices calculated from the three datasets are plotted in Fig. 1.
Since ENSO reaches its peak amplitude and displays most pronounced teleconnections during Northern Hemisphere winter (Bjerknes 1969; Horel and Wallace 1981; Ropelewski and Halpert 1987), we annualized datasets by computing December–February (DJF) averages, where it is understood that January and February belong to year 0 while December belongs to year −1, in the terminology of Rasmussen and Carpenter (1982). To account for the strong decrease in proxy availability after 1995, our historical period ends that year. Although the three datasets share a large fraction of measurements [from the International Comprehensive Ocean–Atmosphere Data Set (ICOADS); Worley et al. 2005], it is clear that the various quality-control and interpolation strategies lead to significant discrepancies before approximately A.D. 1950, with ERSSTv3 displaying notably less variance compared to the other two datasets, especially before 1877. This is at odds with documentary accounts (Quinn 1993; Ortlieb and Macharé 1993; Whetton and Rutherfurd 1994; DeVries et al. 1997; Davis 2001) and annually banded coral data (Urban et al. 2000; Evans et al. 2000; Ault et al. 2009), which depict intense ENSO activity during that period. The Kaplan SST analysis, on the other hand, was designed to retain variance of the leading (large scale) eigenmodes, which sometimes requires overwriting data entries (Kaplan et al. 2003), and it exhibits the largest decadal variability before approximately 1875. However, Kaplan et al. (1997) warn that a caveat of their methodology is to artificially redden the resulting SST field; that is, increasing LF variability at the expense of high-frequency (HF) variability. The HadSST2i data go back farthest in time (A.D. 1850), albeit with large amounts of missing values before approximately A.D. 1950, whose imputation did not overwrite existing data. Its late-nineteenth-century variability falls between Kaplan and ERSSTv3 (Fig. 1a).
A look at LF instrumental variability (Fig. 1b) reveals large discrepancies among Niño-3.4 trends, whose magnitudes vary by more than a factor of 2: the smallest trend is found for Kaplan SST (0.18°C century−1), followed by HadSST2i (0.38°C century−1) and ERSSTv3 (0.44°C century−1). These discrepancies are well documented (Vecchi et al. 2008; Karnauskas et al. 2009) and are known to be principally due to differing interpolation methods, treatments of satellite-derived data, and instrumental bias adjustments (Deser et al. 2010; Yasunaka and Hanawa 2010). Such discrepancies between instrumental SST data products will prove to be crucial sources of uncertainty in our estimates of Niño-3.4 variability before the historical era. However, whenever illustration is warranted we use HadSST2i as our target, as a middle-of-the-road estimate of the historical Niño-3.4 trajectory.
b. Proxy network
Our network gathers paleoclimate archives from ENSO-teleconnected regions (Horel and Wallace 1981; Ropelewski and Halpert 1987; Trenberth et al. 1998) using the following criteria: (i) located equatorward of 35°, (ii) temporal resolution of 5 yr or finer, (iii) absolute dating uncertainties of less than ±5 yr, (iv) documented sensitivity to surface temperature and/or precipitation, and (v) availability before A.D. 1851 and after A.D. 1974 (though not necessarily continuously). These criteria being arguably subjective, we screened the network for significant correlation with the raw Niño-3.4 index (section 3a), providing an objective measure of a proxy series’ relevance to a tropical Pacific climate reconstruction. (The full proxy database is available online at http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:4253936273493430::::P1_STUDY_ID:13684 and is catalogued in the appendix in Table A1, with boldface indicating those records that are significantly correlated with the HadSST2i Niño-3.4 index.) All records are publicly available from the National Climatic Data Center (NCDC), with one exception (updated Palmyra record; K. M. Cobb et al. 2013, unpublished manuscript). The results of the screening differ slightly for each of the three target SST datasets used (by at most one series retained).
Our database differs significantly from that constructed by M09 in pursuit of global temperature reconstructions and W10 in pursuit of multiproxy reconstructions of tropical Pacific climate. M09 consider records that are continuous and without latitudinal limits; some sedimentary or speleothem records have resolutions lower than 5 yr. In contrast, the W10 study uses continuous, annually or better-resolved tropical records (corals, tree rings, and ice cores) to reconstruct Niño-3.4 since A.D. 1650. In loosening the requirement for continuous, annually resolved tropical proxy data, we seek to provide a more accurate view of decadal- to centennial-scale tropical Pacific climate variability back through the MCA. Ours is also the first ENSO reconstruction effort to incorporate discontinuous series (the Palmyra and Mafia Island coral and Lake Challa records) and comprises the latest available data.
Figure 2a displays the geographical distribution of the proxy database by proxy type. Coral sites are distributed throughout tropical oceans, mainly the Pacific. Tree-ring records are mostly confined to areas outside the tropics where seasonality is large enough for visible annual rings to enable construction of absolute annual chronologies. Speleothems are available from caves in several tropical sites, but none of these records passed the screening test (section 3a). Tropical ice cores and lake records are located farthest from the ENSO center of action, and are therefore more vulnerable to shifting teleconnections and/or climate variability not endogenous to the tropical Pacific. Only one marine sediment core (Black et al. 2007) contains sufficiently high-resolution data to be included in our database. After screening for significant Niño-3.4 correlations, only one ice core record (Quelccaya; Thompson et al. 2006a) is included in the screened proxy network, which is therefore mostly composed of coral and tree-ring records (all annually resolved).
The temporal coverage of our proxy network is displayed in Fig. 2b. The reader will note the steep decline in the availability of coral records over time, with virtually no series available prior to A.D. 1650, with the notable exception of the Palmyra fossil coral chronology (Cobb et al. 2003), which extends back to A.D. 932, albeit discontinuously, covering about 600 of the past 1000 years (K. M. Cobb et al. 2013, unpublished manuscript).
All series were preprocessed according to the following protocol:
Annualization: Series with subannual resolution were annualized by extracting the December–March months, corresponding to the season of most pronounced ENSO teleconnections. Proxies with a time resolution coarser than one year were linearly interpolated to an annual time scale.
Completeness: Only proxies available before A.D. 1850 and at least until A.D. 1975 were included, to ensure a long enough calibration.
Power transform: Those series that deviated from Gaussianity were subjected to a power transform (Box and Cox 1964), a classical way to render them approximately Gaussian so they can be incorporated into this parametric statistical framework.
Screening: Following Mann et al. (2008), we performed reconstructions on both unscreened and screened predictor networks. Predictors retained by the screening procedure were those that exhibited a significant linear correlation with the raw Niño-3.4 index over the calibration period. To gauge significance, we performed Monte Carlo tests with an ensemble of 1000 first-order autoregressive [AR(1)] processes with identical lag-1 autocorrelation γ as the proxy of interest. The absolute value of the correlation between Niño-3.4 and each red-noise series is set aside; the 90th quantile of their distribution then defines an approximate 90% confidence bound against the null hypothesis that the absolute correlation is zero.
The effect of screening further reduces the number of proxy predictors: with a HadSST2i target, the number goes from 57 to 36 series (see Fig. 2 of Part II). Of those 33 time series, only 3 go back to A.D. 1000. The first is the PC1 of the Palmer Drought Severity Index (PDSI) reconstruction from the North American Drought Atlas (NADA; Cook 2008), including only grid points equatorward of 35°N.3 The second is derived from a network of drought-sensitive trees from Chile (Le Quesne et al. 2006). The third is the Lake Challa varve thickness record of Wolff et al. (2011).
b. Reconstruction methods
There is now a rich history of climate reconstruction techniques from proxy records; we refer to Jones et al. (2009) for a recent and comprehensive review. While few records are available from the tropical Pacific (strictly defined), one can take advantage of the large-scale teleconnectivity of tropical climate to observe ENSO behavior from remote locations (Evans et al. 2002). Indeed, historical use of the term “teleconnection” itself is intimately tied to the discovery of the ENSO phenomenon (Walker and Bliss 1932; Bjerknes 1969; Wallace and Gutzler 1981; Horel and Wallace 1981) and its physical basis is well understood (e.g., Hoskins and Karoly 1981; Sardeshmukh and Hoskins 1988; Ting and Held 1990; Trenberth et al. 1998; Liu and Alexander 2007; Seager et al. 2003, 2005). The tropics (particularly in the circum-Pacific region) are dotted with ENSO-sensitive sites (Ropelewski and Halpert 1987) where paleoclimate archives can be successfully used to monitor past ENSO conditions (Evans et al. 2000, 2002). This was also the approach taken in early ENSO reconstructions (Quinn 1993; Whetton and Rutherfurd 1994), as well as more recent attempts (Gergis and Fowler 2009; Braganza et al. 2009), albeit in a more qualitative sense. Climate reconstruction techniques make use of the same physical relationships but allow climate anomalies to be quantitatively assessed from paleoclimate proxies, together with an estimate of uncertainty. They rest on the assumption that each proxy records some aspect of tropical Pacific SSTs in an approximately linear manner, and with a stationary covariance structure. These assumptions are testable and seem to hold to first order (Evans et al. 2000; Sterl et al. 2007; Furtado et al. 2009).
1) Hybrid RegEM reconstruction
Following recent work (Mann et al. 2007b, 2008; M09), we use the RegEM technique (Schneider 2001) in a hybrid context, that is, reconstructing separately the high- and low-pass-filtered data (with a frequency split fs = 0.1 yr−1). RegEM is a variant of the expectation maximization (EM) algorithm (Dempster et al. 1977; Little and Rubin 2002) designed for the imputation of missing values in spatiotemporal datasets typically encountered in climatology. It consists of regressing missing values over available ones using maximum-likelihood estimates of the mean and covariance matrix of the data. Its central assumptions are that (i) the data are missing at random, in the sense that the probability of a data point being missing is conditionally independent of the value of the measurement, and (ii) the data are normally distributed.
The RegEM algorithm has been widely applied to paleoclimate reconstructions (Mann and Rutherford 2002; Rutherford et al. 2003; Mann et al. 2005; Rutherford et al. 2005; Mann et al. 2007b, 2008; Riedwyl et al. 2008; M09; W10). Smerdon and Kaplan (2007) note that LF variability is poorly preserved in RegEM when using Tikhonov regularization [also known as “ridge regression”; Tikhonov and Arsenin (1977)] but it was shown that regularization by TTLS (Fierro et al. 1997) is more robust in this regard (Mann et al. 2007a). A downside to TTLS is that, unlike ridge regression, a discrete truncation number—the number of modes retained in the singular-value decomposition of the augmented data matrix—must be a priori specified for the proxy network and time interval at hand. Truncation parameters for both high- and low-frequency networks were assigned for each 100-yr interval (A.D. 1000–99, A.D. 1100–99, …, A.D. 1800–50 based on the criteria of Mann et al. (2008). The HF truncation level is chosen as the number of “significant” principal components of the proxy data matrix in each 100-yr interval (estimated as the number of singular values of the proxy data matrix that lie above a least squares fit to the trailing singular values in a log-linear scree plot). The LF truncation level is chosen so as to retain at least 33% of the variance in this spectral range.4 Hence the truncation choice is broadly adaptive and reflects the need for increased regularization back in time, when imposed by data sparsity (see Fig. 2b). Just as minor departures from Gaussianity do not invalidate normal statistics altogether, the fact that the missing-at-random assumption is not strictly satisfied in the paleoclimate context (Smerdon et al. 2008; Tingley et al. 2012) has been shown by the pseudoproxy studies cited above, and this study (section 5), to yield reasonable reconstructions nonetheless.
2) Composite-plus-scale reconstruction
We also apply the relatively simple CPS methodology (Bradley and Jones 1993; Esper et al. 2002; Jones and Mann 2004; D’Arrigo et al. 2005; Wilson et al. 2006; D’Arrigo et al. 2009) to our proxy network, following W10, who underscored the value of CPS in reconstructing ENSO. They restored the mean and variance of each nest5 to those of the most replicated proxy nest, resulting in built-in homoskedasticity (maintenance of variance through time). This was done to avoid the loss of variance associated with the declining number of proxy predictors further back in time, but this artificial inflation of variance also increases reconstruction bias. Nevertheless CPS serves as a straightforward counterpart to RegEM, and will prove to provide comparable estimates of decadal variability. To ensure meaningful averaging, we require that at least four proxy series be available at any given time. This is only the case after A.D. 1146, so our CPS reconstructions are zero beforehand.
4. Validation using frozen network analysis
A common way to assess the quality of statistical predictions is cross-validation (e.g., Hastie et al. 2008). In our context, it consists of estimating the instrumental target (here the historical Niño-3.4 index) using proxy data, when some of the instrumental data is withheld from the calibration period to compute out-of-sample prediction error (Wilks 1995). When the diminishing availability of data with time is taken into account, this validation exercise is called a frozen network analysis (FNA; see, e.g., Mann et al. 2007b). Following McShane and Wyner (2011), we selected contiguous, sliding holdout blocks as verification sets. The blocks are 45 years long and were slid in 2-yr steps, resulting in 48 blocks covering the A.D. 1857–1995 interval (the instrumental interval available in all three SST datasets). Calibration sets are defined as the complement of the verification set in the instrumental set; for example, an A.D. 1931–75 verification interval corresponds to a [A.D. 1857–1930]∪[A.D. 1976–95] calibration interval. Compared to traditional calibration/verification exercises (which consider at most two disjoint intervals), this allows the estimation of a distribution of verifications scores for each network. We note that only about three truly independent 45-yr blocks can cover a 145-yr interval, so the effective sample size is lower than 48. We consider nine time segments to define the networks: A.D. 1000–99, A.D. 1100–99, …, A.D. 1700–99, and A.D. 1800–50.
To ensure that no information from the verification sets enters the prediction of Niño-3.4, the proxy screening process is repeated separately for each block, using calibration data only. Following standard practice, we report three statistics as an objective measure of quality 1) the reduction of error (RE), 2) the “coefficient of efficiency” (CE; Briffa et al. 1988; Cook et al. 1994),6 and 3) the coefficient of determination R2 (e.g., Wilks 1995) (the square of Pearson’s product moment correlation). For observed and estimated time series z and ,
where μc and μυ are the means of z over the calibration and verification intervals, respectively. A perfect reconstruction will exhibit unit scores, but there is debate as to which scores delineate a “skillful” reconstruction. Ammann and Wahl (2007) point out that R2 is often an inadequate metric of LF skill, being overly prone to false negatives and false positives, so its results should be viewed with caution. The most stringent of the three metrics is CE, which by definition is always less than RE. Bürger (2007) showed that CE, which rewards reconstructions tracking shifts in the mean with appropriate amplitude, derives from a skill score (Wilks 1995). RE does not, but has a more natural interpretation as the amount of variance captured by a reconstruction, with the caveat that stationary stochastic processes can produce positive RE scores by chance alone. To gauge the influence of persistence on validation scores, we perform 1000 Monte Carlo simulations with random time series fitted to the instrumental target (at both high and low frequencies). The chosen model is a second-order autoregressive [AR(2)] method, chosen using the “ARfit” algorithm (Schneider and Neumaier 2001). From these distributions we extract the 95% quantile, yielding the critical values REcrit, CEcrit, and (Table 1).
a. RegEM validation
Distributions of RE scores for the frozen network analysis are shown in Fig. 3 for a HadSST2i target. Total scores are seen to remain relatively constant over time, in the vicinity of 0.5. This is primarily explained by the large HF scores, while LF scores fluctuate widely. The fact that median RE scores do not increase monotonically as proxy networks grow denser may be due to several factors: colinearity (redundancy) between proxies, overfitting to noise, or an overly simplistic statistical model—we cannot distinguish among these possibilities. Solid lines represent scores obtained with screened proxy networks, and generally show more skill in the post-A.D. 1600 period, which means that these problems are partially mitigated by the screening procedure. Still, screened-network LF RE scores only breach zero for three networks: A.D. 1100, A.D. 1200, and A.D. 1600. A potential problem in the assessment of LF skill by FNA is the shorter calibration period, reduced from 145 to 100 yr (ERSSTv3 and HadSST2i) and from 139 to 94 yr (Kaplan) compared to a real reconstruction. This question will be explored using pseudoproxy experiments in section 5c, where it will be argued that LF scores are systematically underestimated by FNA compared to what would be obtained from longer calibration and verification periods, and the uncertainties are thereby increased.
A more comprehensive picture is offered in Fig. 4, which assesses screened reconstructions for all three targets. Since R2 often fails to discriminate between good and poor reconstructions, only the median RE and CE scores are reported here for conciseness, although all three statistics are reported in Table 1. Sensitivity tests (not shown) demonstrate that the results are not sensitive to small (5–10 yr) variations in the length of the verification period or the overlap between blocks, as long as a few decades are available for verification.
Several results are noteworthy:
Critical values of RE and CE are usually negative, except at low frequencies, where they reach up 0.23 (Kaplan). Critical R2 scores are on the order of 0.5 for LF, and 0.13 for total scores.
Consistent with previous work, verification scores do not necessarily improve as denser networks of predictors are used, confirming the impression gleaned from Fig. 3. The RE or CE scores do not increase monotonically over time with any target.
The HF verification scores considerably exceed the critical values for all metrics for all networks. This generally good performance is due to the fact that Niño-3.4 displays most of its variance in the interannual band, which is also where the signal-to-noise ratio is highest in most proxy measurements. Accordingly, we find that total scores are always above critical values even when LF scores are not, which may overstate our true confidence in the reconstructions’ quality given the presence of age uncertainties in the preinstrumental era.
As a consistency check, we analyzed regression residuals for randomly selected verification periods. Residuals were tested for Gaussianity via the Spiegelhalter test (Spiegelhalter 1983) and for autocorrelation via the Durbin–Watson test (Durbin and Watson 1951; Kanzler 1998) (not shown). Skillful reconstructions generally exhibit test p values exceeding 0.05, indicating that residuals are statistically indistinguishable from uncorrelated, Gaussian residuals at the 95% level. Conversely, reconstructions that do not pass the CE or RE thresholds consistently exhibit non-Gaussian and autocorrelated residuals, confirming the poor quality of such predictions.
b. CPS validation
In Fig. 5 we plot the RE score for the CPS reconstructions. By design, CPS rescales the composite in each nest (which changes every time a proxy drops in or out of the network), so an RE value is computed for each such nest (sometimes as short as a few decades). The reconstructions display consistently positive HF RE scores (about 0.6, like RegEM reconstructions) for all times after A.D. 1150. In contrast, LF RE scores are mostly insignificant, indicative of skill-less reconstructions—a result found to stem from poor estimates of the mean in each nest. That CPS should fail to reconstruct centennial variability is not surprising given that it centers composites at every nest, thereby erasing the previous nest’s memory with regard to the mean. Therefore, one can only expect CPS to perform well on time scales shorter than the average nest size. For centennial variability to be skillfully reconstructed with CPS, one would need to employ proxy series available continuously through several centuries, not on a network characterized by a progressive dropout of proxy series back in time, such as ours (Fig. 2). Furthermore, CPS implicitly assumes that each proxy has an equal say in the final reconstruction, and that the central limit theorem will take care of reducing errors. Since every proxy expresses Niño-3.4 variability in different ways, the very structure of the regression framework enables it to capture such differences, unlike CPS. These limitations lead us to favor RegEM for LF reconstructions, though CPS reconstructions will provide a useful reference point in Part II.
Overall, we derive four major conclusions from this exercise:
In the absence of dating errors, our network enables the skillful reconstruction of Niño-3.4 interannual variance with RegEM and CPS, typically around 60% thereof. In accordance with previous studies, we find LF variability relatively more difficult to reconstruct because most of the signal is in the interannual (ENSO) band.
The RegEM solutions capture LF variability more skillfully than CPS in all cases, but LF scores vary strongly with the target SST dataset.
For all three targets, screened networks generally improve prediction skill.
Differences in median total RE scores are overall fairly minor (Fig. 4, rightmost column), and one may ask whether they are statistically significant given the width of the distributions (illustrated in Fig. 3). A caveat of frozen network approaches is that the length of the calibration period is reduced from ~150 to ~100 yr, which affects both the screening process (tending to select a slightly different list of proxies than when screening over the entire historical interval) and the ability to capture centennial-scale variability. These considerations may result in overly conservative conclusions in regards to LF skill. Conversely, neglecting the dating errors likely leads to an overly optimistic assessment of HF skill. We now turn to pseudoproxy experiments for a complementary appraisal of our methodology.
5. Validation using pseudoproxy experiments
a. Experimental design
Although the previous analysis constitutes an important crosscheck of the methodology, it only evaluates its performance vis-à-vis the decreasing number of predictors back in time (i.e., the network depopulation). Pseudoproxy experiments (PPEs; Smerdon 2012) offer another useful benchmark: the numerical laboratory provided by long integrations of coupled general circulation models (CGCMs), wherein the effects of stochasticity, varying proxy quality, and frequency-specific calibrations can be investigated (Mann and Rutherford 2002; von Storch et al. 2004; Mann et al. 2005, 2007b; Hegerl et al. 2007; Küttel et al. 2007; Riedwyl et al. 2008; Smerdon et al. 2010b, 2011).7
Here we use the pseudoproxy framework to test the robustness of the hybrid RegEM reconstruction of Niño-3.4, using the output of two 1156-yr-long GCM simulations with the NCAR Community Climate System Model, version 4.0 (CCSM4.0), CGCM:8
LM: A simulation of the last millennium forced with estimates of volcanic and solar forcing, covering the A.D. 850–2005 period.
PI: An unforced simulation of the last millennium, using fixed A.D. 1850 top-of-the-atmosphere conditions.
These components follow phase three of the Paleoclimate Modelling Intercomparison Project (PMIP3) protocols for last millennium experiments described in Schmidt et al. (2012). Their implementation in CCSM4.0 is documented by Landrum et al. (2013). The model’s simulation of tropical Pacific climate is described extensively in Deser et al. (2012). It is characterized by a realistic seasonal evolution of atmospheric and oceanic fields, realistic asymmetry between El Niño and La Niña events, and a realistic pattern of teleconnections. Interannual ENSO amplitude, however, is overestimated by approximately 30%, while the decadal component is weaker than observed and displays subdued extratropical linkages. The LF variability in the PI control simulation must be intrinsically generated by the coupled system, since it is not subjected to changing external forcings such as greenhouse gas concentration, volcanic aerosols, or solar irradiance variations. This permits an assessment of Niño-3.4 reconstruction skill in the face of intrinsically generated global variability only.
Simulated Niño-3.4 spectra are plotted in Fig. 6, alongside instrumental Niño-3.4 spectra from the three historical SST analyses under consideration. In addition to the points mentioned above, it can be seen that the three instrumental spectra indeed fall in between the LM and PI end members at centennial scales, and that the magnitude of their centennial variability scales like the linear trends of Fig. 1. The greater multidecadal variability in the instrumental records could be due in part to anthropogenic aerosol forcings, which are omitted from the PI control simulation and enter into the LM simulation only at the end of the time series. The ENSO-band variability is quite realistic in both simulations, but overly energetic at the quadrennial scale, particularly in LM. This greater ratio of HF (interannual) to LF variability in the simulations must be kept in mind to interpret our results.
In seeking to construct meaningful pseudoproxy experiments to test our methodology, a few points deserve consideration:
Contrary to the usual assumption used in PPEs, many proxies in our network were selected on the basis of their correlation to Niño-3.4 SST, not local SST. This is particularly true of coral δ18O records, which often record stronger hydrological than thermal signals (Gagan et al. 2000) but can nevertheless faithfully reconstruct large-scale SST patterns by virtue of the strong coupling between tropical SSTs and rainfall (Evans et al. 1998). Our network must thus reflect this diversity of physical controls.
- In the absence of forward models calibrated to each proxy, a simplifying assumption must be made to construct pseudoproxies P at each location x and time t. We follow previous studies (e.g., Mann et al. 2007b; Christiansen et al. 2009; Smerdon et al. 2010b, 2011) in using a white-noise model:
The V(x, t) was assigned as follows: all corals are assumed to record local temperature; ice cores and speleothems are assumed to record (power transformed) precipitation; tree-ring records are assumed to record drought conditions as described by the PDSI; Lake Challa varve thickness (Wolff et al. 2011) is controlled by zonal wind, while Cariaco basin Mg/Ca (Black et al. 2007) is controlled by local temperature.
By consistency with our methodology, we screen each pseudoproxy for a significant correlation to the target Niño-3.4.
Given the rapid decrease in proxy availability over time (Fig. 2), and the marked temporal dependence of verification statistics (section 4), it is clear than any realistic pseudoproxy network should exhibit a similar pattern of temporal availability. To do this, we rank the screened pseudoproxies by their correlations to the simulated Niño-3.4. Similarly, we rank the screened real proxies by their correlations with HadSST2i Niño-3.4 SST over the historical epoch. Then we pair up the pseudo- and real proxies by rank, and assign the temporal mask from the real proxy to its rank-partner pseudoproxy. We generate a reconstruction using the masked pseudoproxies, as we do for real proxies. This process is repeated 200 times for each noise level.
These features make our PPEs more realistic than previous studies in some respects and less so in others. On the one hand, the design takes advantage of the fact that real-world proxies record a range of physical variables (not just temperature) related to Niño-3.4 SST via teleconnections, which are allowed to evolve with the model’s climate state. It is also the first (to our knowledge) to factor in real-world proxy availability. On the other hand, our setup radically simplifies the physical controls on each proxy, neglecting the multivariate nature of most proxy systems (with attendant constructive or competing effects between variables), especially those involving oxygen isotopes. While this design is far from ideal, it is a step toward a more comprehensive treatment, which must ultimately involve high-quality GCM simulations coupled to validated forward models of proxy formation. At present, biased GCMs and incomplete (or nonexistent) forward proxy models limit our ability to assess paleoreconstruction skill. However, our study emphasizes important questions that can help guide future work.
The synthetic and actual proxies are compared in Fig. 7, where it is evident that the pseudoproxy design tends to overestimate true correlations for some proxies and underestimate them for others. Note also that for sufficiently low correlations, the addition of noise to the local climate variable sometimes acts to increase absolute correlations to Niño-3.4 above their noise-free reference level. Hence, pseudoproxies constructed with SNR = 1.0 may exhibit slightly higher correlations than those constructed with SNR = ∞ for proxy indices above 20 or so, although their median value is much lower than for SNR = ∞. In the extreme case where noise-free correlations are close to nil (index above 45 or so), the majority of noise realizations for all other SNRs exhibit higher absolute correlations than for SNR = ∞; this is, however, not sufficient to make them significant, so they are always screened out of the pseudoproxy pool and so do not contribute to the reconstructions.
A spatial view of such correlations is presented in Fig. 8. As expected, the higher the SNR, the more proxies prove significant in each noise realization. Given the discrepancies noted above, however, one may ask which SNR case, if any, constitutes the closest analog to the observed Niño-3.4 signal strength. Overall, the histogram of observed absolute correlations to Niño-3.4 (not shown) is most comparable to the SNR = 1.0 case. This notion is confirmed by looking at the effective SNR over time—that is, the collective ability of the pseudoproxy network to capture a relationship to Niño-3.4 over time in a manner that mimics reality. To quantify this, we define the partial SNR as
where ri is the absolute correlation of the ith proxy (or pseudoproxy) to Niño-3.4. This expression—only valid for standardized variables—is a direct consequence of the definition of the pseudoproxies and the linearity of the correlation operator. The effective SNR is then the sum of all partial SNRs for all proxies available at a given point in time. It is plotted in Fig. 9, which shows the network SNR = 1.0 to be most comparable to the observed network.
b. Results: High- and low-frequency performance
We compute the RE and CE statistics for the raw and low-pass-filtered time series over the A.D. 1001–1850 verification interval (with respect to an A.D. 1851–1995 calibration), and display the results in Fig. 10 for the LM simulation and in Fig. 11 for the PI simulation. As in the previous section, we tested our results against distributions drawn from an ensemble of noninformative predictors, taken as 1000 realizations of an AR(2) process fitted to the target Niño-3.4. Hence, a good reconstruction should exhibit both RE > 0 and beat the noninformative prediction (in the sense that its sampling distribution of RE and CE scores should be to the right of the noninformative prediction, with little to no overlap). To account for the effects of stochasticity, 200 random realizations of the noise matrix were generated, yielding 200 independent pseudoproxy tests. This large ensemble size is important in order to derive the most general conclusions (Christiansen et al. 2009; see also Rutherford et al. 2010).
In the LM case, the total and LF RE scores beat noninformative predictors at reconstructing Niño-3.4 in all cases but SNR = 0.25, and the SNR = 1.0 rivals or even exceeds the SNR = ∞ case (Fig. 10). That performance should increase despite a decrease in SNR deserves an explanation: it is a consequence of the fact, discussed above, that sampling variability may at times increase a noisy pseudoproxy’s correlation to Niño-3.4 above its noise-free value, hence marginally improving the effective network quality. The case most comparable to the real network (SNR = 1.0) displays relatively high LF RE scores (median = 0.75), but important variance losses are apparent, as is a tendency for a warm bias confirmed by the negative LF CE scores (median = −0.3; not shown). This bias is due to the presence of a strong trend over the calibration interval, which is the warmest period of the whole interval.
In the PI case, reconstruction skill is comparable for the raw Niño-3.4 (Fig. 11b) but drastically lower at LF (Fig. 11c). This is not surprising, as LF variability is subdued in the PI simulation, and hence is not detectable by a sparse pseudoproxy network. That total RE scores should be similar to the LM case underscores the dominance of interannual variability in the CCSM4.0, with or without forcing. Given than the magnitude of LF variability in all three instrumental products (Fig. 6) is bracketed by those two end members, we can expect the LM case to provide an upper bound, and the PI case a lower bound, on LF skill.
To visualize performance in frequency space, Fig. 12 shows the multitaper method (MTM) spectra of the target and reconstructed Niño-3.4 indices for SNR = 1.0. In the LM case, a large overlap between the 95% confidence regions indicates that reconstructions accurately depict the target Niño-3.4 spectrum out to periods of approximately 10 yr. For all lower frequencies, the reconstructions systematically underestimate variability, as expected from the “regression dilution” phenomenon (Frost and Thompson 2000). This effect is likely to be at play in real-world reconstructions as well. In the PI case, the lack of substantial multidecadal to centennial variability is accurately captured by the reconstructions, albeit generally with incorrect phase, as attested by the poor reconstruction statistics (Fig. 11). The reconstructions also correctly capture the spectral signature of variability at periods shorter than 10 yr, but slightly underestimate it in the 10–20-yr range.
Extending these results to the real world raises a few issues. First, the pseudoproxy model used here is a simplistic representation of how climate information is incorporated into proxy data. Second, there is no way of knowing which of the two GCM end members is closer to the true tropical Pacific climate. The existence of important LF variability in many sedimentary records of ENSO (cf. section1) tends to suggest that real variability was—regardless of its cause—closer in character to the LM-forced simulation than the PI simulation, implying that for realistic SNR values, RegEM TTLS should yield skillful reconstructions for both high and low frequencies using the available proxy network. Finally, dating uncertainties in real proxy records are expected to deteriorate HF (interannual) skill relative to the pseudoproxy case, but the effect is expected to be negligible at longer periods due to our stringent dating control (5 yr or finer). So while our reconstructions will likely underestimate the very long-term (millennial) trends, these results suggest that they should faithfully capture decadal to centennial oscillations, albeit with inevitable variance losses (i.e., for RE = 0.5, the amplitude will be damped by a factor of , but the phase will be correctly estimated). This claim of a faithful reconstruction of LF signals, however, is at odds with the frozen network analysis of section 4, a problem to which we now turn.
c. Results: Accuracy of frozen network analyses
The PPE framework permits an evaluation of how FNA captures the true distribution of verification scores in an internally consistent manner. Here we wish to quantify the ability of calibration/verification exercises to quantify reconstruction skill when the answer is independently known and many noise realizations are available. To do this, we repeat the analysis of section 4 on the LM and PI Niño-3.4 targets, using the same protocol as above. We assume that only 145 years of “instrumental” data are available, partition them between 100-yr-long calibration set and 45-yr-long verification sets as before, and perform FNA on those sets. We then compare them to the true distributions of RE scores obtained with a 145-yr-long calibration and 100-yr-long verification intervals (A.D. 1000–99, A.D. 1100–99, …, A.D. 1800–50). Because the two types of distributions are evaluated from samples of different size, we apply a kernel density smoothing with a bandwidth of 0.1 to remove granularity arising from sampling variability. For brevity, we only describe the SNR = 1.0 case, but results are qualitatively similar for all noise levels.
The LM results are shown in Fig. 13 for RE scores (similar conclusions hold for CE scores). Pseudoproxy (PP) verification scores (dash–dotted lines) are seen to worsen gradually over time, as expected from network depopulation (Fig. 9). Yet the FNA distributions (solid lines) are consistently shifted left of the PP distributions, especially at low frequencies, whereas HF distributions overlap largely with their PP counterparts, sometimes slightly under- or overestimating them. The only exceptions are the A.D. 1000–99 and A.D. 1100–99 nests, where FNA overestimates total skill. However, since their median never breaches zero, one would judge such reconstructions unreliable regardless of the assessment method. Interestingly, median LF PP RE scores for the forced simulation are high (>0.7) for all networks after A.D. 1200, so the total scores presented in Fig. 10 are brought down by the two earliest nests.
In the PI case (not shown), the opposite happens at LF: FNA tends to overestimate LF RE distributions, but in no instance do RE scores rise above zero, indicative of skill-less reconstructions. Because of the lack of substantial LF variability in this simulation, distributions of HF and total variability largely coincide, consistent with previous findings, and in all cases those scores are significantly underestimated by FNA compared to PP.
These results suggest that FNA can significantly err in its estimation of verification skill, especially at low frequencies. The case of most interest is LM, where LF variability is most comparable to that observed, and for which LF skill is consistently underrepresented in FNA compared to the PP case (by about 0.3 units over most nests). Although the aforementioned caveats limit these results’ direct application to our real-world FNA (section 4), they suggest that reconstruction skill is generally underestimated by this assessment—presumably because of the low number of degrees of freedom available for calibration and/or verification. Note, however, that the FNA samples are not independent (two neighboring verification periods share 95% of data in common), so their size and spacing must reflect a compromise between sample size and the degree of overlap. We find that 30-yr holdout blocks advocated by McShane and Wyner (2011) are even less suited to the estimation of LF skill, and that increasing the spacing to 5 yr does not qualitatively affect our results. Overall, the chosen block size (45 yr) and spacing (2 yr) were found by trial and error to yield the least biased assessments of skill, although this bias remains strong. Finally, using an unscreened proxy network yields almost identical results, indicating that the short screening period used in FNA is not responsible for the lower LF skill.
In this first part, we have set the stage for multiproxy reconstructions of tropical Pacific SST (Niño-3.4 index) over the past millennium. Using a network of accurately dated paleoclimate proxies back to A.D. 1000, we presented two largely independent methodologies (RegEM and CPS), which were validated by systematic cross-validation exercises and (for RegEM) pseudoproxy studies.
The frozen network experiments raised several key points: first, all methods used here are vulnerable to noise in the data, so adding more proxy predictors does not necessarily reduce out-of-sample prediction error as measured by the RE, CE, or R2 statistics. Efforts are underway to devise more robust and accurate statistical methods tailored to this problem. Second, we found LF (decadal and longer) variability inherently harder to reconstruct than HF (interannual) variability, consistent with previous studies (Gergis et al. 2006; Braganza et al. 2009; W10). Although we found that in all cases skillful LF reconstructions could be obtained back to A.D. 1200 with a HadSST2i or ERSSTv3 target (A.D. 1000 with a Kaplan target), FNA skill scores are far from uniform in time and are highly dependent on the instrumental target. This result is surprising and underscores the need for a better understanding of the divergence between SST data products at decadal to centennial time scales. Third, we found RegEM TTLS to be more skillful at reconstructing LF fluctuations than CPS, although HF skill was comparable between the two methods, with a slight advantage for RegEM. This leads us to favor the latter method in the millennium-long reconstructions of Part II. An important caveat of the frozen network framework is that it may compromise the quality of centennial-scale calibration by shortening the calibration and verification periods, thereby providing an overly pessimistic assessment of the quality of LF reconstructions based on longer training and verification periods. This is cause for concern, because cross-validation exercises of this type are currently the only way to evaluate the reliability of real-world climate reconstructions, and our results suggest that this may be very difficult with inevitably short calibration/verification periods. Furthermore, the frozen network exercise is inherently oblivious to the growth of proxy age uncertainties with time. Because slight offsets between proxies can readily cause destructive interferences between interannual signals, ignoring them will tend to overestimate HF reconstruction skill in preinstrumental times. However, this problem is alleviated at low frequencies given a dating control of 5 yr or better.
We further probed RegEM’s performance via a suite of realistic pseudoproxy tests using two different GCM simulations of the past millennium, one driven by best estimates of solar and volcanic forcing (LM) and the other driven by constant forcing (PI). The range of tropical Pacific variability spanned by these two cases was meant to bracket nature’s behavior, and indeed encompasses the behavior of the three instrumental Niño-3.4 time series considered in this study (Fig. 6). Our pseudoproxy design incorporates key aspects of the real network, including the decreasing availability of predictors back in time, and uses pertinent climate fields as input for the pseudoproxies. It does, however, make very stringent assumptions regarding the physical controls of each proxy, sometimes overestimating and sometimes underestimating their relationship to Niño-3.4. The case of most interest (SNR = 1.0) suggests that about 40% of the total Niño-3.4 variance (75% of LF variance) can be reliably reconstructed back in time, consistent with frozen network experiments, although with important variance losses at LF and a warm bias induced by the presence of a strong trend over the calibration interval. The large LF RE scores found in this case very likely constitute an upper bound on the real-world performance of our network and method. Discrepancies between FNA and PPEs are partially reconciled by considering the inability of FNA to approximate the true PP distributions (section 5c).
Nevertheless, important caveats limit the application of those results to the real world: 1) the spectral characteristics of each GCM’s simulated Niño-3.4 (none of which emulate instrumental estimates) and 2) the simplicity of the pseudoproxy design. The first point highlights the necessity of using as realistic a GCM as possible in pseudoproxy tests. Millennial integrations run as part of the PMIP3 project (see https://pmip3.lsce.ipsl.fr/wiki/doku.php/pmip3:design:lm:final) are bringing the most comprehensive models to bear on the simulation of the tropical Pacific response to natural forcing over the past millennium, and a similar analysis conducted over a wider ensemble of simulations would put our findings on a stronger footing. The second point highlights the importance of increasing the realism of pseudoproxy tests. To the best of our knowledge, this study is the first attempt to date at incorporating a realistic pattern of temporal availability in the pseudoproxy network; that being said, the univariate, white-noise model used here offers only a simplistic representation of the manifold processes whereby climate signals are incorporated into proxy archives. Should the spectral signature of the proxy genesis process assume a more complex form than our simple model, the reconstruction skill would be correspondingly affected between frequency bands, perhaps nonlinearly so. In addition, this pseudoproxy framework (as in all studies to date) assumes perfect age control, which is obviously an idealization. As in FNA, we expect dating uncertainties to negligibly impact LF skill, but deteriorate HF skill. A more rigorous framework for testing reconstruction methods using pseudoproxies consists of using forward, process-based proxy models, incorporating physical, chemical, and biological mechanisms of proxy genesis to simulate synthetic proxy records with realistic properties (Anchukaitis et al. 2006; Evans 2007; Cobb et al. 2008; Thompson et al. 2011; Mann et al. 2012). This field is only in its infancy, and will await further development before being operationally used for the validation of climate reconstruction methods.
To what extent do these results carry over to real-world reconstructions? Discrepancies between FNA and PPEs illustrate the difficulty of quantitatively assessing LF reconstruction skill: on the one hand, the pessimistic assessments of FNA are most likely excessive; on the other hand, LM may be too favorable a test case because of the relatively high-amplitude LF signals, which exceed that found in all observational datasets considered here. Overall the results suggest that RegEM TTLS can faithfully capture the phase of LF fluctuations, albeit with an amplitude that will inevitably be damped as proxy series drop out of the network—so our estimates are likely to be on the low end of possible centennial variability. It is noteworthy that verification skill depends critically on the instrumental target, which drives significant differences in reconstructed LF behavior. In Part II, we apply the methodologies described and evaluated herein to obtain a range of plausible histories of Niño-3.4 variability over the past 1000 years, before discussing some dynamical implications of the results.
We acknowledge funding from NOAA Grant NA06OAR4310116. The authors thank Scott Rutherford, Emanuele DiLorenzo, Judith Curry, Kettyah Chaak, Tapio Schneider, Julie Richey, David Black, Kevin Anchukaitis, Paige Green, Glenn Alexander, Athanassios Protopapas, and Toby Ault for providing data, technical help, and support.
Table A1 contains the characteristics of the proxy database that was obtained from NCDC.
Defined as the average SST anomaly in the region 5°N–5°S, 150°–90°W.
Ridge regression with a data-adaptive choice of regularization parameter was shown by Schneider (2001) to outperform truncated total least squares for monthly SST imputations. Such is not the case with paleoclimate reconstructions, as we shall see shortly.
This series is closely related to the NADA PDSI PC1 series used by Li et al. (2011) to reconstruct ENSO variance over time, the only difference being the latitudinal restriction.
Defined as an interval over which proxy availability stays constant.
We note that an error in model gridbox locations impacted conclusions of Mann et al. (2005, 2007b) bearing specifically on Niño-3 reconstructions (Smerdon et al. 2010a), but it left the overall conclusions of the study unchanged (Rutherford et al. 2013). None of those results are relevant to the present study and so should not be used as a point of comparison.
To be consistent with our reconstruction choices, only the A.D. 1000–1995 period DJF averages will be analyzed here.