## Abstract

Are temperature proxy records linear recorders of past temperature conditions? A statistical test for linearity is applied to 15 millennial-long proxy records with an annual resolution that was shown to significantly respond to Northern Hemisphere annual mean temperature selected from a collection of 30 proxies. The test, based on generalized additive modeling, shows that most of the proxies can indeed be shown to be linear functions of annual mean temperature, but two proxy records do not appear to have a linear relationship with temperature—this supports the assumption of linearity in most climate reconstruction work. The method tests for nonlinearity in a proxy relative to the group of proxies with which it is being used together. The robustness of the results is tested, and it was found that the results are stable to the choice of proxies. The linearity-testing method is quite general and could in the future be used for larger and more extensive sets of proxies.

## 1. Introduction

Considering the fact that a systematic network of instrumental temperature measurements around the globe started as late as the second half of the nineteenth century, we are dependent on other types of information to understand temperature variability prior to that. Such information is essential to evaluate whether the recent global warming falls outside the range of the natural variability of the last one or two millennia in either magnitude or rate (NRC 2006; Jansen et al. 2007), as well as providing material that may be used to evaluate paleoclimate model output and the paleomodel forcings. Our knowledge of past temperature variability must be drawn from temperature-sensitive proxy data. Such data can be extracted from both historical records and from various natural recorders of climate variability, such as corals, fossil pollen, ice cores, lake and marine sediments, speleothems, and tree-ring width and density. A review of different types of temperature proxy data is given in, for example, Bradley (1999) and Jones et al. (2009), and the availability of millennia-long temperature proxy records in the Northern Hemisphere is synthesized in Ljungqvist et al. (2012).

Multiproxy temperature reconstructions are based on training a model on datasets that include both observations and proxy data. Then, using the trained model on proxies alone, a model-based value for the observational quantity is extracted. Most current reconstruction methods used are linear—a linear proportionality between a signal in the proxy and in the real world is assumed. But do we know a priori whether the relationship actually is linear? We expect that linear methods will produce better reconstructions if the proxies are linear temperature recorders. While knowledge of the biological and geological systems that lay down the proxy records helps to understand which systems are linear (Frank et al. 2010), it is of interest to have quantitative tests for linearity.

There has been an increasing awareness that climate proxy records do not always show a linear relationship to temperature (Tingley et al. 2012). This has been most evident in the field of dendroclimatology, where the so-called divergence problem, the fact that some high-latitude tree-ring records show a lessened or even negative response to higher temperatures in the late twentieth century, has been studied by numerous researchers (Andreu-Hayles et al. 2011; Briffa et al. 1998a,b; D’Arrigo et al. 2008; Loehle 2009; Wilson et al. 2007; and the references therein). The nonlinear properties of some tree-ring width records have in pioneering studies been assessed by, for example, Carrer and Urbinati (2001), Fritts (1976), Graumlich and Brubaker (1986), and D’Arrigo et al. (2004). The nonlinear response of some temperature-sensitive proxy records to temperature has also been brought to attention regarding low-resolution proxy archives such as chironomid, diatom, and pollen. Nonparametric methods have been applied for reconstructing temperature from such proxies by, among others, Birks (1995), and Birks et al. (2010), and Bayesian reconstruction techniques that feature nonlinear modeling have been used by Haslett et al. (2006), Korhola (2002), Toivonen et al. (2001), and Vasko et al. (2000).

We present here methods and tests that can reveal the nonlinear relationship between proxies and temperature, so-called nonparametric methods. These methods are well known in statistics—see, for example, Teräsvirta et al. (2010, chapter 10)—but only occasionally utilized in paleoclimatic research. By using a nonparametric method, the aim is to avoid assumptions on the parametric form of the relationship in question. We let the data “speak for themselves,” enabling us to find a function that describes the available data well. This is in contrast to parametric modeling, where a specific model with parameters is assumed to generate the data in question. One such method in climate reconstruction is the simple “direct regression” method (pioneered by Groveman and Landsberg 1979). In such a parametric model, it can be easy to do inference, and great gains in efficiency are possible, however, only if the model is (almost) true. If the assumed model is incorrect, then inferences can be useless, leading to misleading interpretations of the data. Nonparametric models provide a simple way to find structures in datasets without imposing a parametric model, and it is also possible to test whether relationships are linear.

In this paper we will examine two such models, a nonparametric additive model and a semiparametric additive model, that allow for some linear and some nonlinear regressors. For a purely illustrative purpose, we will compare these two methods with reconstructions using the simple direct regression model. The rest of the paper is organized as follows. In section 2 we give a brief overview of the nonparametric models and the corresponding estimation methodology. In section 3 we present the proxy data and temperature series used, while section 4 reports the results of the fitted models and tests of linearity. Finally, section 5 presents a summary. In the appendix we demonstrate the additive nonparametric method in a test example to show that it works on realistic data with known properties.

## 2. Methods

### a. Linear regression model

In the multivariate regression—used, for instance, in Groveman and Landsberg (1979)—a Northern Hemisphere (NH) mean temperature *Y* is expressed as a linear sum over *d* selected proxies *X*, as follows:

“Training,” to determine the coefficients (*α*, *β _{j}*), is performed during an interval where both

*Y*and the

*X*’s are available, and the determined coefficients are then used to build a model for use at all other times. Assuming data stationarity is central to this step. This method is still in some use although several alternatives exist; see Christiansen et al. (2009) for a current review. We stress that it is used here for illustrative purposes only.

### b. Nonparametric regression models

The aim of nonparametric models is to relax assumptions on the form of a regression function, and to let the data search for a suitable function that describes the available data well. These approaches are powerful in exploring fine structural relationships and provide very useful diagnostic tools for parametric models.

In a multivariate regression problem, we want to study the relationship between the response variable *Y* and the vector of covariates **X** = (*X*_{1}, … , *X _{d}*)

^{T}via

It is useful to model the unknown regression function *m*(**x**) additively, that is,

Usually an intercept term is added, that is, E(*Y*) = *α*. This gives us the following additive nonparametric model:

where *m*_{1}, … , *m _{d}* are unknown univariate functions, E(

*e*) = 0, Var(

*e*) =

*σ*

^{2}, and

*e*is independent of the vector of covariates

**X**. To ensure identifiability,

*m*

_{1}, … ,

*m*are required to satisfy

_{d}which implies that E(*Y*) = *α*.

Estimation of the unknown functions *m*_{1}, … , *m _{d}* is done by the backfitting algorithm, introduced by Breiman and Friedman (1985) and Buja et al. (1989). Note, first, that if the additive model, (3), is correct, then

This relationship suggests an iterative procedure for the estimation of the unknown functions. Thus, for a known constant *α* and given functions *m _{j}*,

*j*≠

*k*, the function

*m*can be estimated by a univariate regression fit based on the observations ,

_{k}*i*= 1, … ,

*n*, where is the

*i*th observation of the

*k*th additive variable. Denote the univariate smoother of

*m*by

_{k}*S*. The algorithm works as follows:

_{k}Step 1. Initialization: for

*k*= 1, …*d.*Step 3. Repeat step 2 until convergence.

The idea behind this algorithm is to carry out a fit, calculate partial residuals from that fit, and refit again, which is why the iteration scheme is called backfitting. The starting functions can be obtained in various ways, for example, from a linear regression fit of *Y* on the covariates *X _{k}*. The smoothing operator

*S*can be other nonparametric regression estimators, such as kernel methods—see, for example, Fan and Gijbels (1996).

_{k}This way of modeling nonlinear relationships between a response and several predictors has been extensively used within the statistical community for some years. In fact, the additive model above is a version of a wider model called generalized additive model (GAM); see Hastie and Tibshirani (1990). Here the conditional mean (*m*(**X**)) of a response *Y* is related to an additive function of the predictors via a link function *g* as follows:

See also Hastie and Tibshirani (1990) and Hastie et al. (2009) for further technical details regarding the backfitting algorithm and nonparametric regression models. An alternative to the backfitting algorithm is the marginal integration method; see Tjøstheim and Auestad (1994).

In the linear regression model, each regressor represents 1 degree of freedom (Df)—in the additive model, more degrees of freedom are used up; see Hastie and Tibshirani (1990)—it is therefore generally advantageous to consider models that contain both linear and nonlinear terms; these are called semiparametric models. The methods are implemented in R (see www.r-project.org), and the software used in this paper is available on request.

## 3. Data

A wide range of different kinds of proxies with annual resolution have been used in this study: one speleothem microlayer thickness record, three ice-core *δ*^{18}O records (one of which is a composite of three individual cores; see below), seven varved thickness sediment records, 13 tree-ring width records, four tree-ring maximum latewood density records, one tree-ring height-increment record, and one tree-ring *δ*^{13}C record. Information about each of these 30 proxy records is given in Table 1, such as the name of the proxy, latitude and longitude, type of proxy, season bias, dating uncertainty, and reference to the original publication. We only use proxies with annual resolution—this places the proxies on a comparable basis to the instrumental data they are calibrated against but excludes most presently available millennia-long proxies.

A general problem for any reconstruction is the risk of “overfitting”—that is, that we have so many proxies for which we have to find a coefficient or, in the case of GAM, an additive smoothed model term, that we are using up all or almost all the degrees of freedom and therefore can, in the extreme case, explain all the variance in the calibration dataset and at the same time have a limited or even reduced explanatory power in independent data. As we have restricted our proxies to those with annual resolution, we have eliminated many other, long, series and thereby minimized the risk of overfitting.

We test the set of 30 proxies for relevance by screening them in terms of how well they correlate to both the Northern Hemispheric mean temperature and the local temperatures; see Table 1. As noted by Juckes et al. (2007), choosing proxies by screening their correlation to temperature is not an entirely unproblematic approach. Even when screening proxies to temperature, there still exists a risk that poor proxies are included because of insufficient instrumental temperature data for the screening process. The uncertainty estimation becomes more problematic if temperature measurements have been used already in the data selection. We found that 15 of the proxies are both significantly correlated (standard two-sided *t* test, *p* = 0.01) to global and local temperatures, and these form the set we use in the subsequent analysis. We use a two-sided test since, a priori, we do not know if the proxy responds with an increase or a decrease when the temperature rises. Most temperature proxy records have temporal and seasonal limitations (Bradley 1999; Bradley and Jones 1992; Jones et al. 2009). The proxies have different optimal season response and few proxies are likely to fully reflect annual mean temperature (Ljungqvist 2010). The response to the proxies’ optimal season can be higher than the correlation to annual mean temperature used here.

Most of the series we have here are continuous, but the series for Iceberg Lake and Jämtland have gaps, and the Greenland composite and the southern Sierra Nevada end before 1990—the missing values, in the standardized series, are replaced with 0’s. This is an unacceptable practice were the reconstructions to be used as such, but they are calculated and shown in this paper only to help illustrate the nonlinearity testing method. To train the models, we use the Northern Hemisphere annual mean temperature data from the 5° × 5° gridded Hadley Centre and Climate Research Unit surface temperature, version 3 (HadCRUT 3v) dataset from Brohan et al. (2006).

## 4. Results

In this section we use the nonparametric additive method to reveal whether nonlinear relationships exist between 15 selected proxies and the mean NH temperature. We also perform a multiproxy temperature reconstruction based on a multilinear regression, a nonparametric model, and a semiparametric model based on observed values of the proxies from a.d. 800. It is well known that low-frequency variability is underestimated by the multilinear regression method (Christiansen et al. 2009), so we stress that the reconstruction is performed to have a reference against which the nonlinearity test can be viewed—the direct reconstruction is not presented here for other uses.

To calibrate our models, we use annual observations of the NH mean temperature data from the 5° × 5° gridded HadCRUT3v dataset (Brohan et al. 2006) and the proxies from 1850 through 1969—that is, 120 observations—which is the calibration period. The NH temperature is centered over the calibration period. The proxies are centered and normalized with the mean and standard deviation from the calibration period. This choice of calibration period ensures that there is no missing data in any of the proxies used; that is, there is no use of zero padding in the calibration of the models.

We first fit the classical multiple linear regression model to our data, also known as the direct global method, that is, a regression with the NH temperature as the response and selected proxies as predictors. The results are given in Table 2. Here the estimated coefficients and corresponding *p* value for the significance test is reported.

Next, we fit a nonparametric model, that is, the model from Eq. (3), to the proxies selected above. We use the software program R, and the function gam() from the package gam, with default settings; that is, we use the identity link function and smoothing splines with 4 degrees of freedom—see, for example, Green and Silverman (1994)—as the smoothing operator. An approximate *F* test is used to evaluate the significance of the nonlinearity, to determine whether including the nonlinear component of each smooth term in the model resulted in a significantly better fit than a linear relationship. Although the test statistics do not have exact or even asymptotic *F* distributions, Hastie and Tibshirani (1990) report that simulations show them to be useful approximations. The test results for nonlinear effects are shown in Table 2 and show two significantly nonlinear proxies: the tree-ring width records Indigirka and Yamal. Figure 1 shows the fitted functions for each proxy, and some of them seem to have a nonlinear relationship with the temperature; however, only two of them are found significantly nonlinear. The dashed curves in each plot are the upper and lower pointwise twice-standard-error curves. The vertical marks along the bottom of each graph illustrate the distribution of the values of the proxies. Further details regarding the nonlinear effects test are found in Hastie and Tibshirani (1990).

We compare these two models, that is, the linear model and the nonparametric model, by a comparison test; see Table 3. The model comparison test is an approximate *F* test, where the test statistic is

where RSS_{i} is the residual sum of squares for model *i* and DF_{i} is the approximate degrees of freedom for model *i*; see Hastie and Tibshirani (1990) for further details. We see that the reduction in RSS for the nonparametric model compared with the linear model is not significant.

Since we have found that 2 of the 15 proxies relate nonlinearly to temperature, we fit a semiparametric model, that is, the other 13 proxies are modeled linearly. We thus also reduce problems with overfitting. The results for the semiparametric model are seen in Table 2; that is, the same nonlinear effects test as above is performed for the two proxies. Again, the tests indicate that Indigirka and Yamal relate nonlinearity to temperature. Figure 2 shows the estimated linear and nonparametric curves.

We then compare the explanatory power of the linear and the semiparametric model in a comparison test as above (Table 3). We find that the reduction in residual sum of squares *is* significant, indicating a better fit for the semiparametric model over the linear model.

Finally, we calculate the in-sample correlation (i.e., during the training interval) of the three methods and the NH temperature; see Table 4. These nonparametric methods give better in-sample fit than the linear method.

### a. Robustness test

The results regarding proxy nonlinearity are interpretable in a relative sense—that is, the nonlinearity is present in a proxy with respect to the other proxies, and a natural question is whether the list of nonlinear proxies detected will be different if a different set of proxies is used. We test this in a very simple way, by removing one proxy at the time and recalculating, noting which proxies test positive for nonlinearity, then replacing the omitted proxy, and going on to the next one. Thus, 15 tests are performed. Table 5 shows the results. Evidently there are two proxies that overwhelmingly test positive for nonlinearity—Indigirka and Yamal—while a handful of others now and then also pass the test, depending on which proxy is omitted in the leave-one-out analysis. This gives a robust indication that the two proxies are special with respect to the other proxies as a group.

### b. Reconstruction uncertainties

The reconstructions presented here are for comparative purposes only, but it is of interest to see how specific they are given the proxies used and whether they are individually different once a picture of their uncertainties are at hand. To obtain this information, we perform bootstrapping with replacement on the observations. This is a well-known procedure that is occasionally coming into use in climate reconstruction circles (Till and Guiot 1990; Guiot et al. 2005). We performed 1000 resamples, performed the training and reconstruction each time, and compiled the reconstructions. In the end we had 1000 values for each year of the reconstruction period and found the lower 2.5% and the upper 97.5% percentiles and plot these, along with the reconstructions, in Fig. 3.

The reconstructions are from a.d. 800 to 1990, to minimize problems with missing proxy data. However, for these reconstructions, zero padding in the standardized proxy series is applied for missing observations in the Greenland composite (1974–90), the southern Sierra Nevada (1989–90), Jämtland (888–908), and the Columbia Icefield (800–949). As already mentioned, such a replacement of missing data may affect the reconstructions. To estimate such an effect, we have excluded the Columbia Icefield tree-ring record in a separate analysis, as this is the proxy with the most missing data. That is, first, we have used the other 14 proxies and the mean NH temperature to calibrate three models: a complete linear, a nonparametric, and a semiparametric model with Indigirka and Yamal as nonlinear. Second, reconstructions are performed with these three models. A comparison between these reconstructions (with 14 proxies) and the reconstructions in Fig. 3 (with 15 proxies) indicates that the zero padding has almost no effect on the linear and semiparametric reconstructions, and just a small effect on the nonparametric reconstruction, which leads us to believe that even though zero padding has been applied, the reconstructions in this case seem relatively robust.

We note that the intervals of uncertainty in Fig. 3 are quite wide—so wide that here is hardly any difference between the reconstructions—but particularly wide in the case of the fully nonlinear method. This is consistent with the possibility of overfitting in the training interval. Note the much larger uncertainty, relatively, for the period near a.d. 1000 in the nonlinear method.

## 5. Summary and discussion

We have introduced testing for nonlinearity in proxy-based temperature reconstructions and a nonlinear reconstruction method. An alternative to the approximate test for nonlinear effects we have used here is the generalized likelihood ratio (GLR) test developed by Fan and Jiang (2005). By using such a test, it is possible to test a parametric null hypothesis—for example, that the mean NH temperature is a linear model of the available proxies—against an alternative, not a linear model. We note that even though the asymptotic null distribution of the GLR statistic is available, in finite samples one would resort to conditional bootstrap methods to obtain the null distributions. We therefore defer application of this test for future research.

We show that these nonparametric methods do give better in-sample fit and further, we should expect that temperature reconstructions utilizing such methods also could be more correct. Of course, we should have in mind problems with overfitting when using nonlinear models.

Furthermore, autocorrelation in the residuals can cause problems during reconstructions, such as biased level or biased uncertainty estimates. This problem may still be present, and in future research we aim to control for this fact by introducing autoregressive error terms. Also, using a better selection procedure for relevant proxies for the nonparametric additive regression model would be of interest, but we note that inference in additive models is not well developed—see, for example, Fan and Jiang (2005).

Our analysis shows that relative to the present group of proxies, two are nonlinear relative to the NH mean temperature. The series are the tree-ring width records Indigirka and Yamal. One possible physical explanation for the nonlinear behavior of the Yamal tree-ring width record (Briffa 2000; Briffa et al. 2008) may be the observed change in growth form of the trees in this region during the latter part of the twentieth century caused by the warmer and thus more favorable growth conditions (Devi et al. 2008). Concerning the Indigirka tree-ring width record (Moberg et al. 2006), only the last 600 yr of the record have actually been published in a specific article that critically evaluates the record (under the name Yakutia) (Hughes et al. 1999). It is interesting to note, however, that the correlation between the two records only amounts to 0.77 over the period a.d. 1400–1993, which they both have in common.

That only 2 out of 15 proxies tested positive for nonlinearity is support for the general assumption that temperature proxies can be used in linear reconstruction attempts. Loehle (2009) discusses how nonlinearity can come about in the context of trees’ response functions; it is evident how the response can be not only nonlinear but monotonic and also bivalued, although then with oppositely signed response rates. Our own result for Yamal seems to be of the former kind (Devi et al. 2008), while the result for Indigirka hints at the latter (or an even more complex case). We note that data nonlinearities do not just arise due to direct causes affecting trees and their environment but can also be due to mundane things like data collection issues (Esper et al. 2012). None of the nonlinearities we have detected was found in other forms of temperature proxies.

To validate the models, we would like to introduce cross validation on data not used to train the model—a common and important procedure in the reconstruction field but difficult to carry out due to the scarcity of independent data. “Cross validation,” also known as “leave one out,” validation is a possibility but is complicated by the presence of autocorrelation in the data.

Finally, testing reconstruction methods on ensembles (Christiansen et al. 2009) of pseudodata allows one to find not only the scatter of the reconstructions but also the bias. The use of the bootstrap routine is, however, a very economical way of extracting information on the scatter of a reconstruction from limited data. Further research should also include comparing nonparametric reconstruction methods with the more commonly used reconstruction methods, such as the iterative regularized expectation maximization (RegEM) method (Dempster et al. 1977; Schneider 2001; Mann and Rutherford 2002; Smerdon and Kaplan 2007), and also examining their performance in an ensemble study as in Christiansen et al. (2009).

## Acknowledgments

We wish to express our gratitude to the scholars that provided us with proxy data that are not available from public databases: Christophe Corona, CNRS/Aix-Marseille Universit, and Ivan A. Kalugin, Siberian branch of the Russian Academy of Sciences. This work was supported by the Danish Climate Centre at the Danish Meteorological Institute. We thank the reviewers, who helped us improve this manuscript.

### APPENDIX

#### An Application of Nonparametric Regression to an Artificial Case

In this appendix we document the application of the additive nonparametric model to an artificial case, that is, we specify a nonlinear model for some of the proxies and then check whether the method detects this model. For simplicity, we assume an additive model consisting of just three standardized proxies, as follows:

where the three proxies *X*_{1}, *X*_{2}, and *X*_{3} are the Southern Colorado Plateau (as its coefficient is found significant in the linear model), Indigirka, and Yamal (as these are found significantly nonlinear), respectively. We further assume that the regression function for the Southern Colorado Plateau is linear (i.e., *m*_{1} = 0.06*X*_{1}), the regression function for Indigirka is nonlinear [*m*_{2} = 0.1 cos^{2}(*X*_{2})], and the regression function for Yamal is also nonlinear (). The functions are chosen to resemble the calibrated functions from the semiparametric model.

By using the calibration data from 1850 to 1969 for these proxies, we generate the artificial “temperature” *Y* from the model above. In addition, we add noise (*e*), which is from a normal distribution with zero mean and standard deviation equal to 0.15, to the artificial temperature. We then fit a nonparametric model to these data. The result from the estimation is shown in Table A1 and Fig. A1. The test correctly detects *m*_{2} and *m*_{3} to be nonlinear, and rejects *m*_{1} as nonlinear. In the figure the estimated functions are plotted as dashed curves, and the true underlying functions are plotted as solid curves. Clearly, the method is capable of detecting the true underlying functions, but not perfectly, mainly due to the added noise. In the bottom-right plot, the points are the artificial temperature *Y* with noise. Clearly, they deviate quite a bit from the solid curve, which is the artificial temperature *Y* without noise. The dashed curve is the reconstruction obtained from the calibrated nonparametric model. The reconstruction is quite close to the true curve, and we conclude that the method works.

## REFERENCES

_{2}on a millennium-long tree-ring carbon isotope record