## 1. Introduction

Christiansen (2011) introduced a climate reconstruction method called LOC (local method), which is designed to avoid underestimation of low-frequency variability. The method is based on three steps:

- The proxies are screened so that proxies without significant relation to the local temperature are excluded.
- The local temperatures at the locations of the remaining proxies are reconstructed using one-dimensional linear regression with the local temperature as the independent variable.
- The reconstructed local temperatures are averaged (with weights equal to the cosine of the latitudes) to get the extratropical Northern Hemisphere (NH) mean.

The LOC method was tested in a series of realistic pseudoproxy experiments and it was confirmed that low-frequency variability such as preindustrial level, trends, and long-lasting warm periods were indeed well reconstructed. However, the preservation of low-frequency variability comes with the cost of an increased high-frequency variability.

In their comment, Tingley and Li (2012) acknowledge both the importance of relating proxies to local temperatures and the importance of choosing the proxy as the dependent variable. However, they raise questions about the behavior of the regression model and claim that Christiansen (2011) did not consider the uncertainty of the LOC method. They also strongly advocate for a hierarchical Bayesian approach for estimation of uncertainty. While it is true that the chosen regression model (step 2 above) is ill behaved for small slopes, it is for the present purpose superior when proxies without correlations to the local temperatures have been eliminated (step 1 above). In Christiansen (2011) the uncertainty was not considered by “propagation of errors.” Instead the skill of the reconstructions was evaluated in a series of ensemble pseudoproxy experiments; a method that in this context is better than the suggested Bayesian approach as it can also address the problem of biases introduced by model choices. Christiansen and Ljungqvist (2011) used the pseudoproxy approach to provide confidence intervals for the reconstructions. We will deal with these points in more detail below.

## 2. Variance of the reconstruction

*P*denotes a proxy and

*T*denotes the corresponding local temperature. We assume that both have been centered. The method referred to in Christiansen (2011) as indirect regression and used in step 2 of LOC,and the method Christiansen (2011) referred to as direct regression,are often called classical calibration and inverse calibration, respectively. Here

*x*⊥

*y*means that

*x*and

*y*are independent. Predictions are given by the estimators

*T*=

^{I}*P*/

*λ*and

^{I}*T*=

^{D}*λ*, respectively. Thus, classical calibration (indirect regression) has

^{D}P*T*as the independent variable, while inverse calibration (direct regression) uses

*P*as the independent variable.

Tingley and Li (2012) worry about the behavior of *T ^{I}* when

*λ*is close to zero. This is a valid consideration that has already been treated comprehensively in the older literature about calibration. The relative merits of classical and inverse regression have been discussed in Krutchkoff (1969), Williams (1969), Shukla (1972), and Brown (1982) and have more recently been reviewed by Osborne (1991).

^{I}*ξ*

_{eq}is noise with variance

*σ*

^{2}. The slope is found by the method of least squares. It then follows from the discussion in the literature that for classical calibration the prediction estimator,

*T*, has infinite mean square error (mse) if all slopes,

^{I}*λ*, are considered. On the other hand the inverse estimator,

*T*, is always biased even for an infinite sample size if the predictor,

^{D}*P*, is different from zero. If only slopes significantly different from zero are considered (Shukla and Datta 1985), the classical estimator,

*T*(now known as the truncated classical estimator), has a lower mse than the inverse estimator,

^{I}*T*, when the predictor is not close to zero. The reason is that for the inverse estimator,

^{D}*T*, the bias grows fast when the predictor moves away from zero. Shukla (1972) gives expansions to first order in 1/

^{D}*n*(where

*n*is the sample size) of the mse, bias, and variance for both the truncated classical estimator and the inverse estimator when

*σ*/

*λ*is small. More fundamentally, it can be shown that an unbiased estimator with finite variance does not exist (Williams 1969).

As shown previously, and most systematic in Christiansen et al. (2009), the main problems of many climate construction methods are systematic biases (see also the next session). For the reconstruction problem it therefore is sensible to avoid the bias by using truncated classical calibration. The contribution from the variance to the mse can be dampened by spatial and temporal averaging as discussed in Christiansen (2011) (and also in the older literature under the subject of repeated measurements). It is the truncated classical calibration that is used in LOC as described above. Tingley and Li (2012) suggest a Bayesian approach where the problems for small *λ* are alleviated by adopting an informative prior. This is indeed an alternative that fits well into a more systematic approach to the reconstruction problem. However, the truncated classical estimator is well studied and its free parameter (the threshold of the screening in step 1) is easily interpreted. Without serial correlations in the proxies and local temperatures the screening can be done with a *t* test. When serial correlations are present (reducing the number of degrees of freedom) a Monte Carlo method (Christiansen 2000; Christiansen and Ljungqvist 2011) can be applied. The threshold of the screening (the *p* value) must be chosen small enough to remove proxies with only spurious relations to the local temperature. In Christiansen and Ljungqvist (2011) it is shown that the reconstructions are not very sensitive to the choice of threshold.

We illustrate the differences between classical and inverse regression by a simple example. This example highlights the adverse effects of using a model with bias. It also demonstrates why indirect calibration may predict low-frequency variability well, while high-frequency variability is exaggerated.

We assume that the physical process generates the data according to Eq. (3). We draw a sample of 100 pairs (*P*, *T*) for calibration and a single pair (*P*_{new}, *T*_{new}) where *T*_{new} is to be predicted from *P*_{new}. We assume that the *T* is Gaussian with unit variance, *λ* = 1, and *ξ*_{eq} is Gaussian with variance *P* and *T* around 0.4 (Christiansen and Ljungqvist 2011). Keeping *P*_{new} fixed and using the equations above we calculate the estimators *T ^{D}* and

*T*, which can be compared to the known target

^{I}*T*

_{new}. Repeating this 1000 times we can calculate the mse and the bias for the predictions for the chosen value of

*P*

_{new}.

We repeat the calculation of the mse and the bias for different values of *P*_{new} in the interval [0, 5]. Figure 1 shows the mse and the bias as function of *P*_{new}. For the classical estimator the mse is large but increases only slowly with *P*_{new}, while for the inverse estimator the mse is small for small *P*_{new}, but increases fast as *P*_{new} increases (top-left panel). For *P*_{new} larger than about 2.8 the mse for inverse calibration (direct regression) exceeds that for classical calibration (indirect regression). For the inverse estimator, *T ^{D}*, the mse is dominated by the bias, while for the classical estimator,

*T*, the mse is dominated by the variance (top-right panel). This is an important difference that suggests that the mse for the classical calibration (indirect regression) can be reduced by averaging, while the mse for the inverse calibration (direct regression) cannot. This is demonstrated in the bottom panels where

^{I}*T*

_{new}is predicted as the average of 10 repeated measurements. We now find that the mse for the classical estimator,

*T*, has decreased drastically compared to the unaveraged case, while the mse for the inverse estimator,

^{I}*T*, has not. It is the small bias and the large mse that allows indirect regression (as in the LOC method) to correctly estimate low-frequency variability, while high-frequency variability is overestimated.

^{D}## 3. Uncertainty

In their comment Tingley and Li (2012) claim that the uncertainty was not considered in Christiansen (2011). However, most of Christiansen (2011) was devoted to the study of the skills and uncertainties of the LOC method.

The skills were estimated by using an ensemble pseudoproxy approach based on a long experiment with a forced climate model. This approach allows the comparison of the reconstructions to the known targets. The ensemble pseudoproxy approach generates a distribution of the differences between the reconstructed diagnostic and that of the target. This distribution includes the effects of the stochasticity in the reconstruction process. Summarizing statistics such as bias, variance, and mse can then be calculated from the distribution [Figs. 4, 6, 8, and 12 in Christiansen (2011)]. In Christiansen and Ljungqvist (2011) the pseudoproxy approach was used to provide confidence intervals for the reconstructions.

As mentioned in the previous section the main problems of many climate construction methods are systematic biases (Christiansen et al. 2009). These biases are found in diagnostics such as preindustrial level, trends, and amplitude of low-frequency variability [as in the strong previous warming experiments of Christiansen et al. (2010a) and Christiansen (2011)]. For such diagnostics the bias is often more important than the variance—sometimes the target lies completely outside the whole distribution of the modeled diagnostics.

Therefore, the possibility to assess the bias is of utmost importance. This possibility exists for the ensemble pseudoproxy approach but not for the Bayesian approach where the target is unknown. It is because the bias has been shown to be a major problem for climate reconstructions that the ensemble pseudoproxy approach is preferable.

The hierarchical Bayesian approach is systematic and can span the different steps in the reconstruction process. However, so can the ensemble pseudoproxy approach. In Christiansen (2011) the pseudoproxies are generated with the same linear relation to the local temperatures as the original proxies. The whole reconstruction process is then repeated in all detail including the screening for significant proxies, the parameter estimation, and the spatial averaging of the reconstructed local temperatures. Other steps can also be incorporated; for example, Christiansen and Ljungqvist (2011) give an example where the in-filling of missing values in the observed temperatures is included.

The bias in the reconstructions can originate from using a wrong model. Two aspects are probably most important here—the regression model and the covariance model used for calculating the spatial mean. Possible regression models are direct regression, indirect regression, errors in variables, multivariate regression, etc. Potential covariance models include the simple mean used in LOC, empirical models based on a truncated set of empirical orthogonal functions, and theoretical covariance functions such as those discussed in Tingley et al. (2012). However, spatial covariance patterns depend strongly on the time scale considered and it may be impossible to capture the low-frequency structure from annual values. An extreme case is found in sea level reconstructions (Christiansen et al. 2010b) where the low-frequency variability (trend) would be extremely underestimated if the covariance structure was based on 10–20 years of data. In this case an additional constant pattern is needed to capture the trend. In fact, Christiansen et al. (2010b) found that the simple mean in many ways outperform the sea level reconstructions based on empirical orthogonal functions. This was the argument for using the simple mean in LOC (Christiansen 2011). The fact that some local reconstructions are more reliable than others can easily be incorporated by using weights proportional to the correlations between the proxy and local temperature (Christiansen and Ljungqvist 2011).

We illustrate the limitations of the Bayesian approach by a simple example. We draw a sample of 100 pairs (*P*, *T*) for calibration from Eq. (3) and we wish to predict the *T*_{new} when *P*_{new} = 3. Other parameters are as in the previous section. For the two models in Eqs. (1) and (2), we get the marginal conditional probability densities of *T*_{new} after integrating *λ* and *λ* and the variances. It is obvious (Fig. 2) that the model based on direct regression (inverse calibration) has a large bias and that the probability distribution hardly covers the true value. The model based on indirect regression (classical calibration) is centered on the true value but has a larger variance. This illustrates that the ability to propagate the errors in the Bayesian approach does not include the bias resulting from a wrong model choice.

## 4. Further considerations

In their comment Tingley and Li (2012) relate the LOC method to the Bayesian schemes used by Tingley and Huybers (2010) and Li et al. (2010). Such comparisons are always interesting and may provide insight into the behavior of the models.

Tingley and Li (2012) show that LOC is a special case of the method in Li et al. (2010) with the caveat that the latter calibrates against NH mean temperatures rather than local temperatures. They also mention that the advantage of reconstructing local temperatures instead of directly targeting the NH mean temperature is unclear. This is, however, an important difference. For a simple illustration of the differences let the average temperature *m* stations *T _{i}*,

*i*= 1 …

*m*. Let the local temperatures depend on the average temperature as

*P*=

_{i}*c*+

_{i}T_{i}*ξ*,

_{i}*ξ*⊥

_{i}*T*. LOC then reconstructs

_{i}*a*change—reflecting a change in the covariance structure of the temperature—the alternative method will distort the weightings of the local temperatures. Such distortion is not found for LOC, but can be easily deduced also for other methods (e.g., the indirect global method of Christiansen et al. 2009). Furthermore, Christiansen and Ljungqvist (2011) show that correlations between the NH mean temperature and the proxies are much more dominated by the recent trend than correlations between the local temperatures and the proxies.

_{i}Both Tingley and Huybers (2010) and Li et al. (2010) used a model with noise on both the dependent and independent variables. This errors-in-variables model is unidentifiable under normality as discussed in Christiansen (2011) and also mentioned in Tingley and Li (2012). This means that the likelihood function will contain ridges of constant probability and that a maximum likelihood estimator cannot be uniquely determined. For the Bayesian approach this may mean that the priors will have considerable influence and that convergence of Monte Carlo methods may be delayed. The solution to this problem should be based on deliberate assumptions and not hidden somewhere in a complicated, hierarchical model with a huge number of parameters. The solution chosen by LOC is to ignore the noise on the temperature for reasons discussed in appendix B of Christiansen (2011). One reason is that the noise

## 5. Conclusions

When using indirect regression (classical calibration), as in step 2 in LOC, there is a risk of divergence for small slopes. This risk is avoided by including only proxies with correlations to the local temperatures that are significantly different from zero (step 1). Compared to direct regression (inverse calibration), indirect regression (classical calibration) has smaller mse when the predictor is not close to zero. The mse for indirect regression (classical calibration) is dominated by variance, while the mse for direct regression (inverse calibration) is dominated by bias. This means that the mse will decrease for indirect regression (classical calibration), but not for direct regression (inverse calibration) when the estimator is averaged or temporally smoothed.

Christiansen (2011) considered the skills and uncertainties of LOC-based reconstructions. This was accomplished by an ensemble pseudoproxy approach based on a climate model experiment. In the pseudoproxy approach the target is known and mse, variance, and bias can be estimated. The latter is of utmost importance as previous studies have shown that systematic biases are the main problem of many climate construction methods. The LOC method was found to reconstruct the low-frequency variability well with very little bias. In Christiansen and Ljungqvist (2011) the pseudoproxy approach provided confidence intervals for real-world reconstructions.

The pseudoproxy studies in Christiansen et al. (2009, 2010a), Christiansen (2011), and Christiansen and Ljungqvist (2011) included the effects of the screening of proxies, the parameter estimation, and the spatial averaging of the reconstructed local temperatures. Other sources of stochasticity can be included as well; for example, Christiansen and Ljungqvist (2011) include the effect of in-filling of missing values in the observed temperatures. Note, that the noise added to local temperatures to get the pseudoproxies is not white, but has the same power spectrum as the true residuals (Christiansen et al. 2009).

Tingley and Li have advocated for a hierarchical Bayesian approach in a series of papers. The Bayesian approach is systematic and flexible and allows for the “propagation of errors.” However, a main point here is that the Bayesian approach will not inform of us about the bias but only about the variance. This is because the Bayesian method is conditioned on the chosen model(s) and only gives us the spread about the model’s mean response. The ensemble pseudoproxy method gives us both the variance and the bias and is therefore the best method to estimate the errors for the reconstruction problem. The pseudoproxy method is not free of assumptions, though, the main assumption being that the climate model experiment and the pseudoproxies are realistic. Pseudoproxies are usually generated by degrading the local temperature, while real proxies might also respond to precipitation and therefore, through teleconnections, to temperatures in other regions. The influence of such complications on the conclusions drawn from pseudoproxy experiments needs to be explored.

This work was supported by the Danish Climate Centre at the Danish Meteorological Institute. I would like to thank Matthew Schofield, University of Otago, for directing me to the older literature about calibration.

## REFERENCES

Brown, P. J., 1982: Multivariate calibration.

,*J. Roy. Stat. Soc.***44B**, 287–321.Christiansen, B., 2000: A model study of the dynamical connection between the Arctic Oscillation and stratospheric vacillations.

,*J. Geophys. Res.***105**, 29 461–29 474.Christiansen, B., 2011: Reconstructing the NH mean temperature: Can underestimation of trends and variability be avoided?

,*J. Climate***24**, 674–692.Christiansen, B., , and F. C. Ljungqvist, 2011: Reconstruction of the extratropical NH mean temperature over the last millennium with a method that preserves low-frequency variability.

,*J. Climate***24**, 6013–6034.Christiansen, B., , T. Schmith, , and P. Thejll, 2009: A surrogate ensemble study of climate reconstruction methods: Stochasticity and robustness.

,*J. Climate***22**, 951–976.Christiansen, B., , T. Schmith, , and P. Thejll, 2010a: Reply.

,*J. Climate***23**, 2839–2844.Christiansen, B., , T. Schmith, , and P. Thejll, 2010b: A surrogate ensemble study of sea level reconstructions.

,*J. Climate***23**, 4306–4326.Krutchkoff, R. G., 1969: Classical and inverse regression methods of calibration in extrapolation.

,*Technometrics***11**, 605–608.Li, B., , W. D. Nychka, , and C. M. Ammann, 2010: The value of multiproxy reconstruction of past climate.

,*J. Amer. Stat. Assoc.***105**, 883–895.Osborne, C., 1991: Statistical calibration: A review.

,*Int. Stat. Rev.***59**, 309–336.Shukla, G. K., 1972: On the problem of calibration.

,*Technometrics***14**, 547–553.Shukla, G. K., , and P. Datta, 1985: Comparison of the inverse estimator with the classical estimator subject to a preliminary test in linear calibration.

,*J. Stat. Plann. Inference***12**, 93–102.Tingley, M. P., , and P. Huybers, 2010: A Bayesian algorithm for reconstructing climate anomalies in space and time. Part I: Development and applications to paleoclimate reconstruction problems.

,*J. Climate***23**, 2759–2781.Tingley, M. P., , and B. Li, 2012: Comments on “Reconstructing the NH mean temperature: Can underestimation of trends and variability be avoided?”

,*J. Climate***25**, 3441–3446.Tingley, M. P., , P. F. Craigmile, , M. Haran, , B. Li, , E. Mannshardt-Shamseldin, , and B. Rajaratnam, 2012: Piecing together the past: Statistical insights into paleoclimatic reconstructions.

,*Quat. Sci. Rev.***35**, 1–22.Williams, E. J., 1969: A note on regression methods in calibration.

,*Technometrics***11**, 189–192.