## Abstract

Two common approaches for estimating a linear trend are 1) simple linear regression and 2) the epoch difference with possibly unequal epoch lengths. The epoch difference estimator for epochs of length *M* is defined as the difference between the average value over the last *M* time steps and the average value over the first *M* time steps divided by *N* − *M*, where *N* is the length of the time series. Both simple linear regression and the epoch difference are unbiased estimators for the trend; however, it is demonstrated that the variance of the linear regression estimator is always smaller than the variance of the epoch difference estimator for first-order autoregressive [AR(1)] time series with lag-1 autocorrelations less than about 0.85. It is further shown that under most circumstances if the epoch difference estimator is applied, the optimal epoch lengths are equal and approximately one-third the length of the time series. Additional results are given for the optimal epoch length at one end when the epoch length at the other end is constrained.

## 1. Introduction

Quantifying the change in a variable over time is one of the most common calculations in climate science. Two standard techniques are 1) simple linear regression and 2) the epoch difference—that is, the difference between the average value over the end of the time series and the average value over the beginning of the time series. While simple linear regression directly provides an estimate of the linear trend over time, the epoch difference provides the net change between two periods. However, under the assumption that the underlying trend is in fact linear, one can divide the epoch difference by a characteristic time scale (discussed below) to also provide an estimate of the linear trend.

The summary for policymakers of the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) quantified the change in global surface temperature over the historical record using both linear regression and an epoch difference (IPCC 2013). Ding and Steig (2013) used both linear regression and epoch differences to estimate trends in observed surface temperature at stations in Antarctica. Deser et al. (2012) explicitly took both approaches to quantify their model’s climate response over the 2005–60 period. They used 10-yr epoch differences and linear regression and commented that both methods produced similar results. However, it is not readily apparent whether both methods are necessary or whether the two methods will always yield similar results when the underlying trends are indeed linear.

In some cases, even if the trend is expected to be linear, simple linear regression is not an option. For example, phase 3 of the Coupled Model Intercomparison Project (CMIP3) requested that daily data be archived for specific 20-yr periods over the twenty-first century (e.g., 2046–65 and 2081–2100) (Meehl et al. 2007), and, thus, large gaps in the time series made linear regression impossible. In such situations, epoch differences are an obvious alternative. But what length of epoch should be used? One finds a wide range of epoch lengths in the literature for assessing changes in climate over the twentieth and twenty-first centuries—for example, 5 years (e.g., IPCC 2007), 10 years (e.g., Deser et al. 2012), 20 years (e.g., Collins et al. 2013), and 40 years (e.g., Kidston and Gerber 2010). Furthermore, in some studies, the epochs cover only the ends of the time series (e.g., Deser et al. 2012), while in others the epochs split the time series in half (e.g., Ding and Steig 2013).

Here, we pose two questions for the case of a linear trend:

Which method is better for trend estimation: simple linear regression or the epoch difference?

What is the optimal length of an epoch?

We demonstrate that under most circumstances, simple linear regression is preferred to an epoch difference in estimating a linear trend. We further demonstrate that if an epoch difference estimator is used, the optimal epoch length is approximately one-third the length of the time series when the memory is small.

Climate trends are often described by their linear component, and many variables (e.g., temperature and precipitation) exhibit linear trends in response to climate change over the twenty-first century in model simulations (e.g., Thompson et al. 2015). In addition, simple linear regression is relatively straightforward and easily replicated. For example, Hartmann et al. (2013) explicitly use linear regression to quantify trends over the historical record because of its simplicity and frequent use in climate research. However, there is no a priori reason to expect that climate variables will exhibit linear trends, and when they do not, simple linear regression may not be the most appropriate method (e.g., Seidel and Lanzante 2004). In this case, epoch differences are still an appropriate method for quantifying the net change over two periods of time. Thus, we stress that the comparisons made here between simple linear regression and epoch differences are applicable only in the specific case that the underlying trends are linear.

## 2. Problem setup

We begin with a first-order autoregressive [AR(1); red noise] time series *x*_{n} of length *N* (Box et al. 2008):

where *n* = 1, 2, …, *N*, *x*(0) = 0 by assumption, the coefficients *α* and *β* satisfy *α*^{2} + *β*^{2} = 1, and the *ϵ*_{n} are independent, identically distributed, Gaussian noise with mean 0 and variance of *σ*^{2}. The coefficient *α* is between 0 and 1 and captures the memory in the time series such that *α* = 0 denotes a time series with no memory from one time step to the next. The value of *α* can be estimated as the lag-1 autocorrelation of the time series (e.g., Box et al. 2008).

Next, we impose a trend upon *x*_{n} such that

where *γ* is the slope of the imposed linear trend [see (A1) in appendix A for *y*_{n} written as a function of *y* alone].

The question is, how should we estimate *γ*? Here, we explore two methods. The first method is simple linear regression; we denote the regression estimator of the trend (i.e., the slope) as . The second method is the epoch difference of length *M* converted into a trend—that is, the difference between the average value over the last *M* time steps and the average value over the first *M* time steps divided by *N* − *M* (see Fig. 1). We denote the epoch difference estimator of the trend as .

An example of these two methods is given in Fig. 1a, where we plot the annual mean land surface–air temperature anomalies (base period 1951–80) from GISTEMP (Hansen et al. 2010; downloaded from http://data.giss.nasa.gov/gistemp/) between 1913 and 2012. A similar linear fit and epoch difference was calculated in the AR5 summary for policymakers, although the epochs were of different lengths and the epoch difference was not converted into a trend (IPCC 2013). The two methods of estimating the trend over the 100 years yield slightly different answers; the 10-yr epoch difference estimator yields = 0.1°C decade^{−1} and the linear regression estimator = 0.08°C decade^{−1}.

To determine which trend estimator is optimal, we define the “optimal estimator” as the estimator with the smallest variance. As an example, Fig. 1b shows the distribution of estimated trends calculated from 100 000 synthetic white noise time series (*α* = 0) of the form of (2) with yearly variance *σ*^{2} = 0.01°C^{2} (estimated from the detrended temperatures over 1970–2012 in Fig. 1a) and linear trend *γ* = 0.08°C decade^{−1}. The different colors denote the different estimation methods. The width of the distribution of trends estimated using simple linear regression (dashed black) is smaller than the widths of all of the epoch difference estimators, with *M* = 5 yr exhibiting the largest spread and *M* = 30 yr the smallest spread. In appendixes A, B, and C, we analytically derive the variance of these distributions (i.e., the variance of and ); however, we present only the final results here for brevity.

## 3. Results

Figure 2 shows the variance of divided by the variance of . This ratio is a function of time series memory *α* and the epoch difference length *M* for a time series of length *N* = 100. The most striking feature of this figure is that the variance of is greater than the variance of for nearly all combinations of *α* and *M* (warm colors). In some cases (small *M*), the variance of the epoch difference estimator is more than 4 times that of the variance of the linear regression estimator. Only for time series with large memory (*α * 0.85) does the linear regression estimator variance exceed that of the epoch difference. The exact cutoff at which the linear regression estimator variance exceeds that of the epoch difference is a function of *N*, and we find that as long as *N* ≥ 10 and *α* ≤ 0.5, simple linear regression always exhibits the smaller variance (not shown).

Further inspection of Fig. 2 reveals that for *α * 0.85 there is a minimum in the ratio of estimator variances when the epoch length is around 30 time steps. This minimum is more easily seen in Fig. 3 (orange line), where we plot a cross section of Fig. 2 along *α* = 0 (the special case of no memory). Figure 3 also shows results for time series of length *N* = 25, 50, and 75. In all cases, the ratio of variances exhibits a minimum—that is, there is an optimal *M*, denoted *M**, where the variance of the epoch difference estimator is minimized.

In the special case of no memory (*α* = 0), the variance of the epoch difference estimator is given by

[see (C19)]. Differentiating (3) with respect to *M* and setting the resulting expression equal to 0 yields a remarkably simple form for *M**:

Thus, in the case of no memory, the optimal epoch length is one-third the length of the time series. This general result is shown in Fig. 3.

When *α* ≠ 0, one can use the more general equation for the variance of [see (C16) in appendix C] to determine the optimal epoch length. Results of this calculation are shown in Fig. 4, where the optimal *M* is plotted as a function of time series length *N*. The different colors denote different values of *α*. For small memories (*α* ≤ 0.5; purple and blue dots) the optimal epoch length remains near *N*/3; however, when *α* increases to 0.7 (green dots) the optimal epoch length decreases for short time series. In the rather extreme case of *α* = 0.9 (orange dots), the optimal epoch length is *M** = 1 for all *N*. In this limit, the autocorrelation is so large that adding additional data points to the epoch difference does not help the trend estimation.

Returning to Fig. 3, it is also apparent that the ratio of variances has a minimum limit of 1.125 in the case of *α* = 0. Even when the optimal epoch length is chosen, the variance of the epoch difference estimator is 12% larger than that of the linear regression estimator. This limit is straightforward to derive when *α* = 0 using (3) and the equation for the variance of [see (B17) in appendix B]:

Taking the ratio of (3) to (5) and setting *M* = *N*/3 leads to a minimum ratio of 1.125 in the limit of large *N*.

When performing an epoch difference, one is not limited to defining the two epochs to be the same length. For example, the AR5 summary for policymakers analyzes changes in observed surface air temperature between 1850 and 2012 using one epoch of length 51 years (1850–1900) and the other of length 10 years (2003–12) (IPCC 2013). The ratio of variances of the estimated trend computed using epochs of different lengths *M*_{1} and *M*_{2} and simple linear regression is shown in Fig. 5a (for *N* = 100 and *α* = 0). As one might suspect from Fig. 4, the minimum variance is found when *M*_{1} = *M*_{2} = *N*/3 ≈ 33 [(C20)]. Also evident is that a larger *M*_{1} does not compensate for a smaller *M*_{2}. Choosing *M*_{1} = 45 and *M*_{2} = 21, for example, does not produce an estimator variance as small as that when *M*_{1} = *M*_{2} = 33.

Finally, Fig. 5a suggests that if one has certain restrictions on the length of *M*_{1}, there is an optimal length of the second epoch *M*_{2}. The optimal *M*_{2} given *M*_{1} (denoted as ) is given by (C21):

Solutions to this equation are plotted in Fig. 5b, and it is clear that using equal-length epochs (dashed line) is not the optimal solution unless *M*_{1} = *M*_{2} = 33. For example, if *M*_{1} = 10, then setting *M*_{2} = 24 will minimize the variance of the epoch difference estimator of the trend. The general relationship is that the optimal increases nonlinearly with *M*_{1}.

## 4. Discussion

In this short note we show that using simple linear regression to estimate a trend in a red noise time series with an imposed linear trend is almost always preferred over the use of an epoch difference trend estimator, although the degree to which the regression estimator is optimal is strongly dependent on the epoch length. We further demonstrate that if an epoch difference is used, the optimal epoch length is approximately one-third the length of the time series (*N*/3) under most circumstances. These conclusions break down in the limit that the time series exhibits extremely large autocorrelation. Finally, we demonstrate that if the epoch length at one end is constrained to a suboptimal value then the optimal epoch length at the other end is, in general, neither *N*/3 nor symmetric.

It is perhaps not surprising that simple linear regression is superior to the epoch difference trend estimation under most circumstances. While simple linear regression uses all of the data in the time series, maximizing the degrees of freedom, the epoch difference estimation uses only the data at the beginning and end of the time series and thus ignores information. In most cases, more information provides a tighter constraint on the trend. This is, however, not the case when the autocorrelation of the time series is large. In this instance, the data in the middle of the time series are no longer independent of the data at the end points; the epoch difference estimation then outperforms the linear regression estimation of the trend.

As discussed in the introduction, the results presented here are based on the assumption that the trend being estimated is linear and that the variability superposed on this linear trend is first-order autoregressive [AR(1)]. While the climate system is known to exhibit variability that is bimodal or oscillatory (e.g., the Madden–Julian oscillation), many aspects of the climate system are well modeled as AR(1) (e.g., Hartmann and Lo 1998; Feldstein 2000; Newman et al. 2003). Spectral methods may be used to carry out an analysis similar to that presented in this paper when considering autoregressive models of higher order (e.g., Bloomfield 1992; Bloomfield and Nychka 1992).

In addition, the assumption of a linear climate trend is often made in climate research, and, in fact, it is embraced by all of the IPCC reports. If, however, the trend is not linear or the underlying variability is not AR(1), simple linear regression may not be the most appropriate method (e.g., Seidel and Lanzante 2004).

## Acknowledgments

We are indebted to two anonymous reviewers and Editor J. Barsugli for their thoughtful comments that greatly helped improve an earlier version of this paper. EAB thanks C. Deser for posing the question that led to this work. EAB is supported in part by the Climate and Large–Scale Dynamics Program of the National Science Foundation under Grant 1419818.

### APPENDIX A

#### Setup and Notation

We consider the red noise time series with a linear trend imposed:

for *n* = 1, 2, …, *N*, where all quantities are as described in the main text.

Our ultimate goal is to derive the variance of the linear trend estimators; to do this, it is easiest to put the equations in matrix form. We define four *N* × 1 matrices:

We also define the *N* × *N* permutation matrix:

We rewrite (A1) using our compact matrix notation as

We rearrange (A4) to isolate on the left side and write

where the *N* × *N* matrix − *α* has a simple form: 1’s along the main diagonal and −*α*’s along the first lower diagonal. Thus, the matrix − *α* is invertible, and we can maintain the compactness of our equations by defining

Equation (A7) has the form of a basic linear equation, with following a linear trend with slope *γ*.

We note that the expected value *E*(⋅) of is an *N* × 1 matrix of zeros and that the variance Var(⋅) of is *σ*^{2} times an *N* × *N* identity matrix.

### APPENDIX B

#### Linear Regression Estimator

We wish to estimate the slope of the linear trend from the data. We first apply simple linear regression, and we denote this estimated trend as , given by

where ^{T} is the 1 × *N* matrix (Draper and Smith 1981):

We note that

and

We compute the expected value of the estimator by substituting (A7) into (B1) and applying standard matrix statistics (Graybill 1983):

Thus, the simple linear regression estimator is an unbiased estimator of the trend.

We compute the variance of the estimator in a similar fashion:

In the special case of no memory, *α* = 0, and = , and (B14) reduces to

### APPENDIX C

#### Epoch Difference Estimator

We next apply epoch differences to estimate the trend in [(A7)]. We allow for different epoch lengths: *M*_{1} at the beginning and *M*_{2} the end. We use to denote the estimate of the trend based on the difference between the two epoch averages:

We can rewrite (C1) using our matrix notation as

where is an *N* × 1 matrix of zeros, except the first *M*_{1} rows are given by

and the last *M*_{2} rows are given by

We assume that *M*_{1} + *M*_{2} < *N*. Then, although it is not immediately obvious,

and

We compute the expected value of the estimator by substituting (A7) into (C2) and applying standard matrix statistics:

Thus, the epoch difference estimator is an unbiased estimator of the trend.

We compute the variance of the estimator in a similar fashion:

In the special case of no memory, *α* = 0, and = , and (C16) reduces to

When *M*_{1} = *M*_{2} = *M*, (C18) reduces to

We compute the optimal size of the epochs (i.e., the values of *M*_{1} and *M*_{2} that minimize ), which we denote and , for a fixed *N* by differentiating (C18) with respect to *M*_{1} and with respect to *M*_{2} and setting the resulting two expressions simultaneously equal to 0. The result is surprisingly simple:

We compute the optimal *M*_{2} for a given *M*_{1} and *N*, which we denote , by differentiating (C18) with respect to *M*_{2}, setting the result equal to zero, and solving for :

Similarly, the optimal *M*_{1} for a given *M*_{2} and *N*, which we denote , is given by

## REFERENCES

*Time Series Analysis: Forecasting and Control*. 4th ed. John Wiley and Sons, 746 pp.

*Climate Change 2013: The Physical Science Basis*, T. F. Stocker et al., Eds., Cambridge University Press, 1029–1136. [Available online at https://www.ipcc.ch/pdf/assessment-report/ar5/wg1/drafts/fgd/WGIAR5_WGI-12Doc2b_FinalDraft_Chapter12.pdf.]

*Applied Regression Analysis*. 2nd ed. John Wiley and Sons, 709 pp.

*Matrices with Applications in Statistics*. Wadsworth International, 461 pp.

*Climate Change 2013: The Physical Science Basis*, T. F. Stocker et al., Eds., Cambridge University Press, 159–254. [Available online at https://www.ipcc.ch/pdf/assessment-report/ar5/wg1/WG1AR5_Chapter02_FINAL.pdf.]

*Climate Change 2007: The Physical Science Basis*, S. Solomon et al., Eds., Cambridge University Press, 1–18.

*Climate Change 2013: The Physical Science Basis*, T. F. Stocker et al., Eds., Cambridge University Press, 1–29.

*Geophys. Res. Lett*.,

**37**, L09708, doi:.

*J. Geophys. Res.*,

**109**, D14108, doi:.