## 1. Introduction

Running trends analysis (RTA) is one of several methods used in climate research to analyze univariate time series and time series association. For a given time series {*y*_{t}}, observed at *n* equally spaced points in time *t*_{1}, *t*_{2}, …, *t*_{n}, RTA involves defining *n* − *L* + 1 overlapping time windows (*W*_{1}, *W*_{2}, …, *W*_{n−L+1}) each with exactly *L* consecutive time points (2 ≤ *L* ≤ *n* − 1), and then evaluating the least squares estimates of the trend for each time window *W*_{j}. This produces a new series of length *n* − *L* + 1 called the “running trends series,” which is used as a summary statistic of the original series {*y*_{t}} for further analysis.

In recent years RTA has been widely used in climate applied research as part of more complex time series analysis. Holgate and Woodworth (2004), for example, use 10-yr global mean sea level (GMSL) running trends to study acceleration of GMSL from 1948 to 2002 and to obtain a GMSL reconstruction for the same period. Santer et al. (2014) use 120-month running trends of changes in the temperature of the lower troposphere (from satellite measurements made by the Microwave Sounding Unit on National Oceanic and Atmospheric Administration polar-orbiting satellites) for the period 1979–2012, to analyze volcanic contributions to observed changes in warming rates.

Hamlington et al. (2014, 2013) use 20-yr running trends of the Pacific decadal oscillation (PDO) and of annual mean sea level to study the contribution of the PDO to mean sea level trends both globally (Hamlington et al. 2013) and regionally (Hamlington et al. 2014). Palmer and McNeall (2014) compute 10-yr running trends of the total energy (TE) in the Earth system, the global surface temperature (GST), and the total near-global ocean heat content (OHC) to investigate the relationship among these variables [the model data analyzed are multicentury preindustrial control simulations from phase 5 of the Coupled Model Intercomparison Project (CMIP5) model archive]. In the same work, the authors use the correlations between running trends in TE and running trends in OHC, for a range of periods (running trends of length 2–36 months are considered), to estimate the time scale on which the ocean becomes the dominant term in the planetary energy budget. Risbey et al. (2014) use 15-yr global mean surface air temperature (GMST) running trends evaluated using data from CMIP5 models and observations for the period 1880–2012 to study the CMIP5 models’ performance (in terms of the models’ ability to reproduce the 15-yr observed GMST trends).

There is no doubt that RTA might be a useful descriptive tool; however, despite its general use in applied research, precisely what it reveals about the underlying time series and time series association is unclear and, as a result, its interpretation is unclear too. In this respect a more rigorous study of the information that RTA conveys about the underlying time series and time series association and the definition of additional statistics to adequately interpret and validate RTA would be desirable. This paper is intended as a first step in this direction. We present two main results. First of all, we provide an explicit formula for the set of time series that share a given series of running trends by showing that such a set is the solution set (of dimension *L* − 1) of a linear system with *n* − *L* + 1 equations and *n* unknowns. This allows us to show that running trends, alone, perform very poorly as summary statistics for univariate time series and time series association. In addition, we show that RTA is equivalent to the estimation of the (possibly nonlinear) trend component of a time series using a weighted moving average filter with window length *L* and triangular-shaped weighting pattern (i.e., the maximum weight is assigned to the observation at the evaluation point and the weight decreases symmetrically as we move away in time from the evaluation point). Such equivalence provides a solid ground for RTA implementation and for the definition of additional statistics to adequately interpret RTA.

The first main result, which is the explicit formula for the set of time series with a given series of running trends, is presented in section 2 and illustrated with an example in section 3. The worked example also serves to discuss some important practical implications of the derived formula. The second main result, which clarifies the relationship between RTA and time series smoothing using a weighted moving average, is presented in section 4. Statistical tools to validate the RTA and choose the optimal time window length *L* are also discussed in section 4. Methodological issues related to trend extraction and their relevance for RTA are discussed in section 5. In section 6, we apply the proposed methodology to a real data example. Concluding remarks are presented in section 7.

## 2. Main result 1

*n*equally spaced time points

*t*

_{1},

*t*

_{2}, …,

*t*

_{n}, and let Δ be the common length of the time interval between two consecutive points in time. We haveLet

*L*be a positive integer such that 2 ≤

*L*≤

*n*− 1 and let

*W*

_{1},

*W*

_{2}, …,

*W*

_{n−L+1}be overlapping time windows of length

*L*Δ defined starting from

*t*

_{1}and advancing by Δ at a time:For a generic time series

*y*

_{t}} as the series

*W*

_{j}. That is,

*m*

_{j}in the linear regressionwhere

*n*−

*L*+ 1 arbitrary values. We provide a recursive formula to obtain the set of all time series

*L*− 1. The details are provided in the following theorem (for the proof see appendix A).

### Theorem 1

Let ^{n−L+1}.

- The set
of all time series for which the corresponding trend series is exactly is a vector space of dimension *L*− 1. - The generic element in
is a time series , where can be arbitrarily chosen in ℝ ^{L−1}, andcan be found using the recursive relation

Once we have arbitrarily fixed the values of

## 3. Example 1

As an illustration of the main result in the previous section, we consider *n* = 15 equally spaced time points *t*_{1} = 2000, *t*_{2} = 2001, …, *t*_{15} = 2014 (i.e., Δ = 1) and 11 overlapping time windows *W*_{j} of length *L* = 5 (*n* − *L* + 1 = 11). To show the general applicability of the main result we consider two scenarios. Each scenario corresponds to a choice of the series of running trends, and for each scenario we use the main result in section 2 to generate five time series of length *n* = 15 that share the same series of running trends. The two scenarios considered are scenario 1, where the series of running trends is approximately linearly increasing (it follows a linear function plus noise), and scenario 2, where the series of running trends is a realization of a random noise process. In particular the running trends are generated independently from a normal distribution with mean of 300 and standard deviation of 50.

The series of running trends for the two scenarios above are plotted in the top-left panel of Figs. 1 and 2. Notice that the series of running trends for the second scenario is generated independently of the running trends series for the first scenario. This will be used later in the example.

Let *n* = 15 whose running trends series is the one corresponding to scenario *h* (=1 and 2). According to the main result in section 2, each space *L* −1 = 4 and in order to specify a series in *n* = 15 with the same running trends. Four of the five choices are common across scenarios; in particular we choose *n* = 15 for scenarios 1 and 2 are plotted in Figs. 1 and 2, respectively. Since each series corresponds to a choice of

Despite its simplicity, the worked example highlights some important consequences of the main result in section 2.

- Time series with a very different behavior might share exactly the same running trends series [cf. the series (i)–(v) in each of Figs. 1 and 2]. This is true in general. Different choices of the free variables
in theorem 1 (increasing, decreasing, constant, periodic, etc.), will lead to time series with the same running trends but whose behavior is very different. This remark has two important implications. First of all, a series of running trends, alone, provides a poor description of the underlying time series. That is, a running trends series alone has a poor performance, as summary statistics of univariate time series. Second, strong association between running trends series, alone, does not imply any association between the underlying time series. Consider, for example, the time series (ii) and (v) in Fig. 1. Suppose that series (ii) represents simulations of a certain variable *X*(for the period 2000–14) from a given model and that series (v) represents the observed values of*X*for the same period. If we use the correlation between 5-yr running trends of series (ii) and (v) to assess the model’s performance, we would conclude that the model performance is very good [the correlation between the two running trends series is 1 since (ii) and (v) share exactly the same series of running trends]. However, this result would be misleading. If we compare series (ii) and (v) we would say that the model’s performance is extremely poor. The fact that strong association between running trends series, alone, does not imply any association between the underlying time series is particularly evident if we focus on the last time window*W*_{n−L+1}. According to the main result in section 2, in fact, we can arbitrary fix the value of*L*− 1 out of the*L*points in*W*_{n−L+1}and then use the recursive formula in (2) to obtain a series with the desired running trends. In the example above this was exemplified by our arbitrary choice of the free variables. - Time series with very different running trends series might be almost identical. For example, compare the series (ii) in Figs. 1 and 2. We observe that the two series are almost identical [the correlation between the two (ii) series is 0.999999!] despite the fact that the corresponding series of running trends are at best only weakly associated. The implication of this remark is that absence of association between running trends series does not imply absence of association between the original series. Once again, consider series (ii) in Figs. 1 and 2. Suppose that series (ii) in Fig. 1 represents simulations of a certain variable
*X*(for the period 2000–14) from a given model and that series (ii) in Fig. 2 represents the observed values of*X*for the same period. If we use the association between 5-yr running trends of series (ii) in Fig. 1 and series (ii) in Fig. 2 to assess the model’s performance, we would conclude that the model performance is very poor (the correlation between the two running trends series is 0.035). However this result would be misleading. If we compare the series (ii) in scenarios 1 and 2, we would say that the model’s performance is excellent. Again this is true in general and can be explained by the difference in magnitude between the running trends series and the free variables. Given a series of running trends , we can always choose the magnitude of the free variables sufficiently large to make negligible the contribution of the running trends in the recursive formula in (2). Thus, it is possible to have two time series { *x*_{t}} and {*y*_{t}} that are almost identical (and thus, that show very high association) but whose running trends series are practically independent.

In our opinion, however, the worked example serves its scope, which is not to describe common misinterpretation of RTA in practice (we do not believe, in fact, that papers in climate research tend to confuse similar or dissimilar running trends series with similar or dissimilar time series), but rather to illustrate the main result in section 2 and remind the community that RTA should be used and interpreted with care.

## 4. Main result 2

In the previous sections we show that the set of time series which share the same series of running trends can be in fact very different. An inverse statement is also demonstrated to be true: time series with different running trends can be nearly identical. The conclusion is that RTA *alone* is a poor summary statistics for univariate time series and time series association and should be used with care. Although theoretically interesting, this result is of limited practical use for climate scientists since it fails to address two fundamental questions. Question 1 (Q1) is, for a given window length *L*, what is the information that an RTA conveys about the underlying time series? Question 2 (Q2) is, how should one choose the window length *L* in the applications? Both questions are related to the interpretation of RTA output. In this section we provide a theoretical result that clarifies such an interpretation.

*y*

_{t}} aswhere

*M*(

*t*) is a (possibly nonlinear) trend component (a smooth and continuous function of

*t*), and {

*z*

_{t}} is a zero mean stationary process that represents a random noise component (as we will see later, if the original series contains also a seasonal component this should be removed before evaluating running trends). Trend extraction is an important topic in climate research (see Hannachi 2007; Barbosa and Andersen 2009; Li et al. 2011). Climate scientists perform RTA to study how the trend component

*M*(

*t*) changes over time. The mathematical interpretation of rate of change is the first derivative. Thus implicit in RTA is the study of the velocity of the trend component

*M*(

*t*) in the representation in (3). When we evaluate running trends for a given window length

*L*, we are trying to estimate the velocity of the trend component

*M*(

*t*). Therefore if we integrate RTA results (i.e., if we integrate the series of running trends {

*y*

_{t}}) we obtain the estimation of the trend component

*M*(

*t*) implicit in RTA. It is possible to show that such estimation is equivalent to the estimation of the trend component of the original series obtained by smoothing {

*y*

_{t}} with a weighted moving average filter with window length

*L*and triangular shaped weighting pattern (i.e., the maximum weight is assigned to the observation at the evaluation point and decrease symmetrically as we move away in time from the evaluation point). The details are provided in the following theorem (for the proof see appendix B). To simplify the exposition and without loss of generality we will assume that the common length of the time interval between two consecutive points in time (Δ) is one.

### a. Theorem 2

Let Δ = 1 and *n* − *L* + 1 obtained by evaluating the integral of the function resulting from linearly interpolate {*W*_{j}). Depending on whether *L* is odd or even we have the following:

- Odd case,
*L*= 2*r*+ 1 for*r*∈ ℕ,where, , and . - Even case,
*L*= 2*r*for*r*∈ ℕ,where, , and .

### b. Diagnostic tools for RTA validation and interpretation

Evaluation of RTA performance for a given *L* (Q1) is equivalent to the assessment of the performance of the moving average filter in (4) or (5) as estimator of the trend component of the original series.

*w*

_{−s}=

*w*

_{s}), if the trend

*M*(

*t*) is approximately linear over the time window

*W*

_{j}, and the weighted average (with weights

*w*

_{s}) of the error term

*z*

_{t}in (3) over the same time window is close to zero, then we have (for the proof see appendix C)That is, RTA provides a very accurate estimate of the (possibly nonlinear) trend component of the underlying series {

*y*

_{t}} at

*M*(

*t*) is approximately linear over the time windows

*W*

_{j}. Thus our proposal for RTA validation is to report together with the running trends series, the RTA estimation of the trend component

*M*(

*t*) and numerical indices that summarize the extent to which the linear models in (1) provide a suitable description of the behavior of the series in each of the overlapping time windows

*W*

_{j}. In particular, we advocate for the use of the average

*R*

^{2}and the percentage of running trends that are statistically significant at a given significant level

*α*. Standard choices for

*α*are

*α*= 0.05 or 0.01. For a given time series, the

*R*

^{2}for the time window

*W*

_{j}measures the proportion of the total variability of the series explained by (the least squares estimate of) the linear model in (1), thus

*R*

^{2}takes value in [0, 1]. An

*R*

^{2}close to (equal to) one for a certain time window

*W*

_{j}would indicate that the time series is approximately (exactly) linear in

*W*

_{j}and thus the estimated linear trend

*n*−

*L*+ 1

*R*

^{2}values corresponding to fitting the linear models in (1) for each time window

*W*

_{j}(average

*R*

^{2}) would thus be a natural summary of the overall performance of running trends as estimator of the trend component of the underlying series.

In addition to these two summary measures of overall performance, we believe that detailed information on statistical significance and confidence intervals for running trends are fundamental for a correct interpretation of RTA results. Running trends that are not statistically significant in a certain time interval should be interpreted to mean that there is no evidence that the trend velocity is different from zero in that interval [i.e., there is no evidence that *M*(*t*) is not constant in that period]. On the other hand, running trends confidence intervals (CIs) are fundamental for a correct interpretation of differences between the running trends of a certain time series or the series of running trends underlying two time series. Large overlap of confidence intervals should be interpreted as that the underlying running trends are indistinguishable, while no overlap will indicate a change (or a difference) in trend velocity. In summary, for RTA validation and interpretation we propose the following:

- To plot the original series superposing the RTA estimation of the trend component using weighted moving average with weights as in (4) or (5) depending on whether
*L*is odd or even. Such a plot will provide useful information on whether the level of smoothness implied by the window length used in RTA is appropriate (we want to avoid oversmoothing and undersmoothing). For interpretation purposes different colors (or different characters) should be used in the plot of, the RTA estimation of the trend component, to distinguish points that correspond to running trends statistically significant at a given significant level *α*from those that correspond to running trends that are not statistically significant. As we observed before, portions of the plot ofthat correspond to running trends that are not statistically significant should be interpreted as that there is no evidence that the trend velocity is different from zero in that region [i.e., there is no evidence that *M*(*t*) is not constant in that region]. The average*R*^{2}and the percentage of running trends that are statistically significant at the significant level*α*should also be reported with the plot. - To plot the series of running trends with the corresponding (1 −
*α*)% confidence intervals. Again for interpretation purposes different colors (or different characters) should be used to distinguish between points that correspond to running trends statistically significant at the significant level*α*from those that correspond to running trends that are not statistically significant.

#### Example 1 revisited

As a first illustration, we apply the proposed RTA validation procedure to selected time series from example 1. As we observed in section 3, the series (ii) for scenarios 1 and 2 of example 1 are nearly identical but, apparently, have very different running trends. On the other hand, the two time series (ii) and (v) for scenario 1, which share the same series of running trends, are, in fact, very different. We want to use the proposed measures for RTA validation and interpretation to assess the information provided by RTA in these two cases.

Figure 3 (left panels) shows the (ii) and (v) series for scenario 1 and the series (ii) for scenario 2 of example 1 (circles), with the RTA estimation of the trend component, *L* = 5 and according to (4) the moving average weights are *w*_{−2} = 2/26, *w*_{−1} = 5/26, *w*_{0} = 6/26, *w*_{1} = 5/26, and *w*_{2} = 2/26. In the same figure (right panels) we also represent the series of running trends associated with each of the three series (solid triangles) together with the corresponding 95% confidence intervals (solid lines). In both cases, solid circles and triangles that correspond to running trends statistically significant at the significant level *α* = 0.05 are represented in black, while those that correspond to running trends that are not statistically significant are represented in gray. The average *R*^{2} and the percentage of running trends that are statistically significant at the significant level *α* = 0.05 are also reported.

If we compare the series (ii) for scenarios 1 and 2, we observe that none of their running trends is statistically significant (percentage of running trends that are statistically significant is 0), that is, in both cases there is no evidence that the velocity of the trend component is different from zero. In agreement with the overall significance of the running trends the RTA estimation of the trend component (gray solid circles) in both cases looks just the same: a horizontal line that indicates no trend. The two series share the same trend pattern, that is, the absence of any trend (this is what we expect since the two series look just the same). The analysis of the confidence intervals for the running trends underlying the two series provides further insights in their comparison. The confidence intervals are huge and the differences between the two running trends series (which can be appreciated in Figs. 1 and 2) become negligible once the uncertainty on running trends estimates is taken into account. In this case it would not make any sense to look at time series association in terms of correlation between the corresponding series of running trends since running trends variations for both series are not statistically significant.

Similarly if we compare series (ii) and (v) for scenario 1 we observe that while none of the running trends is statistically significant for series (ii), all the running trends are statistically significant for series (v). The difference in *R*^{2} is also huge, zero for series (ii) versus 0.99 for series (v). In agreement with the overall significance of the running trends in the two cases, the RTA estimation of the trend component is an approximately horizontal line for series (ii), indicating no trend, and a linear function for series (v), indicating a constant trend acceleration (i.e., a quadratic trend component) for series (v). Once again the analysis of the confidence intervals for the running trends underlying the two series provides further insights in their comparison. Despite the fact that the running trends series is exactly the same for series (ii) and (v) in scenario 1, they look very different when uncertainty on their estimates is taken into account. While the confidence intervals for the running trends of series (ii) are huge and thus the corresponding running trends series looks constant, the confidence intervals for series (v) are much narrower, and a linearly increasing trend velocity (i.e., a constant acceleration of the original series) can be appreciated. Also in this case assessing association between the trend components of the two series looking at the correlation between the corresponding series of running trends would be totally misleading since the same series of running trends has a very different interpretation in each of the two cases.

### c. Choice of the optimal time window length

As a result of theorem 2, the problem of choosing the optimal window length in RTA (Q2) can be reformulated in terms of optimal choice of the window length in a problem of trend estimation by weighted moving average filter. In this context, there is a trade-off between increasing bias (for large *L*) and increasing variance (for small *L*) of the moving average estimator *L* controls the degree of smoothing. The larger *L* is, the more we smooth the original series. Smoothing less (decreasing *L*) increases the variance of the moving average estimator underlying RTA but also leads to less bias. A large *L* leads to an overly smooth estimate of the trend component leaving out possible details of the trend, and creating a large bias. On the other hand, when a small *L* is used, trend estimation is based on few data points, the variance of the estimator is large, and the resulting RTA estimation of the trend component is a wiggly curve. An optimal balance between bias and variance of the estimator *M*(*t*), our smoothing method, and the amount of data we have (see Shalizi 2015). The choice of *L* requires a certain amount of subjective judgement. Trial and error is needed in order to produce good results.

## 5. Methodological issues in linear trend analysis

As we described in the previous section, statistical significance and confidence intervals of the running trends are fundamental for a correct interpretation of RTA results. Both statistical significance and confidence intervals in RTA are derived under the framework of ordinary least squares (LS) regression, which crucially depends on the assumption that the error terms are independent, normally distributed with zero mean and constant variance. Unfortunately these assumptions are violated in many applications in climatic science. Nonnormal data, correlated errors, and especially the presence of outliers in environmental time series can dramatically affect the performance of LS estimation and thus adversely affect the performance of RTA. In particular, as a consequence of even few data outliers, linear trend estimation can be severely biased or trends can be masked (e.g., see Muhlbauer et al. 2009). Thus, diagnostic checking for RTA results should include a rigorous study of the LS regression assumptions in each time window *W*_{j}. Such diagnostic check should include formal tests for normality and independent and equally distributed error terms [e.g., see Brockwell and Davis (2002, section 1.6, 35–38) and Thode (2002)] as well as the use of graphical tools such as the normal quantile–quantile (QQ) plot or the plot of the sample partial and simple autocorrelation functions of the residuals. LS estimates in each time window *W*_{j} should be compared with more robust regression estimates that are less sensitive to outliers and nonnormal data, such as the “MM estimates” proposed by Yohai (1987), and that combine high levels of robustness with high efficiency (e.g., see Gschwandtner and Filzmoser 2012) or bootstrap methods (see Davison and Hinkley 1997).

It should be noted that RTA requires LS assumptions to be satisfied “locally”, that is, within each time window *W*_{j}. In this respect, problems such as autocorrelated errors are less severe at the local scale but at the same time are also more difficult to detect due to the reduced sample size.

Note also that an important feature in applications of RTA, as linear filter, is that regardless of the value of *L* used, RTA is not able to remove a periodic component with period *d* ≥ 3 [this is because the weights in (4) and (5) are all positive, and triangular shaped]. This implies that if a periodic component is present in the data, it should be removed before RTA is performed.

Also note that moving averages do not allow estimates of the trend component *M*(*t*) near the ends of the time series. Thus, as a result of theorem 2, given a time series observed at *n* points in time *t*_{1}, *t*_{2}, …, *t*_{n}, RTA provides no information at the end of the series. For example, if *L* is odd (*L* = 2*r* + 1), RTA provides an estimation of the trend component from *t*_{r+1} to *t*_{n−r} and no information is provided for the first and last *r* periods. Similar results hold for the even case. Any extrapolation of the results of an RTA to the end of the time series is equivalent to extrapolating the time series behavior before and after the period of study (*r* times backward and forward the period of study when *L* = 2*r* + 1). Such extrapolation requires a large degree of subjective judgement and should be acknowledged in presenting RTA results.

## 6. A real data example

In a recent paper, Hamlington et al. (2013) use reconstructed sea level (SL) gridded data from 1950 to 2009 to study the 20-yr trends in sea level. In particular they found that “Over the last 60 years […] PDO has contributed significantly to the 20 year trends in GMSL during certain time periods. In the last 20 years, when the PDO went from generally positive to negative phase […] the PDO contributed 0.49 ± 0.25 mm/year to the trend in GMSL. From 1968 to 1987 when the PDO went from negative to positive phase, however, the PDO contribution lowered the trend in GMSL by 0.70 ± 0.26 mm/year.”

As an illustration of the approach to RTA validation and implementation that we discussed in the previous sections, we apply RTA to the PDO and GMSL time series for the same period studied by Hamlington et al. (2013), using their same data and the same window length *L* = 20 (with the only difference that we do not include the year 2009 in the analysis since the 2009 reconstructed data were incomplete). The GMSL series is obtained by averaging each of the 59 SL maps resulting from computing yearly averaged reconstructed SL data at each grid point in the ocean (from 1950 to 2008). This produces a GMSL time series of length *n* = 59. In Figs. 4a and 4c we show the GMSL and PDO time series (circles) with the RTA trend estimation superposed (solid circles). In Figs. 4b,d we also show the corresponding running trends series (solid triangles) with pointwise 95% confidence intervals (dashed lines). In both cases solid circles and triangles that correspond to running trends that are (are not) statistically significant at the significance level *α* = 0.05 are represented in black (gray). Note that since RTA trend estimation, both for the PDO and GMSL, is based on a 20-yr moving average, the first 10 years and the last 10 years of the period of study are “lost” and thus the two RTA estimations are available only for the period 1959–98.

According to the GMSL trend estimation, based on RTA in Fig. 4a, the data suggest that the GMSL trend is approximately linear for the period 1959–85 and subject to a (constant) positive acceleration afterward (i.e., from 1986 to 1998). All GMSL running trends for the period 1959–98 are statistically significant at the significant level *α* = 0.05, and the fit of the linear GMSL model in (1), for the 40 time windows considered, is quite good with an average *R*^{2} of 0.74. The 95% confidence intervals for GMSL running trends (see Fig. 4b) confirm these results. All the confidence intervals are positive and do not contain zero (thus indicating evidence of a positive velocity of GMSL trend for the period 1959–98). The large overlap of the 95% CI for GMSL running trends for the period 1959–85 suggests a constant velocity (i.e., a linear GMSL trend) for this period. The vertical shift of the 95% CI starting from 1986 to 1998, for GMSL, suggests a linear increase in the velocity (i.e., a constant acceleration) for this period.

On the other hand, according to the PDO trend estimation based on RTA in Fig. 4c, and the 95% CI for PDO running trends in Fig. 4d, the data suggest 1) a positive linear PDO trend for the period 1970–80 and 2) no evidence of any PDO trend for the period 1959–69, and also for the period 1981–98 (with the exception of 1992, which we will discuss later). The fit of the linear PDO model in (1), for the 40 time windows considered, is quite poor with an average *R*^{2} of 0.2 (although if we restrict the attention to the period 1970–80 for which a PDO linear trend has been detected, the average *R*^{2} increases to 0.5). The 95% confidence intervals for the PDO running trends (see Fig. 4d) confirm again these results. All the 95% CI confidence intervals for PDO running trends, except those for the period 1970–80 and for the year 1992, contain the zero (thus indicating no evidence of a positive or negative trend velocity for the two periods 1959–69 and 1981–98). Note that despite the fact that the PDO running trends for the time window centered on 1992 are statistically significant and negative, the corresponding 95% CI suggests that the PDO trend velocity for the year 1992 might be very close to zero (the upper bound of the 95% CI is −0.10 for the year 1992), so only weak evidence of a nonzero PDO trend velocity is provided by the data for the year 1992.

In terms of time series association, our RTA results only suggest a strong association between GMSL and PDO trends for the period 1970–80 (in this period, both trends are approximately linear), but no association for the periods 1959–69 and 1981–98, for which the GMSL trend is again approximately linear, while there is no evidence of PDO trend. Note that also in this case, as in the example in section 3, statistical significance and confidence intervals for PDO and GMSL running trends are fundamental for a correct interpretation of RTA results. In particular, the very low percentage of statistically significant PDO running trends suggests that using correlation between PDO trends and GMSL trends (or between PDO running trends and any other time series) might be very misleading since it ignores uncertainty on the actual values of the PDO running trends. Note also that, as we explained above, RTA in this example provides an estimation of the PDO and GMSL trend component only for the period 1959–98. Extrapolation of the RTA results outside this period would be equivalent to extrapolate the PDO and GMSL series backward and forward the period of study. Such extrapolation would introduce additional uncertainty that should be at least acknowledged.

As a diagnostic check for the validity of the PDO and GMSL linear models in (1), which we used in our analysis, for each of the 40 time windows *W*_{j} we checked (i) the hypothesis of independent and identically distributed residuals using the difference sign test and the rank test [e.g., see Brockwell and Davis (2002, section 1.6, 35–38)] and (ii) the hypothesis of normality using the Shapiro–Wilk, the Lilliefors, and the Pearson chi-square tests [e.g., see Saculinggan and Balase (2013) and Thode (2002)]. Graphical tools such as a normal QQ plot or a plot of the sample autocorrelation functions of the residuals were also used as additional tools for the diagnostics in (i) and (ii). Both for PDO and GMSL no evidence of correlated errors was found. However, the hypothesis of normality was rejected for time windows *W*_{30}–*W*_{32} for GMSL and for time windows *W*_{12}, *W*_{14}, and *W*_{25} for PDO. Inspection of the plot of residuals for those time windows reveals a lack of symmetry in the distribution of residuals. To take into account the violation of the normality assumption as an alternative to the standard 95% confidence intervals for both GMSL and PDO running trends we also considered 95% percentile bootstrapped confidence intervals [see algorithm 6.1 on page 262 in Davison and Hinkley (1997)]. We did not appreciate any important difference between the ordinary and the bootstrapped CI. All these diagnostics should be interpreted with caution because of the small sample size in each window *W*_{j}. With respect to outlying observations we compared the ordinary least squares estimation of the PDO and GMSL running trends with robust MM estimates. Robust and ordinary running trends estimates, both for PDO and GMSL, were in close agreement.

## 7. Conclusions

Running trends series are widely used as summary statistics for univariate time series and time series association. Interpretation of RTA results, however, is unclear. In this paper, we contribute to such interpretation in two ways: 1) we provide an explicit formula for the set of time series with a given series of running trends, which allows us to show that running trends, alone, perform very poorly as summary statistics for univariate time series and time series association; and 2) we establish an equivalence between RTA and the estimation of (a possibly nonlinear) trend component of the underlying time series using a weighted moving average filter. Such equivalence provides a solid ground for RTA interpretation and implementation. As we discussed in section 5, methodological issues related with trend extraction, such as nonnormal data, correlated error terms, and the presence of outliers, are crucial for interpretation of RTA results and thus a diagnostic check of LS assumptions in each time windows *W*_{j} should be part of the diagnostic analysis of any RTA. In this respect, it would be interesting to study how our results generalize when RTA is implemented using more robust (linear) trend estimation methods less sensitive to violations of LS assumptions. Given the equivalence between RTA and linear filtering using a weighted moving average, smoothing splines, kernel regressions, and other nonparametric methods for trend estimation are natural competitors of RTA and a comparison of RTA performance with respect to these alternative estimation methods represents a very interesting topic for future research.

This work has been supported by Projects CGL2010-12153-E and AYA2010-22039-C02-01 from the Spanish Department of Science and Innovation (MICINN). We thank three anonymous reviewers for their useful comments that greatly improved the original version of the paper.

# APPENDIX A

## Proof of Theorem 1

*j*∈ {1, 2, …,

*n*} and let

*b*

_{j}and

*m*

_{j}are given by

*W*

_{j}can be written aswhere T denotes transposition.

*n*−

*L*+ 1 equations (the number of time windows

*W*

_{j}) and

*n*unknowns (

*L*− 1, and the generic solution of the system is obtained by assigning an arbitrary value to the free variables

- for
*j*= 0, the recursive formula above provides the value ofas a function of , and - for
*j*= 1, the recursive formula above provides the value ofas a function of , and so on.

# APPENDIX B

## Proof of Theorem 2

*L*= 2

*r*+ 1 and

*r*∈ ℕ. A similar proof applies when

*L*is even. Consider a time series

*W*

_{j}, for

*j*= 1, 2, …,

*n*−

*L*+ 1, according to (A2) we havewhere the last equality follows by noticing thatand the constant

*C*in (4) in theorem 2. Let

*α*is an integration constant. Choose

*α*to be the running weighted mean of the

*y*

_{t}values in the first window

*W*

_{1},The proof is by induction on

*j*. For

*j*=1, (4) holds [because of (B3) and (B4)]. Assuming that (4) holds for an arbitrary

*j*, we need to show that it also holds for

*j*+ 1. So let us assume thatAccording to (B3) we have

# APPENDIX C

## REFERENCES

Barbosa, S. M., , and O. B. Andersen, 2009: Trend patterns in global sea surface temperature.

,*Int. J. Climatol.***29**, 2049–2055, doi:10.1002/joc.1855.Brockwell, P., , and R. Davis, 2002:

Springer, 434 pp.*Introduction to Time Series and Forecasting.*Davison, A., , and D. Hinkley, 1997:

Cambridge University Press, 594 pp.*Bootstrap Methods and Their Application.*Gschwandtner, M., , and P. Filzmoser, 2012: Computing robust regression estimators: Developments since Dutter (1977).

,*Aust. J. Stat.***41**, 45–58.Hamlington, B. D., , R. R. Leben, , M. W. Strassburg, , R. S. Nerem, , and K.-Y. Kim, 2013: Contribution of the Pacific decadal oscillation to global mean sea level trends.

,*Geophys. Res. Lett.***40**, 5171–5175, doi:10.1002/grl.50950.Hamlington, B. D., , M. W. Strassburg, , R. R. Leben, , W. Han, , R. S. Nerem, , and K.-Y. Kim, 2014: Uncovering an anthropogenic sea-level rise signal in the Pacific Ocean.

,*Nat. Climate Change***4**, 782–785, doi:10.1038/nclimate2307.Hannachi, A., 2007: Pattern hunting in climate: A new method for finding trends in gridded climate data.

,*Int. J. Climatol.***27**, 1–15, doi:10.1002/joc.1375.Holgate, S. J., , and P. L. Woodworth, 2004: Evidence for enhanced coastal sea level rise during the 1990s.

*Geophys. Res. Lett.*,**31**, L07305, doi:10.1029/2004GL019626.Li, G., , B. Ren, , J. Zheng, , and C. Yang, 2011: Trend singular value decomposition analysis and its application to the global ocean surface latent heat flux and SST anomalies.

,*J. Climate***24**, 2931–2948, doi:10.1175/2010JCLI3743.1.Muhlbauer, A., , P. Spichtinger, , and U. Lohmann, 2009: Application and comparison of robust linear regression methods for trend estimation.

,*J. Appl. Meteor. Climatol.***48**, 1961–1970, doi:10.1175/2009JAMC1851.1.Palmer, M. D., , and D. J. McNeall, 2014: Internal variability of Earth’s energy budget simulated by CMIP5 climate models.

*Environ. Res. Lett.*,**9**, 034016, doi:10.1088/1748-9326/9/3/034016.Risbey, J. S., , S. Lewandowsky, , C. Langlais, , D. P. Monselesan, , T. J. O. Kane, , and N. Oreskes, 2014: Well-estimated global surface warming in climate projections selected for ENSO phase.

,*Nat. Climate Change***4**, 835–840, doi:10.1038/nclimate2310.Saculinggan, M., , and E. Balase, 2013: Empirical power comparison of goodness of fit tests for normality in the presence of outliers.

*J. Phys. Conf. Ser.*,**435**, 012041, doi:10.1088/1742-6596/435/1/012041.Santer, B. D., and Coauthors, 2014: Volcanic contribution to decadal changes in tropospheric temperature.

,*Nat. Geosci.***7**, 185–189, doi:10.1038/ngeo2098.Shalizi, C., 2015:

*Advanced Data Analysis from an Elementary Point of View*. Cambridge University Press, in press.Thode, J., 2002:

Marcel Dekker, 368 pp.*Testing for Normality.*Yohai, V., 1987: High breakdown-point and high efficiency robust estimates for regression.

,*Ann. Stat.***15**, 642–656, doi:10.1214/aos/1176350366.