## Abstract

This study proposes an empirical approach to account for lag-1 autocorrelation in detecting mean shifts in time series of white or red (first-order autoregressive) Gaussian noise using the penalized maximal *t* test or the penalized maximal *F* test. This empirical approach is embedded in a stepwise testing algorithm, so that the new algorithms can be used to detect single or multiple changepoints in a time series. The detection power of the new algorithms is analyzed through Monte Carlo simulations. It has been shown that the new algorithms work very well and fast in detecting single or multiple changepoints. Examples of their application to real climate data series (surface pressure and wind speed) are presented. An open-source software package (in R and FORTRAN) for implementing the algorithms, along with a user manual, has been developed and made available online free of charge.

## 1. Introduction

As has been stated in many previous studies (e.g., Peterson et al. 1998; Wang et al. 2007), on the one hand, homogeneity of climate data is indispensable for many aspects of climate research, especially for a realistic and reliable assessment of historical climate trends and variability, and also for the calculation of related statistics that are needed and used to define the state of climate and climate extremes. On the other hand, discontinuities are often inevitable in long-term climate data records because of inevitable changes that took place in the period of data record (such as changes in instrumentation/observer, in station location/exposure, and in observing practices/procedure). Some of these artificial changes are documented in the related metadata database, while others are not (because of inaccurate and/or incomplete metadata). Therefore, more and more effort has been devoted to detecting and adjusting for artificial shifts in climate data series (e.g., Vincent et al. 2002; Menne and Williams 2005; DeGaetano 2006; Wang 2006; Wan et al. 2007; among others), and several statistical methods have been proposed for detecting “undocumented shifts” (e.g., Lund and Reeves 2002; Wang 2003, 2008; Caussinus and Mestre 2004; Wang et al. 2007). There have also emerged several studies that review and compare different methods for testing artificial shifts (e.g., Easterling and Peterson 1995; Peterson et al. 1998; DeGaetano 2006; Reeves et al. 2007).

Note that the “undocumented shifts” here refer to shifts that are statistically significant *even without* metadata support. Some of them may turn out to be documented, that is, supported by metadata, while some are just truly undocumented (i.e., have no metadata support). For better clarity, they are hereafter referred to as type-1 shifts, while type-0 shifts refer to shifts that would be statistically significant *only if* they are supported by metadata. The criteria for testing type-1 and type-0 are different (see section 3).

Recently, Wang et al. (2007) tackled the issue of uneven distribution of false-alarm rate (FAR) of the maximal *t*-type tests [equivalence of the standard normal homogeneity (SNH) test; Alexandersson 1986]. They proposed a penalized maximal *t* (PMT) test for mean shifts in time series of zero trend, using an empirically constructed penalty function to even out the U-shaped FAR distribution over the relative position in the time series of the point being tested. They also showed that the PMT test has significantly higher power of detection than the SNH-type tests, while ensuring the FAR close to the nominal level no matter where the point being tested is located in the time series.

Similarly, Wang (2008) proposed a penalized maximal *F* (PMF) test for mean shifts in time series without trend change, using an empirically constructed penalty function to even out the W-shaped FAR distribution of the common trend two-phase regression based test (TPR3; Wang 2003). It has been shown that the PMF test also has higher power of detection than its unpenalized version TPR3, while ensuring the FAR close to the nominal level for all points being tested.

Note that, like most of the existing changepoint detection techniques, the PMT and PMF tests are designed for time series of identically and independently distributed (IID) Gaussian errors. However, autocorrelation is inherent in most climate data series, especially consecutive monthly series or series of finer time scales (e.g., daily), which changes the effective sample size of the data series and hence the FAR and detection power of a homogeneity test. The use of good reference series *usually* cannot diminish autocorrelation (the memory in the noise component of the climate system), although it could greatly diminish the climate signal (i.e., the deterministic components such as linear trend and periodicity in the mean) in the data series being tested. [Note that estimates of either linear trend or autocorrelation could be biased if the trend and autocorrelation are not modeled in tandem, for the two components are often confounded (e.g., see von Storch and Navarra 1995). The cross correlation between the base and reference series is usually calculated from the base and reference series in which their linear trend and periodic mean components are not removed; thus it is usually dominated by the similarity between the deterministic components in these series. Such an estimate of correlation shall be distinguished from the memory in the noise component that is discussed above.] Recently, Lund et al. (2007) showed the effects of autocorrelation in changepoint detection and modified the TPR3 model of Wang (2003) to account for the inherent autocorrelation and periodicity in climate data series. Their new model is theoretically sound and appealing. However, before a quicker algorithm is developed for the inevitable estimation of the critical value (or *p* value) for each individual series (or each segment of it) being tested, their algorithm is too computation demanding to be used in practice. This is because the distribution of the *F*_{max} statistic in Lund et al. (2007) depends on the values of the time series parameters, so that a critical value (or a *p* value) has to be estimated through simulations for each individual case. For a 50-yr monthly series, before a quicker algorithm is developed, it would take at least one day of CPU to obtain a 95th percentile that is accurate to one decimal place; and this percentile is only good for detecting a single changepoint in the series. After a changepoint is identified, two more percentiles need to be estimated for testing the homogeneity of each of the two segments of the series (before and after the first changepoint found), which usually have different sample sizes. Thus, testing the homogeneity of a 50-yr monthly series (*N* = 600) could take days of CPU time using the new algorithm of Lund et al. (2007). In addition, the problem of uneven FAR distribution remains to be resolved for the algorithm of Lund et al. (2007).

The present study aims to account for autocorrelation in changepoint detection using the PMT and/or PMF test, proposing a practically feasible empirical approach. The proposed algorithms can also handle time series of multiple changepoints.

The remainder of the paper is structured as follows. The effects of lag-1 autocorrelation on the PMT and PMF test statistics are assessed in section 2, with algorithms to account for such effects being described in section 3. The power of detection of the new algorithms is assessed in section 4, with examples of application to climate data series being presented in section 5. Section 6 completes the study with a few concluding remarks.

## 2. Effects of autocorrelation on the PMT and PMF tests

### a. Effects of autocorrelation on the PMT test statistic

The PMT test (Wang et al. 2007) is constructed for detecting mean shifts in time series of zero-trend and IID Gaussian errors. Existence of positive autocorrelation in the time series being tested would increase the number of unjustified detections (i.e., false alarms) if its effects are ignored. To show/estimate the effects of lag-1 autocorrelation *ϕ* on the PMT test statistic PT_{max} [see Wang et al. (2007) for details about this statistic], 1 × 10^{7} independent simulations of a homogeneous first-order autoregressive series [i.e., AR(1) series] of length *N* were generated for each choice of *ϕ* value; then the 90th, 95th, and 99th percentiles of PT_{max} were estimated from the 1 × 10^{7} simulations of this statistic. Such simulations were conducted for each combination of the selected *N* and *ϕ* values, for 26 selected values of *N* (ranging from 10 to 1200), and 43 selected values of *ϕ*: {−0.200, −0.150, −0.100, −0.050, 0, 0.025, . . . , 0.950} (i.e., each increment of 0.025 between 0 and 0.950). All the resulting PT_{max} percentiles are stored in a table for practical use (see section 3 below). They are included in our software package RHtestV2 (Wang and Feng 2007; available online at http://cccma.seos.uvic.ca/ETCCDMI/software.shtml).

Note that climate data series usually have positive *ϕ* values. The negative *ϕ* values are included here for accounting the uncertainty in the estimate *ϕ̂*; the lower bound of *ϕ̂* (more details given later in this section) could be negative when the true *ϕ* value is close to zero.

Furthermore, as shown in Fig. 1a, the PT_{max} percentiles also depend on *N*_{min}, the minimum length of segment [i.e., the subseries between two neighboring changepoints or between the first (last) changepoint to the beginning (end) of the series]. Thus, the same *N*_{min} value shall be used both in estimating the percentiles and in the tests that use these percentiles. For example, one should not use the percentiles estimated using *N*_{min} = 1 for the tests in which it is set that *N*_{min} > 1 (e.g., *N*_{min} = 5). In the simulations above, the most probable changepoint is sought out from the candidate points *N*_{min} ≤ *k* ≤ (*N* − *N*_{min}) [instead of 1 ≤ *k* ≤ (*N* − 1)], where *N*_{min} = 5 (this is also used for the PMTred algorithm in the RHtestV2 package). Thus, there are at least *N*_{min} = 5 data values available for estimating one parameter (one sample mean in this setting).

The estimated 95th percentiles of PT_{max} are shown as a function of the autocorrelation *ϕ* for each of the 26 selected values of *N* in Fig. 2, and as a function of series length *N* for each of the 43 selected values of *ϕ* in Fig. 3. Clearly, for a fixed series length *N*, PT_{max} appears to be an exponential-type function of *ϕ*; the larger the *ϕ* values, the more drastic the increase of PT_{max} with an increase in *ϕ* (Fig. 2). With the (95th) percentiles of PT_{max} estimated for each of the 43 *ϕ* values densely spaced in [−0.200, 0.950], one can obtain with good accuracy the PT_{max} percentile for any *ϕ* value of climate data series by interpolating linearly between the pair of percentiles that correspond to the two immediate neighboring *ϕ* values.

As shown in Fig. 3, PT_{max} varies with *N* very slowly when *N* is large (e.g., *N* > 400), but very drastically when *N* is small (especially for the cases of *N* < 40 and *ϕ* < 0.60). (This is because the asymptotic properties of the statistic are met when *N* is large, in which case the estimate of the statistic is stable. This is, however, not the case for small *N.*) Thus, for a fixed value of *ϕ*, the value of PT_{max} for any *N* > 1200 can be obtained with good accuracy by linear extrapolation of the corresponding curves shown in Fig. 3 (in the RHtestV2, we use the rate of change as estimated between *N* = 600 and *N* = 1200 for the linear extrapolation). This feature was taken into account when choosing the 26 values of *N* for the simulations above (i.e., using a small increment when *N* is small, and a large one when *N* is large).

### b. Effects of autocorrelation on the PMF test statistic

The PMF test (Wang 2008) is constructed for detecting mean shifts without trend change in time series of a constant linear trend and IID Gaussian errors. Here, the effects of autocorrelation *ϕ* on the PMF test statistic PF_{max} [see Wang (2008) for details about this statistic] are assessed in a way similar to what is described in section 2a. Ten million (1 × 10^{7}) independent simulations of an AR(1) series of length *N* were generated for each choice of *ϕ* value, and a linear trend (*β* = 0.01) was then added to each of these series; that is, *X _{j}*(

*t*) =

*β*+

_{t}*R*(

_{j}*t*) for all

*j*∈ [1, 1 × 10

^{7}]. These series were then used to obtain 1 × 10

^{7}simulations of the PF

_{max}statistic, which were then used to estimate the desired PF

_{max}percentiles. Such simulations were conducted for each combination of the selected

*N*and

*ϕ*values, for 18 selected values of

*N*(ranging from 20 to 1200), and 44 selected values of

*ϕ*:

*ϕ*= {−0.200, −0.150, −0.100, −0.050, 0, 0.025, . . . , 0.975}. Here, the most probable changepoint is sought out from the candidate points

*N*

_{min}≤

*k*≤ (

*N*−

*N*

_{min}) with

*N*

_{min}= 10 (this is also used for the PMFred algorithm in RHtestV2). Thus, there are also at least

*N*

_{min}= 5 data values available for estimating each parameter. As shown in Fig. 1b, the PF

_{max}percentiles also depend on

*N*

_{min}. Thus, the same

*N*

_{min}value shall also be used both in estimating the percentiles and in the tests that use these percentiles.

All the resulting PF_{max} percentiles are stored in a table for practical use (see section 3 below); they are included in the RHtestV2 package. Having this table of PF_{max} percentiles, one can obtain with good accuracy the PF_{max} percentile for any *ϕ* value of climate data series by interpolating linearly between the pair of percentiles that correspond to the two immediate neighboring *ϕ* values.

The estimated 95th percentiles of PF_{max} are shown as a function of *ϕ* for each selected value of *N* in Fig. 4, and as a function of *N* for each selected value of *ϕ* in Fig. 5. The features of these functions (curves) are very similar to those shown in Figs. 2 –3, except that the PF_{max} values are much greater than the PT_{max} values, and so are the PF_{max} differences between two neighboring *ϕ* values. For a fixed value of *ϕ*, the value of PF_{max} for any *N* > 1200 can also be obtained with good accuracy by linear extrapolation of the corresponding curves shown in Fig. 5.

## 3. Algorithms to account for the autocorrelation effects

Two algorithms are introduced in this section, namely, the PMTred and PMFred algorithms for accounting for lag-1 autocorrelation in detecting mean shifts using the PMT and PMF tests, respectively. Both algorithms are empirical, which utilize the results of percentiles estimated earlier in section 2. Note that the computation-demanding method of Lund et al. (2007) is also an empirical approach in the sense that the theoretical distribution of the test statistic is unknown; in addition, it has a drawback that the percentiles of its test statistic depend on the time series parameters and hence need to be estimated through simulations for each individual series (or each segment of it) being tested. Similarly, all the existing regression based methods for detecting type-1 changepoints in time series of IID Gaussian errors (i.e., Wang et al. 2007; Wang 2008; Lund and Reeves 2002; Alexandersson 1986) are also empirical approaches, although the distribution of the test statistic is independent of the time series parameters in these IID Gaussian cases (and hence a table of percentiles for a typical range of series length *N* can be estimated a priori, which is applicable to all time series of the structure assumed by the method being used).

The new algorithms can handle time series of multiple changepoints; they are designed to detect both type-1 and/or type-0 changepoints in time series of IID or AR(1) Gaussian errors. An AR(1) process is also called a red noise. Thus, the algorithm that accounts for the effects of red noise autocorrelation on the PMT (or PMF) test results is referred to as the PMTred (or PMFred) algorithm. The PMTred and PMFred algorithms are very similar to each other; they are embedded with the same stepwise testing algorithm. Therefore, the PMTred algorithm is described in details below, with the differences between PMTred and PMFred being pointed out in the end of this section.

The initial procedure is summarized in step 1 in Fig. 6. First of all, the first most probable changepoint *c*_{0} is sought out from all candidate points *t* ∈ {*N*_{min}, *N*_{min} + 1, . . . , (*N* − *N*_{min})}. Similarly, the next most probable changepoints *c*_{01} and *c*_{03} are sought out from the segment of series over *t* = {*N*_{min}, . . . , (*c*_{0} − *N*_{min})} and over *t* = {(*c*_{0} + 1 + *N*_{min}), . . . , (*N* − *N*_{min})}, respectively. Then, from the segment between *c*_{01} and *c*_{03}, the most probable changepoint *c*_{02} is identified, which is a new estimate of *c*_{0} (it may or may not equal *c*_{0}). To find the most significant changepoint among these changepoints (*c*_{01}, *c*_{02}, and *c*_{03}), it is necessary to estimate the PT_{max} statistic and *p* value for each of them. For a changepoint *c*, the PT_{max} is calculated from the model fits with and without this changepoint (i.e., full and null model fits) in the series of full-length *N.* To estimate the *p* value of changepoint *c*, first the residual series {*R _{t}*} of the full model fit with changepoints (including

*c*) is used to estimate the lag-1 autocorrelation

*ϕ̂*and to obtain the prewhitened series

Here, *X̂*_{t} denotes the full model fit to the series {*X _{t}*}. The prewhitened series is then used to calculate the Student’s

*t*statistic and its

*p*value for changepoint

*c*. The changepoint that is associated with the maximum PT

_{max}statistic among the three is then identified. If found to be significant at the nominal level (see the next paragraph for the criteria), this point is taken as the very first changepoint identified in series {

*X*} (

_{t}*t*= 1, 2, . . . ,

*N*) and subsequent tests for multiple changepoints proceed (as described later in this section); otherwise, the series {

*X*} is declared homogeneous at the nominal

_{t}*α*level of significance. The above initial procedure is necessary since the estimates of position (and significance) of the first changepoint might be biased by the possible existence of other changepoints in the series. Note that the autocorrelation estimate in procedure 1 is obtained from the residual series of the full model fit with all the three changepoints:

*c*

_{01},

*c*

_{02}, and

*c*

_{03}. This is in consideration that the bias in

*ϕ*estimation will be much lower if one or several insignificant changepoints are accounted for than if a significant one is omitted (this is also the reason for using the full model residual series to estimate

*ϕ*).

The criteria for determining whether or not a changepoint is significant at the nominal level depends on whether one wishes to test for type-1 or type-0 changepoints [see section 1 and Wang et al. (2007)]. When testing only for type-1 changepoints, the PT_{max} value associated with a changepoint *c* is compared with PT_{max,}* _{α}*(

*ϕ̂*,

*N*), that is, the (1 −

*α*)-percentile of PT

_{max}that corresponds to the

*ϕ̂*value as estimated from the residual series of the full model fit with changepoint

*c*in the series of length

*N*. This changepoint is deemed significant at the

*α*level if PT

_{max}> PT

_{max,}

*(*

_{α}*ϕ̂*,

*N*); otherwise, it is deemed insignificant. This is referred to as the PT

_{max,}

*criterion hereinafter. When testing for type-0 changepoints, the*

_{α}*p*value associated with changepoint

*c*, as estimated from the prewhitened series {

*W*}, is compared with the nominal level

_{t}*p*= (1 −

*α*), and this changepoint is deemed significant if its

*p*value is greater than the nominal

*p*. This is referred to as the

*p*value criterion hereinafter. Both criteria are used if one wishes to detect both type-1 and type-0 changepoints.

The following procedure is not necessary if the series is declared homogeneous in procedure 1 above. Otherwise, one changepoint has been identified. Let {*c _{j}*} (

*j*= 1, 2, . . . ,

*N*) denote the list of

_{c}*N*changepoint(s) identified so far. Then,

_{c}*N*= 1 right after procedure 1. As summarized in procedures 2–4 in Fig. 6 (and listed below), tests for additional changepoints are accomplished in three steps involving two major iterative procedures:

_{c}2) Identify the most probable changepoint in each of the (

*N*+ 1) segments of the series. Let_{c}*c*^{n}_{i}denote the most probable changepoint in the*i*th segment. Note that, for any*i*∈ {1, 2, . . . , (*N*+ 1)}, the position estimate of_{c}*c*^{n}_{i}is not affected by any changepoints that exist outside the*i*th segment, which greatly reduces the possibility for the estimate to vary with any change in the list of changepoints identified.3) Estimate the significance of each probable changepoint

*c*^{n}_{i}, using the*ϕ̂*value as estimated from the residual series of the full model fit with (*N*+ 1) changepoints (i.e., the previously identified_{c}*N*changepoints plus_{c}*c*^{n}_{i}) to obtain the corresponding critical value PT_{max,}(_{α}*ϕ̂**, N*) or the prewhitened series {_{i}*W*} for estimating the_{t}*p*value*p*. The fit of the model with the previously identified_{i}*N*changepoints is now the null model fit. (Here, the model with_{c}*N*changepoints is similar to model 2 in the appendix, except that_{c}*β*≡ 0 in the PMTred setting.) During this procedure, the autocorrelation*ϕ̂*is estimated from the full model residual series of full-length*N*(this is true throughout the algorithm), while the PT_{max}is calculated using the*i*th segment only, so its critical value is the one corresponding to*N*, the length of the_{i}*i*th segment. Also, the*t*statistic and its*p*value are estimated from the*i*th segment of the prewhitened series {*W*}._{t}4) Find the most significant changepoint from the candidate list {

*c*^{n}_{i}} [*i*= 1, 2, . . . , (*N*+ 1)] and determine whether it is significant at the nominal level. If it is, add it to the list {_{c}*c*} (_{j}*j*= 1, 2, . . . ,*N*) and delete it from the candidate list {_{c}*c*^{n}_{i}}, and then go to repeat the process from procedure 3. Otherwise, go to repeat from procedure 2, until no more changepoints can be added to the list {*c*}._{j}

The next step here is to reassess the significance of each changepoint *c _{j}* (

*j*= {1, 2, . . . ,

*N*}) in the presence of all other changepoints in the list {

_{c}*c*}. This involves fitting the full model with all the

_{j}*N*changepoints and the null model with (

_{c}*N*− 1) changepoints (excluding

_{c}*c*). Then, we identify the least significant one among the

_{j}*N*changepoints and determine if it is significant at the nominal level. If not, delete it from the list {

_{c}*c*} (thus

_{j}*N*=

_{c}*N*− 1) and repeat the procedure to find the next least significant one in the list. This iteration goes on until all the changepoints in the list {

_{c}*c*} are significant (i.e., no more deletion is possible). This is procedure 5 of this algorithm (see also Fig. 6). During this procedure, the significance of each changepoint is assessed in the presence of all other significant changepoints; it is not biased by the existence of other changepoints in the series. Here, the procedure for estimating the significance of

_{j}*c*is similar to that in procedure 3 above, except that now the full model has the

_{j}*N*changepoints, while the null model has all these changepoints except

_{c}*c*.

_{j}To obtain good approximation to the Gaussian assumption, usually, the difference series {*X _{t}* =

*B*−

_{t}*F*} (

_{t}*t*= 1, 2, . . . ,

*N*), rather than the base series {

*B*}, is to be tested, where {

_{t}*F*} denotes a homogeneous reference series that has the same climate signal (trend and periodic mean components) as the base series. When the reference series {

_{t}*F*} is homogeneous, all changepoints identified for the difference series {

_{t}*X*} can be attributed to the base series. Otherwise, one should attempt to determine whether the identified changepoints are from the base or reference series.

_{t}The last procedure, 6 in Fig. 6, is to obtain the final estimates of significance and magnitude of the detected shifts. Traditionally, the difference series is also used to estimate shift sizes, which is based on the assumption that the reference is perfect; that is, it is homogeneous, free of error, and has exactly the same climate signal as does the base series (so that the difference series has a zero trend). However, these assumptions are not always valid, in which case the estimate of step size may be subject to the effects of data errors in the reference series, in addition to those in the base series (except those that are detected and accounted for), and to the effects of nonzero trend in the difference series as well. Thus, in the PMTred algorithm (and the RHtestV2), the significance of the identified changepoints are estimated using the difference series {*X _{t}*}, but two final estimates of shift size for each shift are provided, one calculated from the difference series, another from fitting a common trend (

*N*+ 1) phase regression model to the deseasonalized base series (details given in the appendix, in which the seasonal means, shift sizes, and linear trend are estimated through iterative processes). The users have the options and are responsible to determine what adjustments to make. In any case, caution shall always be exercised when imposing adjustments to climate data series, especially when there is a significant discrepancy between the two estimates of step size. Such discrepancy can arise from the existence of changepoint(s) in the reference series, or from undetected changepoint(s) in the base series (there is always a probability of

_{c}*α*for such failure). Any undetected changepoint will affect the estimate of step size, no matter the difference or base series is used. Also, it should be pointed out that estimates of step size from the multiphase regression fit to the deseasonalized base series might tend to confound–approximate small shifts with linear trend, and that interannual or decadal oscillations, if they exist, are not accounted for in the multiphase regression. Thus, it might be unrealistic to assume a common linear trend throughout the data record, especially when the data record is long. Detection of trend shifts is discussed later in section 6.

To account for the uncertainty in the autocorrelation estimate, the 95% confidence interval (or uncertainty range) of *ϕ̂*, denoted as [*ϕ̂** ^{L}*,

*ϕ̂*

*], should also be estimated [see 8.2.3 of von Storch and Zwiers (1999) for the estimators], and so should be the corresponding uncertainty range of the PT*

^{U}_{max}percentiles, [PT

_{max,}

*(*

_{α}*ϕ̂*

*,*

^{L}*N*), PT

_{j}_{max,}

*(*

_{α}*ϕ̂*

*,*

^{U}*N*)], or of the

_{j}*p*values [

*p*

^{L}

_{j},

*p*

^{U}

_{j}]. These should be included in the output for inferring the significance of changepoints. Specifically, changepoints of PT

_{max}∈ [PT

_{max,}

_{α}(

*ϕ̂*

^{L},

*N*

_{j}), PT

_{max}

_{,α}(

*ϕ̂*

^{U},

*N*

_{j})] (or

*p*

^{L}

_{j}≤

*p*= (1 −

*α*) ≤

*p*

^{U}

_{j}) should not be declared significant without further scrutinizing (e.g., compare the fits of the model with and without this changepoint to make the decision); while changepoints of PT

_{max}> PT

_{max,α}(

*ϕ̂*

^{U},

*N*

_{j}), or of

*p*

^{L}

_{j}>

*p*= (1 −

*α*) can be declared statistically significant at the nominal level.

If one wishes to detect both type-1 and type-0 changepoints, one should first try to find all significant type-1 changepoints in the series, using the PT_{max,}* _{α}* criterion (this can be done by calling function FindU.wRef in RHtestV2). This is because the criteria for declaring a significant type-1 changepoint is much higher (see e.g., Lund and Reeves 2002), so that the largest shifts, whether or not they are supported by metadata, would be identified first; further tests would work better (less biased) after the largest shifts are accounted for. Then, if the available metadata have been investigated and all documented changes that could cause a shift are known, one can simply add all these documented changepoints to the list of changepoints, if they are not already there, for further assessment (see the next paragraph). Otherwise, detection of type-0 changepoints proceeds in the presence of all these significant type-1 changepoints, using the

*p*value criterion (this can be done by calling function FindUD.wRef in RHtestV2). This is to narrow down the periods for metadata investigation. Often, only a few of these type-0 changepoints are supported by metadata. One should delete all the type-0 changepoints that are not supported by metadata.

After obtaining a list of type-1 and type-0 changepoints, one should reassess the significance and magnitude of each of these changepoints (e.g., by calling function StepSize.wRef in RHtestV2) and delete the least significant one if it is now found to be insignificant. One should repeat such reassessment until all the changepoints retained are deemed significant. Note that this reassessment is necessary, since the significance and magnitude of each changepoint is estimated under the condition of the existence of other changepoints and, thus, any change in the list of changepoints might affect the estimate of significance and magnitude of shifts. For the same reason, one should delete only one insignificant changepoint (the least significant one) at a time, and reestimate the significance and magnitude of each of the remaining changepoints to find the next least significant one; one should not delete all “now insignificant” changepoints at once, because some of these might be significant after the least significant one is deleted.

For the PMFred algorithm, PF_{max} and PF_{max,}* _{α}* are used in place of PT

_{max}and PT

_{max,}

*respectively. The series being tested is denoted as {*

_{α,}*Y*}; so that the residual series for the fit

_{t}*Ŷ*

_{t}is

*R*

_{t}=

*Y*

_{t}−

*Ŷ*

_{t}for

*t*= 1, 2, . . . ,

*N*. Of course, the final estimates of significance and magnitude of shift are obtained only from the common trend multiphase regression fit to the deseasonalized base series, since a reference series is not used in this case (unless the base series is a difference series). The RHtestV2 package includes functions FindU, FindUD, and StepSize for implementing the PMFred algorithm, which correspond respectively to the FindU. wRef, FindUD.wRef, and StepSize.wRef functions for implementing the PMTred algorithm. Readers are referred to Wang and Feng (2007) for details about implementing the PMTred and PMFred algorithms, and for using the RHtestV2 software package. Note that the PMFred algorithm can also be applied to a base-minus-reference series when it is suspected that there might be a difference in the linear trend components between the base and reference series.

## 4. Detection power of the PMTred and PMFred algorithms

### a. Detection power of the PMTred algorithm

First, in order to assess the FAR distribution shape of the PMTred algorithm, [(*N* − 2) × (*N*_{min} + 1)] × 10 000 independent simulations of homogeneous AR(1) time series with *ϕ* = 0.1925 are generated for *N* = 600. Then, at the nominal level *α* = 0.05, the PMTred is applied to each of these homogeneous series, and the FAR distribution is estimated and shown in Fig. 7a (in comparison with that of the SNH and PMT tests). Clearly, the FAR distribution of PMTred is also basically evenly distributed over the relative positions in the time series of the point being tested; its shape is very similar to that of the PMT test. Note that the FAR values of PMTred shown in Fig. 7a were estimated without considering the uncertainty in the estimate *ϕ̂*, which slightly increases the FAR values (from 5% to nearly 6%, on average). However, it is advised in this paper (and in the RHtestV2 user manual) to take into account such uncertainty subjectively, that is, to subjectively determine whether or not to take a changepoint as significant when the corresponding test statistic lies within its uncertainty range. This should alleviate the problem of slightly biased-high value of FAR in the PMTred algorithm.

Second, 10 000 independent simulations of homogeneous AR(1) series are also generated for each of the following three combinations of autocorrelation *ϕ* and series length *N*: 1) *ϕ* = 0 and *N* = 600, 2) *ϕ* = 0.1925 and *N* = 600, and 3) *ϕ* = 0.1366 and *N* = 1200. With *α* = 0.05, the PMTred algorithm is then applied to each of these homogeneous series. For each combination of *ϕ* and *N*, the FAR is estimated as the ratio of the total number of changepoints found to be significant to the total number of simulations (10 000 here). The FAR uncertainty range [FAR* _{L}*, FAR

*] that corresponds to the 95% uncertainty range of the estimate*

_{U}*ϕ̂*is also estimated similarly [by using the

*ϕ̂*

*and*

_{L}*ϕ̂*

*to look up the*

_{U}*p*percentiles PT

_{max,}

*(*

_{α}*ϕ̂*

*,*

^{L}*N*) and PT

_{max,}

*(*

_{α}*ϕ̂*

*,*

^{U}*N*)]. These results are presented in Tables 1a–c.

Clearly, the FAR of PMTred is very close to the nominal level in each case. For time series of IID Gaussian errors (i.e., *ϕ* = 0), both the PMTred and PMT tests perform equally well (cf. FAR_{0} and FAR in Table 1a). However, if a positive autocorrelation is ignored (i.e., if one applies PMT, rather than PMTred, to positively autocorrelated series), the FAR could be much higher than the nominal level (cf. FAR_{0} and FAR in Tables 1b, c). The longer the time series being tested, the smaller the FAR uncertainty range of PMTred.

Further, in order to analyze the detection power of PMTred, one or more mean shifts are inserted at randomly picked positions in each of the 10 000 AR(1) series, with the sign of the shifts (and the order of the shift sizes in case of multiple shifts) also randomly assigned. Several combinations of the changepoint number and shift magnitudes are tried out (see Table 1). For each combination, the PMTred is again applied, at *α* = 0.05 level, to each of the 10 000 series with changepoints (*N _{c}* ≥ 1). Then, the hit rate (HR) is estimated as the ratio of the total number of hits to the true total number of changepoints (

*N*× 10 000 here), with a hit being counted when the changepoint identified is of the nominal significance level

_{c}*and*at a position within the interval [(

*K*− 18), (

*K*+ 18)], where

*K*is the true changepoint position. (Note that the above interval for monthly series is equivalent to the interval [(

*K*− 1), (

*K*+ 1)] for annual series.) Such a hit rate is a strict measure of power of accurate detection. The HR uncertainty range [HR

*, HR*

_{L}*], which corresponds to the 95% uncertainty range of the estimate*

_{U}*ϕ̂*, is also estimated similarly, using PT

_{max,α}(

*ϕ̂*

^{L},

*N*

_{i}) or PT

_{max,α}(

*ϕ̂*

^{U},

*N*

_{i}) in place of PT

_{max,α}(

*ϕ̂*,

*N*

_{i}) (where

*N*is the length of the segment being tested). The results are also presented in Table 1.

_{i}As can be inferred clearly from Table 1, the hit rate of PMTred increases as the relative shift-size Δ/*σ* increases (cf. Tables 1d, e); it also increases as the average length of segment (i.e., *N*_{seg} = *N/N _{c}*) increases (cf. first rows in Tables 1f, g or 1h, j).

Apparently, the PMTred algorithm performs well in terms of both FAR and hit rate. It works well in time series of IID or AR(1) Gaussian errors, with single or multiple changepoints (see Table 1). Further, in terms of computation requirement, it has a tremendous improvement over the algorithm of Lund et al. (2007). For a time series of length *N* = 600 and containing 5 changepoints, the set of FORTRAN codes in our software package takes about 0.015 s (our set of R codes takes about 1 s) of CPU time to finish all the necessary computation (to provide all the estimates; including adjusted series) on a standard computer.

### b. Detection power of the PMFred algorithm

The detection power aspects of the PMFred algorithm are assessed in a way very similar to what is described in section 4a above, except that a linear trend (*β* = 0.01) is added to each of the simulated ten million homogeneous AR(1) series in each case (hence the fit *Ŷ*_{t} includes a linear trend estimate), and that the PF_{max}, PF_{max,α}(*ϕ̂*^{L}, *N*), and PF_{max,α}(*ϕ̂*^{U}, *N*) are used in place of PT_{max}, PT_{max,α}(*ϕ̂*^{L}, *N*), and PT_{max,α}(*ϕ̂*^{U}, *N*), respectively. The simulation results are summarized in Fig. 7b and Table 2.

In general, the power aspects of PMFred are very similar to those of PMTred (see Tables 1 and 2). The FAR of PMFred is also basically evenly distributed and very close to the nominal level in each case (see Fig. 7b and Tables 2a–c); and it is very similar to the FAR of the PMF test when *ϕ* = 0 (cf. FAR_{0} and FAR in Table 2a). The PMFred and PMF tests share very similar FAR values, except for the first and last 20 data points, for which the PMFred has slightly higher FAR values (see Fig. 7b). This bias also has to do with the uncertainty in *ϕ̂* that is not taking into account in the simulations; it can be alleviated by considering the uncertainty subjectively, as suggested in the PMFred algorithm (see section 3). Further, ignoring a positive autocorrelation in the PMFred setting could also result in a much higher than specified FAR (cf. FAR_{0} with FAR in Tables 2b, c). The longer the time series being tested, the smaller the FAR uncertainty range of PMFred.

The hit rate of the PMFred algorithm also increases as the relative shift-size Δ/*σ* increases (cf. Tables 2d, e); it also increases as the average length of segment increases (e.g., compare first rows in Tables 2f, g).

In general, the PMFred algorithm performs well in terms of both FAR and hit rate. It works well in time series of IID or AR(1) Gaussian errors, with single or multiple changepoints (see Table 2).

The advantage of the PMFred algorithm is that it is much more straightforward than the PMTred algorithm because it works without a reference series. Thus, it can be used to roughly check the homogeneity of the reference series to be used, to help determine whether a changepoint detected from the difference series arises from the base or reference series. However, the common trend model tends to confound–approximate small shifts with linear trend, and that interannual or decadal oscillations, if they exist, are not accounted for in the multiphase regression. These should be kept in mind when analyzing the results.

## 5. Examples of application

Both PMTred and PMFred algorithms assume that the time series being tested has IID or AR(1) Gaussian errors, and that the mean-shift magnitude does not vary with season or with synoptic regimes. The PMTred algorithm also assumes that the time series being tested has a zero trend, while the PMFred algorithm assumes a constant linear trend throughout the series. Both are for detecting mean shifts only, not for detecting trend changes. The algorithms can be applied to any climate variable for which the above assumptions are valid. Some data transformation (e.g., logarithm transformation of precipitation data) might be used to obtain a better approximation to the Gaussian assumption. Examples of application to surface pressure and wind speed data series are presented below.

### a. Examples of applying the PMTred algorithm

As the first example, the PMTred algorithm is applied to the series of monthly mean station pressures recorded at Burgeo (Newfoundland, Canada) for the 28-yr period from January 1967 to December 1994, with the homogeneous series of monthly mean station pressures recorded at Yarmouth Airport (Nova Scotia, Canada) being used as a reference series. This is exactly the data used by Wang et al. (2007) as an example of application of the PMT and SNH tests. The Burgeo series was also used as an example in Lund et al. (2007). Thus, the PMTred algorithm can be compared with the PMT test and the algorithm of Lund et al. (2007).

The metadata indicate that the Burgeo series contains a mean shift (a sudden drop) at the end of December 1976, which arises from neglecting the station elevation of 10.6 m in the calculation of station pressure from barometer readings before January 1977 (Wan et al. 2007).

As shown in Fig. 8, the PMTred identifies a changepoint in October (i.e., between October and November) 1976 in the Burgeo − Yarmouth difference series (see Fig. 8a). Note that the PMTred did not identify this shift to be a significant type-1 changepoint; it identified it as a significant type-0 changepoint, after identifying August 1984 as a significant type-1 changepoint (which, however, turns out to be not significant in the presence of the changepoint in October 1976). In the end, however, the PMTred identifies this changepoint quite accurately (only 2 months earlier) and as a significant (at 5% level) type-1 changepoint, since its PT_{max} = 4.66 is above the 95% uncertainty range of PT_{max,0.05}: [2.98, 3.66]. As a documented changepoint, this changepoint is highly significant statistically (*p* > 0.999 95). The shift size is estimated to be Δ̂* _{d}* = −0.996 hPa from the difference series and Δ̂ = −2.878 hPa from the multiphase regression fit to the deseasonalized base series alone (see Figs. 8a–c).

After accounting for this shift, a lag-1 autocorrelation of 0.030 [−0.079, 0.138] (*p* = 0.7033) is estimated for the difference series; while the deseasonalized base (Burgeo) series is estimated to have a lag-1 autocorrelation of 0.024 [−0.083, 0.131] (*p* = 0.6708). Neither is significant. The shift-adjusted pressure series (adjusted to the most recent segment) is shown in Fig. 8d; the linear trend estimate is *β̂* = 0.001 668 hPa month^{−1} with the adjustment Δ̂* _{d}* = −0.996 hPa, and

*β̂*= 0.009 315 hPa month

^{−1}with the adjustment Δ̂ = −2.878 hPa. For comparison, the algorithm of Lund et al. (2007) identifies the changepoint to be August 1976 (i.e., four months earlier) with Δ̂ = −2.893 hPa.

As an example of application in the multiple-changepoints setting, monthly mean surface wind speeds recorded at Nanaimo Airport (British Columbia, Canada) for the 51-yr period from January 1954 to December 2004 is analyzed. The geostrophic wind speed series that is derived from homogenized hourly surface pressure data from a pressure triangle in the region and is best correlated with the Nanaimo surface wind speed series is used as a reference series. We noticed that the periodic mean component is much larger in the geostrophic winds than in the surface winds. In this case, the periodic mean component has to be removed from both series before the difference series is derived (otherwise, a strong seasonality would remain in the difference series, violating the assumptions for this series). Here, the sample means for the 12 calendar months are used to represent the estimate of the periodic mean component for each series. For the base series, this is a rough estimate, because any shift in this series is not accounted for here; however, it has little effect on the test results because the same periodic mean component is removed throughout the whole series. This procedure is also included in the RHtestV2. In other words, both the base and reference series are always deseasonalized before being used to derive the difference series (more about this in the end of this section).

Note that monthly mean wind speed (nonnegative) data are not expected to have a Gaussian distribution. However, the non-Gaussian behavior can be greatly diminished by using the reference series in the PMTred setting or by testing the deseasonalized series with the PMFred algorithm.

As summarized in Table 3, for the wind speed difference series, the PMTred algorithm first identifies 10 type-1 changepoints, with 9 of them being significant *even without* metadata support; the one in September 1995 may or may not be a significant type 1, because its PT_{max} value lies within the 95% confidence interval of its 95th percentile (see Table 3). In addition, the FindUD.wRef function also identifies 18 type-0 changepoints. Investigation of the Nanaimo metadata reveals that 8 of the type-1 changepoints and 2 of the type-0 changepoints can be related to documented changes, with the detected dates of change being generally very close to the documented dates of change (see Table 3). Note that the station inspection reports are missing from July 1994 to April 1997. Thus, no metadata are available to support the two changepoints detected in this period (the last two in Table 3). It was determined that the changepoint in September 1995 would not be taken as a significant one, after comparing the fits with and without this changepoint. The final fits with the 11 changepoints (i.e., all except September 1995 in Table 3) are shown in Figs. 9a–d and summarized in Table 4. All of these changepoints eventually turn out to be significant at least at 5% level even without metadata support. The estimates of shift size that are obtained from the difference series only slightly differ from those that are obtained from the deseasonalized base series, and so do the linear trend estimates (see Table 4 and Fig. 9c). If all the shifts were ignored, the linear trend estimate would be *β̂* = 0.002 232, which is of the opposite sign.

After accounting for the 11 shifts, the difference series is estimated to have an autocorrelation of 0.029 [−0.051, 0.108], which is not significant (*p* = 0.7607); while the deseasonalized base series is estimated to have an autocorrelation of 0.204 [0.127, 0.279], which is highly significant (*p* > 0.999 95).

The quantile–quantile (QQ) plots (not shown) of the prewhitened residuals of the PMTred full model fit to each of the two difference series (shown in Figs. 8a and 9a) reveal that the prewhitened error term in each of these series does approximate well to an IID Gaussian distribution. Thus, the normality assumption in PMTred is valid for these series.

To show the effect of the quality of reference series on the results of homogeneity tests, we replace the geostrophic wind reference (GeoRef) series with a reference series of mean surface wind speeds calculated from averaging the wind speeds recorded at 5 other stations in the same province that are most closely correlated with the Nanaimo wind speed series, and redo the application. Although the average wind reference (AvgRef) series was first subject to homogeneity tests and preliminarily homogenized for two significant type-1 shifts (using the PMFred algorithm), the results are not as good as those obtained using the geostrophic wind reference series. The PMTred identifies 9 type-1 changepoints (including two that may or may not significant) and 44 type-0 changepoints in this application. After incorporating the metadata, the list of 13 changepoints identified using this average wind reference series is given in Table 5, and the related fit is shown in Fig. 9e. There are two significant type-1 changepoints in Table 5 that are not present in Table 3; neither is supported by metadata. The two estimates of shift size for either of these changepoints are also notably different, with larger estimates obtained from the difference series (see the dashed and solid trend lines in Fig. 9e). Thus, it can be determined that the third and fourth shifts in Table 5 (and Fig. 9e) are not from the base series. Applying the adjustments to the base series, especially those estimated from the difference series (the dashed line in Fig. 9e), would greatly bias the base series. In general, when the reference series is not homogeneous, applying the adjustments that are estimated from the difference series could impose additional changepoints (those that exist in the reference series) to the base series. In this example, the adjusted difference series is estimated to have a lag-1 autocorrelation of 0.230 [0.154, 0.304], which is even higher than the autocorrelation estimated for the adjusted base series (which is 0.208 [0.131, 0.283] in this case). Furthermore, there are two documented changepoints (in 1980 and 1982) that were identified as significant type-1 changepoints using the geostrophic wind reference are now identified only as type-0, which means that these would now go undetected if there were no metadata available. Clearly, the quality of the average wind reference series is questionable even after it has been adjusted for two large mean shifts.

It should also be pointed out that the two documented changepoints, in 1965 and 1994, were not detected as type 1 when the deseasonalization procedure was not applied to the base and reference series before the difference series was derived. This indicates that the deseasonalization procedure can improve the detection power, because it diminishes the chance for seasonality to remain in the difference series and makes the difference series better approximate the assumed structure [i.e., has zero trend and AR(1) or IID Gaussian errors].

### b. Examples of applying the PMFred algorithm

The Nanaimo wind speed series is used again as an example in this section. Instead of using a reference series, here the deseasonalized monthly wind speed series is tested using the PMFred algorithm, which identifies 5 type-1 changepoints (see Table 6), as well as 7 type-0 changepoints. However, only 4 of the 7 type-0 changepoints are supported by metadata and hence were added to the list of changepoints, obtaining a total of 9 changepoints for further assessment. The changepoint in October 1954 was found to be the smallest and is insignificant in the presence of the other changepoints and hence was deleted from the list. The results for the fit with the 8 changepoints are presented in Table 6.

In comparison of the results obtained by applying PMTred with the geostrophic wind reference (Table 4), the PMFred algorithm failed to identify 3 of the 4 documented changepoints in 1980s, although it is able to identify the majority of the major shifts (see Fig. 9f and Table 6). Such failure would bias the estimates of shift size and linear trend (see Tables 4 and 6); it would also bias high the autocorrelation estimate (now *ϕ̂* = 0.247[0.171, 0.321]).

Note that the PMFred can also identify the two changepoints in 1985 as type-0 changepoints when it is applied to the difference series (i.e., the base series minus the geostrophic wind series), instead of the deseasonalized base series. Thus, as would be expected, the use of good reference series can increase the detection power.

Also, the QQ plots (not shown) of the prewhitened residuals of the PMFred full model fit to the deseasonalized base series reveal that the prewhitened error term does approximate well to an IID Gaussian distribution. Thus, the normality assumption of the PMFred model is valid for this series. Note that failure to identify one or more significant changepoints could result in outlier(s) in a QQ plot.

## 6. Concluding remarks

In this study, a practical empirical approach is proposed to account for lag-1 autocorrelation in detecting single or multiple mean shift changepoints using the PMT and/or PMF test. The detection power of the new PMTred and PMFred algorithms has been analyzed through Monte Carlo simulations. It has been shown that these algorithms work very well and fast in detecting multiple changepoints, in simulation settings and in real climate data series.

The PMTred algorithm is suitable for detecting multiple changepoints in time series of zero-trend and IID or AR(1) Gaussian errors. Therefore, one should attempt to ensure that the time series to be tested by PMTred is not trended, most often by using a homogeneous reference series that has the same climate signal (trend and periodic components) as the base series. The use of reference series can also help to diminish departure from Gaussian distribution (such as precipitation or wind speed data). In any case, one should check if the Gaussian assumption is notably violated when using the PMTred (or the PMFred) algorithm. Transformation of data may be necessary to ensure good approximation to Gaussian distribution.

In reality, however, a good reference series is often not available and/or its homogeneity cannot be assumed; also, the base and reference series might have different trends and hence the base-minus-reference series does not necessarily have zero trend. The PMFred algorithm allows a constant linear trend in the series and thus can be used without a reference series. It is constructed for detecting multiple mean shifts without trend changes in time series of white or red noise. In particular, one can use the PMFred to test the homogeneity of a reference series and adjust it for significant type-1 changepoints when necessary. The homogenized reference series can then be used to do further tests with the PMTred or PMFred algorithm.

Also, a nonlinear trend or climatic trend change can be accounted for by using a reference series that has the same trend or trend change as the base series, so that the base-minus-reference series has a zero or constant trend, to which the PMTred or PMFred algorithm can be applied. In general, the use of good reference series can increase the detection power. However, a good reference series shall be homogeneous and have the same linear trend and long-term oscillatory components as does the base series. When such a good reference series is available, it is recommended to use the PMTred algorithm rather than the PMFred algorithm. When a reference series is used, the PMFred algorithm is necessary only when it is suspected that the base-minus-reference series has a nonzero trend.

Certainly, a different-trend two-phase regression model approach (e.g., TPR4 in Lund and Reeves 2002) should be used to detect trend changes that accompany mean shifts. Note that the TPR4 also has a U-shaped FAR distribution and does not account for autocorrelation. A penalized version of the TPR4 test that also accounts for the effects of red noise autocorrelation is being developed and will be reported in a separate paper.

Note that the current algorithms assume a constant shift for each changepoint, that is, the shift size does not change seasonally or depend on synoptic regimes. However, for climate data, the magnitude of artificial shift could vary seasonally or even depend on the type of synoptic systems. The current PMTred and PMFred algorithms can be modified to provide estimates of seasonally varying shift size (i.e., Δ̂_{υ}). In this case, the sample size for estimating Δ* _{υ}* is much smaller than the one for a constant shift Δ (e.g., only one-twelfth for monthly Δ̂

_{υ}); the accuracy of the estimates Δ̂

_{υ}is thus compromised. However, the subject of seasonally varying shift size is beyond the scope of the present study. Readers are referred to Auer et al. (2007) for seasonal corrections, and to Della-Marta and Wanner (2006) and Trewin and Trevitt (1996) for an algorithm that deals with synoptic regime dependent shift size.

Further, some climate data series may have longer memory. Detecting changepoints in time series of longer than AR(1) memory is beyond the scope of the present study; it should be a subject for future research.

Last, no matter what statistical tests or algorithms are used, it is always necessary to check the veracity of changepoints identified statistically. This is because statistical tests work with a preset level of significance/confidence; a preset false-alarm rate is inherent. Caution should always be excised when adjusting climate data series for artificial changes.

## Acknowledgments

The author thanks Yang Feng and Hui Wan for their helpful comments obtained from testing use of the FORTRAN codes and Yang Feng for creating the equivalent R codes for implementing these algorithms. Hui Wan is also acknowledged for her help in preparing the example data series and some metadata information. Xuebin Zhang, Yang Feng, and Hui Wan are also acknowledged for their helpful internal review comments on an earlier version of this manuscript. The author also thanks two anonymous reviewers and the editor/author DeGaetano for their helpful review comments.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

### APPENDIX

#### Multiphase Regression Modeling of Periodic and Autocorrelated Series

Let {*B _{t}*} denote a time series of “seasonal” means

*E*,

_{υ}*N*mean shifts at times {

_{c}*c*} (

_{j}*j*= 1, 2, . . . ,

*N*

_{c}; c_{0}= 0 and

*c*

_{Nc + 1}=

*N*), common linear trend

*β*, and AR(1) errors of autocorrelation

*ϕ*. Here,

*υ*is a general seasonal index; it could be season or month or day, etc. In this case, the deseasonalized series {

*Y*=

_{t}*B*−

_{t}*E*} can be represented by a common trend (

_{υ}*N*+ 1)-phase regression model (Wang 2006, 2003):

_{c}where {*ε _{t}*} denotes an AR(1) process/series of autocorrelation

*ϕ*and unknown variance.

To obtain coherent estimates of *E _{υ}*, regression parameters

*β*and

*μ*(

_{j}*j*= 0, 1, . . . ,

*N*), and the lag-1 autocorrelation

_{c}*ϕ*, the following steps are taken:

Obtain a rough estimate of

*E*(denoted as_{υ}*Ê*^{0}_{υ}) without taking the mean shifts into account, that is, the sample mean for each season*υ*as calculated from the base series {*B*}._{t}Obtain the deseasonalized base series {

*Y*_{t}=*B*_{t}−*Ê*^{0}_{υ}}. Then, estimate the (*N*+ 1)-phase regression parameters_{c}*β*and*μ*(_{j}*j*= 0, 1, . . . ,*N*) for the series{_{c}*Y*}, and use the resulting residual series to estimate the lag-1 autocorrelation_{t}*ϕ*. Denote these estimates as*β̂*^{0},*μ̂*^{0}_{j}, and*ϕ̂*^{0}.Obtain the adjusted base series {

*A*=_{t}*B*−_{t}*μ̂*^{0}_{j}} for*t*=*c*+ 1, . . . ,_{j}*c*_{j}_{+1}and*j*= 0, 1, . . . ,*N*(_{c}*c*_{Nc+1}=*N*), as well as the prewhitened series {*Q*= [_{t}*A*−_{t}*ϕ̂*^{0}*A*_{t}_{−1}]/(1 −*ϕ̂*^{0})} (which has the same trend*β*as the series{*A*} or {_{t}*Y*}; see Wang and Swail 2001)._{t}Estimate trend

*β*from the prewhitened series {*Q*} and use this_{t}*β̂*value to obtain a detrended series {*D*=_{t}*Y*−_{t}*β̂*}. Then, for each_{t}*j*∈ {0, 1, . . . ,*N*}, calculate the sample mean of the segment from_{c}*t*=*c*+ 1 to_{j}*t*=*c*_{j+}_{1}of the detrended series {*D*} which is a set of new estimates_{t}*μ̂*_{j}(*j*= 0, 1, . . . ,*N*)._{c}Obtain a new estimate of

*E*(denoted as_{υ}*Ê*_{υ}) from the new adjusted series {*A*=_{t}*B*−_{t}*μ̂*_{j}}, and find the largest absolute difference*D*between_{E}*Ê*_{υ}and*Ê*^{0}_{υ}among all seasons. If*D*is greater than a preset tolerance (e.g.,_{E}*σ*/1000 in our software package, where_{e}*σ*denotes the standard deviation of_{E}*Ê*_{υ}), set*Ê*^{0}_{υ}=*Ê*_{υ}, and go to repeat from procedure 2. Otherwise, proceed to procedure 6.The estimates

*Ê*_{v},*μ̂*_{j},*β̂*, and*ϕ̂*are taken as the final estimates. The final estimate of the magnitude of the*j*th mean shift is Δ̂_{j}= (*μ̂*_{j}−*μ̂*_{j−1}) (*j*= 1, . . . ,*N*)._{c}

## Footnotes

*Corresponding author address:* Dr. Xiaolan L. Wang, Climate Research Division, Atmospheric Science and Technology Directorate, Environment Canada, 4905 Dufferin Street, Toronto, ON M3H 5T4, Canada. Email: xiaolan.wang@ec.gc.ca