Comments on “Detection of Undocumented Changepoints: A Revision of the Two-Phase Regression Model”

Xiaolan L. Wang Climate Research Branch, Meteorological Service of Canada, Downsview, Ontario, Canada

Search for other papers by Xiaolan L. Wang in
Current site
Google Scholar
PubMed
Close
Full access

Corresponding author address: Dr. Xiaolan L. Wang, Climate Research Branch, Meteorological Service of Canada, 4905 Dufferin Street, Downsview, ON M3H 5T4, Canada. Email: xiaolan.wang@ec.gc.ca

Corresponding author address: Dr. Xiaolan L. Wang, Climate Research Branch, Meteorological Service of Canada, 4905 Dufferin Street, Downsview, ON M3H 5T4, Canada. Email: xiaolan.wang@ec.gc.ca

Lund and Reeves (2002) revisited the two-phase linear regression test for changepoint detection at an undocumented time. The two-phase regression model is of the following form:
i1520-0442-16-20-3383-e1
where {ϵt} is the zero-mean independent random error with a constant variance σ2ϵ. This allows for both step- and trend-type changepoints. The time c is called a changepoint if µ1µ2 (step type) and/or α1α2 (trend type).
The F statistic for a changepoint at time c ∈ {2, … , n – 1} is
i1520-0442-16-20-3383-e2
where SSEFull is the “full model” sum of squared errors:
i1520-0442-16-20-3383-e3
and SSERed is the “reduced model” sum of squared errors:
i1520-0442-16-20-3383-e4
where μ̂Red and α̂Red are estimated under the constraints µ1 = µ2 = µRed and α1 = α2 = αRed. Under the null hypothesis of no changepoint and assuming Gaussian errors ϵt, Fc has an F distribution of 2 numerator degrees of freedom and (n – 4) denominator degrees of freedom, denoted as F(2,n–4). The existence of an undocumented changepoint is concluded when
i1520-0442-16-20-3383-e5
is too large to be attributed to chance variation. The most prominent changepoint is estimated as the argument(s) c that maximizes Fc. Lund and Reeves (2002) pointed out errors that had propagated in the related literature and presented the true percentiles of the Fmax distribution under the null hypothesis of no changepoint, which is very useful for detecting step- and trend-type changepoints.

However, as mentioned in Lund and Reeves (2002), changes in trend slopes could be rooted in true climate change. Also, a trend-type changepoint could well be just a point between two phases of long quasi-periodic variation (e.g., multidecadal fluctuations). Thus, extra caution should be exercised when using the above changepoint detection technique, especially when a trend-type changepoint is involved. In particular, Eq. (4.2) in Lund and Reeves (2002), which shows how to adjust the time series for the detected changepoint, should be used with caution. If the ultimate goal of the changepoint detection is to form homogeneous climate data series by correcting biases, a trend-type change in climate data series should only be adjusted if there is sufficient evidence showing that it is related to a change at the observing station, such as a change in the exposure or location of the station, or in its instrumentation or observing procedures.

In addition, a simpler situation is most often encountered when correcting biases in climate observations but was not considered explicitly in Lund and Reeves (2002). That is a case when α1 = α2 = α; that is, a two-phase regression model with a common trend α:
i1520-0442-16-20-3383-e6
In this case, the F statistic for the changepoint c ∈ {2, … , n – 1} is
i1520-0442-16-20-3383-e7
which is distributed as F(1,n–3) under the null hypothesis of no changepoint and assuming Gaussian errors ϵt. Here,
i1520-0442-16-20-3383-e8
where μ̂Red and α̂Red are estimated under the constraint µ1 = µ2 = µRed (i.e., Xt = µRed + αRedt + ϵt for 1 ≤ tn).

Under the null hypothesis the Fmax percentiles of model (6) are given in Table 1, which was obtained by simulating 100 000 Fmax values for each series length n. The simulation procedure is similar to the one described in Lund and Reeves (2002).

One of the advantages of using model (6) is that it has one less parameter to estimate, and hence it has smaller sampling variability {the variance of the estimates of α is var(α̂) = σ2X/Σnt=1[t – (n + 1)/2]2 in this case} and a higher power of detecting step-type changepoints (i.e., more likely to detect a changepoint when it really exists). To illustrate this, for each series length n, we simulated 200 time series of the form Xt = µ + αt + ϵt with arbitrary parameters µ and α and σϵ (note that under the null hypothesis the Fmax statistic does not depend on the particular values of µ and α and σϵ). Then, we imposed a step ΔX = σϵj/100 at an arbitrary point c (1 < c < n) in the jth simulated time series (j = 1, 2, … , 200). In other words, the steps imposed in the 200 simulated time series range from 0.01σϵ to 2σϵ. Then, both models (6) and (1) were applied to each of the 200 time series to detect the imposed step-type changepoint. The results clearly show that model (6) has a somewhat higher power of detection than model (1), especially when the time series is short (cf. Table 2) or the “step size to noise” ratio (i.e., ΔX/σϵ) is low (not shown). Note that each of our simulations was drawn from a model with a constant slope, in which case model (1) is not suitable and hence it is not surprising that its power of detection is lower than that of model (6).

In particular, model (6) can better identify a negative step (i.e., µ1 > µ2) that occurs in time series of a decreasing trend, or a positive step (i.e., µ1 < µ2) that occurs in time series of an increasing trend. Such steps could be “masked” and/or misrepresented as trend-type changes when using model (1). These situations were also tested in the power of detection test simulations described in the last paragraph, with results shown in Table 2. Here is a “real world” example: an observer change (from a part-time observer to a full-time observer) at Gander International Airport (Canada) on 1 April 1997 introduced a step-type changepoint between March and April of 1997 in the monthly cloud-cover time series: the full-time observer observes much more frequently in each hour and hence reports more cloudy conditions than the part-time observer does (cf. Fig. 1; the latter had other duties in addition to making observations). Both models (6) and (1) were applied to the time series of monthly counts of occurrences of 1/10 and 9/10 cloud-covered sky conditions (i.e., 1/10 and 9/10 of the sky covered by clouds). The results are shown in Fig. 1: When the step size to noise ratio is high, such as the case of 9/10 cloud-cover (Fig. 1, bottom), both models accurately identified the changepoint, although the sizes of the step and trend estimated by the two models are different. However, when the step size to noise ratio is lower, such as the case of 1/10 cloud cover (Fig. 1, top), model (6) clearly outperforms model (1): while the changepoint identified by model (6) is identical to the actual changepoint, the one identified by model (1) is 39 time intervals (months in this case) earlier and involves a notable trend-type change, which is far from the reality.

Having detected a changepoint using model (6), one can further test whether the slopes are equal before and after the changepoint time. A standard normal z-score hypothesis test with z = [α̂1α̂2]/[var(α̂1) + var(α̂2)]1/2 would suffice. But again, a trend-type change should only be adjusted if there is sufficient evidence showing that it is not rooted in true climate change/variability.

Finally, note that both models (6) and (1) can easily accommodate reference series (see Lund and Reeves 2002 for more details).

Acknowledgments

The author thanks Dr. Francis W. Zwiers for his helpful comments on an earlier version of this manuscript. The reviewer Dr. Robert Lund is also acknowledged. The research was partly funded by the Action Plan 2000 science program of the Canadian government.

REFERENCES

Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: A revision of the two-phase regression model. J. Climate, 15 , 25472554.

  • Search Google Scholar
  • Export Citation

Fig. 1.
Fig. 1.

Changepoints detected using the step-only model (6) and the step-trend model (1) for (top) 1/10 cloud cover and (bottom) 9/10 cloud cover

Citation: Journal of Climate 16, 20; 10.1175/1520-0442(2003)016<3383:CODOUC>2.0.CO;2

Table 1.

The Fmax percentiles of model (6)

Table 1.
Table 2.

The detection rates (in 1/200) of models (6) and (1) for time series of step-type changepoints (at 5% level of significance)

Table 2.
Save
  • Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: A revision of the two-phase regression model. J. Climate, 15 , 25472554.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Changepoints detected using the step-only model (6) and the step-trend model (1) for (top) 1/10 cloud cover and (bottom) 9/10 cloud cover

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 616 108 12
PDF Downloads 238 63 5