Multiple Changepoint Detection Using Metadata

Yingbo Li Department of Mathematical Sciences, Clemson University, Clemson, South Carolina

Search for other papers by Yingbo Li in
Current site
Google Scholar
PubMed
Close
and
Robert Lund Department of Mathematical Sciences, Clemson University, Clemson, South Carolina

Search for other papers by Robert Lund in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

This paper examines multiple changepoint detection procedures that use station history (metadata) information. Metadata records are available for some climate time series; however, these records are notoriously incomplete and many station moves and gauge changes are unlisted (undocumented). The shift in a series must be comparatively larger to declare a changepoint at an undocumented time. Also, the statistical methods for the documented and undocumented scenarios radically differ: a simple t test adequately detects a single mean shift at a documented changepoint time, while a tmax distribution is appropriate for a single undocumented changepoint analysis. Here, the multiple changepoint detection problem is considered via a Bayesian approach, with the metadata record being used to formulate a prior distribution of the changepoint numbers and their location times. This prior distribution is combined with the data to obtain a posterior distribution of changepoint numbers and location times. Estimates of the most likely number of changepoints and times are obtained from the posterior distribution. Simulation studies demonstrate the efficacy of this approach. The methods, which are applicable with or without a reference series, are applied in the analysis of an annual precipitation series from New Bedford, Massachusetts.

Corresponding author address: Yingbo Li, O-110 Martin Hall, Box 340975, Department of Mathematical Sciences, Clemson University, Clemson, SC 29634. E-mail: ybli@clemson.edu

Abstract

This paper examines multiple changepoint detection procedures that use station history (metadata) information. Metadata records are available for some climate time series; however, these records are notoriously incomplete and many station moves and gauge changes are unlisted (undocumented). The shift in a series must be comparatively larger to declare a changepoint at an undocumented time. Also, the statistical methods for the documented and undocumented scenarios radically differ: a simple t test adequately detects a single mean shift at a documented changepoint time, while a tmax distribution is appropriate for a single undocumented changepoint analysis. Here, the multiple changepoint detection problem is considered via a Bayesian approach, with the metadata record being used to formulate a prior distribution of the changepoint numbers and their location times. This prior distribution is combined with the data to obtain a posterior distribution of changepoint numbers and location times. Estimates of the most likely number of changepoints and times are obtained from the posterior distribution. Simulation studies demonstrate the efficacy of this approach. The methods, which are applicable with or without a reference series, are applied in the analysis of an annual precipitation series from New Bedford, Massachusetts.

Corresponding author address: Yingbo Li, O-110 Martin Hall, Box 340975, Department of Mathematical Sciences, Clemson University, Clemson, SC 29634. E-mail: ybli@clemson.edu
Save
  • Barry, D., and J. A. Hartigan, 1992: Product partition models for change point problems. Ann. Stat., 20, 260279, doi:10.1214/aos/1176348521.

    • Search Google Scholar
    • Export Citation
  • Barry, D., and J. A. Hartigan, 1993: A Bayesian analysis for change point problems. J. Amer. Stat. Assoc., 88, 309319.

  • Beaulieu, C., T. B. Ouarda, and O. Seidou, 2010: A Bayesian normal homogeneity test for the detection of artificial discontinuities in climatic series. Int. J. Climatol., 30, 23422357, doi:10.1002/joc.2056.

    • Search Google Scholar
    • Export Citation
  • Berger, J. O., 1985: Statistical Decision Theory and Bayesian Analysis. 2nd ed. Springer, 618 pp.

  • Casella, G., and R. L. Berger, 1990: Statistical Inference. Duxbury Press, 660 pp.

  • Chernoff, H., and S. Zacks, 1964: Estimating the current mean of a normal distribution which is subjected to changes in time. Ann. Math. Stat., 35, 9991018, doi:10.1214/aoms/1177700517.

    • Search Google Scholar
    • Export Citation
  • Chib, S., 1998: Estimation and comparison of multiple change-point models. J. Econ., 86, 221241, doi:10.1016/S0304-4076(97)00115-2.

  • Fearnhead, P., 2006: Exact and efficient Bayesian inference for multiple changepoint problems. Stat. Comput., 16, 203213, doi:10.1007/s11222-006-8450-8.

    • Search Google Scholar
    • Export Citation
  • Gallagher, C., R. Lund, and M. Robbins, 2013: Changepoint detection in climate series with long-term trends. J. Climate, 26, 49945006, doi:10.1175/JCLI-D-12-00704.1.

    • Search Google Scholar
    • Export Citation
  • George, E. I., and R. E. McCulloch, 1997: Approaches for Bayesian variable selection. Stat. Sin., 7, 339373.

  • Green, P. J., 1995: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711732, doi:10.1093/biomet/82.4.711.

    • Search Google Scholar
    • Export Citation
  • Hannart, A., and P. Naveau, 2009: Bayesian multiple change points and segmentation: Application to homogenization of climatic series. Water Resour. Res., 45, xxx, doi:10.1029/2008WR007689.

    • Search Google Scholar
    • Export Citation
  • Jones, P. D., S. C. B. Raper, R. S. Bradley, H. F. Diaz, P. M. Kelly, and T. M. L. Wigley, 1986: Northern Hemisphere surface air temperature variations: 1851–1984. J. Climate Appl. Meteor., 25, 161179, doi:10.1175/1520-0450(1986)025<0161:NHSATV>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lee, J., S. Li, and R. Lund, 2014: Trends in extreme United States temperatures. J. Climate, 27, 42094225, doi:10.1175/JCLI-D-13-00283.1.

    • Search Google Scholar
    • Export Citation
  • Li, S., and R. Lund, 2012: Multiple changepoint detection via genetic algorithms. J. Climate, 25, 674686, doi:10.1175/2011JCLI4055.1.

    • Search Google Scholar
    • Export Citation
  • Lu, Q., and R. Lund, 2007: Simple linear regression with multiple level shifts. Can. J. Stat., 35, 447458, doi:10.1002/cjs.5550350308.

    • Search Google Scholar
    • Export Citation
  • Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: A revision of the two-phase regression model. J. Climate, 15, 25472554, doi:10.1175/1520-0442(2002)015<2547:DOUCAR>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lund, R., X. Wang, Q. Lu, J. Reeves, C. Gallagher, and Y. Feng, 2007: Changepoint detection in periodic and autocorrelated time series. J. Climate, 20, 51785190, doi:10.1175/JCLI4291.1.

    • Search Google Scholar
    • Export Citation
  • Mitchell, J. M., 1953: On the causes of instrumentally observed secular temperature trends. J. Meteor., 10, 244261, doi:10.1175/1520-0469(1953)010<0244:OTCOIO>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Perreault, L., J. Bernier, B. Bobée, and E. Parent, 2000: Bayesian change-point analysis in hydrometeorological time series. Part 1. The normal model revisited. J. Hydrol., 235, 221241, doi:10.1016/S0022-1694(00)00270-5.

    • Search Google Scholar
    • Export Citation
  • Reeves, J., J. Chen, X. Wang, R. Lund, and Q. Q. Lu, 2007: A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteor. Climatol., 46, 900915, doi:10.1175/JAM2493.1.

    • Search Google Scholar
    • Export Citation
  • Robbins, M., R. Lund, C. Gallagher, and Q. Q. Lu, 2011: Changepoints in the North Atlantic tropical cyclone record. J. Amer. Stat. Assoc., 106, 8999, doi:10.1198/jasa.2011.ap10023.

    • Search Google Scholar
    • Export Citation
  • Ruggieri, E., 2013: A Bayesian approach to detecting change points in climatic records. Int. J. Climatol., 33, 520528, doi:10.1002/joc.3447.

    • Search Google Scholar
    • Export Citation
  • Scott, J., and J. Berger, 2010: Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Stat., 38, 25872619, doi:10.1214/10-AOS792.

    • Search Google Scholar
    • Export Citation
  • Seidou, O., J. J. Asselin, and T. B. M. J. Ouarda, 2007: Bayesian multivariate linear regression with application to change point models in hydrometeorological variables. Water Resour. Res., 43, W08401, doi:10.1029/2005WR004835.

    • Search Google Scholar
    • Export Citation
  • Wang, J., and E. Zivot, 2000: A Bayesian time series model of multiple structural changes in level, trend, and variance. J. Bus. Econ. Stat., 18, 374386.

    • Search Google Scholar
    • Export Citation
  • Western, B., and M. Kleykamp, 2004: A Bayesian change point model for historical time series analysis. Polit. Anal., 12, 354374, doi:10.1093/pan/mph023.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp.

  • Yao, Y.-C., 1984: Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Stat., 12, 14341447, doi:10.1214/aos/1176346802.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 498 137 11
PDF Downloads 285 104 12