Some Effects of Finite Sample Size and Persistence on Meteorological Statistics. Part I: Autocorrelations

Kevin E. Trenberth National Center for Atmospheric Research, Boulder, CO 80307

Search for other papers by Kevin E. Trenberth in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Time series of meteorological variables typically exhibit a pronounced annual cycle and persistence and samples are of finite size. This paper analyses the impact of these complicating features on certain statistics computed from the time series. The presence of an annual cycle means that statistics are nonstationary unless computed from multiyear samples of limited duration. Persistence leads to lack of independence of observations. Large amplitude weather (high frequency) events induce natural variability at low frequencies, known as climatic noise, that is enhanced by the presence of persistence. This natural variability should be, taken into account when estimating population statistics from a finite sample, but generally this has not been done in meteorology.

A number of studies in meteorology have computed statistics from daily data by 1) removing the annual cycle; 2) computing second moment statistics over each individual season; and 3) averaging the second moment statistics over all years. This procedure fails to take into account the natural interannual variability that should be present and results in biased estimates of certain statistics. In particular, autocorrelations that lag are systematically negatively biased. It is shown for first order autoregressive (AR) time series that theautocorrelations computed in this way become negative after just a few days lag. Consequently, several studies have drawn doubtful conclusions about the stochastic character of meteorological time series, and a few reported results are questionable. A brief discussion of some papers adversely affected by the methodology is given.

The effects of the statistical methodology are illustrated with simulated, and thus known, time series. It is shown that the best possible estimate of autocorrelations for stationary time series is obtained by subtraction of the mean of all available data, rather than subtracting a different mean for each subsample (season) in order to compute the anomalies. Appropriate methods for computing the statistics are discussed.

Abstract

Time series of meteorological variables typically exhibit a pronounced annual cycle and persistence and samples are of finite size. This paper analyses the impact of these complicating features on certain statistics computed from the time series. The presence of an annual cycle means that statistics are nonstationary unless computed from multiyear samples of limited duration. Persistence leads to lack of independence of observations. Large amplitude weather (high frequency) events induce natural variability at low frequencies, known as climatic noise, that is enhanced by the presence of persistence. This natural variability should be, taken into account when estimating population statistics from a finite sample, but generally this has not been done in meteorology.

A number of studies in meteorology have computed statistics from daily data by 1) removing the annual cycle; 2) computing second moment statistics over each individual season; and 3) averaging the second moment statistics over all years. This procedure fails to take into account the natural interannual variability that should be present and results in biased estimates of certain statistics. In particular, autocorrelations that lag are systematically negatively biased. It is shown for first order autoregressive (AR) time series that theautocorrelations computed in this way become negative after just a few days lag. Consequently, several studies have drawn doubtful conclusions about the stochastic character of meteorological time series, and a few reported results are questionable. A brief discussion of some papers adversely affected by the methodology is given.

The effects of the statistical methodology are illustrated with simulated, and thus known, time series. It is shown that the best possible estimate of autocorrelations for stationary time series is obtained by subtraction of the mean of all available data, rather than subtracting a different mean for each subsample (season) in order to compute the anomalies. Appropriate methods for computing the statistics are discussed.

Save