Statistical Considerations for Climate Experiments. Part I: Scalar Tests

F. W. Zwiers Canadian Climate Centre, Downsview, Ontario Canada

Search for other papers by F. W. Zwiers in
Current site
Google Scholar
PubMed
Close
and
H. J. Thiébaux Dalhousie University, Halifax, Nova Scotia, Canada

Search for other papers by H. J. Thiébaux in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Statistical tests used in model intercomparisons or model/climate comparisons may be either “scalar” or “multivariate” tests. The former are employed when testing a hypothesis about a single variable observed at a single location, or through a single derived coefficient. The latter are employed when testing a hypothesis about an entire field, or a set of derived coefficients. In this paper we examine several scalar tests for differences of mean and variance. The tests can be broadly classed as “standard” tests which operate on samples of time averages, and “time-series”-based tests which operate on samples of time series. The latter have the potential to be more powerful than standard tests because they use more of the information available in the sample, but they have the disadvantage that they are “asymptotic” tests, meaning that the properties of these tests are only well known in the case of very large samples. The properties of these tests in the case of relatively small samples are examined by means of a series of Monte Carlo experiments which are meant to mimic a broad range of stochastic behavior. It is shown that the actual significance level of time-series-based tests, especially those comparing means, ran be considerably different from the nominal significance level. Models are developed which relate the true significance level of these tests to sample size and the stochastic properties of the data, and them models are used to make recommendations for the design of experiments using time-series-based tests.

Abstract

Statistical tests used in model intercomparisons or model/climate comparisons may be either “scalar” or “multivariate” tests. The former are employed when testing a hypothesis about a single variable observed at a single location, or through a single derived coefficient. The latter are employed when testing a hypothesis about an entire field, or a set of derived coefficients. In this paper we examine several scalar tests for differences of mean and variance. The tests can be broadly classed as “standard” tests which operate on samples of time averages, and “time-series”-based tests which operate on samples of time series. The latter have the potential to be more powerful than standard tests because they use more of the information available in the sample, but they have the disadvantage that they are “asymptotic” tests, meaning that the properties of these tests are only well known in the case of very large samples. The properties of these tests in the case of relatively small samples are examined by means of a series of Monte Carlo experiments which are meant to mimic a broad range of stochastic behavior. It is shown that the actual significance level of time-series-based tests, especially those comparing means, ran be considerably different from the nominal significance level. Models are developed which relate the true significance level of these tests to sample size and the stochastic properties of the data, and them models are used to make recommendations for the design of experiments using time-series-based tests.

Save