Alternatives to the Chi-Square Test for Evaluating Rank Histograms from Ensemble Forecasts

Kimberly L. Elmore Cooperative Institute for Mesoscale Meteorological Studies, University of Oklahoma, Norman, Oklahoma

Search for other papers by Kimberly L. Elmore in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Rank histograms are a commonly used tool for evaluating an ensemble forecasting system’s performance. Because the sample size is finite, the rank histogram is subject to statistical fluctuations, so a goodness-of-fit (GOF) test is employed to determine if the rank histogram is uniform to within some statistical certainty. Most often, the χ2 test is used to test whether the rank histogram is indistinguishable from a discrete uniform distribution. However, the χ2 test is insensitive to order and so suffers from troubling deficiencies that may render it unsuitable for rank histogram evaluation. As shown by examples in this paper, more powerful tests, suitable for small sample sizes, and very sensitive to the particular deficiencies that appear in rank histograms are available from the order-dependent Cramér–von Mises family of statistics, in particular, the Watson and Anderson–Darling statistics.

* Additional affiliation: NOAA/National Severe Storms Laboratory, Norman, Oklahoma

Corresponding author address: Dr. Kimberly L. Elmore, NSSL, 1313 Halley Circle, Norman, OK 73069. Email: kim.elmore@noaa.gov

Abstract

Rank histograms are a commonly used tool for evaluating an ensemble forecasting system’s performance. Because the sample size is finite, the rank histogram is subject to statistical fluctuations, so a goodness-of-fit (GOF) test is employed to determine if the rank histogram is uniform to within some statistical certainty. Most often, the χ2 test is used to test whether the rank histogram is indistinguishable from a discrete uniform distribution. However, the χ2 test is insensitive to order and so suffers from troubling deficiencies that may render it unsuitable for rank histogram evaluation. As shown by examples in this paper, more powerful tests, suitable for small sample sizes, and very sensitive to the particular deficiencies that appear in rank histograms are available from the order-dependent Cramér–von Mises family of statistics, in particular, the Watson and Anderson–Darling statistics.

* Additional affiliation: NOAA/National Severe Storms Laboratory, Norman, Oklahoma

Corresponding author address: Dr. Kimberly L. Elmore, NSSL, 1313 Halley Circle, Norman, OK 73069. Email: kim.elmore@noaa.gov

Save
  • Anderson, J. S., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integration. J. Climate, 9 , 15181530.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Anderson, J. S., and Stern W. F. , 1996: Evaluating the potential predictive utility of ensemble forecasts. J. Climate, 9 , 260269.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Anderson, T. W., and Darling D. A. , 1952: Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Stat., 23 , 193212.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Choulakian, V., Lockhart R. A. , and Stephens M. A. , 1994: Cramér–von Mises statistics for discrete distributions. Can. J. Stat., 22 , 125137.

  • Conover, W. J., 1999: Practical Nonparametric Statistics. 3d ed. John Wiley and Sons, 584 pp.

  • Cramér, H., 1928: On the composition of elementary errors. Skand. Aktuarietidskr., 11 , 1374. 141180.

  • Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129 , 550560.

  • Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125 , 13121327.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Colucci S. J. , 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126 , 711724.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hou, D., Kalnay E. , and Droegemeier K. K. , 2001: Objective verification of the SAMEX’98 ensemble forecasts. Mon. Wea. Rev., 129 , 7391.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Millard, S. P., 2002: Environmental Stats for S-Plus. 2d ed. Springer, 264 pp.

  • Smirnov, N. V., 1936: Sui la distribution de w2 (Criterium de M.R.v. Mises). Compt. Rend., 202 , 449452.

  • Stensrud, D. J., and Yussouf N. , 2003: Short-range ensemble predictions of 2-m temperature and dewpoint temperature over New England. Mon. Wea. Rev., 131 , 25102524.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • von Mises, R., 1931: Wahrscheinlichkeitsrechnung und Ihre Anwendung in der Statistik und Theroretishen Physik. Vol. 1. F. Deuticke, 574 pp.

  • Watson, G. S., 1961: Goodness-of-fit tests on a circle. I. Biometrika, 48 , 109114.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1877 562 87
PDF Downloads 1074 153 8