Indices of Rank Histogram Flatness and Their Sampling Properties

D. S. Wilks Department of Earth and Atmospheric Sciences, Cornell University, Ithaca, New York

Search for other papers by D. S. Wilks in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Quantitative evaluation of the flatness of the verification rank histogram can be approached through formal hypothesis testing. Traditionally, the familiar χ2 test has been used for this purpose. Recently, two alternatives—the reliability index (RI) and an entropy statistic (Ω)—have been suggested in the literature. This paper presents approximations to the sampling distributions of these latter two rank histogram flatness metrics, and compares the statistical power of tests based on the three statistics, in a controlled setting. The χ2 test is generally most powerful (i.e., most sensitive to violations of the null hypothesis of rank uniformity), although for overdispersed ensembles and small sample sizes, the test based on the entropy statistic Ω is more powerful. The RI-based test is preferred only for unbiased forecasts with small ensembles and very small sample sizes.

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: D. S. Wilks, dsw5@cornell.edu

Abstract

Quantitative evaluation of the flatness of the verification rank histogram can be approached through formal hypothesis testing. Traditionally, the familiar χ2 test has been used for this purpose. Recently, two alternatives—the reliability index (RI) and an entropy statistic (Ω)—have been suggested in the literature. This paper presents approximations to the sampling distributions of these latter two rank histogram flatness metrics, and compares the statistical power of tests based on the three statistics, in a controlled setting. The χ2 test is generally most powerful (i.e., most sensitive to violations of the null hypothesis of rank uniformity), although for overdispersed ensembles and small sample sizes, the test based on the entropy statistic Ω is more powerful. The RI-based test is preferred only for unbiased forecasts with small ensembles and very small sample sizes.

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: D. S. Wilks, dsw5@cornell.edu
Save
  • Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J. Climate, 9, 15181530, https://doi.org/10.1175/1520-0442(1996)009<1518:AMFPAE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bröcker, J., 2018: Assessing the reliability of ensemble forecasting systems under serial dependence. Quart. J. Roy. Meteor. Soc., 144, 26662675, https://doi.org/10.1002/qj.3379.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dawid, A. P., 1984: Present position and potential developments: Some personal views: Statistical theory: The prequential approach. J. Roy. Stat. Soc., 147A, 278292, https://doi.org/10.2307/2981683.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Delle Monache, L., J. P. Hacker, Y. Zhou, X. Deng, and R. B. Stull, 2006: Probabilistic aspects of meteorological and ozone regional ensemble forecasts. J. Geophys. Res., 111, D24307, https://doi.org/10.1029/2005JD006917.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Diebold, F. X., T. A. Gunther, and A. S. Tay, 1998: Evaluating density forecasts with applications to financial risk management. Int. Econ. Rev., 39, 863883, https://doi.org/10.2307/2527342.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Elmore, K. L., 2005: Alternatives to the chi-square test for evaluating rank histograms from ensemble forecasts. Wea. Forecasting, 20, 789795, https://doi.org/10.1175/WAF884.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, https://doi.org/10.1175/MWR2904.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gneiting, T., F. Balabdaoui, and A. E. Raftery, 2007: Probabilistic forecasts, calibration and sharpness. J. Roy. Stat. Soc., 69B, 243268, https://doi.org/10.1111/j.1467-9868.2007.00587.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550560, https://doi.org/10.1175/1520-0493(2001)129<0550:IORHFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327, https://doi.org/10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and C. Primo, 2008: Evaluating rank histograms using decompositions of the chi-square test statistic. Mon. Wea. Rev., 136, 21332139, https://doi.org/10.1175/2007MWR2219.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 13301338, https://doi.org/10.1175/1520-0493(1987)115<1330:AGFFFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Taillardat, M., O. Mestre, M. Zamo, and P. Naveau, 2016: Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics. Mon. Wea. Rev., 144, 23752393, https://doi.org/10.1175/MWR-D-15-0260.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Talagrand, O., R. Vautard, and B. Strauss, 1997: Evaluation of probabilistic prediction systems. Proc. ECMWF Workshop on Predictability, Reading, United Kingdom, ECMWF, 1–25, https://www.ecmwf.int/en/elibrary/12555-evaluation-probabilistic-prediction-systems.

  • Thorarinsdottir, T. L., M. Scheuerer, and C. Heinz, 2016: Assessing the calibration of high-dimensional ensemble forecasts using rank histograms. J. Comput. Graph. Stat., 25, 105122, https://doi.org/10.1080/10618600.2014.977447.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vannitsem, S., D. S. Wilks, and J. W. Messner, Eds., 2018: Statistical Postprocessing of Ensemble Forecasts. Elsevier, 347 pp.

  • Wilks, D. S., 2004: The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts. Mon. Wea. Rev., 132, 13291340, https://doi.org/10.1175/1520-0493(2004)132<1329:TMSTHA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2017: On assessing calibration of multivariate ensemble forecasts. Quart. J. Roy. Meteor. Soc., 143, 164172, https://doi.org/10.1002/qj.2906.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2018: Enforcing calibration in ensemble postprocessing. Quart. J. Roy. Meteor. Soc., 144, 7684, https://doi.org/10.1002/qj.3185.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 632 199 12
PDF Downloads 537 181 8