Interpretation of Rank Histograms for Verifying Ensemble Forecasts

Thomas M. Hamill National Center for Atmospheric Research, Boulder, Colorado*

Search for other papers by Thomas M. Hamill in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Rank histograms are a tool for evaluating ensemble forecasts. They are useful for determining the reliability of ensemble forecasts and for diagnosing errors in its mean and spread. Rank histograms are generated by repeatedly tallying the rank of the verification (usually an observation) relative to values from an ensemble sorted from lowest to highest. However, an uncritical use of the rank histogram can lead to misinterpretations of the qualities of that ensemble. For example, a flat rank histogram, usually taken as a sign of reliability, can still be generated from unreliable ensembles. Similarly, a U-shaped rank histogram, commonly understood as indicating a lack of variability in the ensemble, can also be a sign of conditional bias. It is also shown that flat rank histograms can be generated for some model variables if the variance of the ensemble is correctly specified, yet if covariances between model grid points are improperly specified, rank histograms for combinations of model variables may not be flat. Further, if imperfect observations are used for verification, the observational errors should be accounted for, otherwise the shape of the rank histogram may mislead the user about the characteristics of the ensemble. If a statistical hypothesis test is to be performed to determine whether the differences from uniformity of rank are statistically significant, then samples used to populate the rank histogram must be located far enough away from each other in time and space to be considered independent.

Corresponding author address: Dr. Thomas M. Hamill, NOAA–CIRES Climate Diagnostics Center, R/CDC1, 325 Broadway, Boulder, CO 80309-0216.

Abstract

Rank histograms are a tool for evaluating ensemble forecasts. They are useful for determining the reliability of ensemble forecasts and for diagnosing errors in its mean and spread. Rank histograms are generated by repeatedly tallying the rank of the verification (usually an observation) relative to values from an ensemble sorted from lowest to highest. However, an uncritical use of the rank histogram can lead to misinterpretations of the qualities of that ensemble. For example, a flat rank histogram, usually taken as a sign of reliability, can still be generated from unreliable ensembles. Similarly, a U-shaped rank histogram, commonly understood as indicating a lack of variability in the ensemble, can also be a sign of conditional bias. It is also shown that flat rank histograms can be generated for some model variables if the variance of the ensemble is correctly specified, yet if covariances between model grid points are improperly specified, rank histograms for combinations of model variables may not be flat. Further, if imperfect observations are used for verification, the observational errors should be accounted for, otherwise the shape of the rank histogram may mislead the user about the characteristics of the ensemble. If a statistical hypothesis test is to be performed to determine whether the differences from uniformity of rank are statistically significant, then samples used to populate the rank histogram must be located far enough away from each other in time and space to be considered independent.

Corresponding author address: Dr. Thomas M. Hamill, NOAA–CIRES Climate Diagnostics Center, R/CDC1, 325 Broadway, Boulder, CO 80309-0216.

Save
  • Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J. Climate,9, 1518–1530.

  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probabilities. Mon. Wea. Rev.,78, 1–3.

  • Buizza, R., M. Miller, and T. N. Palmer, 1999: Stochastic simulation of model uncertainty in the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc.,125, 2887–2908.

  • Casella, G., and R. L. Berger, 1990: Statistical Inference. Duxbury Press, 650 pp.

  • Eckel, F. A., and M. K. Walters, 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting,13, 1132–1147.

  • Epstein, E., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor.,8, 985–987.

  • Evans, R. E., M. S. J. Harrison, and R. J. Graham, 2000: Joint medium-range ensembles from The Met. Office and ECMWF systems. Mon. Wea. Rev.,128, 3104–3127.

  • Gilmour, I., and L. A. Smith, 1997: Enlightenment in shadows. Applied Nonlinear Dynamics and Stochastic Systems near the Millenium, J. B. Kadtke and A. Bulsara, Eds., AIP, 335–340.

  • Hamill, T. M., 1997: Reliability diagrams for multicategory probabilistic forecasts. Wea. Forecasting,12, 736–741.

  • ——, and S. J. Colucci, 1996: Random and systematic error in NMC’s short-range Eta ensembles. Preprints, 13th Conf. on Probability and Statistics in the Atmospheric Sciences, San Francisco, CA, Amer. Meteor. Soc., 51–56.

  • ——, and ——, 1997: Verification of Eta-RSM short-range ensemble forecasts. Mon. Wea. Rev.,125, 1312–1327.

  • ——, and ——, 1998: Evaluation of Eta-RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev.,126, 711–724.

  • ——, and C. Snyder, 2000: A hybrid ensemble Kalman filter/3-dimensional variational analysis scheme. Mon. Wea. Rev.,128, 2905–2919.

  • ——, S. L. Mullen, C. Snyder, Z. Toth, and D. P. Baumhefner, 2000a:Ensemble forecasting in the short to medium range: Report from a workshop. Bull. Amer. Meteor Soc.,81, 2653–2664.

  • ——, C. Snyder, and R. E. Morss, 2000b: A comparison of probabilistic forecasts from bred, singular vector, and perturbed observation ensembles. Mon. Wea. Rev.,128, 1835–1851.

  • Harrison, M. S. J., D. S. Richardson, K. Robertson, and A. Woodcock, 1995: Medium-range ensembles using both the ECMWF T63 and unified models—An initial report. UKMO Tech. Rep. 153, 25 pp. [Available from U. K. Met Office Library, London Road, Bracknell, Berkshire RG12 2SZ, United Kingdom.].

  • Hersbach, H., 2000: Decomposition on the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting,15, 559–570.

  • Houtekamer, P. L., and L. Lefaivre, 1997: Using ensemble forecasts for model validation. Mon. Wea. Rev.,125, 2416–2426.

  • ——, ——, and J. Derome, 1996: The RPN ensemble prediction system. Proceedings, ECMWF Seminar on Predictability. Vol. II. ECMWF, 121–146. [Available from ECMWF, Shinfield Park, Reading, Berkshire RG2 9AX, United Kingdom.].

  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci.,20, 130–141.

  • ——, 1969: The predictability of a flow which possesses many scales of motion. Tellus,21, 289–307.

  • ——, 1982: Atmospheric predictability experiments with a large numerical model. Tellus,34, 505–513.

  • Mason, I., 1982: A model for assessment of weather forecasts. Austr. Meteor. Mag.,30, 291–303.

  • Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation. Quart. J. Roy. Meteor Soc.,122, 73–119.

  • Murphy, A. H., 1971: A note on the ranked probability score. J. Appl. Meteor.,10, 155–156.

  • ——, 1973: A new vector partition of the probability score. J. Appl. Meteor.,12, 595–600.

  • Richardson, D. S., 2000: Ensembles using multiple models and analyses. Quart. J. Roy. Meteor. Soc., in press.

  • Smith, L. A., 1999: Disentangling uncertainty and error: On the predictability of nonlinear systems. Nonlinear Dynamics and Statistics, Alistair E. Mees, Ed., Birkhauer Press, 31–64.

  • Stanski, H. R., L. J. Wilson, and W. R. Burrows, 1989: Survey of common verification methods in meteorology. Research Rep. 89-5, Environment Canada, 114 pp. [Available from Atmospheric Environment Service, Forecast Research Division, 4905 Dufferin St., Downsview, ON M3H 5T4 Canada.].

  • Stensrud, D. J., J.-W. Bao, and T. T. Warner, 2000: Using initial condition and model physics perturbations in short-range ensemble simulations of mesoscale convective systems. Mon. Wea. Rev.,128, 2077–2107.

  • Swets, J. A., 1973: The relative operating characteristic in psychology. Science,182, 990–999.

  • Talagrand, O., R. Vautard, and B. Strauss, 1997: Evaluation of probabilistic prediction systems. Proceedings, ECMWF Workshop on Predictability, ECMWF, 1–25. [Available from ECMWF, Shinfield Park, Reading, Berkshire RG2 9AX, United Kingdom.].

  • Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc.,74, 2317–2330.

  • ——, and ——, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev.,12, 3297–3319.

  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences:An Introduction. Academic Press, 467 pp.

  • Ziehmann, C., 2000: Comparison of a single-model EPS with a multi-model ensemble consisting of a few operational models. Tellus,52A, 280–299.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 9280 3497 119
PDF Downloads 6672 1199 56