• Agresti, A., 1984: Analysis of Ordinal Categorical Data. Wiley, 304 pp.

  • Bowler, N., 2008: Accounting for the effect of observation errors on verification of MOGRPS. Meteor. Appl., 15 , 199205.

  • Brier, G. W., , and R. A. Allen, 1951: Verification of weather forecasts. Compendium of Meteorology, T. F. Malone, Ed., Amer. Meteor. Soc., 841–848.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., , and L. A. Smith, 2007: Scoring probabilistic forecasts: The importance of being proper. Wea. Forecasting, 22 , 382388.

  • Conover, W. J., 1999: Practical Nonparametric Statistics. Wiley, 584 pp.

  • Doblas-Reyes, F. C., , C. A. S. Coelho, , and D. B. Stephenson, 2008: How much does simplification of probability forecasts reduce forecast quality? Meteor. Appl., 15 , 155162.

    • Search Google Scholar
    • Export Citation
  • Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8 , 985987.

  • Glahn, H. R., 2004: Discussion of verification concepts in Forecast Verification: A Practitioner’s Guide in Atmospheric Science. Wea. Forecasting, 19 , 769775.

    • Search Google Scholar
    • Export Citation
  • Green, D. M., , and J. A. Swets, 1989: Signal Detection Theory and Psychophysics. Peninsula Publishing, 521 pp.

  • Jolliffe, I. T., 2004: P stands for . . . Weather, 59 , 7779.

  • Jolliffe, I. T., , and D. B. Stephenson, 2003: Introduction. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., Wiley, 1–12.

    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., , and D. B. Stephenson, 2008: Proper scores for probability forecasts can never be equitable. Mon. Wea. Rev., 136 , 15051510.

    • Search Google Scholar
    • Export Citation
  • Kharin, V. V., , and F. W. Zwiers, 2003: Improved seasonal probability forecasts. J. Climate, 16 , 16841701.

  • Lindeman, R. H., , P. F. Merando, , and R. Z. Gold, 1980: Introduction to Bivariate and Multivariate Statistics. Scott Foresman, 444 pp.

  • Livezey, R. E., , and M. Timofeyeva, 2008: The first decade of long-lead U.S. seasonal forecasts: Insights from a skill analysis. Bull. Amer. Meteor. Soc., 89 , 843854.

    • Search Google Scholar
    • Export Citation
  • Mason, I. T., 2003: Binary events. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., Wiley, 37–76.

    • Search Google Scholar
    • Export Citation
  • Mason, S. J., 2004: On using “climatology” as a reference strategy in the Brier and ranked probability skill scores. Mon. Wea. Rev., 132 , 18911895.

    • Search Google Scholar
    • Export Citation
  • Mason, S. J., 2008: Understanding forecast verification statistics. Meteor. Appl., 15 , 3140.

  • Mason, S. J., , and N. E. Graham, 2002: Areas beneath the relative operating characteristics (ROC) and levels (ROL) curves: Statistical significance and interpretation. Quart. J. Roy. Meteor. Soc., 128 , 21452166.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1991: Forecast verification: Its complexity and dimensionality. Mon. Wea. Rev., 119 , 15901601.

  • Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8 , 281293.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1996: The Finley affair: A signal event in the history of forecast verification. Wea. Forecasting, 11 , 320.

  • Murphy, A. H., , and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115 , 13301338.

  • Nicholls, N., 2001: The insignificance of significance testing. Bull. Amer. Meteor. Soc., 82 , 981986.

  • Palmer, T. N., and Coauthors, 2004: Development of a European ensemble system for seasonal to inter-annual prediction (DEMETER). Bull. Amer. Meteor. Soc., 85 , 853872.

    • Search Google Scholar
    • Export Citation
  • Sheskin, D. J., 2007: Handbook of Parametric and Nonparametric Statistical Procedures. Chapman and Hall/CRC, 1776 pp.

  • Thomson, M. C., , F. J. Doblas-Reyes, , S. J. Mason, , R. Hagedorn, , S. J. Connor, , T. Phindela, , A. P. Morse, , and T. N. Palmer, 2006: Multi-model ensemble seasonal climate forecasts for malaria early warning. Nature, 439 , 576579.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. Academic Press, 648 pp.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 118 118 27
PDF Downloads 78 78 28

A Generic Forecast Verification Framework for Administrative Purposes

View More View Less
  • 1 International Research Institute for Climate and Society, Columbia University, Palisades, New York
  • | 2 Federal Office of Meteorology and Climatology, MeteoSwiss, Zurich, Switzerland
© Get Permissions
Restricted access

Abstract

There are numerous reasons for calculating forecast verification scores, and considerable attention has been given to designing and analyzing the properties of scores that can be used for scientific purposes. Much less attention has been given to scores that may be useful for administrative reasons, such as communicating changes in forecast quality to bureaucrats and providing indications of forecast quality to the general public. The two-alternative forced choice (2AFC) test is proposed as a scoring procedure that is sufficiently generic to be usable on forecasts ranging from simple yes–no forecasts of dichotomous outcomes to forecasts of continuous variables, and can be used with deterministic or probabilistic forecasts without seriously reducing the more complex information when available. Although, as with any single verification score, the proposed test has limitations, it does have broad intuitive appeal in that the expected score of an unskilled set of forecasts (random guessing or perpetually identical forecasts) is 50%, and is interpretable as an indication of how often the forecasts are correct, even when the forecasts are expressed probabilistically and/or the observations are not discrete.

Corresponding author address: Dr. Simon J. Mason, International Research Institute for Climate and Society, 61 Route 9W, P.O. Box 1000, Columbia University, Palisades, NY 10964-8000. Email: simon@iri.columbia.edu

Abstract

There are numerous reasons for calculating forecast verification scores, and considerable attention has been given to designing and analyzing the properties of scores that can be used for scientific purposes. Much less attention has been given to scores that may be useful for administrative reasons, such as communicating changes in forecast quality to bureaucrats and providing indications of forecast quality to the general public. The two-alternative forced choice (2AFC) test is proposed as a scoring procedure that is sufficiently generic to be usable on forecasts ranging from simple yes–no forecasts of dichotomous outcomes to forecasts of continuous variables, and can be used with deterministic or probabilistic forecasts without seriously reducing the more complex information when available. Although, as with any single verification score, the proposed test has limitations, it does have broad intuitive appeal in that the expected score of an unskilled set of forecasts (random guessing or perpetually identical forecasts) is 50%, and is interpretable as an indication of how often the forecasts are correct, even when the forecasts are expressed probabilistically and/or the observations are not discrete.

Corresponding author address: Dr. Simon J. Mason, International Research Institute for Climate and Society, 61 Route 9W, P.O. Box 1000, Columbia University, Palisades, NY 10964-8000. Email: simon@iri.columbia.edu

Save