H. L. Wagner's Unbiased Hit Rate and the Assessment of Categorical Forecasting Accuracy

Timothy W. Armistead Armistead Research and Investigative Services, Pinole, California

Search for other papers by Timothy W. Armistead in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

The paper briefly reviews measures that have been proposed since the 1880s to assess accuracy and skill in categorical weather forecasting. The majority of the measures consist of a single expression, for example, a proportion, the difference between two proportions, a ratio, or a coefficient. Two exemplar single-expression measures for 2 × 2 categorical arrays that chronologically bracket the 130-yr history of this effort—Doolittle's inference ratio i and Stephenson's odds ratio skill score (ORSS)—are reviewed in detail. Doolittle's i is appropriately calculated using conditional probabilities, and the ORSS is a valid measure of association, but both measures are limited in ways that variously mirror all single-expression measures for categorical forecasting. The limitations that variously affect such measures include their inability to assess the separate accuracy rates of different forecast–event categories in a matrix, their sensitivity to the interdependence of forecasts in a 2 × 2 matrix, and the inapplicability of many of them to the general k × k (k ≥ 2) problem. The paper demonstrates that Wagner's unbiased hit rate, developed for use in categorical judgment studies with any k × k (k ≥ 2) array, avoids these limitations while extending the dual-measure Bayesian approach proposed by Murphy and Winkler in 1987.

Corresponding author address: Dr. Timothy W. Armistead, Armistead Research and Investigative Services, Ste. 323, 1564-A Fitzgerald Dr., Pinole, CA 94564. E-mail: tarmistead@sbcglobal.net

A comment/reply has been published regarding this article and can be found at http://journals.ametsoc.org/doi/abs/10.1175/WAF-D-14-00004.1 and http://journals.ametsoc.org/doi/abs/10.1175/WAF-D-14-00008.1

Abstract

The paper briefly reviews measures that have been proposed since the 1880s to assess accuracy and skill in categorical weather forecasting. The majority of the measures consist of a single expression, for example, a proportion, the difference between two proportions, a ratio, or a coefficient. Two exemplar single-expression measures for 2 × 2 categorical arrays that chronologically bracket the 130-yr history of this effort—Doolittle's inference ratio i and Stephenson's odds ratio skill score (ORSS)—are reviewed in detail. Doolittle's i is appropriately calculated using conditional probabilities, and the ORSS is a valid measure of association, but both measures are limited in ways that variously mirror all single-expression measures for categorical forecasting. The limitations that variously affect such measures include their inability to assess the separate accuracy rates of different forecast–event categories in a matrix, their sensitivity to the interdependence of forecasts in a 2 × 2 matrix, and the inapplicability of many of them to the general k × k (k ≥ 2) problem. The paper demonstrates that Wagner's unbiased hit rate, developed for use in categorical judgment studies with any k × k (k ≥ 2) array, avoids these limitations while extending the dual-measure Bayesian approach proposed by Murphy and Winkler in 1987.

Corresponding author address: Dr. Timothy W. Armistead, Armistead Research and Investigative Services, Ste. 323, 1564-A Fitzgerald Dr., Pinole, CA 94564. E-mail: tarmistead@sbcglobal.net

A comment/reply has been published regarding this article and can be found at http://journals.ametsoc.org/doi/abs/10.1175/WAF-D-14-00004.1 and http://journals.ametsoc.org/doi/abs/10.1175/WAF-D-14-00008.1

Save
  • Agresti, A., 1996: An Introduction to Categorical Data Analysis. Wiley-Interscience, 290 pp.

  • Armistead, T. W., 2011: Detecting deception in written statements: The British Home Office study of scientific content analysis (SCAN). Policing, 34, 588605.

    • Search Google Scholar
    • Export Citation
  • Armistead, T. W., 2012: The detection of deception by linguistic means: Unresolved issues of validity, usefulness, and epistemology. Policing, 35, 304326.

    • Search Google Scholar
    • Export Citation
  • Bartlett, M. S., 1935: Contingency table interactions. J. Roy. Stat. Soc., 2 (Suppl. 2), 248252.

  • Beck, L., and Feldman R. S. , 1989: Enhancing children's decoding of facial expression. J. Nonverbal Behav., 13, 269278.

  • Bowler, N. E., 2006: Explicitly accounting for observation error in categorical verification of forecasts. Mon. Wea. Rev., 134, 16001606.

    • Search Google Scholar
    • Export Citation
  • Clayton, H. H., 1889: Verification of weather forecasts. Amer. Meteor. J., 6, 211219.

  • Clayton, H. H., 1891: Verification of weather forecasts. Amer. Meteor. J., 8, 369375.

  • Clayton, H. H., 1927: A method of verifying weather forecasts. Bull. Amer. Meteor. Soc., 8, 144146.

  • Clayton, H. H., 1934: Rating weather forecasts. Bull. Amer. Meteor. Soc., 15, 279283.

  • Cohen, M. R., and Nagel E. , 1934: An Introduction to Logic and Scientific Method. Harcourt, Brace and Co., 467 pp.

  • Curtis, G. E., 1887: Tornado predictions and their verification. Amer. Meteor. J., 4, 6874.

  • Davis, M., Markus K. A. , and Walters S. B. , 2006: Judging the credibility of criminal suspect statements: Does the mode of presentation matter? J. Nonverbal Behav., 30, 181198.

    • Search Google Scholar
    • Export Citation
  • Doolittle, M. H., 1884: The verification of predictions. Bull. Philos. Soc. Wash.,7,03 December 1884, 122–127 of the Mathematical Section.

  • Doolittle, M. H., 1888: Association ratios. Bull. Philos. Soc. Wash., 10, 83–87 and 94–96.

  • Doswell, C. A., III, and Flueck J. A. , 1989: Forecasting and verifying in a field research project: DOPLIGHT '87. Wea. Forecasting, 4, 97109.

    • Search Google Scholar
    • Export Citation
  • Doswell, C. A., III, Davies-Jones R. , and Keller D. L. , 1990: On summary measures of skill in rare event forecasting based on contingency tables. Wea. Forecasting, 5, 576585.

    • Search Google Scholar
    • Export Citation
  • Elfenbein, H. A., Mandal M. K. , Ambady N. , and Harizuka S. , 2002: Cross-cultural patterns in emotion recognition highlighting design and analytical techniques. Emotion, 2, 7584.

    • Search Google Scholar
    • Export Citation
  • Finley, J. P., 1884: Tornado predictions. Amer. Meteor. J., 1, 8588.

  • Fisher, R. A., 1934: Statistical Methods for Research Workers. Oliver and Boyd, 356 pp.

  • Gilbert, G. F., 1884: Finley's tornado predictions. Amer. Meteor. J., 1, 166172.

  • Goeleven, E., De Raedt R. , Leyman L. , and Verschuere B. , 2008: The Karolinska directed emotional faces: A validation study. Cognit. Emotion, 22, 10941118.

    • Search Google Scholar
    • Export Citation
  • Goodman, L. A., and Kruskal W. H. , 1954: Measures of association for cross-classifications, II: Further discussion and references. J. Amer. Stat. Assoc., 54, 123163.

    • Search Google Scholar
    • Export Citation
  • Hawk, S. T., van Kleef G. A. , Fischer A. H. , and van der Schalk J. , 2009: “Worth a thousand words”: Absolute and relative decoding of nonlinguistic affect vocalizations. Emotion, 9, 293305.

    • Search Google Scholar
    • Export Citation
  • Hazen, H. A., 1887: Verification of tornado predictions. Amer. J. Sci., 34, 127131.

  • Hazen, H. A., 1892: The verification of weather forecasts. Amer. Meteor. J., 8, 392396.

  • Heidke, P., 1926: Berechnung des erfolges und der güte der windstärkvorhersagen im sturmwarnungsdienst. Geogr. Ann., 8, 301349.

  • Jolliffe, I. T., and Stephenson D. B. , Eds., 2003: Forecast Verification: A Practitioner's Guide in Atmospheric Science. John Wiley and Sons, 292 pp.

  • Köppen, W., 1893: The best method of testing weather predictions. U.S. Weather Bureau Bull. 11, 29–34.

  • Kraut, R., 1980: Humans as lie detectors: Some second thoughts. J. Commun., 30, 209218.

  • Marsh, A. A., Elfenbein H. A. , and Ambady N. , 2003: Cultural differences in facial expressions of emotions. Psychol. Sci., 14, 373376.

    • Search Google Scholar
    • Export Citation
  • Mumford, A. A., and Young M. , 1923: The interrelationships of the physical measurements and the vital capacity. Biometrika, 15, 109133.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1996: The Finley affair: A signal event in the history of forecast verification. Wea. Forecasting, 11, 320.

  • Murphy, A. H., and Daan H. , 1985: Forecast evaluation. Probability, Statistics and Decision Making in the Atmospheric Sciences, A. H. Murphy and R. W. Katz, Eds., Westview Press, 379–437.

  • Murphy, A. H., and Winkler R. L. , 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 13301338.

  • Newman, M. L., Pennebaker J. W. , Berry D. S. , and Richards J. M. , 2003: Lying words: Predicting deception from linguistic styles. Pers. Soc. Psychol. Bull., 29, 665675.

    • Search Google Scholar
    • Export Citation
  • Niles, H. E., 1922: Correlation, causation, and Wright's theory of “path coefficients.” Genetics, 7, 258273.

  • Norton, H. W., 1939: Chance in medicine and research. Brit. Med. J. (correspondence), 2(26), 467–468.

  • Park, H. S., and Levine T. R. , 2001: A probability model of accuracy in deception detection experiments. Commun. Monogr., 68, 201210.

    • Search Google Scholar
    • Export Citation
  • Pearson, K., and Lee A. , 1903: Inheritance of physical characteristics. Biometrika, 2, 357462.

  • Pearson, K., Lee A. , and Bramley-Moore L. , 1899: Genetic (reproductive) selection: Inheritance of fertility in man and of fecundity in thoroughbred racehorses. Philos. Trans. Roy. Soc., 192A, 257330.

    • Search Google Scholar
    • Export Citation
  • Peirce, C. S., 1884: The numerical measure of the success of predictions. Science, 4, 453454.

  • Russell, J. A., and Fernández-Dols J. M. , Eds., 1997: The Psychology of Facial Expression. Cambridge University Press, 400 pp.

  • Sauter, D. A., and Scott S. K. , 2007: More than one kind of happiness: Can we recognize vocal expressions of different positive states? Motiv. Emotion, 31, 192199.

    • Search Google Scholar
    • Export Citation
  • Scherer, K. R., 2003: Vocal communication of emotion: A review of research paradigms. Speech Commun., 40, 227256.

  • Stephenson, D. B., 2000: Use of the “odds ratio” for diagnosing forecast skill. Wea. Forecasting, 15, 221232.

  • Swets, J. A., 1986: Indices of discrimination or diagnostic accuracy: Their ROCs and implied models. Psychol. Bull., 99, 110117.

  • Thomes, J. E., and Stephenson D. B. , 2001: How to judge the quality and value of weather forecast products. Meteor. Appl., 8, 307314.

    • Search Google Scholar
    • Export Citation
  • Vrij, A., 2008: Detecting Lies and Deceit: Pitfalls and Opportunities. 2nd ed. John Wiley and Sons, 488 pp.

  • Vrij, A., Mann S. , Kristen S. , and Fisher R. , 2007: Cues to deception and ability to detect lies as a function of police interview styles. Law Hum. Behav., 31, 499518.

    • Search Google Scholar
    • Export Citation
  • Wagner, H. L., 1993: On measuring performance in categorical judgment studies of nonverbal behavior. J. Nonverbal Behav., 17, 328.

  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 704 pp.

  • Woodworth, R. S., 1938: Experimental Psychology. John Wiley and Sons, 889 pp.

  • Yule, G. U., 1900: On the association of attributes in statistics. Philos. Trans. Roy. Soc., 194A, 257319.

  • Yule, G. U., 1903: Notes on the theory of association of attributes in statistics. Biometrika, 2, 121134.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1639 360 53
PDF Downloads 1017 135 13