An Odds Ratio Parameterization for ROC Diagram and Skill Score Indices

Agostino Manzato Osservatorio Meteorologico Regionale, Agenzia Regionale per la Protezione dell’Ambiente del Friuli Venezia Giulia, Visco, Italy

Search for other papers by Agostino Manzato in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

The relative operating characteristic (ROC) diagram is often used to assess the performance of a classification system, like the categorical forecast of an event occurrence. Categorical forecasting can be obtained by imposing a threshold on a continuous variable in order to make it dichotomous. In practice this threshold could be varied to create different contingency tables. From each table, it is then possible to derive many statistical indices and skill scores, which are functions of the chosen threshold. The ROC curve is obtained by plotting two of these indices: probability of detection (POD) versus probability of false detection (POFD).

In this work a simple approximation for another of these indices, the odds ratio (O), is proposed. Thus, O is parameterized as a function of POFD and that leads to a parameterization of all the theoretical ROC curves. Using this approximation, it is also possible to derive the theoretical maximum Hanssen and Kuipers skill score (KSS) and the theoretical maximum Heidke skill score (HSS), for each ROC. It is found that the maximum HSS depends explicitly on the database event frequency (α), while the KSS seems independent of it.

Out of the approximation framework, some general properties of ROC points corresponding to the maximum KSS, to the maximum HSS, and to the BIAS = 1 condition have also been found. It is also suggested that many of these performance measures are influenced by the event frequency, which must be taken into account when comparing classifiers made for different databases. Another interesting outcome of this study is that it is shown how the KSS is also equitable (in the sense introduced by Gandin and Murphy) for a generic “cost ratio” (λ) between miss and false alarm cases, not only for the original case λ = 1.

Corresponding author address: Agostino Manzato, Osservatorio Meteorologico Regionale/Agenzia Regionale per la Protezione dell’Ambiente Friuli Venezia Giulia (OSMER/ARPA), Via Oberdan, 18/a, I-33040 Visco (UD), Italy. Email: agostino.manzato@osmer.fvg.it

Abstract

The relative operating characteristic (ROC) diagram is often used to assess the performance of a classification system, like the categorical forecast of an event occurrence. Categorical forecasting can be obtained by imposing a threshold on a continuous variable in order to make it dichotomous. In practice this threshold could be varied to create different contingency tables. From each table, it is then possible to derive many statistical indices and skill scores, which are functions of the chosen threshold. The ROC curve is obtained by plotting two of these indices: probability of detection (POD) versus probability of false detection (POFD).

In this work a simple approximation for another of these indices, the odds ratio (O), is proposed. Thus, O is parameterized as a function of POFD and that leads to a parameterization of all the theoretical ROC curves. Using this approximation, it is also possible to derive the theoretical maximum Hanssen and Kuipers skill score (KSS) and the theoretical maximum Heidke skill score (HSS), for each ROC. It is found that the maximum HSS depends explicitly on the database event frequency (α), while the KSS seems independent of it.

Out of the approximation framework, some general properties of ROC points corresponding to the maximum KSS, to the maximum HSS, and to the BIAS = 1 condition have also been found. It is also suggested that many of these performance measures are influenced by the event frequency, which must be taken into account when comparing classifiers made for different databases. Another interesting outcome of this study is that it is shown how the KSS is also equitable (in the sense introduced by Gandin and Murphy) for a generic “cost ratio” (λ) between miss and false alarm cases, not only for the original case λ = 1.

Corresponding author address: Agostino Manzato, Osservatorio Meteorologico Regionale/Agenzia Regionale per la Protezione dell’Ambiente Friuli Venezia Giulia (OSMER/ARPA), Via Oberdan, 18/a, I-33040 Visco (UD), Italy. Email: agostino.manzato@osmer.fvg.it

Save
  • Birdsall, T. G., 1966: The theory of signal detectability: ROC curves and their character. Ph.D. dissertation, University of Michigan, Ann Arbor, MI.

  • Doswell C. A. III, , Davies-Jones R. P. , and Keller D. L. , 1990: On summary measures of skill in rare event forecasting based on contingency tables. Wea. Forecasting, 5 , 576585.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Duda, R. O., Hart P. E. , and Stork D. G. , 2000: Pattern Classification. 2d ed. John Wiley, 680 pp.

  • Flueck, J. A., 1987: A study of some measures of forecast verification. Preprints. 10th Conf. on Probability and Statistics in Atmospheric Sciences, Edmonton, AB, Canada, Amer. Meteor. Soc., 69–73.

    • Search Google Scholar
    • Export Citation
  • Gandin, L. S., and Murphy A. H. , 1992: Equitable skill scores for categorical forecasts. Mon. Wea. Rev., 120 , 361370.

  • Göber, M., Wilson C. A. , Milton S. F. , and Stephenson D. B. , 2004: Fairplay in the verification of operational quantitative precipitation forecasts. J. Hydrol., 288 , 225236.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hanssen, A. W., and Kuipers W. J. A. , 1965: On the relationship between the frequency of rain and various meteorological parameters. Kon. Neder. Meteor. Inst. Meded. Verhand, 81 , 215.

    • Search Google Scholar
    • Export Citation
  • Heidke, P., 1926: Berechnung des Erfolges und der Gute der Windstarkevorhersagen im Sturmwar-nungsdienst. Geogr. Ann., 8 , 301349.

  • Kharin, V. V., and Zwiers F. W. , 2003: On the ROC score of probability forecasts. J. Climate, 16 , 41454150.

  • Manzato, A., 2003: A climatology of instability indices derived from Friuli Venezia Giulia soundings, using three different methods. Atmos. Res., 67–68 , 417454.

    • Search Google Scholar
    • Export Citation
  • Manzato, A., 2005: The use of sounding-derived indices for a neural network short-term thunderstorm forecast. Wea. Forecasting, 20 , 896917.

  • Manzato, A., and Morgan G. M. , 2003: Evaluating the sounding instability with the lifted parcel theory. Atmos. Res., 67–68 , 455473.

    • Search Google Scholar
    • Export Citation
  • Marzban, C., 1998a: Bayesian probability and scalar performance measures in Gaussian models. J. Appl. Meteor., 37 , 7282.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marzban, C., 1998b: Scalar measures of performance in rare-event situations. Wea. Forecasting, 13 , 753763.

  • Marzban, C., 2004: The ROC curve and the area under it as a performance measure. Wea. Forecasting, 19 , 11061114.

  • Marzban, C., and Lakshmanan V. , 1999: On the uniqueness of Gandin and Murphy’s equitable performance measures. Mon. Wea. Rev., 127 , 11341136.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mason, I. B., 1982: A model for the assessment of weather forecasts. Aust. Meteor. Mag., 30 , 291303.

  • Mason, I. B., 2003: Binary events. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., John Wiley, 37–76.

    • Search Google Scholar
    • Export Citation
  • Peirce, C. S., 1884: The numerical measure of the success of predictions. Science, 4 , 453454.

  • Stephenson, D. B., 2000: Use of the “odds ratio” for diagnosing forecast skill. Wea. Forecasting, 15 , 221232.

  • Swets, J. A., 1973: The relative operating characteristic in psychology. Science, 182 , 9001000.

  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 354 197 7
PDF Downloads 199 84 4