Beyond Strictly Proper Scoring Rules: The Importance of Being Local

View More View Less
  • 1 Department of Mathematical Sciences, Durham University, Durham, DH1 3LE, U.K.
  • 2 Centre for the Analysis of Time Series, London School of Economics, London WC2A 2AE. UK
© Get Permissions
Restricted access

Abstract

The evaluation of probabilistic forecasts plays a central role both in the interpretation and in the use of forecast systems and their development. Probabilistic scores (scoring rules) provide statistical measures to assess the quality of probabilistic forecasts. Often, many probabilistic forecast systems are available while evaluations of their performance are not standardized, with different scoring rules being used to measure different aspects of forecast performance. Even when the discussion is restricted to strictly proper scoring rules, there remains considerable variability between them; indeed strictly proper scoring rules need not rank competing forecast systems in the same order when none of these systems are perfect. The locality property is explored to further distinguish scoring rules. The nonlocal strictly proper scoring rules considered are shown to have a property that can produce “unfortunate” evaluations. Particularly the fact that Continuous Rank Probability Score prefers the outcome close to the median of the forecast distribution regardless the probability mass assigned to the value at/near the median raises concern to its use. The only local strictly proper scoring rules, the logarithmic score, has direct interpretations in terms of probabilities and bits of information. The nonlocal strictly proper scoring rules, on the other hand, lack meaningful direct interpretation for decision support. The logarithmic score is also shown to be invariant under smooth transformation of the forecast variable, while the nonlocal strictly proper scoring rules considered may, however, change their preferences due to the transformation. It is therefore suggested that the logarithmic score always be included in the evaluation of probabilistic forecasts.

Abstract

The evaluation of probabilistic forecasts plays a central role both in the interpretation and in the use of forecast systems and their development. Probabilistic scores (scoring rules) provide statistical measures to assess the quality of probabilistic forecasts. Often, many probabilistic forecast systems are available while evaluations of their performance are not standardized, with different scoring rules being used to measure different aspects of forecast performance. Even when the discussion is restricted to strictly proper scoring rules, there remains considerable variability between them; indeed strictly proper scoring rules need not rank competing forecast systems in the same order when none of these systems are perfect. The locality property is explored to further distinguish scoring rules. The nonlocal strictly proper scoring rules considered are shown to have a property that can produce “unfortunate” evaluations. Particularly the fact that Continuous Rank Probability Score prefers the outcome close to the median of the forecast distribution regardless the probability mass assigned to the value at/near the median raises concern to its use. The only local strictly proper scoring rules, the logarithmic score, has direct interpretations in terms of probabilities and bits of information. The nonlocal strictly proper scoring rules, on the other hand, lack meaningful direct interpretation for decision support. The logarithmic score is also shown to be invariant under smooth transformation of the forecast variable, while the nonlocal strictly proper scoring rules considered may, however, change their preferences due to the transformation. It is therefore suggested that the logarithmic score always be included in the evaluation of probabilistic forecasts.

Save