Hedging and Skill Scores for Probability Forecasts

Allan H. Murphy National Center for Atmospheric Research, Boulder, Cola. 80302

Search for other papers by Allan H. Murphy in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

An individual skill score (SS) and a collective skill score (CSS) are examined to determine whether these scoring or improper. The SS and the CSS are both standardized versions of the Brier, or probability, score (PS) and have been used to measure the “skill” of probability forecasts. The SS is defined in terms of individual forecasts, while the CSS is defined in terms of collections of forecasts. The SS and the CSS are shown to be improper scoring rules, and, as a result, both the SS and the CSS encourage hedging on the part of forecasters.

The results of a preliminary, investigation of the nature of the hedging produced by. the SS and the CSS indicate that, while the SS may encourage a considerable amount of hedging, the CSS, in general, encourages only a modest amount of hedging, and even this hedging decreases as the sample size K of the collection forecasts increases. In fact, the CSS is approximately strictly Proper for large collections of forecasts (K ≥ 100).

Finally, we briefly consider two questions related to the standardization of scoring rules: 1) the use of different scoring rules in the assessment and evaluation tasks, and 2) the transformation of strictly proper scoring rules. With regard to the latter, we identify standardized versions of the PS which are strictly proper scoring rules and which, as a result, appear to be appropriate scoring rules to use to measure the “skill” of probability forecasts.

Abstract

An individual skill score (SS) and a collective skill score (CSS) are examined to determine whether these scoring or improper. The SS and the CSS are both standardized versions of the Brier, or probability, score (PS) and have been used to measure the “skill” of probability forecasts. The SS is defined in terms of individual forecasts, while the CSS is defined in terms of collections of forecasts. The SS and the CSS are shown to be improper scoring rules, and, as a result, both the SS and the CSS encourage hedging on the part of forecasters.

The results of a preliminary, investigation of the nature of the hedging produced by. the SS and the CSS indicate that, while the SS may encourage a considerable amount of hedging, the CSS, in general, encourages only a modest amount of hedging, and even this hedging decreases as the sample size K of the collection forecasts increases. In fact, the CSS is approximately strictly Proper for large collections of forecasts (K ≥ 100).

Finally, we briefly consider two questions related to the standardization of scoring rules: 1) the use of different scoring rules in the assessment and evaluation tasks, and 2) the transformation of strictly proper scoring rules. With regard to the latter, we identify standardized versions of the PS which are strictly proper scoring rules and which, as a result, appear to be appropriate scoring rules to use to measure the “skill” of probability forecasts.

Save