A sample skill score (SSS), which is based upon a new partition of the probability, or Brier, score (PS) recently described by Murphy, is formulated. The SSS is defined simply as the difference between the PS for the sample relative frequencies, a term in this partition, and the PS for the forecast Probabilities. Thus, the SSS is a natural measure of the “skill” of probability forecasts. In addition, the other two terms in the partition of the PS form a useful partition of the SSS. Specifically, the SSS represents the difference between measures of the resolution and the reliability of such forecasts. The nature and properties of the SSS are examined. In this regard, the SSS is shown to be a strictly proper scoring rule (i.e., the SSS discourages hedging on the part of forecasters.
The SSS is a difference skill score and is based upon sample relative frequencies, while the scoring rules used heretofore to measure the “skill” of probability forecasts have been ratio skill scores and have been based upon climatological probabilities. First, difference and ratio skill scores are defined and compared. An examination of the properties of these two classes of scoring rules indicates that difference skill scores are, in general, strictly proper, while ratio skill scores are, in general, improper. On the other hand, strictly proper ratio skill scores can be formulated if expected scores as well as actual scores are used to standardize the relevant measures of “accuracy,” and a class of strictly proper ratio skill scores, based in part upon expected scores, is briefly described.
Second, the relative merits of using climatological probabilities and sample relative frequencies when formulating skill scores are examined in some detail. The results of this examination indicate that 1) the use of sample relative frequencies instead of climatological probabilities decreases the scores assigned to forecasters by both difference and ratio skill scores, although this decrease is quite small for large collections of forecasts; 2) any adverse psychological effects upon forecasters resulting from the use of sample relative frequencies instead of climatological probabilities (as well as any such effects resulting from the use of the skill scores themselves) can be substantially reduced by subjecting the scoring rules of concern to appropriate linear transformations; 3) the decrease or difference in score resulting from the use of sample relative frequencies does not appear to be a legitimate part of a forecaster's “Skill” 4) adjusting the average forecast probabilities to correspond more closely to the sample relative frequencies does not guarantee that the “skill” of the forecasts will increase; 5) the above-mentioned difference in score, which must he considered when comparing forecasters, forecast offices, etc., should, however, be considered separately from aspects of “skill” such as reliability and resolution; and 6) although, strictly speaking, climatological probabilities and Sample relative frequencies are and are not forecasts, respectively (and, as such, are appropriate and inappropriate, respectively, as the bases for skill scores), the difference in score resulting from the use of the latter, which are estimates of the former, will, in general, be quite small
In summary, the SSS appears to offer certain advantages vis-a-vis other skill scores used heretofore to measure the “skill” of probability forecasts.