# Search Results

## You are looking at 1 - 10 of 87 items for

- Author or Editor: Allan H. Murphy x

- All content x

## Abstract

A *sample skill score* (SSS), which is based upon a new partition of the probability, or Brier, score (PS) recently described by Murphy, is formulated. The SSS is defined simply as the difference between the PS for the sample relative frequencies, a term in this partition, and the PS for the forecast Probabilities. Thus, the SSS is a *natural* measure of the “skill” of probability forecasts. In addition, the other two terms in the partition of the PS form a useful partition of the SSS. Specifically, the SSS represents the difference between measures of the resolution and the reliability of such forecasts. The nature and properties of the SSS are examined. In this regard, the SSS is shown to be a *strictly proper* scoring rule (i.e., the SSS discourages hedging on the part of forecasters.

The SSS is a difference skill score and is based upon sample relative frequencies, while the scoring rules used heretofore to measure the “skill” of probability forecasts have been ratio skill scores and have been based upon climatological probabilities. First, difference and ratio skill scores are defined and compared. An examination of the properties of these two classes of scoring rules indicates that difference skill scores are, in general, strictly proper, while ratio skill scores are, in general, improper. On the other hand, strictly proper ratio skill scores can be formulated if expected scores as well as actual scores are used to standardize the relevant measures of “accuracy,” and a class of strictly proper ratio skill scores, based in part upon expected scores, is briefly described.

Second, the relative merits of using climatological probabilities and sample relative frequencies when formulating skill scores are examined in some detail. The results of this examination indicate that 1) the use of sample relative frequencies instead of climatological probabilities decreases the scores assigned to forecasters by both difference and ratio skill scores, although this decrease is quite small for large collections of forecasts; 2) any adverse psychological effects upon forecasters resulting from the use of sample relative frequencies instead of climatological probabilities (as well as any such effects resulting from the use of the skill scores themselves) can be substantially reduced by subjecting the scoring rules of concern to appropriate linear transformations; 3) the decrease or difference in score resulting from the use of sample relative frequencies does not appear to be a legitimate part of a forecaster's “Skill” 4) adjusting the average forecast probabilities to correspond more closely to the sample relative frequencies does not guarantee that the “skill” of the forecasts will increase; 5) the above-mentioned difference in score, which must he considered when comparing forecasters, forecast offices, etc., should, however, be considered separately from aspects of “skill” such as reliability and resolution; and 6) although, strictly speaking, climatological probabilities and Sample relative frequencies are and are not forecasts, respectively (and, as such, are appropriate and inappropriate, respectively, as the bases for skill scores), the difference in score resulting from the use of the latter, which are estimates of the former, will, in general, be quite small

In summary, the SSS appears to offer certain advantages vis-a-vis other skill scores used heretofore to measure the “skill” of probability forecasts.

## Abstract

A *sample skill score* (SSS), which is based upon a new partition of the probability, or Brier, score (PS) recently described by Murphy, is formulated. The SSS is defined simply as the difference between the PS for the sample relative frequencies, a term in this partition, and the PS for the forecast Probabilities. Thus, the SSS is a *natural* measure of the “skill” of probability forecasts. In addition, the other two terms in the partition of the PS form a useful partition of the SSS. Specifically, the SSS represents the difference between measures of the resolution and the reliability of such forecasts. The nature and properties of the SSS are examined. In this regard, the SSS is shown to be a *strictly proper* scoring rule (i.e., the SSS discourages hedging on the part of forecasters.

The SSS is a difference skill score and is based upon sample relative frequencies, while the scoring rules used heretofore to measure the “skill” of probability forecasts have been ratio skill scores and have been based upon climatological probabilities. First, difference and ratio skill scores are defined and compared. An examination of the properties of these two classes of scoring rules indicates that difference skill scores are, in general, strictly proper, while ratio skill scores are, in general, improper. On the other hand, strictly proper ratio skill scores can be formulated if expected scores as well as actual scores are used to standardize the relevant measures of “accuracy,” and a class of strictly proper ratio skill scores, based in part upon expected scores, is briefly described.

Second, the relative merits of using climatological probabilities and sample relative frequencies when formulating skill scores are examined in some detail. The results of this examination indicate that 1) the use of sample relative frequencies instead of climatological probabilities decreases the scores assigned to forecasters by both difference and ratio skill scores, although this decrease is quite small for large collections of forecasts; 2) any adverse psychological effects upon forecasters resulting from the use of sample relative frequencies instead of climatological probabilities (as well as any such effects resulting from the use of the skill scores themselves) can be substantially reduced by subjecting the scoring rules of concern to appropriate linear transformations; 3) the decrease or difference in score resulting from the use of sample relative frequencies does not appear to be a legitimate part of a forecaster's “Skill” 4) adjusting the average forecast probabilities to correspond more closely to the sample relative frequencies does not guarantee that the “skill” of the forecasts will increase; 5) the above-mentioned difference in score, which must he considered when comparing forecasters, forecast offices, etc., should, however, be considered separately from aspects of “skill” such as reliability and resolution; and 6) although, strictly speaking, climatological probabilities and Sample relative frequencies are and are not forecasts, respectively (and, as such, are appropriate and inappropriate, respectively, as the bases for skill scores), the difference in score resulting from the use of the latter, which are estimates of the former, will, in general, be quite small

In summary, the SSS appears to offer certain advantages vis-a-vis other skill scores used heretofore to measure the “skill” of probability forecasts.

## Abstract

Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.

## Abstract

Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.

## Abstract

The general framework for forecast verification described by Murphy and Winkler embodies a statistical approach to the problem of assessing the quality of forecasts. This framework is based on the joint distribution of forecasts and observations, together with conditional and marginal distributions derived from decompositions of the underlying joint distribution. An augmented version of the original framework is outlined in this paper. The extended framework provides a coherent method of addressing the problem of stratification in this context and it can be used to assess forecast quality—and its various aspects–under specific meteorological conditions. Conceptual examples are presented to illustrate potential applications of this methodological framework. Some issues concerning the extended framework and its application to real-world verification problems arc discussed briefly.

## Abstract

The general framework for forecast verification described by Murphy and Winkler embodies a statistical approach to the problem of assessing the quality of forecasts. This framework is based on the joint distribution of forecasts and observations, together with conditional and marginal distributions derived from decompositions of the underlying joint distribution. An augmented version of the original framework is outlined in this paper. The extended framework provides a coherent method of addressing the problem of stratification in this context and it can be used to assess forecast quality—and its various aspects–under specific meteorological conditions. Conceptual examples are presented to illustrate potential applications of this methodological framework. Some issues concerning the extended framework and its application to real-world verification problems arc discussed briefly.

## Abstract

Comparative operational evaluation of probabilistic prediction procedures in cost-loss ratio decision situations in which the evaluator's knowledge of the cost-loss ratio is expressed in probabilistic terms is considered. First, the cost-loss ratio decision situation is described in a utility framework and, then, measures of the expected-utility of probabilistic predictions are formulated. Second, a class of expected-utility measures, the beta measures, in which the evaluator's knowledge of the cost-loss ratio is expressed in terms of a beta distribution, are described. Third, the beta measures are utilized to compare two prediction procedures on the basis of a small sample of predictions. The results indicate the importance, for comparative operational evaluation, of utilizing measures which provide a suitable description of the evaluator's knowledge. In particular, the use of the probability score, a measure equivalent to the *uniform* measure (which is a special beta measure), in decision situations in which the uniform distribution does not provide a suitable description of the evaluator's knowledge, may yield misleading results. Finally, the results are placed in proper perspective by describing several possible extensions to this study and by indicating the importance of undertaking such studies in actual operational situations.

## Abstract

Comparative operational evaluation of probabilistic prediction procedures in cost-loss ratio decision situations in which the evaluator's knowledge of the cost-loss ratio is expressed in probabilistic terms is considered. First, the cost-loss ratio decision situation is described in a utility framework and, then, measures of the expected-utility of probabilistic predictions are formulated. Second, a class of expected-utility measures, the beta measures, in which the evaluator's knowledge of the cost-loss ratio is expressed in terms of a beta distribution, are described. Third, the beta measures are utilized to compare two prediction procedures on the basis of a small sample of predictions. The results indicate the importance, for comparative operational evaluation, of utilizing measures which provide a suitable description of the evaluator's knowledge. In particular, the use of the probability score, a measure equivalent to the *uniform* measure (which is a special beta measure), in decision situations in which the uniform distribution does not provide a suitable description of the evaluator's knowledge, may yield misleading results. Finally, the results are placed in proper perspective by describing several possible extensions to this study and by indicating the importance of undertaking such studies in actual operational situations.

## Abstract

Situations sometimes arise in which it is necessary to evaluate and compare the performance of categorical and probabilistic forecasts. The traditional approach to this problem involves the transformation of the probabilistic forecasts into categorical forecasts and the comparison of the two sets of forecasts in a categorical framework. This approach suffers from several serious deficiencies. Alternative approaches are proposed here that consist in (i) treating the categorical forecasts as probabilistic forecasts or (ii) replacing the categorical forecasts with primitive probabilistic forecasts. These approaches permit the sets of forecasts to be compared in a probabilistic framework and offer several important advantages vis-a-vis the traditional approach. The proposed approaches are compared and some issues related to these approaches and the overall problem itself are discussed.

## Abstract

Situations sometimes arise in which it is necessary to evaluate and compare the performance of categorical and probabilistic forecasts. The traditional approach to this problem involves the transformation of the probabilistic forecasts into categorical forecasts and the comparison of the two sets of forecasts in a categorical framework. This approach suffers from several serious deficiencies. Alternative approaches are proposed here that consist in (i) treating the categorical forecasts as probabilistic forecasts or (ii) replacing the categorical forecasts with primitive probabilistic forecasts. These approaches permit the sets of forecasts to be compared in a probabilistic framework and offer several important advantages vis-a-vis the traditional approach. The proposed approaches are compared and some issues related to these approaches and the overall problem itself are discussed.

This paper briefly examines the nature of hedging and its role in the formulation of categorical and probabilistic forecasts. Hedging is defined in terms of the difference between a forecaster's *judgment* and his *forecast.* It is then argued that a judgment cannot accurately reflect the forecaster's true state of knowledge unless the uncertainties inherent in the formulation of this judgment are described in a qualitative and/or quantitative manner. Since categorical forecasting does not provide the forecaster with a means of making his forecasts correspond to such judgments, a categorical forecast is generally a hedge. Probabilistic forecasting, on the other hand, presents the forecaster with an opportunity to eliminate hedging by making his (probabilistic) forecasts correspond exactly to his judgments. Thus, contrary to popular belief, the desire to eliminate hedging should encourage forecasters to express more rather than fewer forecasts in probabilistic terms.

This paper briefly examines the nature of hedging and its role in the formulation of categorical and probabilistic forecasts. Hedging is defined in terms of the difference between a forecaster's *judgment* and his *forecast.* It is then argued that a judgment cannot accurately reflect the forecaster's true state of knowledge unless the uncertainties inherent in the formulation of this judgment are described in a qualitative and/or quantitative manner. Since categorical forecasting does not provide the forecaster with a means of making his forecasts correspond to such judgments, a categorical forecast is generally a hedge. Probabilistic forecasting, on the other hand, presents the forecaster with an opportunity to eliminate hedging by making his (probabilistic) forecasts correspond exactly to his judgments. Thus, contrary to popular belief, the desire to eliminate hedging should encourage forecasters to express more rather than fewer forecasts in probabilistic terms.

## Abstract

Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector *cumulative* forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.

Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.

## Abstract

Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector *cumulative* forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.

Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.