Search Results
You are looking at 1 - 10 of 86 items for
- Author or Editor: Allan H. Murphy x
- Refine by Access: All Content x
Abstract
Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.
Abstract
Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.
Abstract
This paper is concerned with the value of climatological, categorical, probabilistic and perfect forecasts in the cost-loss ratio situation. Expressions are derived for the expense associated with these different types of forecasts, and measures of value and relative value are formulated in terms of these expressions. Some relationships among these expressions and measures are described, and these relationships are illustrated by examining both hypothetical and real sets of forecasts.
It is demonstrated that, if the probabilistic forecasts of concern are (completely) reliable, then the value of these forecasts is greater than the value of climatological and categorical forecasts for all activities or operations (i.e., for all values of the cost‐loss ratio C/L). On the other hand, if the forecasts are unreliable, then the value of climatological and/or categorical forecasts may be greater than the value of probabilistic forecasts for some values of C/L. However, examination of hypothetical and real sets of unreliable forecasts indicates that the relationships between the value of reliable probabilistic forecasts and the value of climatological and categorical forecasts are quite robust in the sense that these relationships appear to hold for most if not all values of C/L even for moderately unreliable forecasts.
The results presented in this paper have important implications for operational forecasting procedures and practices. These implications relate to the desirability of formulating and disseminating a wide variety of weather forecasts in probabilistic terms and of achieving and maintaining a high degree of reliability in probabilistic forecasts.
Abstract
This paper is concerned with the value of climatological, categorical, probabilistic and perfect forecasts in the cost-loss ratio situation. Expressions are derived for the expense associated with these different types of forecasts, and measures of value and relative value are formulated in terms of these expressions. Some relationships among these expressions and measures are described, and these relationships are illustrated by examining both hypothetical and real sets of forecasts.
It is demonstrated that, if the probabilistic forecasts of concern are (completely) reliable, then the value of these forecasts is greater than the value of climatological and categorical forecasts for all activities or operations (i.e., for all values of the cost‐loss ratio C/L). On the other hand, if the forecasts are unreliable, then the value of climatological and/or categorical forecasts may be greater than the value of probabilistic forecasts for some values of C/L. However, examination of hypothetical and real sets of unreliable forecasts indicates that the relationships between the value of reliable probabilistic forecasts and the value of climatological and categorical forecasts are quite robust in the sense that these relationships appear to hold for most if not all values of C/L even for moderately unreliable forecasts.
The results presented in this paper have important implications for operational forecasting procedures and practices. These implications relate to the desirability of formulating and disseminating a wide variety of weather forecasts in probabilistic terms and of achieving and maintaining a high degree of reliability in probabilistic forecasts.
Abstract
In this paper we describe and compare two models of the familiar cost-loss ratio situation. This situation involves a decision maker who must decide whether or not to take protective action, with respect to some activity or operation, in the face of uncertainty as to whether or not weather adverse to the activity will occur. The original model, first described by J.C. Thompson, is based in part upon the (implicit) assumption that taking protective action completely eliminates the loss associated with the occurrence of adverse weather. In the model formulated in this paper, on the other hand, it is assumed that taking protective action may reduce or eliminate this loss. The original model, then, is a special am of this “generalized” model. We show that the decision rule in each model depends upon a cost-loss ratio and that in both models this ratio is simply the cost of protection divided by the protectable portion of the loss. Thus the two models are equivalent from a decision-making point of view. This result also implies that the original model is applicable to a wider class of decision-making situations than has generally been recognized heretofore.
We also formulate measures of the value of probability forecasts within the frameworks of these models. First, the expenses (i.e., costs and losses) are translated into utilities, which are assumed to express the decision maker's preferences for the consequences. Then,. probabilistic specifications of the utilities are briefly discussed and general expressions are presented for the appropriate measures of value in cost-loss ratio situations with such specifications, namely expected-utility measures. Finally, we formulate the expected-utility measure associated with each model when the relevant utilities are assumed to possess a uniform probability distribution. Both measures are then shown to be equivalent (i.e., linearly related) to the Brier, or probability, score, a (familiar measure of the accuracy of probability forecasts. These results provide additional support for the use of the probability score as an evaluation measure.
Abstract
In this paper we describe and compare two models of the familiar cost-loss ratio situation. This situation involves a decision maker who must decide whether or not to take protective action, with respect to some activity or operation, in the face of uncertainty as to whether or not weather adverse to the activity will occur. The original model, first described by J.C. Thompson, is based in part upon the (implicit) assumption that taking protective action completely eliminates the loss associated with the occurrence of adverse weather. In the model formulated in this paper, on the other hand, it is assumed that taking protective action may reduce or eliminate this loss. The original model, then, is a special am of this “generalized” model. We show that the decision rule in each model depends upon a cost-loss ratio and that in both models this ratio is simply the cost of protection divided by the protectable portion of the loss. Thus the two models are equivalent from a decision-making point of view. This result also implies that the original model is applicable to a wider class of decision-making situations than has generally been recognized heretofore.
We also formulate measures of the value of probability forecasts within the frameworks of these models. First, the expenses (i.e., costs and losses) are translated into utilities, which are assumed to express the decision maker's preferences for the consequences. Then,. probabilistic specifications of the utilities are briefly discussed and general expressions are presented for the appropriate measures of value in cost-loss ratio situations with such specifications, namely expected-utility measures. Finally, we formulate the expected-utility measure associated with each model when the relevant utilities are assumed to possess a uniform probability distribution. Both measures are then shown to be equivalent (i.e., linearly related) to the Brier, or probability, score, a (familiar measure of the accuracy of probability forecasts. These results provide additional support for the use of the probability score as an evaluation measure.
Abstract
Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector cumulative forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.
Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.
Abstract
Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector cumulative forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.
Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.
Abstract
A sample skill score (SSS), which is based upon a new partition of the probability, or Brier, score (PS) recently described by Murphy, is formulated. The SSS is defined simply as the difference between the PS for the sample relative frequencies, a term in this partition, and the PS for the forecast Probabilities. Thus, the SSS is a natural measure of the “skill” of probability forecasts. In addition, the other two terms in the partition of the PS form a useful partition of the SSS. Specifically, the SSS represents the difference between measures of the resolution and the reliability of such forecasts. The nature and properties of the SSS are examined. In this regard, the SSS is shown to be a strictly proper scoring rule (i.e., the SSS discourages hedging on the part of forecasters.
The SSS is a difference skill score and is based upon sample relative frequencies, while the scoring rules used heretofore to measure the “skill” of probability forecasts have been ratio skill scores and have been based upon climatological probabilities. First, difference and ratio skill scores are defined and compared. An examination of the properties of these two classes of scoring rules indicates that difference skill scores are, in general, strictly proper, while ratio skill scores are, in general, improper. On the other hand, strictly proper ratio skill scores can be formulated if expected scores as well as actual scores are used to standardize the relevant measures of “accuracy,” and a class of strictly proper ratio skill scores, based in part upon expected scores, is briefly described.
Second, the relative merits of using climatological probabilities and sample relative frequencies when formulating skill scores are examined in some detail. The results of this examination indicate that 1) the use of sample relative frequencies instead of climatological probabilities decreases the scores assigned to forecasters by both difference and ratio skill scores, although this decrease is quite small for large collections of forecasts; 2) any adverse psychological effects upon forecasters resulting from the use of sample relative frequencies instead of climatological probabilities (as well as any such effects resulting from the use of the skill scores themselves) can be substantially reduced by subjecting the scoring rules of concern to appropriate linear transformations; 3) the decrease or difference in score resulting from the use of sample relative frequencies does not appear to be a legitimate part of a forecaster's “Skill” 4) adjusting the average forecast probabilities to correspond more closely to the sample relative frequencies does not guarantee that the “skill” of the forecasts will increase; 5) the above-mentioned difference in score, which must he considered when comparing forecasters, forecast offices, etc., should, however, be considered separately from aspects of “skill” such as reliability and resolution; and 6) although, strictly speaking, climatological probabilities and Sample relative frequencies are and are not forecasts, respectively (and, as such, are appropriate and inappropriate, respectively, as the bases for skill scores), the difference in score resulting from the use of the latter, which are estimates of the former, will, in general, be quite small
In summary, the SSS appears to offer certain advantages vis-a-vis other skill scores used heretofore to measure the “skill” of probability forecasts.
Abstract
A sample skill score (SSS), which is based upon a new partition of the probability, or Brier, score (PS) recently described by Murphy, is formulated. The SSS is defined simply as the difference between the PS for the sample relative frequencies, a term in this partition, and the PS for the forecast Probabilities. Thus, the SSS is a natural measure of the “skill” of probability forecasts. In addition, the other two terms in the partition of the PS form a useful partition of the SSS. Specifically, the SSS represents the difference between measures of the resolution and the reliability of such forecasts. The nature and properties of the SSS are examined. In this regard, the SSS is shown to be a strictly proper scoring rule (i.e., the SSS discourages hedging on the part of forecasters.
The SSS is a difference skill score and is based upon sample relative frequencies, while the scoring rules used heretofore to measure the “skill” of probability forecasts have been ratio skill scores and have been based upon climatological probabilities. First, difference and ratio skill scores are defined and compared. An examination of the properties of these two classes of scoring rules indicates that difference skill scores are, in general, strictly proper, while ratio skill scores are, in general, improper. On the other hand, strictly proper ratio skill scores can be formulated if expected scores as well as actual scores are used to standardize the relevant measures of “accuracy,” and a class of strictly proper ratio skill scores, based in part upon expected scores, is briefly described.
Second, the relative merits of using climatological probabilities and sample relative frequencies when formulating skill scores are examined in some detail. The results of this examination indicate that 1) the use of sample relative frequencies instead of climatological probabilities decreases the scores assigned to forecasters by both difference and ratio skill scores, although this decrease is quite small for large collections of forecasts; 2) any adverse psychological effects upon forecasters resulting from the use of sample relative frequencies instead of climatological probabilities (as well as any such effects resulting from the use of the skill scores themselves) can be substantially reduced by subjecting the scoring rules of concern to appropriate linear transformations; 3) the decrease or difference in score resulting from the use of sample relative frequencies does not appear to be a legitimate part of a forecaster's “Skill” 4) adjusting the average forecast probabilities to correspond more closely to the sample relative frequencies does not guarantee that the “skill” of the forecasts will increase; 5) the above-mentioned difference in score, which must he considered when comparing forecasters, forecast offices, etc., should, however, be considered separately from aspects of “skill” such as reliability and resolution; and 6) although, strictly speaking, climatological probabilities and Sample relative frequencies are and are not forecasts, respectively (and, as such, are appropriate and inappropriate, respectively, as the bases for skill scores), the difference in score resulting from the use of the latter, which are estimates of the former, will, in general, be quite small
In summary, the SSS appears to offer certain advantages vis-a-vis other skill scores used heretofore to measure the “skill” of probability forecasts.
Abstract
In this paper, we compare the ranked probability score (RPS) and the probability score (PS) and examine the nature of the sensitivity of the RPS to distance. First, we briefly describe the nature of and the relationship between the frameworks within which the RPS and the PS were formulated. Second, we consider certain properties of the RPS and the PS including their range, their values for categorical and uniform forecasts, and their “proper” nature. Third, we describe the RPS and the PS in a manner that reveals the structure of and the relationship between these scoring rules. Fourth, we considered the RPS with reference to two definitions of distance and examine the nature of the sensitivity of the RPS to distance. The comparison of the RPS and the PS suggests that the RPS rather than the PS should be used to evaluate probability forecats, at least in those situations in which the variable of concern is ordered.
Abstract
In this paper, we compare the ranked probability score (RPS) and the probability score (PS) and examine the nature of the sensitivity of the RPS to distance. First, we briefly describe the nature of and the relationship between the frameworks within which the RPS and the PS were formulated. Second, we consider certain properties of the RPS and the PS including their range, their values for categorical and uniform forecasts, and their “proper” nature. Third, we describe the RPS and the PS in a manner that reveals the structure of and the relationship between these scoring rules. Fourth, we considered the RPS with reference to two definitions of distance and examine the nature of the sensitivity of the RPS to distance. The comparison of the RPS and the PS suggests that the RPS rather than the PS should be used to evaluate probability forecats, at least in those situations in which the variable of concern is ordered.
Abstract
A new vector partition of the probability, or Brier, score (PS) is formulated and the nature and properties of this partition are described. The relationships between the terms in this partition and the terms in the original vector partition of the PS are indicated. The new partition consists of three terms: 1) a measure of the uncertainty inherent in the events, or states, on the occasions of concern (namely, the PS for the sample relative frequencies); 2) a measure of the reliability of the forecasts; and 3) a new measure of the resolution of the forecasts. These measures of reliability and resolution are and are not, respectively, equivalent (i.e., linearly related) to the measures of reliability and resolution provided by the original partition. Two sample collections of probability forecasts are used to illustrate the differences and relationships between these partitions. Finally, the two partitions are compared, with particular reference to the attributes of the forecasts with which the partitions are concerned, the interpretation of the partitions in geometric terms, and the use of the partitions as the bases for the formulation of measures to evaluate probability forecasts. The results of these comparisons indicate that the new partition offers certain advantages vis-à-vis the original partition.
Abstract
A new vector partition of the probability, or Brier, score (PS) is formulated and the nature and properties of this partition are described. The relationships between the terms in this partition and the terms in the original vector partition of the PS are indicated. The new partition consists of three terms: 1) a measure of the uncertainty inherent in the events, or states, on the occasions of concern (namely, the PS for the sample relative frequencies); 2) a measure of the reliability of the forecasts; and 3) a new measure of the resolution of the forecasts. These measures of reliability and resolution are and are not, respectively, equivalent (i.e., linearly related) to the measures of reliability and resolution provided by the original partition. Two sample collections of probability forecasts are used to illustrate the differences and relationships between these partitions. Finally, the two partitions are compared, with particular reference to the attributes of the forecasts with which the partitions are concerned, the interpretation of the partitions in geometric terms, and the use of the partitions as the bases for the formulation of measures to evaluate probability forecasts. The results of these comparisons indicate that the new partition offers certain advantages vis-à-vis the original partition.
Abstract
Scalar and vector partitions of the probability score (PS) in N-state (N > 2) situations are described and compared. In N-state, as well as in two-state (N = 2), situations these partitions provide similar, but not equivalent (i.e., linearly related), measures of the reliability and resolution of probability forecasts. Specifically, the vector partition, when compared to the scalar partition, decreases the reliability and increases the resolution of the forecasts. A sample collection of forecasts is used to illustrate the differences between these partitions in N-state situations.
Several questions related to the use of scalar and vector partitions of the PS in N-state situations are discussed, including the relative merits of these partitions and the effect upon sample size when forecasts are considered to be vectors rather than scalars. The discussions indicate that the vector partition appears to be more appropriate, in general, than the scalar partition, and that when the forecasts in a collection of forecasts are considered to be vectors rather than scalars the sample size of the collection may be substantially reduced.
Abstract
Scalar and vector partitions of the probability score (PS) in N-state (N > 2) situations are described and compared. In N-state, as well as in two-state (N = 2), situations these partitions provide similar, but not equivalent (i.e., linearly related), measures of the reliability and resolution of probability forecasts. Specifically, the vector partition, when compared to the scalar partition, decreases the reliability and increases the resolution of the forecasts. A sample collection of forecasts is used to illustrate the differences between these partitions in N-state situations.
Several questions related to the use of scalar and vector partitions of the PS in N-state situations are discussed, including the relative merits of these partitions and the effect upon sample size when forecasts are considered to be vectors rather than scalars. The discussions indicate that the vector partition appears to be more appropriate, in general, than the scalar partition, and that when the forecasts in a collection of forecasts are considered to be vectors rather than scalars the sample size of the collection may be substantially reduced.
Abstract
An individual skill score (SS) and a collective skill score (CSS) are examined to determine whether these scoring or improper. The SS and the CSS are both standardized versions of the Brier, or probability, score (PS) and have been used to measure the “skill” of probability forecasts. The SS is defined in terms of individual forecasts, while the CSS is defined in terms of collections of forecasts. The SS and the CSS are shown to be improper scoring rules, and, as a result, both the SS and the CSS encourage hedging on the part of forecasters.
The results of a preliminary, investigation of the nature of the hedging produced by. the SS and the CSS indicate that, while the SS may encourage a considerable amount of hedging, the CSS, in general, encourages only a modest amount of hedging, and even this hedging decreases as the sample size K of the collection forecasts increases. In fact, the CSS is approximately strictly Proper for large collections of forecasts (K ≥ 100).
Finally, we briefly consider two questions related to the standardization of scoring rules: 1) the use of different scoring rules in the assessment and evaluation tasks, and 2) the transformation of strictly proper scoring rules. With regard to the latter, we identify standardized versions of the PS which are strictly proper scoring rules and which, as a result, appear to be appropriate scoring rules to use to measure the “skill” of probability forecasts.
Abstract
An individual skill score (SS) and a collective skill score (CSS) are examined to determine whether these scoring or improper. The SS and the CSS are both standardized versions of the Brier, or probability, score (PS) and have been used to measure the “skill” of probability forecasts. The SS is defined in terms of individual forecasts, while the CSS is defined in terms of collections of forecasts. The SS and the CSS are shown to be improper scoring rules, and, as a result, both the SS and the CSS encourage hedging on the part of forecasters.
The results of a preliminary, investigation of the nature of the hedging produced by. the SS and the CSS indicate that, while the SS may encourage a considerable amount of hedging, the CSS, in general, encourages only a modest amount of hedging, and even this hedging decreases as the sample size K of the collection forecasts increases. In fact, the CSS is approximately strictly Proper for large collections of forecasts (K ≥ 100).
Finally, we briefly consider two questions related to the standardization of scoring rules: 1) the use of different scoring rules in the assessment and evaluation tasks, and 2) the transformation of strictly proper scoring rules. With regard to the latter, we identify standardized versions of the PS which are strictly proper scoring rules and which, as a result, appear to be appropriate scoring rules to use to measure the “skill” of probability forecasts.