# Search Results

## You are looking at 21 - 30 of 87 items for

- Author or Editor: ALLAN H. MURPHY x

- Refine by Access: All Content x

## Abstract

Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.

## Abstract

Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.

## Abstract

This paper is concerned with the value of climatological, categorical, probabilistic and perfect forecasts in the cost-loss ratio situation. Expressions are derived for the expense associated with these different types of forecasts, and measures of value and relative value are formulated in terms of these expressions. Some relationships among these expressions and measures are described, and these relationships are illustrated by examining both hypothetical and real sets of forecasts.

It is demonstrated that, if the probabilistic forecasts of concern are (completely) reliable, then the value of these forecasts is greater than the value of climatological and categorical forecasts for all activities or operations (i.e., for all values of the cost‐loss ratio *C*/*L*). On the other hand, if the forecasts are unreliable, then the value of climatological and/or categorical forecasts may be greater than the value of probabilistic forecasts for some values of *C*/*L*. However, examination of hypothetical and real sets of unreliable forecasts indicates that the relationships between the value of *reliable* probabilistic forecasts and the value of climatological and categorical forecasts are quite robust in the sense that these relationships appear to hold for most if not all values of *C*/*L* even for moderately unreliable forecasts.

The results presented in this paper have important implications for operational forecasting procedures and practices. These implications relate to the desirability of formulating and disseminating a wide variety of weather forecasts in probabilistic terms and of achieving and maintaining a high degree of reliability in probabilistic forecasts.

## Abstract

This paper is concerned with the value of climatological, categorical, probabilistic and perfect forecasts in the cost-loss ratio situation. Expressions are derived for the expense associated with these different types of forecasts, and measures of value and relative value are formulated in terms of these expressions. Some relationships among these expressions and measures are described, and these relationships are illustrated by examining both hypothetical and real sets of forecasts.

It is demonstrated that, if the probabilistic forecasts of concern are (completely) reliable, then the value of these forecasts is greater than the value of climatological and categorical forecasts for all activities or operations (i.e., for all values of the cost‐loss ratio *C*/*L*). On the other hand, if the forecasts are unreliable, then the value of climatological and/or categorical forecasts may be greater than the value of probabilistic forecasts for some values of *C*/*L*. However, examination of hypothetical and real sets of unreliable forecasts indicates that the relationships between the value of *reliable* probabilistic forecasts and the value of climatological and categorical forecasts are quite robust in the sense that these relationships appear to hold for most if not all values of *C*/*L* even for moderately unreliable forecasts.

The results presented in this paper have important implications for operational forecasting procedures and practices. These implications relate to the desirability of formulating and disseminating a wide variety of weather forecasts in probabilistic terms and of achieving and maintaining a high degree of reliability in probabilistic forecasts.

## Abstract

Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector *cumulative* forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.

Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.

## Abstract

Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector *cumulative* forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.

Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.

## Abstract

In this paper we describe and compare two models of the familiar cost-loss ratio situation. This situation involves a decision maker who must decide whether or not to take protective action, with respect to some activity or operation, in the face of uncertainty as to whether or not weather adverse to the activity will occur. The original model, first described by J.C. Thompson, is based in part upon the (implicit) assumption that taking protective action *completely* eliminates the loss associated with the occurrence of adverse weather. In the model formulated in this paper, on the other hand, it is assumed that taking protective action may reduce *or* eliminate this loss. The original model, then, is a special am of this “generalized” model. We show that the decision rule in each model depends upon a cost-loss ratio and that *in both models* this ratio is simply the cost of protection divided by the protectable portion of the loss. Thus the two models are equivalent from a decision-making point of view. This result also implies that the original model is applicable to a wider class of decision-making situations than has generally been recognized heretofore.

We also formulate measures of the value of probability forecasts within the frameworks of these models. First, the expenses (i.e., costs and losses) are translated into utilities, which are assumed to express the decision maker's preferences for the consequences. Then,. probabilistic specifications of the utilities are briefly discussed and general expressions are presented for the appropriate measures of value in cost-loss ratio situations with such specifications, namely expected-utility measures. Finally, we formulate the expected-utility measure associated with each model when the relevant utilities are assumed to possess a uniform probability distribution. Both measures are then shown to be *equivalent* (i.e., linearly related) to the Brier, or probability, score, a (familiar measure of the accuracy of probability forecasts. These results provide additional support for the use of the probability score as an evaluation measure.

## Abstract

In this paper we describe and compare two models of the familiar cost-loss ratio situation. This situation involves a decision maker who must decide whether or not to take protective action, with respect to some activity or operation, in the face of uncertainty as to whether or not weather adverse to the activity will occur. The original model, first described by J.C. Thompson, is based in part upon the (implicit) assumption that taking protective action *completely* eliminates the loss associated with the occurrence of adverse weather. In the model formulated in this paper, on the other hand, it is assumed that taking protective action may reduce *or* eliminate this loss. The original model, then, is a special am of this “generalized” model. We show that the decision rule in each model depends upon a cost-loss ratio and that *in both models* this ratio is simply the cost of protection divided by the protectable portion of the loss. Thus the two models are equivalent from a decision-making point of view. This result also implies that the original model is applicable to a wider class of decision-making situations than has generally been recognized heretofore.

We also formulate measures of the value of probability forecasts within the frameworks of these models. First, the expenses (i.e., costs and losses) are translated into utilities, which are assumed to express the decision maker's preferences for the consequences. Then,. probabilistic specifications of the utilities are briefly discussed and general expressions are presented for the appropriate measures of value in cost-loss ratio situations with such specifications, namely expected-utility measures. Finally, we formulate the expected-utility measure associated with each model when the relevant utilities are assumed to possess a uniform probability distribution. Both measures are then shown to be *equivalent* (i.e., linearly related) to the Brier, or probability, score, a (familiar measure of the accuracy of probability forecasts. These results provide additional support for the use of the probability score as an evaluation measure.

## Abstract

No abstract available.

## Abstract

No abstract available.

## Abstract

A *sample skill score* (SSS), which is based upon a new partition of the probability, or Brier, score (PS) recently described by Murphy, is formulated. The SSS is defined simply as the difference between the PS for the sample relative frequencies, a term in this partition, and the PS for the forecast Probabilities. Thus, the SSS is a *natural* measure of the “skill” of probability forecasts. In addition, the other two terms in the partition of the PS form a useful partition of the SSS. Specifically, the SSS represents the difference between measures of the resolution and the reliability of such forecasts. The nature and properties of the SSS are examined. In this regard, the SSS is shown to be a *strictly proper* scoring rule (i.e., the SSS discourages hedging on the part of forecasters.

The SSS is a difference skill score and is based upon sample relative frequencies, while the scoring rules used heretofore to measure the “skill” of probability forecasts have been ratio skill scores and have been based upon climatological probabilities. First, difference and ratio skill scores are defined and compared. An examination of the properties of these two classes of scoring rules indicates that difference skill scores are, in general, strictly proper, while ratio skill scores are, in general, improper. On the other hand, strictly proper ratio skill scores can be formulated if expected scores as well as actual scores are used to standardize the relevant measures of “accuracy,” and a class of strictly proper ratio skill scores, based in part upon expected scores, is briefly described.

Second, the relative merits of using climatological probabilities and sample relative frequencies when formulating skill scores are examined in some detail. The results of this examination indicate that 1) the use of sample relative frequencies instead of climatological probabilities decreases the scores assigned to forecasters by both difference and ratio skill scores, although this decrease is quite small for large collections of forecasts; 2) any adverse psychological effects upon forecasters resulting from the use of sample relative frequencies instead of climatological probabilities (as well as any such effects resulting from the use of the skill scores themselves) can be substantially reduced by subjecting the scoring rules of concern to appropriate linear transformations; 3) the decrease or difference in score resulting from the use of sample relative frequencies does not appear to be a legitimate part of a forecaster's “Skill” 4) adjusting the average forecast probabilities to correspond more closely to the sample relative frequencies does not guarantee that the “skill” of the forecasts will increase; 5) the above-mentioned difference in score, which must he considered when comparing forecasters, forecast offices, etc., should, however, be considered separately from aspects of “skill” such as reliability and resolution; and 6) although, strictly speaking, climatological probabilities and Sample relative frequencies are and are not forecasts, respectively (and, as such, are appropriate and inappropriate, respectively, as the bases for skill scores), the difference in score resulting from the use of the latter, which are estimates of the former, will, in general, be quite small

In summary, the SSS appears to offer certain advantages vis-a-vis other skill scores used heretofore to measure the “skill” of probability forecasts.

## Abstract

A *sample skill score* (SSS), which is based upon a new partition of the probability, or Brier, score (PS) recently described by Murphy, is formulated. The SSS is defined simply as the difference between the PS for the sample relative frequencies, a term in this partition, and the PS for the forecast Probabilities. Thus, the SSS is a *natural* measure of the “skill” of probability forecasts. In addition, the other two terms in the partition of the PS form a useful partition of the SSS. Specifically, the SSS represents the difference between measures of the resolution and the reliability of such forecasts. The nature and properties of the SSS are examined. In this regard, the SSS is shown to be a *strictly proper* scoring rule (i.e., the SSS discourages hedging on the part of forecasters.

The SSS is a difference skill score and is based upon sample relative frequencies, while the scoring rules used heretofore to measure the “skill” of probability forecasts have been ratio skill scores and have been based upon climatological probabilities. First, difference and ratio skill scores are defined and compared. An examination of the properties of these two classes of scoring rules indicates that difference skill scores are, in general, strictly proper, while ratio skill scores are, in general, improper. On the other hand, strictly proper ratio skill scores can be formulated if expected scores as well as actual scores are used to standardize the relevant measures of “accuracy,” and a class of strictly proper ratio skill scores, based in part upon expected scores, is briefly described.

Second, the relative merits of using climatological probabilities and sample relative frequencies when formulating skill scores are examined in some detail. The results of this examination indicate that 1) the use of sample relative frequencies instead of climatological probabilities decreases the scores assigned to forecasters by both difference and ratio skill scores, although this decrease is quite small for large collections of forecasts; 2) any adverse psychological effects upon forecasters resulting from the use of sample relative frequencies instead of climatological probabilities (as well as any such effects resulting from the use of the skill scores themselves) can be substantially reduced by subjecting the scoring rules of concern to appropriate linear transformations; 3) the decrease or difference in score resulting from the use of sample relative frequencies does not appear to be a legitimate part of a forecaster's “Skill” 4) adjusting the average forecast probabilities to correspond more closely to the sample relative frequencies does not guarantee that the “skill” of the forecasts will increase; 5) the above-mentioned difference in score, which must he considered when comparing forecasters, forecast offices, etc., should, however, be considered separately from aspects of “skill” such as reliability and resolution; and 6) although, strictly speaking, climatological probabilities and Sample relative frequencies are and are not forecasts, respectively (and, as such, are appropriate and inappropriate, respectively, as the bases for skill scores), the difference in score resulting from the use of the latter, which are estimates of the former, will, in general, be quite small

In summary, the SSS appears to offer certain advantages vis-a-vis other skill scores used heretofore to measure the “skill” of probability forecasts.

## Abstract

Several issues related to the mode of expression of forecasts of rare events (RSEs) are addressed in this paper. These issues include the correspondence between forecasters' judgments and their forecasts, the problem of overforecasting, and the use of forecasts as a basis for rational decision making. Neither forecasters nor users are well served by current practices, according to which operational forecasts of RSEs are generally expressed in a categorical format.

It is argued here that sound scientific and economic reasons exist for expressing forecasts of RSEs in terms of probabilities. Although quantification of uncertainty in forecasts of RSEs–-and the communication of such information to users–-presents some special problems, evidence accumulated from a multitude of operational and experimental probabilistic weather forecasting programs suggests that these problems involve no insurmountable difficulties. Moreover, when a probabilistic format is employed, forecasts of RSEs can correspond to forecasters’ true judgments, the forecasting and decision-making tasks can be disentangled, the rationale for overforecasting RSEs is eliminated, and the needs of *all* users can be met in an optimal manner.

Since the probabilities of RSEs seldom achieve high values, it might be desirable to provide users with information concerning the likelihood of such events relative to their climatological likelihood. Alternatively, the relative odds–-that is, the ratio of an event's forecast odds to its climatological odds–-could be reported. This supplemental information should help to focus users’ attention on those occasions on which the probability of RSEs is *relatively* high.

## Abstract

Several issues related to the mode of expression of forecasts of rare events (RSEs) are addressed in this paper. These issues include the correspondence between forecasters' judgments and their forecasts, the problem of overforecasting, and the use of forecasts as a basis for rational decision making. Neither forecasters nor users are well served by current practices, according to which operational forecasts of RSEs are generally expressed in a categorical format.

It is argued here that sound scientific and economic reasons exist for expressing forecasts of RSEs in terms of probabilities. Although quantification of uncertainty in forecasts of RSEs–-and the communication of such information to users–-presents some special problems, evidence accumulated from a multitude of operational and experimental probabilistic weather forecasting programs suggests that these problems involve no insurmountable difficulties. Moreover, when a probabilistic format is employed, forecasts of RSEs can correspond to forecasters’ true judgments, the forecasting and decision-making tasks can be disentangled, the rationale for overforecasting RSEs is eliminated, and the needs of *all* users can be met in an optimal manner.

Since the probabilities of RSEs seldom achieve high values, it might be desirable to provide users with information concerning the likelihood of such events relative to their climatological likelihood. Alternatively, the relative odds–-that is, the ratio of an event's forecast odds to its climatological odds–-could be reported. This supplemental information should help to focus users’ attention on those occasions on which the probability of RSEs is *relatively* high.

## Abstract

Skill scores measure the accuracy of the forecasts Of interest relative to the accuracy Of forecasts based on naive forecasting methods, with either climatology or persistence usually playing the role of the naive method. In formulating skill scores, it is generally agreed that the naive method that produces the most accurate forecasts should be chosen as the standard of reference. The conditions under which climatological forecasts are more accurate than persistence forecasts—and vice versa—were first described in the meteorological literature more than 30 years ago. At about the same time, it was also shown that a linear combination of climatology and persistence produces more accurate forecasts than either of these standards of reference alone. Surprisingly, these results have had relatively little if any impact on the practice of forecast verification in general and the choice of a standard of reference in formulating skill scorn in particular.

The purposes of this paper are to describe these results and discuss their implications for the practice of forecast verification. Expressions for the mean-square errors of forecasts based on climatology, persistence, and an optimal linear combination of climatology and persistence—as well as expressions for the respective skill scores—are presented and compared. These pairwise comparisons identify the conditions under which each naive method is superior as a standard of reference. Since the optimal linear combination produces more accurate forecasts than either climatology or persistence alone, it leads to lower skill scores than the other two naive forecasting methods. Decreases in the values of the skill scores associated with many types of operational weather forecasts can be anticipated if the optimal linear combination of climatology and persistence is used as a standard of reference. The conditions under which this practice might lead to substantial decreases in such skill scores are identified.

## Abstract

Skill scores measure the accuracy of the forecasts Of interest relative to the accuracy Of forecasts based on naive forecasting methods, with either climatology or persistence usually playing the role of the naive method. In formulating skill scores, it is generally agreed that the naive method that produces the most accurate forecasts should be chosen as the standard of reference. The conditions under which climatological forecasts are more accurate than persistence forecasts—and vice versa—were first described in the meteorological literature more than 30 years ago. At about the same time, it was also shown that a linear combination of climatology and persistence produces more accurate forecasts than either of these standards of reference alone. Surprisingly, these results have had relatively little if any impact on the practice of forecast verification in general and the choice of a standard of reference in formulating skill scorn in particular.

The purposes of this paper are to describe these results and discuss their implications for the practice of forecast verification. Expressions for the mean-square errors of forecasts based on climatology, persistence, and an optimal linear combination of climatology and persistence—as well as expressions for the respective skill scores—are presented and compared. These pairwise comparisons identify the conditions under which each naive method is superior as a standard of reference. Since the optimal linear combination produces more accurate forecasts than either climatology or persistence alone, it leads to lower skill scores than the other two naive forecasting methods. Decreases in the values of the skill scores associated with many types of operational weather forecasts can be anticipated if the optimal linear combination of climatology and persistence is used as a standard of reference. The conditions under which this practice might lead to substantial decreases in such skill scores are identified.

## Abstract

Differences of opinion exist among forecasters—and between forecasters and users—regarding the meaning of the phrase “good (bad) weather forecasts.” These differences of opinion are fueled by a lack of clarity and/or understanding concerning the nature of goodness in weather forecasting. This lack of clarity and understanding complicates the processes of formulating and evaluating weather forecasts and undermines their ultimate usefulness.

Three distinct types of goodness are identified in this paper: 1) the correspondence between forecasters’ judgments and their forecasts (type 1 goodness, or *consistency*), 2) the correspondence between the forecasts and the matching observations (type 2 goodness, or *quality*), and 3) the incremental economic and/or other benefits realized by decision makers through the use of the forecasts (type 3 goodness, or *value*). Each type of goodness is defined and described in some detail. In addition, issues related to the measurement of consistency, quality, and value are discussed.

Relationships among the three types of goodness are also considered. It is shown by example that the level of consistency directly impacts the levels of both quality and value. Moreover, recent studies of quality/value relationships have revealed that these relationships are inherently nonlinear and may not be monotonic unless the multifaceted nature of quality is respected. Some implications of these considerations for various practices related to operational forecasting are discussed. Changes in these practices that could enhance the goodness of weather forecasts in one or more respects are identified.

## Abstract

Differences of opinion exist among forecasters—and between forecasters and users—regarding the meaning of the phrase “good (bad) weather forecasts.” These differences of opinion are fueled by a lack of clarity and/or understanding concerning the nature of goodness in weather forecasting. This lack of clarity and understanding complicates the processes of formulating and evaluating weather forecasts and undermines their ultimate usefulness.

Three distinct types of goodness are identified in this paper: 1) the correspondence between forecasters’ judgments and their forecasts (type 1 goodness, or *consistency*), 2) the correspondence between the forecasts and the matching observations (type 2 goodness, or *quality*), and 3) the incremental economic and/or other benefits realized by decision makers through the use of the forecasts (type 3 goodness, or *value*). Each type of goodness is defined and described in some detail. In addition, issues related to the measurement of consistency, quality, and value are discussed.

Relationships among the three types of goodness are also considered. It is shown by example that the level of consistency directly impacts the levels of both quality and value. Moreover, recent studies of quality/value relationships have revealed that these relationships are inherently nonlinear and may not be monotonic unless the multifaceted nature of quality is respected. Some implications of these considerations for various practices related to operational forecasting are discussed. Changes in these practices that could enhance the goodness of weather forecasts in one or more respects are identified.