# Search Results

## You are looking at 51 - 60 of 87 items for

- Author or Editor: ALLAN H. MURPHY x

- Refine by Access: All Content x

## Abstract

The sufficiency relation, originally developed in the context of the comparison of statistical experiments, provides a sound basis for the comparative evaluation of forecasting systems. The importance of this relation resides in the fact that if forecasting system *A* can be shown to be sufficient for forecasting system *B*, then all users will find *A*'s forecasts of greater value than *B*'s forecasts regardless of their individual payoff structures.

In this paper the sufficiency relation is applied to the problem of comparative evaluation of prototypical climate forecasting systems. The primary objectives here are to assess the basic applicability of the sufficiency relation in this context and to investigate the implications of this approach for the relationships among the performance characteristics of such forecasting systems.

The results confirm that forecasting system *A* is sufficient for forecasting system *B* when the former uses more extreme probabilities more frequently than the latter. Further, in terms of the relatively simple forecasting systems considered here, it is found that system *A* may be sufficient for system *B* even if the former uses extreme forecasts less frequently, provided that *A*'s forecasts are—to a certain degree—more extreme than *B*'s forecasts. Conversely, system *A* cannot be shown to be sufficient for system *B* if the former users less extreme forecasts more frequently than the latter. The advantages of the sufficiency relation over traditional performance measures in this context are also demonstrated.

Several issues related to the general applicability of the sufficiency relation to the comparative evaluation of climate forecasts are discussed. Possible extensions of this work, as well as some implications of the results for verification procedures and practices in this context, am briefly described.

## Abstract

The sufficiency relation, originally developed in the context of the comparison of statistical experiments, provides a sound basis for the comparative evaluation of forecasting systems. The importance of this relation resides in the fact that if forecasting system *A* can be shown to be sufficient for forecasting system *B*, then all users will find *A*'s forecasts of greater value than *B*'s forecasts regardless of their individual payoff structures.

In this paper the sufficiency relation is applied to the problem of comparative evaluation of prototypical climate forecasting systems. The primary objectives here are to assess the basic applicability of the sufficiency relation in this context and to investigate the implications of this approach for the relationships among the performance characteristics of such forecasting systems.

The results confirm that forecasting system *A* is sufficient for forecasting system *B* when the former uses more extreme probabilities more frequently than the latter. Further, in terms of the relatively simple forecasting systems considered here, it is found that system *A* may be sufficient for system *B* even if the former uses extreme forecasts less frequently, provided that *A*'s forecasts are—to a certain degree—more extreme than *B*'s forecasts. Conversely, system *A* cannot be shown to be sufficient for system *B* if the former users less extreme forecasts more frequently than the latter. The advantages of the sufficiency relation over traditional performance measures in this context are also demonstrated.

Several issues related to the general applicability of the sufficiency relation to the comparative evaluation of climate forecasts are discussed. Possible extensions of this work, as well as some implications of the results for verification procedures and practices in this context, am briefly described.

## Abstract

A time-dependent version of the cost-loss ratio situation is described and the optimal use and economic value of meteorological information are investigated in this decision-making problem. The time-dependent situation is motivated by a decision maker who contemplates postponing the protect/do not protect decision in anticipation of obtaining more accurate forecasts at some later time (i.e., shorter lead time), but who also recognizes that the cost of protection will increase as lead time decreases. Imperfect categorical forecasts, calibrated according to past performance, constitute the information of primary interest. Optimal decisions are based on minimizing expected expense and the value of information is measured relative to the expected expense associated with climatological information.

Accuracy and cost of protection are modeled as exponentially decreasing functions of lead time, and time-dependent expressions for expected expense and value of information are derived. An optimal lead time is identified that corresponds to the time at which the expected expense associated with imperfect forecasts attains its minimum value. The effects of the values of the parameters in the accuracy and cost-of-protection models on expected expense, optimal lead time, and forecast value are examined. Moreover, the optimal lead time is shown to differ in some cases from the lead time at which the economic value of imperfect forecasts is maximized. Numerical examples are presented to illustrate the various results. The implications of these results are discussed and some possible extensions of this work are suggested.

## Abstract

A time-dependent version of the cost-loss ratio situation is described and the optimal use and economic value of meteorological information are investigated in this decision-making problem. The time-dependent situation is motivated by a decision maker who contemplates postponing the protect/do not protect decision in anticipation of obtaining more accurate forecasts at some later time (i.e., shorter lead time), but who also recognizes that the cost of protection will increase as lead time decreases. Imperfect categorical forecasts, calibrated according to past performance, constitute the information of primary interest. Optimal decisions are based on minimizing expected expense and the value of information is measured relative to the expected expense associated with climatological information.

Accuracy and cost of protection are modeled as exponentially decreasing functions of lead time, and time-dependent expressions for expected expense and value of information are derived. An optimal lead time is identified that corresponds to the time at which the expected expense associated with imperfect forecasts attains its minimum value. The effects of the values of the parameters in the accuracy and cost-of-protection models on expected expense, optimal lead time, and forecast value are examined. Moreover, the optimal lead time is shown to differ in some cases from the lead time at which the economic value of imperfect forecasts is maximized. Numerical examples are presented to illustrate the various results. The implications of these results are discussed and some possible extensions of this work are suggested.

## Abstract

In this paper the sufficiency relation is used to compare objective and subjective probability of precipitation (PoP) forecasts. The theoretical significance of the sufficiency relation in comparative evaluation arises from the fact that if it can be shown that forecasting system A is sufficient for forecasting system B, then A's forecasts are necessarily of higher quality and greater value to all users than B's forecasts. However, since the sufficiency relation is an incomplete order (it is not always possible to show that system A is sufficient for system B, or vice versa), the practical significance of this relation warrants further investigation.

An operational method of comparing forecasting systems using the sufficiency relation has recently been described in the forecasting literature. This method involves the construction of a so-called *forecast sufficiency characteristic* (FSC) for each forecasting system, based on a representative set of forecasts and observations. In terms of this characterization, system A is sufficient for system B if A's FSC is superior to B's FSC.

Objective and subjective PoP forecasts for six National Weather Service offices are compared here in terms of their respective FSCs. Sufficiency was found in only two of the 24 cases defined by various combinations of forecast office, season, and lead time. In these two cases, both involving 12–24 hour forecasts, the subjective forecasts were sufficient for the objective forecasts. Several other cases exhibited a condition described as "almost sufficient,” but caution must be exercised in drawing conclusions regarding the relative quality and/or relative value of forecasts in such cases. Comparison of the FSCs of PoP forecasts and the FSCs of categorical forecasts derived from the PoP forecasts reveals that, as expected, the former are always superior to the latter. The implications of these results for comparative evaluation of weather forecasting systems are discussed and some possible extensions of this work are identified.

## Abstract

In this paper the sufficiency relation is used to compare objective and subjective probability of precipitation (PoP) forecasts. The theoretical significance of the sufficiency relation in comparative evaluation arises from the fact that if it can be shown that forecasting system A is sufficient for forecasting system B, then A's forecasts are necessarily of higher quality and greater value to all users than B's forecasts. However, since the sufficiency relation is an incomplete order (it is not always possible to show that system A is sufficient for system B, or vice versa), the practical significance of this relation warrants further investigation.

An operational method of comparing forecasting systems using the sufficiency relation has recently been described in the forecasting literature. This method involves the construction of a so-called *forecast sufficiency characteristic* (FSC) for each forecasting system, based on a representative set of forecasts and observations. In terms of this characterization, system A is sufficient for system B if A's FSC is superior to B's FSC.

Objective and subjective PoP forecasts for six National Weather Service offices are compared here in terms of their respective FSCs. Sufficiency was found in only two of the 24 cases defined by various combinations of forecast office, season, and lead time. In these two cases, both involving 12–24 hour forecasts, the subjective forecasts were sufficient for the objective forecasts. Several other cases exhibited a condition described as "almost sufficient,” but caution must be exercised in drawing conclusions regarding the relative quality and/or relative value of forecasts in such cases. Comparison of the FSCs of PoP forecasts and the FSCs of categorical forecasts derived from the PoP forecasts reveals that, as expected, the former are always superior to the latter. The implications of these results for comparative evaluation of weather forecasting systems are discussed and some possible extensions of this work are identified.

## Abstract

Many skill scores used to evaluate categorical forecasts of discrete variables are inequitable, in the sense that constant forecasts of some events lead to better scores than constant forecasts of other events. Inequitable skill scores may encourage forecasters to favor some events at the expense of other events, thereby producing forecasts that exhibit systematic biases or other undesirable characteristics.

This Paper describes a method of formulating *equitable skill scores* for categorical forecasts of nominal and ordinal variables. Equitable skill scores are based on scoring matrices, which assign scores to the various combinations of forecast and observed events. The basic tenets of equitability require that (i) all constant forecasts—and random forecosts—receive the same expected score, and (ii) the elements of scoring matrices do not depend on the elements of performance matrices. Scoring matrices are assumed here to be symmetric and to possess other reasonable properties related to the nature of the underlying variable. To scale the elements of scoring matrices, the expected scores for constant and random forecasts are set equal to zero and the expected score for perfect forecasts is set equal to one. Taken together, these conditions are necessary but generally not sufficient to determine uniquely the elements of a scoring matrix. To obtain a unique scoring matrix, additional conditions must be imposed or some scores must be specified a priori.

Equitable skill scores are illustrated here by considering specific situations as well as numerical examples. These skill scores possess several desirable properties: (i) The score assigned to a correct forecast of an event increases as the climatological probability of the event decreases and (ii) scoring, matrices in *n*+1–event and *n*-event situations may be made consistent, in the sense that the former approaches the latter as the climatological probability of one of the events approaches zero. Several possible extensions and applications of this method are discussed.

## Abstract

Many skill scores used to evaluate categorical forecasts of discrete variables are inequitable, in the sense that constant forecasts of some events lead to better scores than constant forecasts of other events. Inequitable skill scores may encourage forecasters to favor some events at the expense of other events, thereby producing forecasts that exhibit systematic biases or other undesirable characteristics.

This Paper describes a method of formulating *equitable skill scores* for categorical forecasts of nominal and ordinal variables. Equitable skill scores are based on scoring matrices, which assign scores to the various combinations of forecast and observed events. The basic tenets of equitability require that (i) all constant forecasts—and random forecosts—receive the same expected score, and (ii) the elements of scoring matrices do not depend on the elements of performance matrices. Scoring matrices are assumed here to be symmetric and to possess other reasonable properties related to the nature of the underlying variable. To scale the elements of scoring matrices, the expected scores for constant and random forecasts are set equal to zero and the expected score for perfect forecasts is set equal to one. Taken together, these conditions are necessary but generally not sufficient to determine uniquely the elements of a scoring matrix. To obtain a unique scoring matrix, additional conditions must be imposed or some scores must be specified a priori.

Equitable skill scores are illustrated here by considering specific situations as well as numerical examples. These skill scores possess several desirable properties: (i) The score assigned to a correct forecast of an event increases as the climatological probability of the event decreases and (ii) scoring, matrices in *n*+1–event and *n*-event situations may be made consistent, in the sense that the former approaches the latter as the climatological probability of one of the events approaches zero. Several possible extensions and applications of this method are discussed.

## Abstract

Attributes of the anomaly correlation coefficient, as a model verification measure, are investigated by exploiting a recently developed method of decomposing skill scores into other measures of performance. A mean square error skill score based on historical climatology is decomposed into terms involving the anomaly correlation coefficient, the conditional bias in the forecast, the unconditional bias in the forecast, and the difference between the mean historical and sample climatologies. This decomposition reveals that the square of the anomaly correlation coefficient should be interpreted as a measure of *potential* rather than actual skill.

The decomposition is applied to a small sample of geopotential height field forecasts, for lead times from one to ten days, produced by the medium range forecast (MRF) model. After about four days, the actual skill of the MRF forecasts (as measured by the “climatological skill score”) is considerably less than their potential skill (as measured by the anomaly correlation coefficient), due principally to the appearance of substantial conditional biases in the forecasts. These biases, and the corresponding loss of skill, represent the penalty associated with retaining “meteorological” features in the geopotential height field when such features are not predictable. Some implications of these results for the practice of model verification are discussed.

## Abstract

Attributes of the anomaly correlation coefficient, as a model verification measure, are investigated by exploiting a recently developed method of decomposing skill scores into other measures of performance. A mean square error skill score based on historical climatology is decomposed into terms involving the anomaly correlation coefficient, the conditional bias in the forecast, the unconditional bias in the forecast, and the difference between the mean historical and sample climatologies. This decomposition reveals that the square of the anomaly correlation coefficient should be interpreted as a measure of *potential* rather than actual skill.

The decomposition is applied to a small sample of geopotential height field forecasts, for lead times from one to ten days, produced by the medium range forecast (MRF) model. After about four days, the actual skill of the MRF forecasts (as measured by the “climatological skill score”) is considerably less than their potential skill (as measured by the anomaly correlation coefficient), due principally to the appearance of substantial conditional biases in the forecasts. These biases, and the corresponding loss of skill, represent the penalty associated with retaining “meteorological” features in the geopotential height field when such features are not predictable. Some implications of these results for the practice of model verification are discussed.

## Abstract

The concept of sufficiency, originally introduced in the context of the comparison of statistical experiments, has recently been shown to provide a coherent basis for comparative evaluation of forecasting systems. Specifically, forecasting system A is said to be sufficient for forecasting system B if B's forecasts can be obtained from A's forecasts by a stochastic transformation. The sufficiency of A's forecasts for B's forecasts implies that the former are of higher quality than the latter and that all users will find A's forecasts of greater value than B's forecasts. However, it is not always possible to establish that system A is sufficient for system B or vice versa. This paper examines the concept of sufficiency in the context of comparative evaluation of simple probabilistic weather forecasting systems and investigates its interpretations and implications from perspectives provided by a recently developed general framework for forecast verification.

It is shown here that if system A is sufficient for system B, then the basic performance characteristics of the two systems are related via sets of inequalities and A's forecasts are necessarily more accurate than B's forecasts. Conversely, knowledge of a complete set of performance characteristics makes it possible to infer whether A is sufficient for B, B is sufficient for A, or the two systems are insufficient for each other. In general, however, information regarding only relative accuracy, as measured by a performance measure such as the mean square error, will *not* be adequate to determine the presence or absence of sufficiency, except in situations in which the accuracy of the system of interest exceeds some relatively high critical value. These results, illustrated by means of numerical examples, suggest that comparative evaluation of weather forecasting systems should be based on fundamental performance characteristics rather than on overall performance measures.

Possible extensions of these results to situations involving more general forecasting systems, as well as some implications of the results for verification procedures and practices in meteorology, are briefly discussed.

## Abstract

The concept of sufficiency, originally introduced in the context of the comparison of statistical experiments, has recently been shown to provide a coherent basis for comparative evaluation of forecasting systems. Specifically, forecasting system A is said to be sufficient for forecasting system B if B's forecasts can be obtained from A's forecasts by a stochastic transformation. The sufficiency of A's forecasts for B's forecasts implies that the former are of higher quality than the latter and that all users will find A's forecasts of greater value than B's forecasts. However, it is not always possible to establish that system A is sufficient for system B or vice versa. This paper examines the concept of sufficiency in the context of comparative evaluation of simple probabilistic weather forecasting systems and investigates its interpretations and implications from perspectives provided by a recently developed general framework for forecast verification.

It is shown here that if system A is sufficient for system B, then the basic performance characteristics of the two systems are related via sets of inequalities and A's forecasts are necessarily more accurate than B's forecasts. Conversely, knowledge of a complete set of performance characteristics makes it possible to infer whether A is sufficient for B, B is sufficient for A, or the two systems are insufficient for each other. In general, however, information regarding only relative accuracy, as measured by a performance measure such as the mean square error, will *not* be adequate to determine the presence or absence of sufficiency, except in situations in which the accuracy of the system of interest exceeds some relatively high critical value. These results, illustrated by means of numerical examples, suggest that comparative evaluation of weather forecasting systems should be based on fundamental performance characteristics rather than on overall performance measures.

Possible extensions of these results to situations involving more general forecasting systems, as well as some implications of the results for verification procedures and practices in meteorology, are briefly discussed.

## Abstract

A general framework for forecast verification based on the joint distribution of forecasts and observations is described. For further elaboration of the framework, two factorizations of the joint distribution are investigated: 1) the calibration-refinement factorization, which involves the conditional distributions of observations given forecasts and the marginal distribution of forecasts, and 2) the likelihood-base factorization, which involve the conditional distributions of forecasts given observations and the marginal distribution of observations. The names given to the factorizations reflect the fact that they relate to different attributes of the forecasts and/or observations. Several examples are used to illustrate the interpretation of these factorizations in the context of verification and to describe the relationship between the respective factorizations.

Some insight into the potential utility of the framework is provided by demonstrating that basic elements and summary measures of the joint, conditional, and marginal distributions play key roles in current verification methods. The need for further investigation of the implications of this framework for verification theory and practice is emphasized, and some possible directions for future research in this area are identified.

## Abstract

A general framework for forecast verification based on the joint distribution of forecasts and observations is described. For further elaboration of the framework, two factorizations of the joint distribution are investigated: 1) the calibration-refinement factorization, which involves the conditional distributions of observations given forecasts and the marginal distribution of forecasts, and 2) the likelihood-base factorization, which involve the conditional distributions of forecasts given observations and the marginal distribution of observations. The names given to the factorizations reflect the fact that they relate to different attributes of the forecasts and/or observations. Several examples are used to illustrate the interpretation of these factorizations in the context of verification and to describe the relationship between the respective factorizations.

Some insight into the potential utility of the framework is provided by demonstrating that basic elements and summary measures of the joint, conditional, and marginal distributions play key roles in current verification methods. The need for further investigation of the implications of this framework for verification theory and practice is emphasized, and some possible directions for future research in this area are identified.

## Abstract

On most forecasting occasions forecasts are made for several successive periods, but decision-making models have traditionally neglected the impact of the potentially useful information contained in forecasts for periods beyond the initial period. The use and value of multiple-period forecasts are investigated here in the context of a recently developed dynamic model of the basic cost-loss ratio situation. We also extend previous studies of this model by examining the impacts—on forecast use and value—of assuming (i) that weather events in successive periods are dependent and (ii) that the forecasts of interest are expressed in probabilistic terms. In this regard, expressions are derived for the expected expenses associated with the use of climatological, imperfect (categorical or probabilistic) and perfect multiple-period forecasts under conditions of dependence and independence between events.

Numerical results are presented concerning expected expense and economic value, based on artificially generated forecasts that incorporate the effects of the decrease in forecast quality as lead time increases. Comparisons are made between multiple-period and single-period forecasts, between dependent and independent events, and between probabilistic and categorical forecasts. For some values of the relevant parameters (e.g., cost-loss ratio, climatological probability), the availability of information for longer lead times can substantially increase economic value. It appears, however, that (i) current imperfect forecasts achieve relatively little of this potential value and (ii) improvements in forecasts at longer lead times must be accompanied by improvements at the shortest lead times for these benefits to be realized. Dependence (i.e., persistence) between events generally reduces weather-related expected expenses, sometimes quite substantially, and consequently reduces forecast value. The results also demonstrate once again the potential economic benefits of expressing forecasts in a probabilistic rather than a categorical formal.

## Abstract

On most forecasting occasions forecasts are made for several successive periods, but decision-making models have traditionally neglected the impact of the potentially useful information contained in forecasts for periods beyond the initial period. The use and value of multiple-period forecasts are investigated here in the context of a recently developed dynamic model of the basic cost-loss ratio situation. We also extend previous studies of this model by examining the impacts—on forecast use and value—of assuming (i) that weather events in successive periods are dependent and (ii) that the forecasts of interest are expressed in probabilistic terms. In this regard, expressions are derived for the expected expenses associated with the use of climatological, imperfect (categorical or probabilistic) and perfect multiple-period forecasts under conditions of dependence and independence between events.

Numerical results are presented concerning expected expense and economic value, based on artificially generated forecasts that incorporate the effects of the decrease in forecast quality as lead time increases. Comparisons are made between multiple-period and single-period forecasts, between dependent and independent events, and between probabilistic and categorical forecasts. For some values of the relevant parameters (e.g., cost-loss ratio, climatological probability), the availability of information for longer lead times can substantially increase economic value. It appears, however, that (i) current imperfect forecasts achieve relatively little of this potential value and (ii) improvements in forecasts at longer lead times must be accompanied by improvements at the shortest lead times for these benefits to be realized. Dependence (i.e., persistence) between events generally reduces weather-related expected expenses, sometimes quite substantially, and consequently reduces forecast value. The results also demonstrate once again the potential economic benefits of expressing forecasts in a probabilistic rather than a categorical formal.

## Abstract

The economic value of current and hypothetically improved seasonal precipitation forecasts is estimated for a regionally important haying/pasturing problem in western Oregon by modeling and analyzing the problem in a decision-analytic framework. Although current forecasts are found to be of relatively little value in this decision-making problem, moderate increases in the quality of the forecasts would lead to substantial increases in their value. The quality/value relationship is sensitive to changes in various economic parameters, including the decision maker's attitude toward risk.

## Abstract

The economic value of current and hypothetically improved seasonal precipitation forecasts is estimated for a regionally important haying/pasturing problem in western Oregon by modeling and analyzing the problem in a decision-analytic framework. Although current forecasts are found to be of relatively little value in this decision-making problem, moderate increases in the quality of the forecasts would lead to substantial increases in their value. The quality/value relationship is sensitive to changes in various economic parameters, including the decision maker's attitude toward risk.

## Abstract

Subjective probability forecasts of wind speed, visibility and precipitation events for six-hour periods have been prepared on an experimental basis by forecasters at Zierikzec in The Netherlands since October 1980. Results from the first year of the experiment were encouraging, but they revealed a substantial amount of overforecasting (i.e., a strong tendency for forecast probabilities to exceed observed relative frequencies) for all events, periods and forecasters. Moreover, this overforecasting was reflected in a rapid deterioration in the skill of the forecast as a function of lead time. In October 1981 the forecasters were given extensive feedback concerning their individual and collective performance during the first year of the experimental program. The purpose of this paper is to compare the results of the first and second years of the experiment.

Evaluation of the forecasts formulated in the fist and second years of the Zierikzee experiment reveals marked improvements in reliability (i.e., reductions in overforecasting) from year 1 to year 2, both overall and for most stratifications of the results by event, period or forecaster. For example, the reliability of the forecasts increased for all events and periods and for three of the four forecasters. The improvements in reliability are reflected in substantial increases in the skill of the forecasts from year 1 to year 2, with overall skill scores for the second (first) year for the wind speed, visibility and precipitation forecasts of 25.4% (13.9%), 22.4% (12.4%) and 0.5% (−24.7%), respectively. These improvements in performance are attributed to the feedback provided to the forecasters at the beginning of the second year of the experiment and to the experience in probability forecasting gained by the forecasters during the first year of the program.

The paper concludes with a brief discussion of the results and their implications for probability forecasting in meteorology.

## Abstract

Subjective probability forecasts of wind speed, visibility and precipitation events for six-hour periods have been prepared on an experimental basis by forecasters at Zierikzec in The Netherlands since October 1980. Results from the first year of the experiment were encouraging, but they revealed a substantial amount of overforecasting (i.e., a strong tendency for forecast probabilities to exceed observed relative frequencies) for all events, periods and forecasters. Moreover, this overforecasting was reflected in a rapid deterioration in the skill of the forecast as a function of lead time. In October 1981 the forecasters were given extensive feedback concerning their individual and collective performance during the first year of the experimental program. The purpose of this paper is to compare the results of the first and second years of the experiment.

Evaluation of the forecasts formulated in the fist and second years of the Zierikzee experiment reveals marked improvements in reliability (i.e., reductions in overforecasting) from year 1 to year 2, both overall and for most stratifications of the results by event, period or forecaster. For example, the reliability of the forecasts increased for all events and periods and for three of the four forecasters. The improvements in reliability are reflected in substantial increases in the skill of the forecasts from year 1 to year 2, with overall skill scores for the second (first) year for the wind speed, visibility and precipitation forecasts of 25.4% (13.9%), 22.4% (12.4%) and 0.5% (−24.7%), respectively. These improvements in performance are attributed to the feedback provided to the forecasters at the beginning of the second year of the experiment and to the experience in probability forecasting gained by the forecasters during the first year of the program.

The paper concludes with a brief discussion of the results and their implications for probability forecasting in meteorology.