Search Results
You are looking at 1 - 10 of 87 items for
- Author or Editor: Allan H. Murphy x
- Refine by Access: All Content x
Abstract
The general framework for forecast verification described by Murphy and Winkler embodies a statistical approach to the problem of assessing the quality of forecasts. This framework is based on the joint distribution of forecasts and observations, together with conditional and marginal distributions derived from decompositions of the underlying joint distribution. An augmented version of the original framework is outlined in this paper. The extended framework provides a coherent method of addressing the problem of stratification in this context and it can be used to assess forecast quality—and its various aspects–under specific meteorological conditions. Conceptual examples are presented to illustrate potential applications of this methodological framework. Some issues concerning the extended framework and its application to real-world verification problems arc discussed briefly.
Abstract
The general framework for forecast verification described by Murphy and Winkler embodies a statistical approach to the problem of assessing the quality of forecasts. This framework is based on the joint distribution of forecasts and observations, together with conditional and marginal distributions derived from decompositions of the underlying joint distribution. An augmented version of the original framework is outlined in this paper. The extended framework provides a coherent method of addressing the problem of stratification in this context and it can be used to assess forecast quality—and its various aspects–under specific meteorological conditions. Conceptual examples are presented to illustrate potential applications of this methodological framework. Some issues concerning the extended framework and its application to real-world verification problems arc discussed briefly.
Abstract
Skill scores defined as measures of relative mean square error—and based on standards of reference representing climatology, persistence, or a linear combination of climatology and persistence—are decomposed. Two decompositions of each skill score are formulated: 1) a decomposition derived by conditioning on the forecasts and 2) a decomposition derived by conditioning on the observations. These general decompositions contain terms consisting of measures of statistical characteristics of the forecasts and/or observations and terms consisting of measures of basic aspects of forecast quality. Properties of the terms in the respective decompositions are examined, and relationships among the various skill scores—and the terms in the respective decompositions—are described.
Hypothetical samples of binary forecasts and observations are used to illustrate the application and interpretation of these decompositions. Limitations on the inferences that can be drawn from comparative verification based on skill scores, as well as from comparisons based on the terms in decompositions of skill scores, are discussed. The relationship between the application of measures of aspects of quality and the application of the sufficiency relation (a statistical relation that embodies the concept of unambiguous superiority) is briefly explored.
The following results can be gleaned from this methodological study. 1) Decompositions of skill scores provide quantitative measures of—and insights into—multiple aspects of the forecasts, the observations, and their relationship. 2) Superiority in terms of overall skill is no guarantor of superiority in terms of other aspects of quality. 3) Sufficiency (i.e., unambiguous superiority) generally cannot be inferred solely on the basis of superiority over a relatively small set of measures of specific aspects of quality.
Neither individual measures of overall performance (e.g., skill scores) nor sets of measures associated with decompositions of such overall measures respect the dimensionality of most verification problems. Nevertheless, the decompositions described here identify parsimonious sets of measures of basic aspects of forecast quality that should prove to be useful in many verification problems encountered in the real world.
Abstract
Skill scores defined as measures of relative mean square error—and based on standards of reference representing climatology, persistence, or a linear combination of climatology and persistence—are decomposed. Two decompositions of each skill score are formulated: 1) a decomposition derived by conditioning on the forecasts and 2) a decomposition derived by conditioning on the observations. These general decompositions contain terms consisting of measures of statistical characteristics of the forecasts and/or observations and terms consisting of measures of basic aspects of forecast quality. Properties of the terms in the respective decompositions are examined, and relationships among the various skill scores—and the terms in the respective decompositions—are described.
Hypothetical samples of binary forecasts and observations are used to illustrate the application and interpretation of these decompositions. Limitations on the inferences that can be drawn from comparative verification based on skill scores, as well as from comparisons based on the terms in decompositions of skill scores, are discussed. The relationship between the application of measures of aspects of quality and the application of the sufficiency relation (a statistical relation that embodies the concept of unambiguous superiority) is briefly explored.
The following results can be gleaned from this methodological study. 1) Decompositions of skill scores provide quantitative measures of—and insights into—multiple aspects of the forecasts, the observations, and their relationship. 2) Superiority in terms of overall skill is no guarantor of superiority in terms of other aspects of quality. 3) Sufficiency (i.e., unambiguous superiority) generally cannot be inferred solely on the basis of superiority over a relatively small set of measures of specific aspects of quality.
Neither individual measures of overall performance (e.g., skill scores) nor sets of measures associated with decompositions of such overall measures respect the dimensionality of most verification problems. Nevertheless, the decompositions described here identify parsimonious sets of measures of basic aspects of forecast quality that should prove to be useful in many verification problems encountered in the real world.
Abstract
Several skill scores are defined, based on the mean-square-error measure of accuracy and alternative climatological standards of reference. Decompositions of these skill scores are formulated, each of which is shown to possess terms involving 1) the coefficient of correlation between the forecasts and observations, 2) a measure of the nonsystematic (i.e., conditional) bias in the forecast, and 3) a measure of the systematic (i.e., unconditional) bias in the forecasts. Depending on the choice of standard of reference, a particular decomposition may also contain terms relating to the degree of association between the reference forecasts and the observations. These decompositions yield analytical relationships between the respective skill scores and the correlation coefficient, document fundamental deficiencies in the correlation coefficient as a measure of performance, and provide additional insight into basic characteristics of forecasting performance. Samples of operational precipitation probability and minimum temperature forecasts are used to investigate the typical magnitudes of the terms in the decompositions. Some implications of the results for the practice of forecast verification are discussed.
Abstract
Several skill scores are defined, based on the mean-square-error measure of accuracy and alternative climatological standards of reference. Decompositions of these skill scores are formulated, each of which is shown to possess terms involving 1) the coefficient of correlation between the forecasts and observations, 2) a measure of the nonsystematic (i.e., conditional) bias in the forecast, and 3) a measure of the systematic (i.e., unconditional) bias in the forecasts. Depending on the choice of standard of reference, a particular decomposition may also contain terms relating to the degree of association between the reference forecasts and the observations. These decompositions yield analytical relationships between the respective skill scores and the correlation coefficient, document fundamental deficiencies in the correlation coefficient as a measure of performance, and provide additional insight into basic characteristics of forecasting performance. Samples of operational precipitation probability and minimum temperature forecasts are used to investigate the typical magnitudes of the terms in the decompositions. Some implications of the results for the practice of forecast verification are discussed.
Abstract
Situations sometimes arise in which it is necessary to evaluate and compare the performance of categorical and probabilistic forecasts. The traditional approach to this problem involves the transformation of the probabilistic forecasts into categorical forecasts and the comparison of the two sets of forecasts in a categorical framework. This approach suffers from several serious deficiencies. Alternative approaches are proposed here that consist in (i) treating the categorical forecasts as probabilistic forecasts or (ii) replacing the categorical forecasts with primitive probabilistic forecasts. These approaches permit the sets of forecasts to be compared in a probabilistic framework and offer several important advantages vis-a-vis the traditional approach. The proposed approaches are compared and some issues related to these approaches and the overall problem itself are discussed.
Abstract
Situations sometimes arise in which it is necessary to evaluate and compare the performance of categorical and probabilistic forecasts. The traditional approach to this problem involves the transformation of the probabilistic forecasts into categorical forecasts and the comparison of the two sets of forecasts in a categorical framework. This approach suffers from several serious deficiencies. Alternative approaches are proposed here that consist in (i) treating the categorical forecasts as probabilistic forecasts or (ii) replacing the categorical forecasts with primitive probabilistic forecasts. These approaches permit the sets of forecasts to be compared in a probabilistic framework and offer several important advantages vis-a-vis the traditional approach. The proposed approaches are compared and some issues related to these approaches and the overall problem itself are discussed.
Abstract
Meteorologists have devoted considerable attention to studies of the use and value of forecasts in a simple two-action, two-event decision-making problem generally referred to as the cost-loss ratio situation, An N-action, N-event generalization of the standard cost-loss ratio situation is described here, and the expected value of different types of forecasts in this situation is investigated. Specifically, expressions are developed for the expected expenses associated with the use of climatological, imperfect, and perfect information, and these expressions are employed to derive formulas for the expected value of imperfect and perfect forecasts. The three-action, three-event situation is used to illustrate the generalized model and the value-information results, by considering examples based on specific numerical values of the relevant parameters. Some possible extensions of this model are briefly discussed.
Abstract
Meteorologists have devoted considerable attention to studies of the use and value of forecasts in a simple two-action, two-event decision-making problem generally referred to as the cost-loss ratio situation, An N-action, N-event generalization of the standard cost-loss ratio situation is described here, and the expected value of different types of forecasts in this situation is investigated. Specifically, expressions are developed for the expected expenses associated with the use of climatological, imperfect, and perfect information, and these expressions are employed to derive formulas for the expected value of imperfect and perfect forecasts. The three-action, three-event situation is used to illustrate the generalized model and the value-information results, by considering examples based on specific numerical values of the relevant parameters. Some possible extensions of this model are briefly discussed.
Abstract
Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector cumulative forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.
Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.
Abstract
Scalar and vector partitions of the ranked probability score, RPS, are described and compared. These partitions are formulated in the same manner as the scalar and vector partitions of the probability score, PS, recently described by Murphy. However, since the RPS is defined in terms of cumulative probability distributions, the scalar and vector partitions of the RPS provide measures of the reliability and resolution of scalar and vector cumulative forecasts, respectively. The scalar and vector partitions of the RPS provide similar, but not equivalent (i.e., linearly related), measures of these attributes. Specifically, the reliability (resolution) of cumulative forecasts according to the scalar partition is equal to or greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between the scalar and vector partitions of the RPS and between the vector partitions of the RPS and the PS.
Several questions related to the interpretation and use of the scalar and vector partitions of the RPS are briefly discussed, including the information that these partitions provide about the reliability and resolution of forecasts (as opposed to cumulative forecasts) and the relative merits of these partitions. These discussions indicate that, since a one-to-one correspondence exists between vector and vector cumulative forecasts, the vector partition of the RPS can also be considered to provide measures of the reliability and resolution of vector forecasts and that the vector partition is generally more appropriate than the scalar partition.
Abstract
Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.
Abstract
Probability of precipitation (PoP) forecasts can often be interpreted as average point probability forecasts. Since the latter are equivalent to (unconditional) expected areal coverage forecasts, PoP forecasts can be evaluated in terms of observed areal coverages in those situations in which observations of precipitation occurrence are available from a network of points in the forecast area. The purpose of this paper is to describe a partition of the average Brier, or probability, score—a measure of the average accuracy of average point probability forecasts over the network of points of concern—that facilitates such an evaluation. The partition consists of two terms: 1) a term that represents the average squared error of the average point probability forecasts interpreted as areal coverage forecasts and 2) a term that represents the average variance of the observations of precipitation occurrence in the forecast area. The relative magnitudes of the terms in this partition are examined, and it is concluded (party on the basis of experimental data) that the variance term generally makes a significant contribution to the overall probability score. This result, together with the fact that the variance term does not depend on the forecasts, suggests that the squared error term (rather than the overall score) should be used to evaluate PoP forecasts in many situations. The basis for the interpretation of PoP forecasts as average point probability forecasts and some implications of the results presented in this paper for the evaluation of PoP forecasts are briefly discussed.
Abstract
Two fundamental characteristics of forecast verification problems—complexity and dimensionality—are described. To develop quantitative definitions of these characteristics, a general framework for the problem of absolute verification (AV) is extended to the problem of comparative verification (CV). Absolute verification focuses on the performance of individual forecasting systems (or forecasters), and it is based on the bivariate distribution of forecasts and observations and its two possible factorizations into conditional and marginal distributions.
Comparative verification compares the performance of two or more forecasting systems, which may produce forecasts under 1) identical conditions or 2) different conditions. The first type of CV is matched comparative verification, and it is based on a 3-yariable distribution with possible factorizations. The second and more complicated type of CV is unmatched comparative verification, and it is based on a 4-variable distribution with 24 possible factorizations.
Complexity can be defined in terms of the number of factorizations, the number of basic factors (conditional and marginal distributions) in each factorization, or the total number of basic factors associated with the respective frameworks. These definitions provide quantitative insight into basic differences in complexity among AV and CV problems. Verification problems involving probabilistic and nonprobabilistic forecasts are of equal complexity.
Dimensionality is defined as the number of probabilities that must be specified to reconstruct the basic distribution of forecasts and observations. It is one less than the total number of distinct combinations of forecasts and observations. Thus, CV problems are of higher dimensionality than AV problems, and problems involving probabilistic forecasts or multivalued nonprobabilistic forecasts exhibit particularly high dimensionality.
Issues related to the implications of these concepts for verification procedures and practices are discussed, including the reduction of complexity and/or dimensionality. Comparative verification problems can be reduced in complexity by making forecasts under identical conditions or by assuming conditional or unconditional independence when warranted. Dimensionality can be reduced by parametric statistical modeling of the distributions of forecasts and/or observations.
Failure to take account of the complexity and dimensionality of verification problems may lead to an incomplete and inefficient body of verification methodology and, thereby, to erroneous conclusions regarding the absolute and relative quality and/or value of forecasting systems.
Abstract
Two fundamental characteristics of forecast verification problems—complexity and dimensionality—are described. To develop quantitative definitions of these characteristics, a general framework for the problem of absolute verification (AV) is extended to the problem of comparative verification (CV). Absolute verification focuses on the performance of individual forecasting systems (or forecasters), and it is based on the bivariate distribution of forecasts and observations and its two possible factorizations into conditional and marginal distributions.
Comparative verification compares the performance of two or more forecasting systems, which may produce forecasts under 1) identical conditions or 2) different conditions. The first type of CV is matched comparative verification, and it is based on a 3-yariable distribution with possible factorizations. The second and more complicated type of CV is unmatched comparative verification, and it is based on a 4-variable distribution with 24 possible factorizations.
Complexity can be defined in terms of the number of factorizations, the number of basic factors (conditional and marginal distributions) in each factorization, or the total number of basic factors associated with the respective frameworks. These definitions provide quantitative insight into basic differences in complexity among AV and CV problems. Verification problems involving probabilistic and nonprobabilistic forecasts are of equal complexity.
Dimensionality is defined as the number of probabilities that must be specified to reconstruct the basic distribution of forecasts and observations. It is one less than the total number of distinct combinations of forecasts and observations. Thus, CV problems are of higher dimensionality than AV problems, and problems involving probabilistic forecasts or multivalued nonprobabilistic forecasts exhibit particularly high dimensionality.
Issues related to the implications of these concepts for verification procedures and practices are discussed, including the reduction of complexity and/or dimensionality. Comparative verification problems can be reduced in complexity by making forecasts under identical conditions or by assuming conditional or unconditional independence when warranted. Dimensionality can be reduced by parametric statistical modeling of the distributions of forecasts and/or observations.
Failure to take account of the complexity and dimensionality of verification problems may lead to an incomplete and inefficient body of verification methodology and, thereby, to erroneous conclusions regarding the absolute and relative quality and/or value of forecasting systems.