Search Results

You are looking at 1 - 10 of 87 items for

  • Author or Editor: ALLAN H. MURPHY x
  • Refine by Access: Content accessible to me x
Clear All Modify Search
Allan H. Murphy

Abstract

The general framework for forecast verification described by Murphy and Winkler embodies a statistical approach to the problem of assessing the quality of forecasts. This framework is based on the joint distribution of forecasts and observations, together with conditional and marginal distributions derived from decompositions of the underlying joint distribution. An augmented version of the original framework is outlined in this paper. The extended framework provides a coherent method of addressing the problem of stratification in this context and it can be used to assess forecast quality—and its various aspects–under specific meteorological conditions. Conceptual examples are presented to illustrate potential applications of this methodological framework. Some issues concerning the extended framework and its application to real-world verification problems arc discussed briefly.

Full access
Allan H. Murphy

Abstract

Skill scores defined as measures of relative mean square error—and based on standards of reference representing climatology, persistence, or a linear combination of climatology and persistence—are decomposed. Two decompositions of each skill score are formulated: 1) a decomposition derived by conditioning on the forecasts and 2) a decomposition derived by conditioning on the observations. These general decompositions contain terms consisting of measures of statistical characteristics of the forecasts and/or observations and terms consisting of measures of basic aspects of forecast quality. Properties of the terms in the respective decompositions are examined, and relationships among the various skill scores—and the terms in the respective decompositions—are described.

Hypothetical samples of binary forecasts and observations are used to illustrate the application and interpretation of these decompositions. Limitations on the inferences that can be drawn from comparative verification based on skill scores, as well as from comparisons based on the terms in decompositions of skill scores, are discussed. The relationship between the application of measures of aspects of quality and the application of the sufficiency relation (a statistical relation that embodies the concept of unambiguous superiority) is briefly explored.

The following results can be gleaned from this methodological study. 1) Decompositions of skill scores provide quantitative measures of—and insights into—multiple aspects of the forecasts, the observations, and their relationship. 2) Superiority in terms of overall skill is no guarantor of superiority in terms of other aspects of quality. 3) Sufficiency (i.e., unambiguous superiority) generally cannot be inferred solely on the basis of superiority over a relatively small set of measures of specific aspects of quality.

Neither individual measures of overall performance (e.g., skill scores) nor sets of measures associated with decompositions of such overall measures respect the dimensionality of most verification problems. Nevertheless, the decompositions described here identify parsimonious sets of measures of basic aspects of forecast quality that should prove to be useful in many verification problems encountered in the real world.

Full access
Allan H. Murphy

Abstract

Heretofore it has been widely accepted that the contributions of W. E. Cooke in 1906 represented the first works related to the explicit treatment of uncertainty in weather forecasts. Recently, however, it has come to light that at least some aspects of the rationale for quantifying the uncertainty in forecasts were discussed prior to 1900 and that probabilities and odds were included in some weather forecasts formulated more than 200 years ago. An effort to summarize these new historical insights, as well as to clarify the precise nature of the contributions made by various individuals to early developments is this area, appears warranted.

The overall purpose of this paper is to extend and clarify the early history of probability forecasts. Highlights of the historical review include 1) various examples of the use of qualitative and quantitative probabilities or odds in forecasts during the eighteenth and nineteenth centuries, 2) a brief discussion in 1890 of the economic component of the rationale for quantifying the uncertainty in forecasts, 3) further refinement of the rationale for probability forecasts and the presentation of the results of experiments involving the formulation of quasi-probabilistic and probabilistic forecasts during the period 1900–25 (in reviewing developments during this early twentieth century period, the noteworthy contributions made by W. E. Cooke, C. Hallenbeck, and A. K. Ångström are described and clarified), and 4) a very concise overview of activities and developments in this area since 1925.

The early treatment of some basic issues related to probability forecasts is discussed and, in some cases, compared to their treatment in more recent times. These issues include 1) the underlying rationale for probability forecasts, 2) the feasibility of making probability forecasts, and 3) alternative interpretations of probability in the context of weather forecasts. A brief examination of factors related to the acceptance of—and resistance to—probability forecasts in the meteorological and user communities is also included.

Full access
Allan H. Murphy

Abstract

An individual skill score (SS) and a collective skill score (CSS) are examined to determine whether these scoring or improper. The SS and the CSS are both standardized versions of the Brier, or probability, score (PS) and have been used to measure the “skill” of probability forecasts. The SS is defined in terms of individual forecasts, while the CSS is defined in terms of collections of forecasts. The SS and the CSS are shown to be improper scoring rules, and, as a result, both the SS and the CSS encourage hedging on the part of forecasters.

The results of a preliminary, investigation of the nature of the hedging produced by. the SS and the CSS indicate that, while the SS may encourage a considerable amount of hedging, the CSS, in general, encourages only a modest amount of hedging, and even this hedging decreases as the sample size K of the collection forecasts increases. In fact, the CSS is approximately strictly Proper for large collections of forecasts (K ≥ 100).

Finally, we briefly consider two questions related to the standardization of scoring rules: 1) the use of different scoring rules in the assessment and evaluation tasks, and 2) the transformation of strictly proper scoring rules. With regard to the latter, we identify standardized versions of the PS which are strictly proper scoring rules and which, as a result, appear to be appropriate scoring rules to use to measure the “skill” of probability forecasts.

Full access
Allan H. Murphy

Abstract

A new vector partition of the probability, or Brier, score (PS) is formulated and the nature and properties of this partition are described. The relationships between the terms in this partition and the terms in the original vector partition of the PS are indicated. The new partition consists of three terms: 1) a measure of the uncertainty inherent in the events, or states, on the occasions of concern (namely, the PS for the sample relative frequencies); 2) a measure of the reliability of the forecasts; and 3) a new measure of the resolution of the forecasts. These measures of reliability and resolution are and are not, respectively, equivalent (i.e., linearly related) to the measures of reliability and resolution provided by the original partition. Two sample collections of probability forecasts are used to illustrate the differences and relationships between these partitions. Finally, the two partitions are compared, with particular reference to the attributes of the forecasts with which the partitions are concerned, the interpretation of the partitions in geometric terms, and the use of the partitions as the bases for the formulation of measures to evaluate probability forecasts. The results of these comparisons indicate that the new partition offers certain advantages vis-à-vis the original partition.

Full access
Allan H. Murphy

Abstract

Scalar and vector partitions of the probability score (PS) in N-state (N > 2) situations are described and compared. In N-state, as well as in two-state (N = 2), situations these partitions provide similar, but not equivalent (i.e., linearly related), measures of the reliability and resolution of probability forecasts. Specifically, the vector partition, when compared to the scalar partition, decreases the reliability and increases the resolution of the forecasts. A sample collection of forecasts is used to illustrate the differences between these partitions in N-state situations.

Several questions related to the use of scalar and vector partitions of the PS in N-state situations are discussed, including the relative merits of these partitions and the effect upon sample size when forecasts are considered to be vectors rather than scalars. The discussions indicate that the vector partition appears to be more appropriate, in general, than the scalar partition, and that when the forecasts in a collection of forecasts are considered to be vectors rather than scalars the sample size of the collection may be substantially reduced.

Full access
Allan H. Murphy

Abstract

Scalar and vector partitions of the probability score (PS) in the two-state situation are described and compared. These partitions, which are based upon expressions for the PS in which probability forecasts are considered to be scalars and vectors, respectively, provide similar, but not equivalent (i.e., linearly related), measures of the reliability and resolution of the forecasts. Specifically, the reliability (resolution) of the forecasts according to the scalar partition is, in general, greater (less) than their reliability (resolution) according to the vector partition. A sample collection of forecasts is used to illustrate the differences between these partitions.

Several questions related to the use of scalar and vector partitions of the PS in the two-state situation are discussed, including the interpretation of the results of previous forecast evaluation studies and the relative merits of these partitions. The discussions indicate that the partition most often used in such studies has been a special “scalar” partition, a partition which is equivalent to the vector partition in the two-state situation, and that the vector partition is more appropriate than the scalar partition.

Full access
Allan H. Murphy

Abstract

Full access
Allan H. Murphy

Abstract

Full access
Allan H. Murphy

Abstract

Comparative operational evaluation of probabilistic prediction procedures in cost-loss ratio decision situations in which the evaluator's knowledge of the cost-loss ratio is expressed in probabilistic terms is considered. First, the cost-loss ratio decision situation is described in a utility framework and, then, measures of the expected-utility of probabilistic predictions are formulated. Second, a class of expected-utility measures, the beta measures, in which the evaluator's knowledge of the cost-loss ratio is expressed in terms of a beta distribution, are described. Third, the beta measures are utilized to compare two prediction procedures on the basis of a small sample of predictions. The results indicate the importance, for comparative operational evaluation, of utilizing measures which provide a suitable description of the evaluator's knowledge. In particular, the use of the probability score, a measure equivalent to the uniform measure (which is a special beta measure), in decision situations in which the uniform distribution does not provide a suitable description of the evaluator's knowledge, may yield misleading results. Finally, the results are placed in proper perspective by describing several possible extensions to this study and by indicating the importance of undertaking such studies in actual operational situations.

Full access