Search Results

You are looking at 31 - 40 of 87 items for

  • Author or Editor: ALLAN H. MURPHY x
  • Refine by Access: All Content x
Clear All Modify Search
ALLAN H. MURPHY

Abstract

In this paper, we compare the ranked probability score (RPS) and the probability score (PS) and examine the nature of the sensitivity of the RPS to distance. First, we briefly describe the nature of and the relationship between the frameworks within which the RPS and the PS were formulated. Second, we consider certain properties of the RPS and the PS including their range, their values for categorical and uniform forecasts, and their “proper” nature. Third, we describe the RPS and the PS in a manner that reveals the structure of and the relationship between these scoring rules. Fourth, we considered the RPS with reference to two definitions of distance and examine the nature of the sensitivity of the RPS to distance. The comparison of the RPS and the PS suggests that the RPS rather than the PS should be used to evaluate probability forecats, at least in those situations in which the variable of concern is ordered.

Full access
Allan H. Murphy

Abstract

This paper is concerned with the use of the coefficient of correlation (CoC) and the coefficient of determination (CoD) as performance measures in forecast verification. Aspects of forecasting performance that are measured—and not measured (i.e., ignored)—by these coefficients are identified. Decompositions of familiar quadratic measures of accuracy and skill are used to explore differences between these quadratic measures and the coefficients of correlation and determination. A linear regression model, in which forecasts are regressed on observations, is introduced to provide insight into the interpretations of the CoC and the CoD in this context.

Issues related to the use of these coefficients as verification measures are discussed, including the deficiencies inherent in one-dimensional measures of overall performance, the pros and cons of quadratic measures of accuracy and skill vis-à-vis the coefficients of correlation and determination, and the relative merits of the CoC and the CoD. These coefficients by themselves do not provide an adequate basis for drawing firm conclusions regarding absolute or relative forecasting performance.

Full access
Allan H. Murphy

Abstract

In 1884 a paper by J.P. Finley appeared in the American Meteorological Journal describing the results of an experimental tornado forecasting program in the central and eastern United States. Finley's paper reported “percentages of verifications” exceeding 95%, where this index of performance was defined as the percentage of correct tornado/no-tornado forecasts. Within six months, three papers had appeared that identified deficiencies in Finley's method of verification and/or proposed alternative measures of forecasting performance in the context of this 2×2 verification problem. During the period from 1885 to 1893, several other authors in the United States and Europe, in most cases stimulated either by Finley's paper or by the three early responses, made noteworthy contributions to methods-oriented and practices-oriented discussions of issues related to forecast verification in general and verification of tornado forecasts in particular.

The burst of verification-related activities during the period 1884–1893 is referred to here as the “Finley affair.” It marked the beginning of substantive conceptual and methodological developments and discussions in the important subdiscipline of forecast verification. This paper describes the events that constitute the Finley affair in some detail and attempts to place this affair in proper historical context from the perspective of the mid-1990s. Whatever their individual strengths and weaknesses, the measures introduced during the period from 1884 to 1893 have withstood important tests of time—for example, these measures have been rediscovered on one or more occasions and they are still widely used today (generally under names assigned since 1900). Moreover, many of the issues vis-à-vis forecast verification that were first raised during the Finley affair remain issues of considerable importance more than 100 years later.

Full access
Allan H. Murphy

Abstract

Heretofore it has been widely accepted that the contributions of W. E. Cooke in 1906 represented the first works related to the explicit treatment of uncertainty in weather forecasts. Recently, however, it has come to light that at least some aspects of the rationale for quantifying the uncertainty in forecasts were discussed prior to 1900 and that probabilities and odds were included in some weather forecasts formulated more than 200 years ago. An effort to summarize these new historical insights, as well as to clarify the precise nature of the contributions made by various individuals to early developments is this area, appears warranted.

The overall purpose of this paper is to extend and clarify the early history of probability forecasts. Highlights of the historical review include 1) various examples of the use of qualitative and quantitative probabilities or odds in forecasts during the eighteenth and nineteenth centuries, 2) a brief discussion in 1890 of the economic component of the rationale for quantifying the uncertainty in forecasts, 3) further refinement of the rationale for probability forecasts and the presentation of the results of experiments involving the formulation of quasi-probabilistic and probabilistic forecasts during the period 1900–25 (in reviewing developments during this early twentieth century period, the noteworthy contributions made by W. E. Cooke, C. Hallenbeck, and A. K. Ångström are described and clarified), and 4) a very concise overview of activities and developments in this area since 1925.

The early treatment of some basic issues related to probability forecasts is discussed and, in some cases, compared to their treatment in more recent times. These issues include 1) the underlying rationale for probability forecasts, 2) the feasibility of making probability forecasts, and 3) alternative interpretations of probability in the context of weather forecasts. A brief examination of factors related to the acceptance of—and resistance to—probability forecasts in the meteorological and user communities is also included.

Full access
Allan H. Murphy
and
Edward S. Epstein

Abstract

Attributes of the anomaly correlation coefficient, as a model verification measure, are investigated by exploiting a recently developed method of decomposing skill scores into other measures of performance. A mean square error skill score based on historical climatology is decomposed into terms involving the anomaly correlation coefficient, the conditional bias in the forecast, the unconditional bias in the forecast, and the difference between the mean historical and sample climatologies. This decomposition reveals that the square of the anomaly correlation coefficient should be interpreted as a measure of potential rather than actual skill.

The decomposition is applied to a small sample of geopotential height field forecasts, for lead times from one to ten days, produced by the medium range forecast (MRF) model. After about four days, the actual skill of the MRF forecasts (as measured by the “climatological skill score”) is considerably less than their potential skill (as measured by the anomaly correlation coefficient), due principally to the appearance of substantial conditional biases in the forecasts. These biases, and the corresponding loss of skill, represent the penalty associated with retaining “meteorological” features in the geopotential height field when such features are not predictable. Some implications of these results for the practice of model verification are discussed.

Full access
Allan H. Murphy
and
Robert L. Winkler

Abstract

A general framework for forecast verification based on the joint distribution of forecasts and observations is described. For further elaboration of the framework, two factorizations of the joint distribution are investigated: 1) the calibration-refinement factorization, which involves the conditional distributions of observations given forecasts and the marginal distribution of forecasts, and 2) the likelihood-base factorization, which involve the conditional distributions of forecasts given observations and the marginal distribution of observations. The names given to the factorizations reflect the fact that they relate to different attributes of the forecasts and/or observations. Several examples are used to illustrate the interpretation of these factorizations in the context of verification and to describe the relationship between the respective factorizations.

Some insight into the potential utility of the framework is provided by demonstrating that basic elements and summary measures of the joint, conditional, and marginal distributions play key roles in current verification methods. The need for further investigation of the implications of this framework for verification theory and practice is emphasized, and some possible directions for future research in this area are identified.

Full access
Edward S. Epstein
and
Allan H. Murphy

Abstract

On most forecasting occasions forecasts are made for several successive periods, but decision-making models have traditionally neglected the impact of the potentially useful information contained in forecasts for periods beyond the initial period. The use and value of multiple-period forecasts are investigated here in the context of a recently developed dynamic model of the basic cost-loss ratio situation. We also extend previous studies of this model by examining the impacts—on forecast use and value—of assuming (i) that weather events in successive periods are dependent and (ii) that the forecasts of interest are expressed in probabilistic terms. In this regard, expressions are derived for the expected expenses associated with the use of climatological, imperfect (categorical or probabilistic) and perfect multiple-period forecasts under conditions of dependence and independence between events.

Numerical results are presented concerning expected expense and economic value, based on artificially generated forecasts that incorporate the effects of the decrease in forecast quality as lead time increases. Comparisons are made between multiple-period and single-period forecasts, between dependent and independent events, and between probabilistic and categorical forecasts. For some values of the relevant parameters (e.g., cost-loss ratio, climatological probability), the availability of information for longer lead times can substantially increase economic value. It appears, however, that (i) current imperfect forecasts achieve relatively little of this potential value and (ii) improvements in forecasts at longer lead times must be accompanied by improvements at the shortest lead times for these benefits to be realized. Dependence (i.e., persistence) between events generally reduces weather-related expected expenses, sometimes quite substantially, and consequently reduces forecast value. The results also demonstrate once again the potential economic benefits of expressing forecasts in a probabilistic rather than a categorical formal.

Full access
Martin Ehrendorfer
and
Allan H. Murphy

Abstract

The concept of sufficiency, originally introduced in the context of the comparison of statistical experiments, has recently been shown to provide a coherent basis for comparative evaluation of forecasting systems. Specifically, forecasting system A is said to be sufficient for forecasting system B if B's forecasts can be obtained from A's forecasts by a stochastic transformation. The sufficiency of A's forecasts for B's forecasts implies that the former are of higher quality than the latter and that all users will find A's forecasts of greater value than B's forecasts. However, it is not always possible to establish that system A is sufficient for system B or vice versa. This paper examines the concept of sufficiency in the context of comparative evaluation of simple probabilistic weather forecasting systems and investigates its interpretations and implications from perspectives provided by a recently developed general framework for forecast verification.

It is shown here that if system A is sufficient for system B, then the basic performance characteristics of the two systems are related via sets of inequalities and A's forecasts are necessarily more accurate than B's forecasts. Conversely, knowledge of a complete set of performance characteristics makes it possible to infer whether A is sufficient for B, B is sufficient for A, or the two systems are insufficient for each other. In general, however, information regarding only relative accuracy, as measured by a performance measure such as the mean square error, will not be adequate to determine the presence or absence of sufficiency, except in situations in which the accuracy of the system of interest exceeds some relatively high critical value. These results, illustrated by means of numerical examples, suggest that comparative evaluation of weather forecasting systems should be based on fundamental performance characteristics rather than on overall performance measures.

Possible extensions of these results to situations involving more general forecasting systems, as well as some implications of the results for verification procedures and practices in meteorology, are briefly discussed.

Full access
Lev S. Gandin
and
Allan H. Murphy

Abstract

Many skill scores used to evaluate categorical forecasts of discrete variables are inequitable, in the sense that constant forecasts of some events lead to better scores than constant forecasts of other events. Inequitable skill scores may encourage forecasters to favor some events at the expense of other events, thereby producing forecasts that exhibit systematic biases or other undesirable characteristics.

This Paper describes a method of formulating equitable skill scores for categorical forecasts of nominal and ordinal variables. Equitable skill scores are based on scoring matrices, which assign scores to the various combinations of forecast and observed events. The basic tenets of equitability require that (i) all constant forecasts—and random forecosts—receive the same expected score, and (ii) the elements of scoring matrices do not depend on the elements of performance matrices. Scoring matrices are assumed here to be symmetric and to possess other reasonable properties related to the nature of the underlying variable. To scale the elements of scoring matrices, the expected scores for constant and random forecasts are set equal to zero and the expected score for perfect forecasts is set equal to one. Taken together, these conditions are necessary but generally not sufficient to determine uniquely the elements of a scoring matrix. To obtain a unique scoring matrix, additional conditions must be imposed or some scores must be specified a priori.

Equitable skill scores are illustrated here by considering specific situations as well as numerical examples. These skill scores possess several desirable properties: (i) The score assigned to a correct forecast of an event increases as the climatological probability of the event decreases and (ii) scoring, matrices in n+1–event and n-event situations may be made consistent, in the sense that the former approaches the latter as the climatological probability of one of the events approaches zero. Several possible extensions and applications of this method are discussed.

Full access
Allan H. Murphy
and
Robert L. Winkler

Abstract

An experiment was conducted at the National Severe Storms Forecast Center during 1976 and 1977 in which National Weather Service forecasters formulated probabilistic forecasts of several tornado events in conjunction with both severe weather outlooks and severe thunderstorm and tornado watches. The results indicate that the probabilistic forecasts associated with the outlooks were quite reliable and exhibited positive skill, relative to forecasts based on sample climatological probabilities. The probabilistic forecasts associated with the watches, however, were less reliable and skillful. In view of the lack of prior experience at making probabilistic tornado forecasts, as well as the absence of feedback, comparable objective probabilistic guidance, and even appropriate past data on which to base climatological probabilities, the results of the experiment are quite encouraging. Some suggestions for further work in probabilistic tornado forecasting are provided.

Full access