Search Results

You are looking at 31 - 40 of 87 items for

  • Author or Editor: ALLAN H. MURPHY x
  • Refine by Access: All Content x
Clear All Modify Search
Allan H. Murphy

Abstract

This paper is concerned with the use of the coefficient of correlation (CoC) and the coefficient of determination (CoD) as performance measures in forecast verification. Aspects of forecasting performance that are measured—and not measured (i.e., ignored)—by these coefficients are identified. Decompositions of familiar quadratic measures of accuracy and skill are used to explore differences between these quadratic measures and the coefficients of correlation and determination. A linear regression model, in which forecasts are regressed on observations, is introduced to provide insight into the interpretations of the CoC and the CoD in this context.

Issues related to the use of these coefficients as verification measures are discussed, including the deficiencies inherent in one-dimensional measures of overall performance, the pros and cons of quadratic measures of accuracy and skill vis-à-vis the coefficients of correlation and determination, and the relative merits of the CoC and the CoD. These coefficients by themselves do not provide an adequate basis for drawing firm conclusions regarding absolute or relative forecasting performance.

Full access
Allan H. Murphy

Abstract

In 1884 a paper by J.P. Finley appeared in the American Meteorological Journal describing the results of an experimental tornado forecasting program in the central and eastern United States. Finley's paper reported “percentages of verifications” exceeding 95%, where this index of performance was defined as the percentage of correct tornado/no-tornado forecasts. Within six months, three papers had appeared that identified deficiencies in Finley's method of verification and/or proposed alternative measures of forecasting performance in the context of this 2×2 verification problem. During the period from 1885 to 1893, several other authors in the United States and Europe, in most cases stimulated either by Finley's paper or by the three early responses, made noteworthy contributions to methods-oriented and practices-oriented discussions of issues related to forecast verification in general and verification of tornado forecasts in particular.

The burst of verification-related activities during the period 1884–1893 is referred to here as the “Finley affair.” It marked the beginning of substantive conceptual and methodological developments and discussions in the important subdiscipline of forecast verification. This paper describes the events that constitute the Finley affair in some detail and attempts to place this affair in proper historical context from the perspective of the mid-1990s. Whatever their individual strengths and weaknesses, the measures introduced during the period from 1884 to 1893 have withstood important tests of time—for example, these measures have been rediscovered on one or more occasions and they are still widely used today (generally under names assigned since 1900). Moreover, many of the issues vis-à-vis forecast verification that were first raised during the Finley affair remain issues of considerable importance more than 100 years later.

Full access
Allan H. Murphy

Abstract

No abstract available

Full access
Allan H. Murphy

Abstract

No abstract available

Full access
Allan H. Murphy
and
Thomas E. Sabin

Abstract

This paper describes the results of a study of trends in the quality of National Weather Service (NWS) forecasts from 1967 to 1985. Primary attention is focused on forecasts of precipitation probabilities, maximum temperatures, and minimum temperatures A skill score based on the Brier score is used to verify the precipitation probability forecasts, whereas the temperature forecasts are evaluated using the mean absolute error and percentage of errors greater than 10°F. For each element, trends are examined for objective forecasts produced by numerical-statistical models and for subjective forecasts formulated by NWS forecasters. In addition to weather element, type of forecast, and verification measure, results are stratified by season (cool and warm), lead time (three or four periods), and NWS region (four regions and all regions combined).

At the national level, the forecasts for these three weather elements exhibit positive and highly significant trends in quality for almost all of the various stratifications. Exceptions to this general result are associated solely with the minimum temperature forecasts, primarily for the 60 h lead time. These national trends are generally stronger for the objective forecasts than for the subjective forecasts and for the cool season than for the warm season. Regionally, the trends in quality are almost always positive and are statistically significant in a majority of the cases. However, nonsignificant trends occur more frequently at the regional level than at the national level. As a result of the positive trends in performance, current levels of forecast quality for these weather elements are markedly higher than the levels that existed 15–20 years ago.

Full access
Robert T. Clemen
and
Allan H. Murphy

Abstract

This paper addresses two specific questions related to the interrelationships between objective and subjective probability of precipitation (PoP) forecasts: Do the subjective forecasts contain information not included in the objective forecasts? Do the subjective forecasts make full use of the objective forecasts? With respect to the first question, an analysis of more than 11 years of data indicates that the subjective PoP forecasts add information above and beyond that contained in the objective PoP forecasts for all combinations of geographical area, lead time, and season investigated in this study. For longer lead times, this conclusion appears to contradict the results of earlier studies in which the two types of PoP forecasts were compared using aggregate skill scores. With regard to the second question, the statistical results demonstrate that the subjective forecasts generally do not make full use of the objective forecasts. However, these latter results are not as strong, in a statistical sense, as the results related to the first question; moreover, they indicate that it is primarily in the vicinity of the climatological probability (i.e., 0.10 to 0.40) that better use could be made of the objective forecasts. This conclusion suggests that it may be possible to combine the objective and subjective forecasts to produce a PoP forecast with even greater information content.

Full access
Robert T. Clemen
and
Allan H. Murphy

Abstract

This paper reports the results of an empirical investigation of some methods for improving the quality of precipitation probability forecasts. These methods include 1) techniques for adjusting subjective and objective forecasts using past reliability data and 2) techniques for combining these two types of forecasts via both averaging and a more sophisticated statistical aggregation procedure. The empirical results indicate that forecast performance can be improved through such methods, with the greatest improvements arising from averaging forecasts that have previously been adjusted.

Full access
Allan H. Murphy
and
Daniel S. Wilks

Abstract

The traditional approach to forecast verification consists of computing one, or at most very few, quantities from a set of forecasts and verifying observations. However, this approach necessarily discards a large portion of the information regarding forecast quality that is contained in a set of forecasts and observations. Theoretically sound alternative verification approaches exist, but these often involve computation and examination of many quantities in order to obtain a complete description of forecast quality and, thus, pose difficulties in interpretation. This paper proposes and illustrates an intermediate approach to forecast verification, in which the multifaceted nature of forecast quality is recognized but the description of forecast quality is encapsulated in a much smaller number of parameters. These parameters are derived from statistical models fit to verification datasets. Forecasting performance as characterized by the statistical models can then be assessed in a relatively complete manner. In addition, the fitted statistical models provide a mechanism for smoothing sampling variations in particular finite samples of forecasts and observations.

This approach to forecast verification is illustrated by evaluating and comparing selected samples of probability of precipitation (PoP) forecasts and the matching binary observations. A linear regression model is fit to the conditional distributions of the observations given the forecasts and a beta distribution is fit to the frequencies of use of the allowable probabilities. Taken together, these two models describe the joint distribution of forecasts and observations, and reduce a 21-dimensional verification problem to 4 dimensions (two parameters each for the regression and beta models). Performance of the selected PoP forecasts is evaluated and compared across forecast type, location, and lead time in terms of these four parameters (and simple functions of the parameters), and selected graphical displays are explored as a means of obtaining relatively transparent views of forecasting performance within this approach to verification.

Full access
Barbara G. Brown
and
Allan H. Murphy

Abstract

Fire-weather forecasts (FWFs) prepared by National Weather Service (NWS) forecasters on an operational basis are traditionally expressed in categorical terms. However, to make rational and optimal use of such forecasts, fire managers need quantitative information concerning the uncertainty inherent in the forecasts. This paper reports the results of two studies related to the quantification of uncertainty in operational and experimental FWFs.

Evaluation of samples of operational categorical FWFs reveals that these forecasts contain considerable uncertainty. The forecasts also exhibit modest but consistent biases which suggest that the forecasters are influenced by the impacts of the relevant events on fire behavior. These results underscore the need for probabilistic FWFs.

The results of a probabilistic fire-weather forecasting experiment indicate that NWS forecasters are able to make quite reliable and reasonably precise credible interval temperature forecasts. However, the experimental relative humidity and wind speed forecasts exhibit considerable overforecasting and minimal skill. Although somewhat disappointing, these results are not too surprising in view of the fact that (a) the forecasters had little, if any, experience in probability forecasting; (b) no feedback was provided to the forecasters during the experimental period; and (c) the experiment was of quite limited duration. More extensive experimental and operational probability forecasting trials as well as user-oriented studies are required to enhance the quality of FWFs and to ensure that the forecasts are used in an optimal manner.

Full access
Allan H. Murphy
and
Martin Ehrendorfer

Abstract

This paper explores the relationship between the quality and value of imperfect forecasts. It is assumed that these forecasts are produced by a primitive probabilistic forecasting system and that the decision-making problem of concern is the cost-loss ratio situation. In this context, two parameters describing basic characteristics of the forecasts must be specified in order to determine forecast quality uniquely. As a result, a scalar measure of accuracy such as the Brier score cannot completely and unambiguously describe the quality of the imperfect forecasts. The relationship between forecast accuracy and forecast value is represented by a multivalued function—an accuracy/value envelope. Existence of this envelope implies that the Brier score is an imprecise measure of value and that forecast value can even decrease as forecast accuracy increases (and vice versa). The generality of these results and their implications for verification procedures and practices are discussed.

Full access