1. Introduction
Operational weather forecasters now recognize that uncertainties in the initial conditions used to initialize numerical weather prediction (NWP) models, as well as errors in the models themselves, lead to uncertainty in the forecast. Many forecast centers now attempt to estimate the impact of these uncertainties in the initial conditions by generating ensembles of forecasts (Molteni et al. 1996; Toth and Kalnay 1997). The ensemble forecast members usually differ in initial conditions, although research is under way into generating ensembles that reflect model error (Houtekamer et al. 1996; Buizza et al. 1999; Stensrud et al. 1999; Evans et al. 2000), and such methods are now becoming operational. The best method to construct these ensembles is the subject of current research (for a review, see Palmer 2000). Ensemble forecasting is a Monte Carlo approach to sampling the forecast probability distribution function (PDF); explicit calculation of this function in the state space of a modern NWP model is computationally impossible, and possibly ill defined (Smith et al. 1999). Computational limits determine the size of the ensembles generated.
Users of weather forecasts may benefit significantly from the greater amount of information contained in a probabilistic forecast than in a single deterministic forecast (Smith et al. 2001; Richardson 2000, 2001; Roulston and Smith 2002). To ascertain this benefit to a particular user one should create a cost function that takes into account the decisions that the user can make, and the utility associated with possible outcomes. The result will be specific to that particular user.
The question of how to assess the general quality of probabilistic forecasts is a subject of current research in the weather forecasting community. Currently used methods include the Brier score (Brier 1950), ranked probability score (Epstein 1969; Murphy 1971), relative operating characteristics (Swets 1973; Mason 1982), rank histograms (Anderson 1996; Hamill and Colucci 1996; Talagrand et al. 1997), and the generalization of rank histograms to higher dimensions (Smith 2000).
Information theory provides a useful theoretic framework to understand and quantify weather and climate predictability (Leung and North 1990; Schneider and Griffies 1999; Kleeman 2002). It was suggested by Leung and North (1990) that a relative entropy type measure might be used as the basis of a skill score for deterministic forecasts. Information theoretic measures, such as entropy, have been used in previous studies to quantify ensemble spread (Stephenson and Doblas-Reyes 2000). In these studies the entropy of the probabilistic forecast was suggested as a predictor of forecast skill, rather than as a measure of forecast skill.
In this paper, we propose a wider role for a logarithmic scoring rule (Lindley 1985) by showing how it fits into the context of information theory. Under this scoring the “best” forecast would be one that leads to the highest level of data compression when describing truth; this forecast would also yield the highest expected return if used to place proportional bets on the future. The idea of using data compressibility as a measure of model quality has philosophical appeal (Davies 1991), while the correspondence with gambling returns has some relevance to insurance and weather derivative pricing applications, or any other industry with the option to take action based on a forecast.
2. Ignorance defined
The aim is to develop a forecast skill score that measures the quality of the forecast PDF. The forecast PDF should be assessed on how similar it is to the true PDF. In this paper, the phrase “true PDF” means the PDF of consistent initial conditions evolved forward in time under the dynamics of the real atmosphere (Smith et al. 1999). This initial PDF is the product of the distribution of observational uncertainty and the distribution of states on the atmospheric attractor (if one exists).
Note that if fi = 0 then, according to Eq. (2), an infinite number of bits is assigned to the ith outcome. This is because an optimal compression scheme would have no way of encoding any outcome deemed impossible a priori. This raises the interesting issue of whether reporting 0 forecast probabilities can ever be justified, especially if the forecast probabilities are estimates obtained from finite ensembles and imperfect models. Forecasters should replace 0 forecast probabilities with small probabilities based on the uncertainties in the forecast PDF. Not to do so means reporting the improbable as the impossible. This would violate “Cromwell's rule,”1 which warns against assigning 0 probability to an event unless it is truly impossible (Lindley 1985).
3. Relationship between ignorance and forecast quality
4. Relationship between ignorance and cost–loss
When considering the economic value of forecasts, the binary cost–loss scenario is commonly used (Katz and Murphy 1987, 1997). In this scenario there are two outcomes (e.g., not freezing and freezing). The user can make a decision to protect or not to protect (e.g., to grit the roads or not to grit the roads). This protection has a cost C but, should the user choose not to protect and adverse weather occurs, the user sustains a loss L. Let the probability of it freezing be p1 = p, and the probability of it not freezing be p2 = 1 − p. The wealth multipliers oi associated with each outcome are o1 = L/C − 1 and o2 = 1. However, in the simple cost–loss scenario, the users cannot spread their wealth arbitrarily between the outcomes. Since the potential loss the users can suffer is L, this is the amount of wealth they can bet on the outcomes. Effectively they must either bet w1 = 0, w2 = 1 or w1 = C/L, w2 = 1 − C/L. They would choose the latter if p is greater than C/L. If p is less than C/L, the user could replicate the proportional betting strategy (by gritting a fraction pL/C of the roads, if this is possible). The cost–lost score is parametric; it depends on the value of C/L. If a uniform distribution of C/L ratios is assumed, it can be shown that the mean cost–loss score is equivalent to the Brier score (Murphy 1966; Richardson 2001). The advantage of ignorance over the cost–lost score is that ignorance easily generalizes beyond the binary decision case; indeed ignorance can be defined for a continous distribution ρ(x) as IGN = −log2ρ(xa), where xa is the actual outcome. It can be shown that ignorance is equivalent to the cost–loss score averaged over a distribution of cost–loss ratios that is weighted toward values of C/L close to 0 and unity (see the appendix).
5. Relationship between ignorance and Brier score
The Brier score is a common skill score for assessing probabilistic forecasts (Brier 1950). It will now be shown that a forecast scheme with a lower expected Brier score than another forecast scheme may not necessarily have a lower value of expected ignorance. In the simple two-outcome case, ignorance is a double-valued function of Brier score. It shares this property with the cost–loss value for a single coss–loss ratio (Murphy and Ehrendorfer 1987).
Ignorance is a double-valued function of the expected Brier score because, while the expected Brier score is symmetric in f, ignorance is asymmetric, as is the cost–loss value for a fixed cost–loss ratio.
6. Ignorance and continuous forecast variables: An example
7. Using ignorance: Temperature at Heathrow
8. Summary
A skill score for assessing probabilistic forecasts based on the information deficit (or ignorance) given the forecast has been presented. This skill score is directly related to the level of data compression that could be achieved using the forecast to design the compression algorithm. The relationship between data compression and gambling returns implies that this skill score corresponds to the expected returns of a gambler placing optimal (i.e., proportional) bets on the possible outcomes. The relationship of ignorance to gambling is not generally equivalent to the cost–loss score, which is used in simple studies of the economic value of forecasts. The correspondence between gambling returns and ignorance only holds if the user is free to adopt the optimal proportional (“Kelly”) betting strategy. In the cost–loss scenario this is not the case. Also, ignorance easily generalizes beyond binary decision scenarios.
The ignorance score does not indicate what effects are contributing to the loss of skill (e.g., greater ensemble spread or because truth is lying outside the ensemble). No skill score that attempts to summarize probabilistic forecast skill in a single number can describe such effects. If such a single number summary is required, however, the ignorance has advantages over other scores such as the Brier score and the cost–loss ratio.
Ignorance also has a more robust philosophical justification than the Brier score. Ignorance directly measures the average information deficit of someone in possession of a particular forecasting model. Using ignorance naturally connects the problems of practically evaluating real forecasts to the information-theoretic framework for weather and climate prediction, which has been constructed by other workers in the field (Leung and North 1990; Kleeman 2002).
The ignorance can be calculated either for categorical forecasts constructed from ensembles or from rank histograms by considering how much information is required to specify the location of truth in the ordered ensemble.
Given its advantages over other skill scores, it is likely to prove a particularly useful tool in future evaluation of probabilistic forecasts, which is a relatively neglected aspect of current meteorological research.
Acknowledgments
The authors thank the two anonymous reviewers whose suggestions greatly improved this paper. This work was supported by ONR DRI Grant N00014-99-1-0056.
REFERENCES
Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J. Climate, 9 , 1518–1530.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probabilities. Mon. Wea. Rev., 78 , 1–3.
Buizza, R., M. Miller, and T. N. Palmer, 1999: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 125 , 2887–2908.
Cover, T. M., and J. A. Thomas, 1991: Elements of Information Theory. John Wiley, 542 pp.
Davies, P. C. W., 1991: Why is the physical world so comprehensible? Complexity, Entropy and the Physics of Information, W. H. Zurek, Ed., Addison-Wesley, 61–70.
Epstein, E., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8 , 985–987.
Evans, R. E., M. S. J. Harrison, R. J. Graham, and K. R. Mylne, 2000: Joint medium-range ensembles from The Met. Office and ECMWF systems. Mon. Wea. Rev., 128 , 3104–3127.
Hamill, T. M., and S. J. Colucci, 1996: Random and systematic error in NMC's short-range Eta ensembles. Preprints, 13th Conf. on Probability and Statistics in the Atmospheric Sciences, San Francisco, CA, Amer. Meteor. Soc., 51–56.
Houtekamer, P. L., L. Lefaivre, J. Derome, H. Ritchie, and H. L. Mitchell, 1996: A system simulation approach to ensemble prediction. Mon. Wea. Rev., 124 , 1225–1242.
Katz, R. W., and A. H. Murphy, 1987: Quality/value relationships for imperfect information in the umbrella problem. Amer. Stat., 41 , 187–189.
Katz, R. W., and A. H. Murphy, . 1997: Forecast value: Prototype decision-making models. Economic Value of Weather and Climate Forecasts, R. W. Katz and A. H. Murphy, Eds., Cambridge University Press, 183–217.
Kelly, J., 1956: A new interpretation of information rate. Bell Syst. Technol. J., 35 , 916–926.
Kleeman, R., 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci., in press.
Leung, L-Y., and G. R. North, 1990: Information theory and climate prediction. J. Climate, 3 , 5–14.
Lindley, D. V., 1985: Making Decisions. John Wiley and Sons, 207 pp.
Mason, I. B., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30 , 291–303.
Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122 , 73–119.
Murphy, A. H., 1966: A note on the utility of probabilistic predictions and the probability score in the cost–loss ratio decision situation. J. Appl. Meteor., 5 , 534–537.
Murphy, A. H., . 1971: A note on the ranked probability score. J. Appl. Meteor., 10 , 155–156.
Murphy, A. H., . 1997: Forecast verification. Economic Value of Weather and Climate Forecasts, A. H. Murphy and R. W. Katz, Eds., Cambridge University Press, 19–70.
Murphy, A. H., and H. Daan, 1985: Forecast evaluation. Probability, Statistics and Decision Making in the Atmospheric Sciences, A. H. Murphy and R. W. Katz, Eds., Westview Press, 379–437.
Murphy, A. H., and M. Ehrendorfer, 1987: On the relationship between the accuracy and value of forecasts in the cost–loss ratio situation. Wea. Forecasting, 2 , 243–251.
Palmer, T. N., 2000: Predicting uncertainty in forecasts of weather and climate. Rep. Prog. Phys., 63 , 71–116.
Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 126 , 649–667.
Richardson, D. S., . 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., in press.
Roulston, M. S., and L. A. Smith, 2002: End-to-end ensemble forecasting: Ensemble interpretation in forecasting and risk management. Preprints, Symp. on Observations, Data Assimilation, and Probabilistic Prediction, Orlando, FL, Amer. Meteor. Soc., 123–126.
Schneider, T., and S. M. Griffies, 1999: A conceptual framework for predictability studies. J. Climate, 12 , 3133–3155.
Shannon, C. E., 1948: A mathematical theory of communication. Bell Syst. Technol. J., 27 , 379–423. 623–656.
Smith, L. A., 2000: Disentangling uncertainty and error: On the predictability of nonlinear systems. Nonlinear Dynamics and Statistics, A. I. Mees, Ed., Birkhauser, 31–64.
Smith, L. A., C. Ziehmann, and K. Fraedrich, 1999: Uncertainty dynamics and predictability in chaotic systems. Quart. J. Roy. Meteor. Soc., 125 , 2855–2886.
Smith, L. A., M. S. Roulston, and J. Hordenberg, 2001: End to end ensemble forecasting: Towards evaluating the economic value of the ensemble prediction system. ECMWF Tech. Rep. 336.
Stensrud, D. J., H. E. Brooks, J. Du, M. S. Tracton, and E. Rogers, 1999: Using ensembles for short-range forecasting. Mon. Wea. Rev., 127 , 433–446.
Stephenson, D. B., and F. J. Doblas-Reyes, 2000: Statistical methods for interpreting Monte Carlo ensemble forecasts. Tellus, 52A , 300–322.
Swets, J. A., 1973: The relative operating characteristic in psychology. Science, 182 , 990–999.
Talagrand, O., R. Vautard, and B. Strauss, 1997: Evaluation of probabilistic prediction systems. Proc. ECMWF Workshop on Predictibility, Reading, United Kingdom, ECMWF, 1–25.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125 , 3297–3319.
Winkler, R. L., and A. H. Murphy, 1968: “Good” probability assessors. J. Appl. Meteor., 7 , 751–758.
APPENDIX
Skill Scores and Cost–Loss
This appendix derives the relationships between cost–loss scores and the quadratic (Brier) and logarithmic (ignorance) skill scores.
A plot of expected ignorance against expected Brier score for a binary event with p = 0.25. The curves are parameterized by the forecast probability f. The curves intersect when f = p. Model A has f = 0.05, and model B has f = 0.475
Citation: Monthly Weather Review 130, 6; 10.1175/1520-0493(2002)130<1653:EPFUIT>2.0.CO;2
(a) The observed temperature at London's Heathrow airport (thin line) and an average seasonal cycle (thick line). (b) The average ignorance of probabilistic forecasts of whether the temperature will be above or below the seasonal average. The daily forecasts were constructed using operational 51-member ECMWF ensembles
Citation: Monthly Weather Review 130, 6; 10.1175/1520-0493(2002)130<1653:EPFUIT>2.0.CO;2
“I beseech you, in the bowels of Christ, think it possible you may be mistaken.” (Oliver Cromwell in a letter to the General Assembly of the Church of Scotland, 3 August 1650.)