Comments on “The Discrete Brier and Ranked Probability Skill Scores”

Michael K. Tippett International Research Institute for Climate and Society, Palisades, New York

Search for other papers by Michael K. Tippett in
Current site
Google Scholar
PubMed
Close
Full access

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus/61 Route 9W, Palisades, NY 10964. Email: tippett@ iri.columbia.edu

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus/61 Route 9W, Palisades, NY 10964. Email: tippett@ iri.columbia.edu

1. Introduction

The ranked probability score (RPS) is the sum of the squared differences between cumulative forecast probabilities and cumulative observed probabilities, and measures both forecast reliability and resolution (Murphy 1973). The ranked probability skill score (RPSS) compares the RPS of a forecast with some reference forecast such as “climatology” (using past mean climatic values as the forecast), oriented so that RPSS < 0 (RPSS > 0) corresponds to a forecast that is less (more) skillful than climatology.

Categorical forecast probabilities are often estimated from ensembles of numerical model integrations by counting the number of ensemble members in each category. Finite ensemble size introduces sampling error into such probability estimates, and the RPSS of a reliable forecast model with finite ensemble size is an increasing function of ensemble size (Kumar et al. 2001; Tippett et al. 2007). A similar relation exists between correlation and ensemble size (Sardeshmukh et al. 2000). The dependence of RPSS on ensemble size makes it challenging to use RPSS to compare forecast models with different ensemble sizes. For instance, it may be difficult to know whether a forecast system has higher RPSS because it is based on a superior forecast model or because it uses a larger ensemble. This question often arises in the comparison of multimodel and single model forecasts (Hagedorn et al. 2005; Tippett and Barnston 2008). The dependence of RPSS on ensemble size is not a problem when comparing forecast quality. Improved RPSS is associated with improved forecast quality and is desirable whether it results from larger ensemble size or from a better forecast model.

Müller et al. (2005) recently introduced a resampling strategy to estimate the infinite-ensemble RPSS from the finite-ensemble RPSS and called this estimate the “debiased RPSS.” Weigel et al. (2007) derived an analytical formula for the debiased RPSS and proved that it is an unbiased estimate of the infinite-ensemble RPSS in the case of uncorrelated ensemble members, that is, forecasts without skill. Here it is proved that the debiased RPSS is an unbiased estimate of the infinite-ensemble RPSS for any reliable forecasts. It is shown that over- or underconfident forecasts introduce a dependence of the debiased RPSS on ensemble size. Simplification of the results of Weigel et al. (2007) shows that the debiased RPSS is a multicategory generalization of the result of Richardson (2001) for the Brier skill score.

2. RPSS and debiased RPSS

The RPS of a K-category probability forecast is
i1520-0493-136-9-3629-e1
where Pi is the forecast probability assigned to the ith category and Oi is 1 when the observation falls into the ith category and 0 otherwise. When forecast probabilities are computed by counting the number of ensemble members in each category, finite ensemble size results in sampling errors that increase RPS.
In the case of two categories, RPS is the Brier score. Richardson (2001) showed the dependence of the Brier score on ensemble size M in a reliable forecast system. Tippett et al. (2007) generalized that result to tercile categories and later (Tippett and Barnston 2008) to an arbitrary number of categories as
i1520-0493-136-9-3629-e2
indicating how decreasing ensemble size increases the expected RPS.
The RPSS is
i1520-0493-136-9-3629-e3
where RPSCl is the RPS of a reference forecast consisting of climatological probabilities and angle brackets denote averaging over forecasts. Sampling error causes RPSS to decrease. Using Eq. (2), the infinite-ensemble RPSS can be expressed in terms of the finite-ensemble RPSS as
i1520-0493-136-9-3629-e4
The strategy introduced by Müller et al. (2005) to estimate RPSS(∞) from RPSS(M) was to artificially increase the error in the reference forecast by computing climatological probabilities using the same number of samples as ensemble members and then to define a debiased RPSS, denoted RPSSD, by
i1520-0493-136-9-3629-e5
Müller et al. (2005) showed in numerical examples with reliable forecasts and tercile categories that RPSSD had little if any dependence on ensemble size.
By using Eq. (2), one can immediately see that RPSSD is the same as RPSS(∞) and is indeed an unbiased estimate for the infinite-ensemble RPSS for all reliable forecasts since
i1520-0493-136-9-3629-e6
The impact of sample size on expected RPS is multiplicative and independent of skill level. Therefore the ratio of the RPSS of two reliable forecast systems with the same ensemble size is independent of ensemble size.
In Müller et al. (2005), 〈RPSCl(M)〉 was computed by repeatedly sampling from the historical record. Weigel et al. (2007) computed 〈RPSCl(M)〉 analytically using properties of the multinomial distribution and expressed RPSSD as
i1520-0493-136-9-3629-e7
where
i1520-0493-136-9-3629-e8
and pi is the climatological probability of the ith category. In light of Eq. (4), it must be the case that
i1520-0493-136-9-3629-e9
To prove Eq. (9) directly, first the expression for D is simplified. From Eq. (12) of Weigel et al. (2007),
i1520-0493-136-9-3629-e10
where i is the M-member sample estimate of pi. Since the M-member sample estimates of the cumulative probabilities are binomially distributed, their means are Ci and their variances are Ci(1 − Ci)/M, where the cumulative climatological probability Ci is defined by
i1520-0493-136-9-3629-e11
Therefore, D has the simple form
i1520-0493-136-9-3629-e12
Next 〈RPSCl〉 is expressed in terms of the climatological categorical probabilities pi. Explicitly, 〈RPSCl〉 is
i1520-0493-136-9-3629-e13
The expected value of RPSCl is simply Eq. (13) summed over all possible outcomes of the observations, weighted by the probabilities of each outcome. That is,
i1520-0493-136-9-3629-e14
where the Kronecker delta δij is defined to be 1 when i = j and 0 otherwise. Direct manipulation of this expression gives
i1520-0493-136-9-3629-e15
thus proving Eq. (9).

3. Unreliable forecasts

However, the results above do not give any guidance about the dependence of RPSS on ensemble size when the forecasts are unreliable. Ferro et al. (2008) derive a more general estimator for RPSS that is applicable to under- and overconfident ensembles, as long as the ensemble members are “exchangeable.” Although Müller et al. (2005) states that RPSSD is an unbiased estimate of the infinite-ensemble RPSS and is independent of ensemble size, there was no explicit examination of the behavior of the RPSSD for unreliable forecasts. The behavior of RPSSD is investigated here in an example in which the forecasts are unreliable.

A simple univariate example is considered here in which the forecasts and observations are normally distributed. The expected correlation between the ensemble mean and observations is r, and the expected correlation between the ensemble mean and an ensemble member is rf; rf measures potential predictability, that is, the ability of the forecast model to predict itself. Explicitly, the observations are normally distributed with mean rs and variance 1 − r2, denoted N(rs, 1 − r2), and the forecast distribution is N(rfs, 1 − r2f ); the distribution of the random variable s is N(0, 1). The forecast is reliable when rf = r and overconfident (underconfident) when rf > r (rf < r).

Values of r and rf were chosen corresponding to reliable, weakly overconfident, very overconfident, weakly underconfident, and very underconfident forecast systems, as indicated in Table 1. The expected values of RPSS(M) and RPSSD for tercile-based categorical forecasts were computed from 106 simulations of the observations and forecast ensembles. Figure 1 shows the results as a function of ensemble size M. Figure 1a shows that RPSSD is, as proved, an unbiased estimate of RPSS(∞) independent of ensemble size. Figures 1b and 1c show that for overconfident forecasts RPSSD overestimates RPSS(∞), with the discrepancy between RPSSD and RPSS(∞) being greater than that between RPSS(M) and RPSS(∞) for very overconfident forecasts. There is some indication of the tendency of RPSSD to overestimate RPSS(∞) in Figs. 3a and 3b of Weigel et al. (2007), indicating model overconfidence. In the underconfident examples, RPSSD slightly underestimates RPSS(∞).

4. Summary

The ranked probability skill score measures the reliability and resolution of categorical probability forecasts relative to the climatology forecast (Murphy 1973). When categorical forecast probabilities are estimated from finite ensembles, sampling error negatively impacts RPSS (Kumar et al. 2001; Tippett et al. 2007). Weigel et al. (2007) recently derived an analytical formula for the debiased RPSS, an estimate of the infinite-ensemble RPSS in terms of the finite-ensemble RPSS, based on the resampling strategy of Müller et al. (2005). Here it has been proved that the debiased RPSS is an unbiased estimate of the infinite-ensemble RPSS for reliable forecasts only. Over- or underconfident forecasts introduce dependence of the debiased RPSS on ensemble size. Analysis of the results of Weigel et al. (2007) shows that the debiased RPSS is a multicategory generalization of the Brier skill score result of Richardson (2001).

Acknowledgments

The author thanks Simon Mason, Andreas Weigel, and Tony Barnston for their comments and suggestions. The author is supported by a grant/cooperative agreement from the National Oceanic and Atmospheric Administration (NA05OAR4311004). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

REFERENCES

  • Ferro, C. A. T., D. S. Richardson, and A. P. Weigel, 2008: On the effect of ensemble size on the discrete and continuous ranked probability scores. Meteor. Appl., 15 , 1925.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—I. Basic concept. Tellus, 57A , 219233.

    • Search Google Scholar
    • Export Citation
  • Kumar, A., A. G. Barnston, and M. P. Hoerling, 2001: Seasonal predictions, probabilistic verifications, and ensemble size. J. Climate, 14 , 16711676.

    • Search Google Scholar
    • Export Citation
  • Müller, W. A., C. Appenzeller, F. J. Doblas-Reyes, and M. A. Liniger, 2005: A debiased ranked probability skill score to evaluate probabilistic ensemble forecasts with small ensemble sizes. J. Climate, 18 , 15131523.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., 127 , 24732489.

    • Search Google Scholar
    • Export Citation
  • Sardeshmukh, P. D., G. P. Compo, and C. Penland, 2000: Changes of probability associated with El Niño. J. Climate, 13 , 42684286.

  • Tippett, M. K., and A. G. Barnston, 2008: Skill of multimodel ENSO probability forecasts. Mon. Wea. Rev., in press.

  • Tippett, M. K., A. G. Barnston, and A. W. Robertson, 2007: Estimation of seasonal precipitation tercile-based categorical probabilities from ensembles. J. Climate, 20 , 22102228.

    • Search Google Scholar
    • Export Citation
  • Weigel, A. P., M. A. Liniger, and C. Appenzeller, 2007: The discrete Brier and ranked probability skill scores. Mon. Wea. Rev., 135 , 118124.

    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

RPSS(∞) (thick line), RPSS(M) (dashed line), and RPSSD (thin line) plotted as function of ensemble size M for the cases listed in Table 1.

Citation: Monthly Weather Review 136, 9; 10.1175/2008MWR2594.1

Table 1.

Values of r and rf used in the numerical experiments.

Table 1.
Save
  • Ferro, C. A. T., D. S. Richardson, and A. P. Weigel, 2008: On the effect of ensemble size on the discrete and continuous ranked probability scores. Meteor. Appl., 15 , 1925.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., F. J. Doblas-Reyes, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—I. Basic concept. Tellus, 57A , 219233.

    • Search Google Scholar
    • Export Citation
  • Kumar, A., A. G. Barnston, and M. P. Hoerling, 2001: Seasonal predictions, probabilistic verifications, and ensemble size. J. Climate, 14 , 16711676.

    • Search Google Scholar
    • Export Citation
  • Müller, W. A., C. Appenzeller, F. J. Doblas-Reyes, and M. A. Liniger, 2005: A debiased ranked probability skill score to evaluate probabilistic ensemble forecasts with small ensemble sizes. J. Climate, 18 , 15131523.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., 127 , 24732489.

    • Search Google Scholar
    • Export Citation
  • Sardeshmukh, P. D., G. P. Compo, and C. Penland, 2000: Changes of probability associated with El Niño. J. Climate, 13 , 42684286.

  • Tippett, M. K., and A. G. Barnston, 2008: Skill of multimodel ENSO probability forecasts. Mon. Wea. Rev., in press.

  • Tippett, M. K., A. G. Barnston, and A. W. Robertson, 2007: Estimation of seasonal precipitation tercile-based categorical probabilities from ensembles. J. Climate, 20 , 22102228.

    • Search Google Scholar
    • Export Citation
  • Weigel, A. P., M. A. Liniger, and C. Appenzeller, 2007: The discrete Brier and ranked probability skill scores. Mon. Wea. Rev., 135 , 118124.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    RPSS(∞) (thick line), RPSS(M) (dashed line), and RPSSD (thin line) plotted as function of ensemble size M for the cases listed in Table 1.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 213 65 9
PDF Downloads 95 29 5