• Anderson, D. L., and Coauthors, 2003: Comparison of the ECMWF seasonal forecast systems 1 and 2, including the relative performance for the 1997/8 El Niño. ECMWF Tech. Memo. 404, 93 pp.

  • Buizza, R., and T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction. Mon. Wea. Rev, 126 , 25032518.

  • Buizza, R., T. Pertoliagis, T. N. Palmer, J. Barkmeijer, M. Hamrud, A. Hollingsworth, A. Simmons, and N. Wedi, 1998: Impact of model resolution and ensemble size on the performance of an ensemble prediction system. Quart. J. Roy. Meteor. Soc, 124 , 19351960.

    • Search Google Scholar
    • Export Citation
  • Dequé, M., 1997: Ensemble size for numerical seasonal forecasts. Tellus, 49A , 7486.

  • Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor, 8 , 985987.

  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15 , 559570.

    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and D. B. Stephenson, 2003: Forecast Verification: A Practitioner's Guide in Atmospheric Science. Wiley, 240 pp.

  • Kumar, A., A. G. Barnston, and M. P. Hoerling, 2001: Seasonal predictions, probabilistic verifications, and ensemble size. J. Climate, 14 , 16711676.

    • Search Google Scholar
    • Export Citation
  • Marsagli, C., A. Montani, F. Nerozzi, T. Paccagnella, S. Tibaldi, F. Molteni, and R. Buizza, 2001: A strategy for high resolution ensemble prediction II: Limited-area experiments in four Alpine flood events. Quart. J. Roy. Meteor. Soc, 127 , 20952115.

    • Search Google Scholar
    • Export Citation
  • Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag, 30 , 291303.

  • Mason, S. J., 2004: On using climatology as a reference strategy in the Brier and ranked probability skill scores. Mon. Wea. Rev, 132 , 18911895.

    • Search Google Scholar
    • Export Citation
  • Müller, W. A., C. Appenzeller, and C. Schär, 2004: Probabilistic seasonal prediction of the winter North Atlantic Oscillation and its impact on near surface temperature. Climate Dyn, doi:10.1007/s00382-004-0492-z.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1969: On the ranked probability skill score. J. Appl. Meteor, 8 , 988989.

  • Murphy, A. H., 1971: A note on the ranked probability skills score. J. Appl. Meteor, 10 , 155156.

  • Nichollis, N., 2001: The insignificance of significance testing. Bull. Amer. Meteor. Soc, 81 , 981986.

  • Palmer, T. N., C. Brankovic, and D. S. Richardson, 2000: A probability and decision-model analysis of PROVOST seasonal multi-model ensemble integrations. Quart. J. Roy. Meteor. Soc, 126 , 20132033.

    • Search Google Scholar
    • Export Citation
  • Swets, J. A., 1973: The relative operating characteristic in psychology. Science, 182 , 9901000.

  • Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC. The generation of perturbations. Bull. Amer. Meteor. Soc, 74 , 23172330.

  • Tracton, M. S., and E. Kalnay, 1993: Operational ensemble prediction at the National Meteorological Center: Practical aspects. Wea. Forecasting, 8 , 379398.

    • Search Google Scholar
    • Export Citation
  • Unger, D. A., 1985: A method to estimate the continuous ranked probability score. Preprints, Ninth Conf. on Probability and Statistics in Atmospheric Science, Virginia Beach, VA, Amer. Meteor. Soc., 206–213.

  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Vol. 59, International Geophysics Series, Academic Press, 467 pp.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 463 151 12
PDF Downloads 304 94 10

A Debiased Ranked Probability Skill Score to Evaluate Probabilistic Ensemble Forecasts with Small Ensemble Sizes

View More View Less
  • 1 Swiss Federal Office of Meteorology and Climatology (MeteoSwiss), Zürich, Switzerland
  • | 2 European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom
  • | 3 Swiss Federal Office of Meteorology and Climatology (MeteoSwiss), Zürich, Switzerland
Restricted access

Abstract

The ranked probability skill score (RPSS) is a widely used measure to quantify the skill of ensemble forecasts. The underlying score is defined by the quadratic norm and is comparable to the mean squared error (mse) but it is applied in probability space. It is sensitive to the shape and the shift of the predicted probability distributions. However, the RPSS shows a negative bias for ensemble systems with small ensemble size, as recently shown. Here, two strategies are explored to tackle this flaw of the RPSS. First, the RPSS is examined for different norms L (RPSSL). It is shown that the RPSSL=1 based on the absolute rather than the squared difference between forecasted and observed cumulative probability distribution is unbiased; RPSSL defined with higher-order norms show a negative bias. However, the RPSSL=1 is not strictly proper in a statistical sense. A second approach is then investigated, which is based on the quadratic norm but with sampling errors in climatological probabilities considered in the reference forecasts. This technique is based on strictly proper scores and results in an unbiased skill score, which is denoted as the debiased ranked probability skill score (RPSSD) hereafter. Both newly defined skill scores are independent of the ensemble size, whereas the associated confidence intervals are a function of the ensemble size and the number of forecasts.

The RPSSL=1 and the RPSSD are then applied to the winter mean [December–January–February (DJF)] near-surface temperature predictions of the ECMWF Seasonal Forecast System 2. The overall structures of the RPSSL=1 and the RPSSD are more consistent and largely independent of the ensemble size, unlike the RPSSL=2. Furthermore, the minimum ensemble size required to predict a climate anomaly given a known signal-to-noise ratio is determined by employing the new skill scores. For a hypothetical setup comparable to the ECMWF hindcast system (40 members and 15 hindcast years), statistically significant skill scores were only found for a signal-to-noise ratio larger than ∼0.3.

Corresponding author address: Dr. W. A. Müller, Kirchgasse 49, D-79291 Merdingen, Germany. Email: climate@gmx.de

Abstract

The ranked probability skill score (RPSS) is a widely used measure to quantify the skill of ensemble forecasts. The underlying score is defined by the quadratic norm and is comparable to the mean squared error (mse) but it is applied in probability space. It is sensitive to the shape and the shift of the predicted probability distributions. However, the RPSS shows a negative bias for ensemble systems with small ensemble size, as recently shown. Here, two strategies are explored to tackle this flaw of the RPSS. First, the RPSS is examined for different norms L (RPSSL). It is shown that the RPSSL=1 based on the absolute rather than the squared difference between forecasted and observed cumulative probability distribution is unbiased; RPSSL defined with higher-order norms show a negative bias. However, the RPSSL=1 is not strictly proper in a statistical sense. A second approach is then investigated, which is based on the quadratic norm but with sampling errors in climatological probabilities considered in the reference forecasts. This technique is based on strictly proper scores and results in an unbiased skill score, which is denoted as the debiased ranked probability skill score (RPSSD) hereafter. Both newly defined skill scores are independent of the ensemble size, whereas the associated confidence intervals are a function of the ensemble size and the number of forecasts.

The RPSSL=1 and the RPSSD are then applied to the winter mean [December–January–February (DJF)] near-surface temperature predictions of the ECMWF Seasonal Forecast System 2. The overall structures of the RPSSL=1 and the RPSSD are more consistent and largely independent of the ensemble size, unlike the RPSSL=2. Furthermore, the minimum ensemble size required to predict a climate anomaly given a known signal-to-noise ratio is determined by employing the new skill scores. For a hypothetical setup comparable to the ECMWF hindcast system (40 members and 15 hindcast years), statistically significant skill scores were only found for a signal-to-noise ratio larger than ∼0.3.

Corresponding author address: Dr. W. A. Müller, Kirchgasse 49, D-79291 Merdingen, Germany. Email: climate@gmx.de

Save