Sampling Uncertainty and Confidence Intervals for the Brier Score and Brier Skill Score

A. Allen Bradley IIHR-Hydroscience and Engineering, The University of Iowa, Iowa City, Iowa

Search for other papers by A. Allen Bradley in
Current site
Google Scholar
PubMed
Close
,
Stuart S. Schwartz Center for Urban Environmental Research and Education, University of Maryland, Baltimore County, Baltimore, Maryland

Search for other papers by Stuart S. Schwartz in
Current site
Google Scholar
PubMed
Close
, and
Tempei Hashino Department of Atmospheric and Ocean Sciences, University of Wisconsin—Madison, Madison, Wisconsin

Search for other papers by Tempei Hashino in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

For probability forecasts, the Brier score and Brier skill score are commonly used verification measures of forecast accuracy and skill. Using sampling theory, analytical expressions are derived to estimate their sampling uncertainties. The Brier score is an unbiased estimator of the accuracy, and an exact expression defines its sampling variance. The Brier skill score (with climatology as a reference forecast) is a biased estimator, and approximations are needed to estimate its bias and sampling variance. The uncertainty estimators depend only on the moments of the forecasts and observations, so it is easy to routinely compute them at the same time as the Brier score and skill score. The resulting uncertainty estimates can be used to construct error bars or confidence intervals for the verification measures, or perform hypothesis testing.

Monte Carlo experiments using synthetic forecasting examples illustrate the performance of the expressions. In general, the estimates provide very reliable information on uncertainty. However, the quality of an estimate depends on both the sample size and the occurrence frequency of the forecast event. The examples also illustrate that with infrequently occurring events, verification sample sizes of a few hundred forecast–observation pairs are needed to establish that a forecast is skillful because of the large uncertainties that exist.

Corresponding author address: A. Allen Bradley, IIHR-Hydroscience and Engineering, The University of Iowa, 107 C. Maxwell Stanley Hydraulics Laboratory, Iowa City, IA 52242. Email: allen-bradley@uiowa.edu

Abstract

For probability forecasts, the Brier score and Brier skill score are commonly used verification measures of forecast accuracy and skill. Using sampling theory, analytical expressions are derived to estimate their sampling uncertainties. The Brier score is an unbiased estimator of the accuracy, and an exact expression defines its sampling variance. The Brier skill score (with climatology as a reference forecast) is a biased estimator, and approximations are needed to estimate its bias and sampling variance. The uncertainty estimators depend only on the moments of the forecasts and observations, so it is easy to routinely compute them at the same time as the Brier score and skill score. The resulting uncertainty estimates can be used to construct error bars or confidence intervals for the verification measures, or perform hypothesis testing.

Monte Carlo experiments using synthetic forecasting examples illustrate the performance of the expressions. In general, the estimates provide very reliable information on uncertainty. However, the quality of an estimate depends on both the sample size and the occurrence frequency of the forecast event. The examples also illustrate that with infrequently occurring events, verification sample sizes of a few hundred forecast–observation pairs are needed to establish that a forecast is skillful because of the large uncertainties that exist.

Corresponding author address: A. Allen Bradley, IIHR-Hydroscience and Engineering, The University of Iowa, 107 C. Maxwell Stanley Hydraulics Laboratory, Iowa City, IA 52242. Email: allen-bradley@uiowa.edu

Save
  • Accadia, C., Mariani S. , Casaioli M. , Lavagnini A. , and Speranza A. , 2003: Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids. Wea. Forecasting, 18 , 918932.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Accadia, C., Mariani S. , Casaioli M. , Lavagnini A. , and Speranza A. , 2005: Verification of precipitation forecasts from two limited-area models over Italy and comparison with ECMWF forecasts using a resampling technique. Wea. Forecasting, 20 , 276300.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Benjamin, J., and Cornell C. , 1970: Probability, Statistics, and Decision for Civil Engineers. McGraw-Hill, 684 pp.

  • Bradley, A. A., Hashino T. , and Schwartz S. S. , 2003: Distributions-oriented verification of probability forecasts for small data samples. Wea. Forecasting, 18 , 903917.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bradley, A. A., Schwartz S. S. , and Hashino T. , 2004: Distributions-oriented verification of ensemble streamflow predictions. J. Hydrometeor., 5 , 532545.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78 , 13.

  • Briggs, W., 2005: A general method of incorporating forecast cost and loss in value scores. Mon. Wea. Rev., 133 , 33933397.

  • Briggs, W., and Ruppert D. , 2005: Assessing the skill of yes/no forecasts. Biometrics, 61 , 799807.

  • Carpenter, T., and Georgakakos K. , 2001: Assessment of Folsom Lake response to historical and potential future climate scenarios: 1. forecasting. J. Hydrol., 249 , 148175.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Doblas-Reyes, F., Pavan V. , and Stephenson D. , 2003: The skill of multi-model seasonal forecasts of the wintertime North Atlantic oscillation. Climate Dyn., 21 , 501514.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ebert, E. E., Wilson L. J. , Brown B. G. , Nurmi P. , Brooks H. E. , Bally J. , and Jaeneke M. , 2004: Verification of nowcasts from the WWRP Sydney 2000 Forecast Demonstration Project. Wea. Forecasting, 19 , 7396.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Efron, B., 1981: Nonparametric estimates of standard error: The jacknife, the bootstrap and other methods. Biometrika, 68 , 589599.

  • Ferro, C. A., 2007: Comparing probabilistic forecasting systems with the Brier score. Wea. Forecasting, 22 , 10761088.

  • Hamill, T. M., 1999: Hypothesis tests for evaluating numerical precipitation forecasts. Wea. Forecasting, 14 , 155167.

  • Hamill, T. M., and Juras J. , 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132 , 29052923.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jolliffe, I., 2007: Uncertainty and inference for verification measures. Wea. Forecasting, 22 , 637650.

  • Jolliffe, I., and Stephenson D. , 2003: Introduction. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. Jolliffe and D. Stephenson, Eds., John Wiley, 1–12.

    • Search Google Scholar
    • Export Citation
  • Kane, T. L., and Brown B. G. , 2000: Confidence intervals for some verification measures—A survey of several methods. Preprints, 15th Conf. on Probability and Statistics in the Atmospheric Sciences, Asheville, NC, Amer. Meteor. Soc., 46–49.

  • Kenney, J. F., and Keeping E. S. , 1951: Mathematics of Statistics, Part 2. 2nd ed. Van Nostrand, 429 pp.

  • Kruger, A., Khandelwal S. , and Bradley A. A. , 2007: Ahpsver: A Web-based system for hydrologic forecast verification. Comput. Geosci., 33 , 739748.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., and Long D. , 1991: Beta likelihood models of probabilistic forecasts. Int. J. Forecasting, 7 , 4755.

  • Mason, S. J., 2004: On using climatology as a reference strategy in the Brier and ranked probability skill scores. Mon. Wea. Rev., 132 , 18911895.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mason, S. J., and Mimmack G. M. , 1992: The use of bootstrap confidence intervals for the correlation coefficient in climatology. Theor. Appl. Climatol., 45 , 229233.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mason, S. J., and Graham N. E. , 2002: Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Quart. J. Roy. Meteor. Soc., 128 , 21452166.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1997: Forecast verification. Economic Value of Weather and Climate Forecasts, R. Katz and A. H. Murphy, Eds., Cambridge University Press, 19–74.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and Winkler R. L. , 1992: Diagnostic verification of probability forecasts. Int. J. Forecasting, 7 , 435455.

  • Murphy, A. H., and Wilks D. S. , 1998: A case study of the use of statistical models in forecast verification: Precipitation probability forecasts. Wea. Forecasting, 13 , 795810.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schwartz, S. S., 1992: Verifying probabilistic water supply outlooks for the Potomac River basin. Preprints, 28th Conf. and Symp. on Managing Water Resources during Global Change, Reno, NV, American Water Resources Association, 153–161.

  • Seaman, R., Mason I. , and Woodcook F. , 1996: Confidence intervals for some performance measures of yes–no forecasts. Aust. Meteor. Mag., 45 , 4953.

    • Search Google Scholar
    • Export Citation
  • Stephenson, D. B., 2000: Use of the “odds ratio” for diagnosing forecast skill. Wea. Forecasting, 15 , 221232.

  • Thornes, J. E., and Stephenson D. B. , 2001: How to judge the quality and value of weather forecast products. Meteor. Appl., 8 , 307314.

  • Toth, Z., Talagrand O. , Candille G. , and Zhu Y. , 2003: Probability and ensemble forecasts. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. Jolliffe and D. Stephenson, Eds., John Wiley, 137–163.

    • Search Google Scholar
    • Export Citation
  • Welles, E., Sorooshian S. , Carter G. , and Olsen B. , 2007: Hydrologic verification: A call for action and collaboration. Bull. Amer. Meteor. Soc., 88 , 503511.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1996: Statistical significance of long-range “optimal climate normal” temperature and precipitation forecasts. J. Climate, 9 , 827839.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. International Geophysics Series, Vol. 91, Academic Press, 648 pp.

    • Search Google Scholar
    • Export Citation
  • Zhang, H., and Casey T. , 2000: Verification of categorical probability forecasts. Wea. Forecasting, 15 , 8089.

  • Zhu, Y. J., 2005: Ensemble forecast: A new approach to uncertainty and predictability. Adv. Atmos. Sci., 22 , 781788.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 2961 1012 180
PDF Downloads 2295 668 56