A User-Focused Approach to Evaluating Probabilistic and Categorical Forecasts

Nicholas Loveday aBureau of Meteorology, Melbourne, Victoria, Australia

Search for other papers by Nicholas Loveday in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0009-0000-5796-7069
,
Robert Taggart bBureau of Meteorology, Sydney, New South Wales, Australia

Search for other papers by Robert Taggart in
Current site
Google Scholar
PubMed
Close
, and
Mohammadreza Khanarmuei cBureau of Meteorology, Brisbane, Queensland, Australia

Search for other papers by Mohammadreza Khanarmuei in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

A user-focused verification approach for evaluating probability forecasts of binary outcomes (also known as probabilistic classifiers) is demonstrated that (i) is based on proper scoring rules, (ii) focuses on user decision thresholds, and (iii) provides actionable insights. It is argued that when categorical performance diagrams and the critical success index are used to evaluate overall predictive performance, rather than the discrimination ability of probabilistic forecasts, they may produce misleading results. Instead, Murphy diagrams are shown to provide a better understanding of the overall predictive performance as a function of user probabilistic decision threshold. We illustrate how to select a proper scoring rule, based on the relative importance of different user decision thresholds, and how this choice impacts scores of overall predictive performance and supporting measures of discrimination and calibration. These approaches and ideas are demonstrated using several probabilistic thunderstorm forecast systems as well as synthetic forecast data. Furthermore, a fair method for comparing the performance of probabilistic and categorical forecasts is illustrated using the fixed risk multicategorical (FIRM) score, which is a proper scoring rule directly connected to values on the Murphy diagram. While the methods are illustrated using thunderstorm forecasts, they are applicable for evaluating probabilistic forecasts for any situation with binary outcomes.

Significance Statement

Recently, several papers have presented verification results for probabilistic forecasts using so-called categorical performance diagrams, which summarize multiple verification metrics. While categorical performance diagrams measure discrimination ability, we demonstrate how they can potentially lead to incorrect conclusions when evaluating overall predictive performance of probabilistic forecasts. By reviewing recent advances in the statistical literature, we show a comprehensive approach for the meteorological community that (i) does not reward a forecaster who “hedges” their forecast, (ii) focuses on the importance of the forecast user’s decision threshold(s), and (iii) provides actionable insights. Additionally, we present an approach for fairly comparing the skill of categorical forecasts to probabilistic forecasts.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Nicholas Loveday, nicholas.loveday@bom.gov.au

Abstract

A user-focused verification approach for evaluating probability forecasts of binary outcomes (also known as probabilistic classifiers) is demonstrated that (i) is based on proper scoring rules, (ii) focuses on user decision thresholds, and (iii) provides actionable insights. It is argued that when categorical performance diagrams and the critical success index are used to evaluate overall predictive performance, rather than the discrimination ability of probabilistic forecasts, they may produce misleading results. Instead, Murphy diagrams are shown to provide a better understanding of the overall predictive performance as a function of user probabilistic decision threshold. We illustrate how to select a proper scoring rule, based on the relative importance of different user decision thresholds, and how this choice impacts scores of overall predictive performance and supporting measures of discrimination and calibration. These approaches and ideas are demonstrated using several probabilistic thunderstorm forecast systems as well as synthetic forecast data. Furthermore, a fair method for comparing the performance of probabilistic and categorical forecasts is illustrated using the fixed risk multicategorical (FIRM) score, which is a proper scoring rule directly connected to values on the Murphy diagram. While the methods are illustrated using thunderstorm forecasts, they are applicable for evaluating probabilistic forecasts for any situation with binary outcomes.

Significance Statement

Recently, several papers have presented verification results for probabilistic forecasts using so-called categorical performance diagrams, which summarize multiple verification metrics. While categorical performance diagrams measure discrimination ability, we demonstrate how they can potentially lead to incorrect conclusions when evaluating overall predictive performance of probabilistic forecasts. By reviewing recent advances in the statistical literature, we show a comprehensive approach for the meteorological community that (i) does not reward a forecaster who “hedges” their forecast, (ii) focuses on the importance of the forecast user’s decision threshold(s), and (iii) provides actionable insights. Additionally, we present an approach for fairly comparing the skill of categorical forecasts to probabilistic forecasts.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Nicholas Loveday, nicholas.loveday@bom.gov.au
Save
  • Ayer, M., H. D. Brunk, G. M. Ewing, W. T. Reid, and E. Silverman, 1955: An empirical distribution function for sampling with incomplete information. Ann. Math. Stat., 26, 641647, https://doi.org/10.1214/aoms/1177728423.

    • Search Google Scholar
    • Export Citation
  • Boyd, K., V. S. Costa, J. Davis, and C. D. Page, 2012: Unachievable region in precision-recall space and its effect on empirical evaluation. Proc. 29th Int. Conf. on Machine Learning, Edinburgh, Scotland, Association for Computing Machinery, 1619–1626, https://dl.acm.org/doi/10.5555/3042573.3042780.

  • Boyd, K., K. H. Eng, and C. D. Page, 2013: Area under the precision-recall curve: Point estimates and confidence intervals. Machine Learning and Knowledge Discovery in Databases, H. Blockeel et al., Eds., Lecture Notes in Computer Science, Vol. 8190, Springer, 451–466, https://doi.org/10.1007/978-3-642-40994-3_29.

  • Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78 (1), 13, https://doi.org/10.1175/1520-0493(1950)078%3C0001:VOFEIT%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Bright, D. R., and J. S. Grams, 2009: Short Range Ensemble Forecast (SREF) calibrated thunderstorm probability forecasts: 2007–2008 verification and recent enhancements. Preprints, Third Conf. on Meteorological Applications of Lightning Data, Phoenix, AZ, Amer. Meteor. Soc., 6.3, https://www.spc.noaa.gov/publications/bright/calthun.pdf.

  • Bright, D. R., M. S. Wandishin, R. E. Jewell, and S. J. Weiss, 2005: A physically based parameter for lightning prediction and its calibration in ensemble forecasts. Preprints, Conf. on Meteor. Appl. of Lightning Data, San Diego, CA, Amer. Meteor. Soc., 4.3, https://ams.confex.com/ams/pdfpapers/84173.pdf.

  • Bröcker, J., 2012: Estimating reliability and resolution of probability forecasts through decomposition of the empirical score. Climate Dyn., 39, 655667, https://doi.org/10.1007/s00382-011-1191-1.

    • Search Google Scholar
    • Export Citation
  • Bröcker, J., and L. A. Smith, 2007: Increasing the reliability of reliability diagrams. Wea. Forecasting, 22, 651661, https://doi.org/10.1175/WAF993.1.

    • Search Google Scholar
    • Export Citation
  • Brown, K., and P. Buchanan, 2019: An objective verification system for thunderstorm risk forecasts. Meteor. Appl., 26, 140152, https://doi.org/10.1002/met.1748.

    • Search Google Scholar
    • Export Citation
  • Brummer, N., and J. du Preez, 2013: The PAV algorithm optimizes binary proper scoring rules. arXiv, 1304.2331v1, https://doi.org/10.48550/arXiv.1304.2331.

  • Brunet, D., D. Sills, and N. Driedger, 2019: On the evaluation of probabilistic thunderstorm forecasts and the automated generation of thunderstorm threat areas during Environment Canada pan am science showcase. Wea. Forecasting, 34, 12951319, https://doi.org/10.1175/WAF-D-19-0011.1.

    • Search Google Scholar
    • Export Citation
  • Charba, J. P., F. G. Samplatsky, A. J. Kochenash, P. E. Shafer, J. E. Ghirardelli, and C. Huang, 2019: LAMP upgraded convection and total lightning probability and “potential” guidance for the conterminous United States. Wea. Forecasting, 34, 15191545, https://doi.org/10.1175/WAF-D-19-0015.1.

    • Search Google Scholar
    • Export Citation
  • Cintineo, J. L., M. J. Pavolonis, J. M. Sieglaff, L. Cronce, and J. Brunner, 2020: NOAA ProbSevere v2.0-ProbHail, ProbWind, and ProbTor. Wea. Forecasting, 35, 15231543, https://doi.org/10.1175%2FWaf-D-19-0242.1.

    • Search Google Scholar
    • Export Citation
  • Craven, J. P., and Coauthors, 2018: Overview of national blend of models version 3.1. Part I: Capabilities and an outlook for future upgrades. 25th Conf. on Probability and Statistics, Austin, TX, Amer. Meteor. Soc., 7.3, https://ams.confex.com/ams/98Annual/webprogram/Paper325347.html.

  • Dance, S., E. Ebert, and D. Scurrah, 2010: Thunderstorm strike probability nowcasting. J. Atmos. Oceanic Technol., 27, 7993, https://doi.org/10.1175/2009JTECHA1279.1.

    • Search Google Scholar
    • Export Citation
  • Davis, J., and M. Goadrich, 2006: The relationship between precision-recall and ROC curves. ICML06: Proc. 23rd Int. Conf. on Machine Learning, Pittsburgh, PA, Association for Computing Machinery, 233–240, https://doi.org/10.1145/1143844.1143874.

  • Diebold, F. X., and R. S. Mariano, 1995: Comparing predictive accuracy. J. Bus. Econ. Stat., 13, 253263, https://doi.org/10.1080/07350015.1995.10524599.

    • Search Google Scholar
    • Export Citation
  • Dimitriadis, T., T. Gneiting, and A. I. Jordan, 2021: Stable reliability diagrams for probabilistic classifiers. Proc. Natl. Acad. Sci. USA, 118, e2016191118, https://doi.org/10.1073/pnas.2016191118.

    • Search Google Scholar
    • Export Citation
  • Dimitriadis, T., T. Gneiting, A. I. Jordan, and P. Vogel, 2024: Evaluating probabilistic classifiers: The triptych. Int. J. Forecasting, 40, 11011122, https://doi.org/10.1016/j.ijforecast.2023.09.007.

    • Search Google Scholar
    • Export Citation
  • Ehm, W., T. Gneiting, A. Jordan, and F. Krüger, 2016: Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. J. Roy. Stat. Soc., 78B, 505562, https://doi.org/10.1111/rssb.12154.

    • Search Google Scholar
    • Export Citation
  • Fawcett, T., and A. Niculescu-Mizil, 2007: PAV and the ROC convex hull. Mach. Learn., 68, 97106, https://doi.org/10.1007/s10994-007-5011-0.

    • Search Google Scholar
    • Export Citation
  • Ferro, C. A. T., and T. E. Fricker, 2012: A bias-corrected decomposition of the Brier score. Quart. J. Roy. Meteor. Soc., 138, 19541960, https://doi.org/10.1002/qj.1924.

    • Search Google Scholar
    • Export Citation
  • Flora, M. L., C. K. Potvin, P. S. Skinner, S. Handler, and A. McGovern, 2021: Using machine learning to generate storm-scale probabilistic guidance of severe weather hazards in the warn-on-forecast system. Mon. Wea. Rev., 149, 15351557, https://doi.org/10.1175/MWR-D-20-0194.1.

    • Search Google Scholar
    • Export Citation
  • Foley, M., and N. Loveday, 2020: Comparison of single-valued forecasts in a user-oriented framework. Wea. Forecasting, 35, 10671080, https://doi.org/10.1175/WAF-D-19-0248.1.

    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., II, A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 18191840, https://doi.org/10.1175/WAF-D-17-0010.1.

    • Search Google Scholar
    • Export Citation
  • Gallo, B. T., and Coauthors, 2022: Exploring the watch-to-warning space: Experimental outlook performance during the 2019 spring forecasting experiment in NOAA’s Hazardous Weather Testbed. Wea. Forecasting, 37, 617637, https://doi.org/10.1175/WAF-D-21-0171.1.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., 2011: Making and evaluating point forecasts. J. Amer. Stat. Assoc., 106, 746762, https://doi.org/10.1198/jasa.2011.r10138.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359378, https://doi.org/10.1198/016214506000001437.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and R. Ranjan, 2011: Comparing density forecasts using threshold-and quantile-weighted scoring rules. J. Bus. Econ. Stat., 29, 411422, https://doi.org/10.1198/jbes.2010.08110.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and P. Vogel, 2022: Receiver operating characteristic (ROC) curves: Equivalences, beta model, and minimum distance estimation. Mach. Learn., 111, 21472159, https://doi.org/10.1007/s10994-021-06115-2.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and Coauthors, 2023: Model diagnostics and forecast evaluation for quantiles. Annu. Rev. Stat. Appl., 10, 597621, https://doi.org/10.1146/annurev-statistics-032921-020240.

    • Search Google Scholar
    • Export Citation
  • Good, I. J., 1952: Rational decisions. J. Roy. Stat. Soc., 14B, 107114, https://doi.org/10.1111/j.2517-6161.1952.tb00104.x.

  • Griffiths, D., and A. Jayawardena, 2022: AutoFcst: A coherent set of forecast grids. Bureau of Meteorology Research Rep.-071, 21 pp., http://www.bom.gov.au/research/publications/researchreports/BRR-071.pdf.

  • Hand, D. J., 2009: Measuring classifier performance: A coherent alternative to the area under the ROC curve. Mach. Learn., 77, 103123, https://doi.org/10.1007/s10994-009-5119-5.

    • Search Google Scholar
    • Export Citation
  • Hand, D. J., and C. Anagnostopoulos, 2013: When is the area under the receiver operating characteristic curve an appropriate measure of classifier performance? Pattern Recognit. Lett., 34, 492495, https://doi.org/10.1016/j.patrec.2012.12.004.

    • Search Google Scholar
    • Export Citation
  • Hand, D. J., and C. Anagnostopoulos, 2023: Notes on the H-measure of classifier performance. Adv. Data Anal. Classif., 17, 109124, https://doi.org/10.1007/s11634-021-00490-3.

    • Search Google Scholar
    • Export Citation
  • Hand, D. J., P. Christen, and N. Kirielle, 2021: F*: An interpretable transformation of the F-measure. Mach. Learn., 110, 451456, https://doi.org/10.1007/s10994-021-05964-1.

    • Search Google Scholar
    • Export Citation
  • Harrison, D. R., M. S. Elliott, I. L. Jirak, and P. T. Marsh, 2022: Utilizing the high-resolution ensemble forecast system to produce calibrated probabilistic thunderstorm guidance. Wea. Forecasting, 37, 11031115, https://doi.org/10.1175/WAF-D-22-0001.1.

    • Search Google Scholar
    • Export Citation
  • Hart, T., 2019: The road taken: An account, with lessons learned, of a 15 year journey in translating advances in science and technology into streamlining the weather forecast process and enhancing bureau services to Australian community. Australian Bureau of Meteorology, 82 pp., http://www.bom.gov.au/research/publications/otherreports/FSEP_to_NexGenFWS.pdf.

  • Hering, A. S., and M. G. Genton, 2011: Comparing spatial predictions. Technometrics, 53, 414425, https://doi.org/10.1198/TECH.2011.10136.

    • Search Google Scholar
    • Export Citation
  • Hernández-Orallo, J., P. A. Flach, and C. F. Ramirez, 2011: Brier curves: A new cost-based visualisation of classifier performance. 28th Int. Conf. on Machine Learning, Seattle, WA, Association for Computing Machinery, 585–592, https://dmip.webs.upv.es/papers/BrierCurvespres.pdf.

  • Hsu, W.-R., and A. H. Murphy, 1986: The attributes diagram a geometrical framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2, 285293, https://doi.org/10.1016/0169-2070(86)90048-8.

    • Search Google Scholar
    • Export Citation
  • Jordan, A. I., A. Mühlemann, and J. F. Ziegel, 2022: Characterizing the optimal solutions to the isotonic regression problem for identifiable functionals. Ann. Inst. Stat. Math., 74, 489514, https://doi.org/10.1007/s10463-021-00808-0.

    • Search Google Scholar
    • Export Citation
  • Just, A., and M. Foley, 2020: Streamlining the graphical forecast process. J. South. Hemisphere Earth Syst. Sci., 70, 108113, https://doi.org/10.1071/es19047.

    • Search Google Scholar
    • Export Citation
  • Lagerquist, R., A. McGovern, C. R. Homeyer, D. J. Gagne II, and T. Smith, 2020: Deep learning on three-dimensional multiscale data for next-hour tornado prediction. Mon. Wea. Rev., 148, 28372861, https://doi.org/10.1175/mwr-d-19-0372.1.

    • Search Google Scholar
    • Export Citation
  • Laugesen, R., M. Thyer, D. McInerney, and D. Kavetski, 2023: Flexible forecast value metric suitable for a wide range of decisions: Application using probabilistic subseasonal streamflow forecasts. Hydrol. Earth Syst. Sci., 27, 873893, https://doi.org/10.5194/hess-27-873-2023.

    • Search Google Scholar
    • Export Citation
  • Lerch, S., T. L. Thorarinsdottir, F. Ravazzolo, and T. Gneiting, 2017: Forecaster’s dilemma: Extreme events and forecast evaluation. Stat. Sci., 32, 106127, https://doi.org/10.1214/16-sts588.

    • Search Google Scholar
    • Export Citation
  • Loken, E. D., A. J. Clark, M. Xue, and F. Kong, 2017: Comparison of next-day probabilistic severe weather forecasts from coarse-and fine-resolution CAMs and a convection-allowing ensemble. Wea. Forecasting, 32, 14031421, https://doi.org/10.1175/waf-d-16-0200.1.

    • Search Google Scholar
    • Export Citation
  • Manning, C. D., and H. Schutze, 1999: Foundations of Statistical Natural Language Processing. MIT Press, 717 pp.

  • Marzban, C., 2004: The ROC curve and the area under it as performance measures. Wea. Forecasting, 19, 11061114, https://doi.org/10.1175/825.1.

    • Search Google Scholar
    • Export Citation
  • Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30, 291303.

  • Mason, I., 1989: Dependence of the critical success index on sample climate and threshold probability. Aust. Meteor. Mag., 37, 7581.

  • Merkle, E. C., and M. Steyvers, 2013: Choosing a strictly proper scoring rule. Decis. Anal., 10, 292304, https://doi.org/10.1287/deca.2013.0280.

    • Search Google Scholar
    • Export Citation
  • Miller, W. J. S., and Coauthors, 2022: Exploring the usefulness of downscaling free forecasts from the warn-on-forecast system. Wea. Forecasting, 37, 181203, https://doi.org/10.1175/WAF-D-21-0079.1.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600, https://doi.org/10.1175/1520-0450(1973)012%3C0595:ANVPOT%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8, 281293, https://doi.org/10.1175/1520-0434(1993)008%3C0281:WIAGFA%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1998: The early history of probability forecasts: Some extensions and clarifications. Wea. Forecasting, 13, 515, https://doi.org/10.1175/1520-0434(1998)013%3C0005:TEHOPF%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Pesce, L. L., C. E. Metz, and K. S. Berbaum, 2010: On the convexity of ROC curves estimated from radiological test results. Acad. Radiol., 17, 960968, https://doi.org/10.1016/j.acra.2010.04.001.

    • Search Google Scholar
    • Export Citation
  • Price, B., and M. Foley, 2021: How thunderstorms appear in weather forecasts. 24th Int. Congress on Modelling and Simulation, Sydney, NSW, Australia, GEWEX, 456–462, https://www.mssanz.org.au/modsim2021/papers/I2/price.pdf.

  • Radford, J. T., and G. M. Lackmann, 2023: Improving High-Resolution Ensemble Forecast (HREF) system mesoscale snowband forecasts with random forests. Wea. Forecasting, 38, 16951706, https://doi.org/10.1175/WAF-D-23-0005.1.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 126, 649667, https://doi.org/10.1002/qj.49712656313.

    • Search Google Scholar
    • Export Citation
  • Rodwell, M. J., J. Hammond, S. Thornton, and D. S. Richardson, 2020: User decisions, and how these could guide developments in probabilistic forecasting. Quart. J. Roy. Meteor. Soc., 146, 32663284, https://doi.org/10.1002/qj.3845.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601608, https://doi.org/10.1175/2008WAF2222159.1.

    • Search Google Scholar
    • Export Citation
  • Roelofs, R., N. Cain, J. Shlens, and M. C. Mozer, 2022: Mitigating bias in calibration error estimation. Int. Conf. on Artificial Intelligence and Statistics, Valencia, Spain, PMLR, 4036–4054, https://proceedings.mlr.press/v151/roelofs22a.html.

  • Rothfusz, L. P., R. Schneider, D. Novak, K. Klockow-McClain, A. E. Gerard, C. Karstens, G. J. Stumpf, and T. M. Smith, 2018: FACETs: A proposed next-generation paradigm for high-impact weather forecasting. Bull. Amer. Meteor. Soc., 99, 20252043, https://doi.org/10.1175/BAMS-D-16-0100.1.

    • Search Google Scholar
    • Export Citation
  • Rudlosky, S. D., 2015: Evaluating ENTLN performance relative to TRMM/LIS. J. Oper. Meteor., 3, 1120, https://doi.org/10.15191/nwajom.2015.0302.

    • Search Google Scholar
    • Export Citation
  • Saito, T., and M. Rehmsmeier, 2015: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10, e0118432, https://doi.org/10.1371/journal.pone.0118432.

    • Search Google Scholar
    • Export Citation
  • Sandmæl, T. N., and Coauthors, 2023: The tornado probability algorithm: A probabilistic machine learning tornadic circulation detection algorithm. Wea. Forecasting, 38, 445466, https://doi.org/10.1175/WAF-D-22-0123.1.

    • Search Google Scholar
    • Export Citation
  • Schervish, M. J., 1989: A general method for comparing probability assessors. Ann. Stat., 17, 18561879, https://doi.org/10.1214/aos/1176347398.

    • Search Google Scholar
    • Export Citation
  • Shuford, E. H., Jr., A. Albert, and H. Edward Massengill, 1966: Admissible probability measurement procedures. Psychometrika, 31, 125145, https://doi.org/10.1007/BF02289503.

    • Search Google Scholar
    • Export Citation
  • Simon, T., P. Fabsic, G. J. Mayr, N. Umlauf, and A. Zeileis, 2018: Probabilistic forecasting of thunderstorms in the eastern Alps. Mon. Wea. Rev., 146, 29993009, https://doi.org/10.1175/MWR-D-17-0366.1.

    • Search Google Scholar
    • Export Citation
  • Stephenson, D. B., C. A. S. Coelho, and I. T. Jolliffe, 2008: Two extra components in the brier score decomposition. Wea. Forecasting, 23, 752757, https://doi.org/10.1175/2007WAF2006116.1.

    • Search Google Scholar
    • Export Citation
  • Taggart, R., 2022: Evaluation of point forecasts for extreme events using consistent scoring functions. Quart. J. Roy. Meteor. Soc., 148, 306320, https://doi.org/10.1002/qj.4206.

    • Search Google Scholar
    • Export Citation
  • Taggart, R., N. Loveday, and D. Griffiths, 2022: A scoring framework for tiered warnings and multicategorical forecasts based on fixed risk measures. Quart. J. Roy. Meteor. Soc., 148, 13891406, https://doi.org/10.1002/qj.4266.

    • Search Google Scholar
    • Export Citation
  • Uden, D. M., M. S. Wandishin, P. Schlatter, and M. Kraus, 2023: Evaluation of probabilistic snow forecasts for winter weather operations at intermountain west airports. Wea. Forecasting, 38, 13411362, https://doi.org/10.1175/WAF-D-22-0170.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1997: Resampling hypothesis tests for autocorrelated fields. J. Climate, 10, 6582, https://doi.org/10.1175/1520-0442(1997)010<0065:RHTFAF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Elsevier, 676 pp.

  • Winkler, R. L., and A. H. Murphy, 1968: “Good” probability assessors. J. Appl. Meteor., 7, 751758, https://doi.org/10.1175/1520-0450(1968)007<0751:PA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 342 342 141
Full Text Views 219 219 85
PDF Downloads 254 254 102