Sample Stratification in Verification of Ensemble Forecasts of Continuous Scalar Variables: Potential Benefits and Pitfalls

Joseph Bellier Université Grenoble Alpes, Grenoble INP, CNRS, IRD, IGE, Grenoble, France

Search for other papers by Joseph Bellier in
Current site
Google Scholar
PubMed
Close
,
Isabella Zin Université Grenoble Alpes, Grenoble INP, CNRS, IRD, IGE, Grenoble, France

Search for other papers by Isabella Zin in
Current site
Google Scholar
PubMed
Close
, and
Guillaume Bontron Compagnie Nationale du Rhône, Lyon, France

Search for other papers by Guillaume Bontron in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

In the verification field, stratification is the process of dividing the sample of forecast–observation pairs into quasi-homogeneous subsets, in order to learn more on how forecasts behave under specific conditions. A general framework for stratification is presented for the case of ensemble forecasts of continuous scalar variables. Distinction is made between forecast-based, observation-based, and external-based stratification, depending on the criterion on which the sample is stratified. The formalism is applied to two widely used verification measures: the continuous ranked probability score (CRPS) and the rank histogram. For both, new graphical representations that synthesize the added information are proposed. Based on the definition of calibration, it is shown that the rank histogram should be used within a forecast-based stratification, while an observation-based stratification leads to significantly nonflat histograms for calibrated forecasts. Nevertheless, as previous studies have warned, statistical artifacts created by a forecast-based stratification may still occur, thus a graphical test to detect them is suggested. To illustrate potential insights about forecast behavior that can be gained from stratification, a numerical example with two different datasets of mean areal precipitation forecasts is presented.

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Isabella Zin, isabella.zin@univ-grenoble-alpes.fr

Abstract

In the verification field, stratification is the process of dividing the sample of forecast–observation pairs into quasi-homogeneous subsets, in order to learn more on how forecasts behave under specific conditions. A general framework for stratification is presented for the case of ensemble forecasts of continuous scalar variables. Distinction is made between forecast-based, observation-based, and external-based stratification, depending on the criterion on which the sample is stratified. The formalism is applied to two widely used verification measures: the continuous ranked probability score (CRPS) and the rank histogram. For both, new graphical representations that synthesize the added information are proposed. Based on the definition of calibration, it is shown that the rank histogram should be used within a forecast-based stratification, while an observation-based stratification leads to significantly nonflat histograms for calibrated forecasts. Nevertheless, as previous studies have warned, statistical artifacts created by a forecast-based stratification may still occur, thus a graphical test to detect them is suggested. To illustrate potential insights about forecast behavior that can be gained from stratification, a numerical example with two different datasets of mean areal precipitation forecasts is presented.

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Isabella Zin, isabella.zin@univ-grenoble-alpes.fr
Save
  • Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J. Climate, 9, 15181530, doi:10.1175/1520-0442(1996)009<1518:AMFPAE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bellier, J., I. Zin, S. Siblot, and G. Bontron, 2016: Probabilistic flood forecasting on the Rhone River: Evaluation with ensemble and analogue-based precipitation forecasts. E3S Web Conf. (FLOODrisk 2016), 7, 18011, doi:10.1051/e3sconf/20160718011.

    • Crossref
    • Export Citation
  • Ben Daoud, A., E. Sauquet, M. Lang, G. Bontron, and C. Obled, 2011: Precipitation forecasting through an analog sorting technique: A comparative study. Adv. Geosci., 29, 103107, doi:10.5194/adgeo-29-103-2011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ben Daoud, A., E. Sauquet, G. Bontron, C. Obled, and M. Lang, 2016: Daily quantitative precipitation forecasts based on the analogue method: Improvements and application to a French large river basin. Atmos. Res., 169, 147159, doi:10.1016/j.atmosres.2015.09.015.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bontron, G., 2004: Prévision quantitative des précipitations: Adaptation probabiliste par recherche d’analogues. Utilisation des réanalyses NCEP/NCAR et application aux précipitations du sud-est de la France (Quantitative precipitation forecasts: Probabilistic adaptation by analogues sorting. Use of the NCEP/NCAR reanalyses and application to the south-eastern France precipitations). Ph.D. thesis, Institut National Polytechnique Grenoble (INPG), 276 pp. [Available online at https://tel.archives-ouvertes.fr/tel-01090969/document.]

  • Bröcker, J., 2008: On reliability analysis of multi-categorical forecasts. Nonlinear Processes Geophys., 15, 661673, doi:10.5194/npg-15-661-2008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buizza, R., M. Milleer, and T. Palmer, 1999: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 125, 28872908, doi:10.1002/qj.49712556006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Candille, G., and O. Talagrand, 2005: Evaluation of probabilistic prediction systems for a scalar variable. Quart. J. Roy. Meteor. Soc., 131, 21312150, doi:10.1256/qj.04.71.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Candille, G., C. Côté, P. Houtekamer, and G. Pellerin, 2007: Verification of an ensemble prediction system against observations. Mon. Wea. Rev., 135, 26882699, doi:10.1175/MWR3414.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Casati, B., and Coauthors, 2008: Forecast verification: Current status and future directions. Meteor. Appl., 15, 318, doi:10.1002/met.52.

  • Efron, B., and R. J. Tibshirani, 1994: An Introduction to the Bootstrap. Chapman Hall/CRC Press, 456 pp.

    • Crossref
    • Export Citation
  • Elmore, K. L., 2005: Alternatives to the Chi-square test for evaluating rank histograms from ensemble forecasts. Wea. Forecasting, 20, 789795, doi:10.1175/WAF884.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and A. E. Raftery, 2007: Strictly proper scoring rules, prediction, and estimation. J. Amer. Stat. Assoc., 102, 359378, doi:10.1198/016214506000001437.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gneiting, T., and R. Ranjan, 2011: Comparing density forecasts using threshold-and quantile-weighted scoring rules. J. Bus. Econ. Stat., 29, 411422, doi:10.1198/jbes.2010.08110.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gneiting, T., F. Balabdaoui, and A. E. Raftery, 2007: Probabilistic forecasts, calibration and sharpness. J. Roy. Stat. Soc., 69B, 243268, doi:10.1111/j.1467-9868.2007.00587.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550560, doi:10.1175/1520-0493(2001)129<0550:IORHFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta-RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327, doi:10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta-RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126, 711724, doi:10.1175/1520-0493(1998)126<0711:EOEREP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 29052923, doi:10.1256/qj.06.25.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 32093229, doi:10.1175/MWR3237.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559570, doi:10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and D. B. Stephenson, 2003: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley and Sons, 254 pp.

  • Jolliffe, I. T., and C. Primo, 2008: Evaluating rank histograms using decompositions of the chi-square test statistic. Mon. Wea. Rev., 136, 21332139, doi:10.1175/2007MWR2219.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lerch, S., T. L. Thorarinsdottir, F. Ravazzolo, and T. Gneiting, 2017: Forecasters dilemma: Extreme events and forecast evaluation. Stat. Sci., 32, 106127, doi:10.1214/16-STS588.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marty, R., I. Zin, C. Obled, G. Bontron, and A. Djerboua, 2012: Toward real-time daily PQPF by an analog sorting approach: Application to flash-flood catchments. J. Appl. Meteor. Climatol., 51, 505520, doi:10.1175/JAMC-D-11-011.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Matheson, J. E., and R. L. Winkler, 1976: Scoring rules for continuous probability distributions. Manage. Sci., 22, 10871096, doi:10.1287/mnsc.22.10.1087.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Michelangeli, P.-A., R. Vautard, and B. Legras, 1995: Weather regimes: Recurrence and quasi stationarity. J. Atmos. Sci., 52, 12371256, doi:10.1175/1520-0469(1995)052<1237:WRRAQS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mullen, S. L., and R. Buizza, 2002: The impact of horizontal resolution and ensemble size on probabilistic forecasts of precipitation by the ECMWF ensemble prediction system. Wea. Forecasting, 17, 173191, doi:10.1175/1520-0434(2002)017<0173:TIOHRA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600, doi:10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1995: A coherent method of stratification within a general framework for forecast verification. Mon. Wea. Rev., 123, 15821588, doi:10.1175/1520-0493(1995)123<1582:ACMOSW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and E. S. Epstein, 1967: Verification of probabilistic predictions: A brief review. J. Appl. Meteor., 6, 748755, doi:10.1175/1520-0450(1967)006<0748:VOPPAB>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 13301338, doi:10.1175/1520-0493(1987)115<1330:AGFFFV>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murtagh, F., and P. Legendre, 2014: Wards hierarchical agglomerative clustering method: Which algorithms implement wards criterion? J. Classif., 31, 274295, doi:10.1007/s00357-014-9161-z.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Obled, C., G. Bontron, and R. Garçon, 2002: Quantitative precipitation forecasts: A statistical adaptation of model outputs through an analogues sorting approach. Atmos. Res., 63, 303324, doi:10.1016/S0169-8095(02)00038-8.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Park, Y.-Y., R. Buizza, and M. Leutbecher, 2008: TIGGE: Preliminary results on comparing and combining ensembles. Quart. J. Roy. Meteor. Soc., 134, 20292050, doi:10.1002/qj.334.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • R Development Core Team, 2014: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Available online at http://www.R-project.org/.]

  • Schaake, J., and Coauthors, 2007: Precipitation and temperature ensemble forecasts from single-value forecasts. Hydrol. Earth Syst. Sci. Discuss., 4, 655717, doi:10.5194/hessd-4-655-2007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Siegert, S., J. Bröcker, and H. Kantz, 2012: Rank histograms of stratified Monte Carlo ensembles. Mon. Wea. Rev., 140, 15581571, doi:10.1175/MWR-D-11-00302.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tabios, G. Q., and J. D. Salas, 1985: A comparative analysis of techniques for spatial interpolation of precipitation. J. Amer. Water Resour. Assoc., 21, 365380, doi:10.1111/j.1752-1688.1985.tb00147.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Talagrand, O., R. Vautard, and B. Strauss, 1997: Evaluation of probabilistic prediction systems. Proc. ECMWF Workshop on Predictability, Reading, United Kingdom, ECMWF, 1–26.

  • Teweles, S., and H. Wobus, 1954: Verification of prognostic charts. Bull. Amer. Meteor. Soc., 35, 455463.

  • Thorarinsdottir, T. L., T. Gneiting, and N. Gissibl, 2013: Using proper divergence functions to evaluate climate models. SIAM/ASA J. Uncertainty Quantif., 1, 522534, doi:10.1137/130907550.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vrac, M., and P. Yiou, 2010: Weather regimes designed for local precipitation modeling: Application to the Mediterranean basin. J. Geophys. Res., 115, D12103, doi:10.1029/2009JD012871.

    • Search Google Scholar
    • Export Citation
  • Yates, J. F., 1982: External correspondence: Decompositions of the mean probability score. Organ. Behav. Hum. Perform., 30, 132156, doi:10.1016/0030-5073(82)90237-9.

    • Crossref
    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 3093 2461 107
PDF Downloads 600 115 8