• Aspelien, T., , Iversen T. , , Bremnes J. B. , , and Frogner I.-L. , 2011: Short-range probabilistic forecasts from the Norwegian limited-area EPS: Long-term validation and a polar low study. Tellus, 63A, 564584, doi:10.1111/j.1600-0870.2010.00502.x.

    • Search Google Scholar
    • Export Citation
  • Bengtsson, L., , and Körnich H. , 2016: Impact of a stochastic parameterization of cumulus convection, using cellular automata, in a mesoscale ensemble prediction system. Quart. J. Roy. Meteor. Soc., 142, 11501159, doi:10.1002/qj.2720.

    • Search Google Scholar
    • Export Citation
  • Berner, J., , Fossell K. R. , , Ha S.-Y. , , Hacker J. P. , , and Snyder C. , 2015: Increasing the skill of probabilistic forecasts: Understanding performance improvements from model-error representations. Mon. Wea. Rev., 143, 12951320, doi:10.1175/MWR-D-14-00091.1.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , Etherton B. J. , , and Majumdar S. J. , 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, doi:10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Bouttier, F., , Raynaud L. , , Nuissier O. , , and Ménétrier B. , 2015: Sensitivity of the AROME ensemble to initial and surface perturbations during HyMeX. Quart. J. Roy. Meteor. Soc., 142, 390403, doi:10.1002/qj.2622.

    • Search Google Scholar
    • Export Citation
  • Bowler, N. E., , Arribas A. , , Mylne K. R. , , Robertson K. B. , , and Beare S. E. , 2008: The MOGREPS short-range ensemble prediction system. Quart. J. Roy. Meteor. Soc., 134, 703722, doi:10.1002/qj.234.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., , and Palmer T. , 1995: The singular-vector structure of the atmospheric global circulation. J. Atmos. Sci., 52, 14341456, doi:10.1175/1520-0469(1995)052,1434:SVSOT.2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., , Miller M. , , and Palmer T. , 1999: Stochastic representation of model uncertainties in the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 125, 28872908, doi:10.1002/qj.49712556006.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., , Leutbecher M. , , and Isaksen L. , 2008: Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 134, 20512066, doi:10.1002/qj.346.

    • Search Google Scholar
    • Export Citation
  • Candille, G., , Côté C. , , Houtekamer P. L. , , and Pellerin G. , 2007: Verification of an ensemble prediction system against observations. Mon. Wea. Rev., 135, 26882699, doi:10.1175/MWR3414.1.

    • Search Google Scholar
    • Export Citation
  • Christensen, H. M., , Moroz I. M. , , and Palmer T. N. , 2015: Stochastic and perturbed parameter representations of model uncertainty in convection parameterization. J. Atmos. Sci., 72, 25252544, doi:10.1175/JAS-D-14-0250.1.

    • Search Google Scholar
    • Export Citation
  • Craig, G. C., , and Cohen B. G. , 2006: Fluctuations in an equilibrium convective ensemble. Part I: Theoretical formulation. J. Atmos. Sci., 63, 19962004, doi:10.1175/JAS3709.1.

    • Search Google Scholar
    • Export Citation
  • Delle Monache, L., , Nipen T. , , Liu Y. , , Roux G. , , and Stull R. , 2011: Kalman filter and analog schemes to postprocess numerical weather predictions. Mon. Wea. Rev., 139, 35543570, doi:10.1175/2011MWR3653.1.

    • Search Google Scholar
    • Export Citation
  • Du, J., , and Tracton M. , 2001: Implementation of a real-time short range ensemble forecasting system at NCEP: An update. Preprints, Ninth Conf. on Mesoscale Processes, Fort Lauderdale, FL, Amer. Meteor. Soc., P4.9. [Available online at https://ams.confex.com/ams/pdfpapers/23074.pdf.]

  • Eckel, F. A., , and Mass C. F. , 2005: Aspects of effective mesoscale, short-range ensemble forecasting. Wea. Forecasting, 20, 328350, doi:10.1175/WAF843.1.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343367, doi:10.1007/s10236-003-0036-9.

    • Search Google Scholar
    • Export Citation
  • Fraley, C., , Raftery A. E. , , and Gneiting T. , 2010: Calibrating multimodel forecast ensembles with exchangeable and missing members using Bayesian model averaging. Mon. Wea. Rev., 138, 190202, doi:10.1175/2009MWR3046.1.

    • Search Google Scholar
    • Export Citation
  • Frogner, I.-L., , and Iversen T. , 2002: High-resolution limited-area ensemble predictions based on low resolution targeted singular vectors. Quart. J. Roy. Meteor. Soc., 128, 13211341, doi:10.1256/003590002320373319.

    • Search Google Scholar
    • Export Citation
  • Frogner, I.-L., , Haakenstad H. , , and Iversen T. , 2006: Limited-area ensemble predictions at the Norwegian Meteorological Institute. Quart. J. Roy. Meteor. Soc., 132, 27852808, doi:10.1256/qj.04.178.

    • Search Google Scholar
    • Export Citation
  • García-Moya, J.-A., , Callado A. , , Escribà P. , , Santos C. , , Santos-Muñoz D. , , and Simarro J. , 2011: Predictability of short-range forecasting: A multimodel approach. Tellus, 63A, 550563, doi:10.1111/j.1600-0870.2010.00506.x.

    • Search Google Scholar
    • Export Citation
  • Gebhardt, C., , Theis S. , , Paulat M. , , and Ben Bouallègue Z. , 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model perturbations and variation of lateral boundaries. Atmos. Res., 100, 168177, doi:10.1016/j.atmosres.2010.12.008.

    • Search Google Scholar
    • Export Citation
  • Gerard, L., , and Geleyn J.-F. , 2005: Evolution of a subgrid deep convection parameterization in a limited-area model with increasing resolution. Quart. J. Roy. Meteor. Soc., 131, 22932312, doi:10.1256/qj.04.72.

    • Search Google Scholar
    • Export Citation
  • Gerard, L., , Piriou J.-M. , , Brožková R. , , Geleyn J.-F. , , and Banciu D. , 2009: Cloud and precipitation parameterization in a meso-gamma-scale operational weather prediction model. Mon. Wea. Rev., 137, 39603977, doi:10.1175/2009MWR2750.1.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., , Raftery A. E. , , Westveld A. H. III, , and Goldman T. , 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, doi:10.1175/MWR2904.1.

    • Search Google Scholar
    • Export Citation
  • Grimit, E. P., , and Mass C. F. , 2002: Initial results of a mesoscale short-range ensemble forecasting system over the Pacific Northwest. Wea. Forecasting, 17, 192205, doi:10.1175/1520-0434(2002)017<0192:IROAMS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., , Buizza R. , , Hamill T. , , Leutbecher M. , , and Palmer T. , 2012: Comparing TIGGE multimodel forecasts with reforecast calibrated ECMWF ensemble forecasts. Quart. J. Roy. Meteor. Soc., 138, 18141827, doi:10.1002/qj.1895.

    • Search Google Scholar
    • Export Citation
  • Holton, J. R., 2004: An Introduction to Dynamic Meteorology. 4th ed. Elsevier Science, 535 pp.

  • Homleid, M., , and Tveter F. T. , 2016: Verification of operational weather prediction models September to November 2015. METInfo Rep. 16/2016, Norwegian Meteorological Institute, 74 pp. [Available online at http://met.no/Forskning/Publikasjoner/MET_info/filestore/Verification_report_201509-2015111.pdf.]

  • Hunt, B. R., , Kostelich E. J. , , and Szunyogh I. , 2007: Efficient data assimilation of spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112126, doi:10.1016/j.physd.2006.11.008.

    • Search Google Scholar
    • Export Citation
  • Isaksen, L., , Bonavita M. , , Buizza R. , , Fisher M. , , Haseler J. , , Leutbecher M. , , and Raynaud L. , 2010: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 636, 45 pp. [Available online at http://www.ecmwf.int/sites/default/files/elibrary/2010/10125-ensemble-data-assimilations-ecmwf.pdf.]

  • Iversen, T., , Deckmyn A. , , Santos C. , , Sattler K. , , Bremnes J. B. , , Feddersen H. , , and Frogner I.-L. , 2011: Evaluation of ‘GLAMEPS’—A proposed multimodel EPS for short range forecasting. Tellus, 63A, 513530, doi:10.1111/j.1600-0870.2010.00507.x.

    • Search Google Scholar
    • Export Citation
  • Johnson, C., , and Swinbank R. , 2009: Medium-range multimodel ensemble combination and calibration. Quart. J. Roy. Meteor. Soc., 135, 777794, doi:10.1002/qj.383.

    • Search Google Scholar
    • Export Citation
  • Junk, C., , Späth S. , , von Bremen L. , , and Delle Monache L. , 2015: Comparison and combination of regional and global ensemble prediction systems for probabilistic predictions of hub-height wind speed. Wea. Forecasting, 30, 12341253, doi:10.1175/WAF-D-15-0021.1.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., , and Palmer T. , 2008: Ensemble forecasting. J. Comput. Phys., 227, 35153539, doi:10.1016/j.jcp.2007.02.014.

  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, doi:10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

  • Marsigli, C., , Boccanera F. , , Montani A. , , and Paccagnella T. , 2005: The COSMO-LEPS mesoscale ensemble system: Validation of the methodology and verification. Nonlinear Processes Geophys., 12, 527536, doi:10.5194/npg-12-527-2005.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., , Ovens D. , , Westrick K. , , and Colle B. A. , 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407430, doi:10.1175/1520-0477(2002)083<0407:DIHRPM>2.3.CO;2.

    • Search Google Scholar
    • Export Citation
  • Masson, V., , and Seity Y. , 2009: Including atmospheric layers in vegetation and urban offline surface schemes. J. Appl. Meteor. Climatol., 48, 13771397, doi:10.1175/2009JAMC1866.1.

    • Search Google Scholar
    • Export Citation
  • Molteni, F., , Buizza R. , , Palmer T. N. , , and Petroliagis T. , 1996: The ECMWF Ensemble Prediction System: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122, 73119, doi:10.1002/qj.49712252905.

    • Search Google Scholar
    • Export Citation
  • Montani, A., , Cesari D. , , Marsigli C. , , and Paccagnella T. , 2011: Seven years of activity in the field of mesoscale ensemble forecasting by the COSMO-LEPS system: Main achievements and open challenges. Tellus, 63A, 605624, doi:10.1111/j.1600-0870.2010.00499.x.

    • Search Google Scholar
    • Export Citation
  • Murphy, A., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600, doi:10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Niemelä, S., , Näsman S. , , and Nurmi P. , 2014: FROST-2014—Performance of Harmonie 1 km during Sochi Olympics. ALADIN-HIRLAM Newsletter, No. 3, 79–86. [Available online at http://hirlam.org/index.php/hirlam-documentation/cat_view/77-hirlam-official-publications/285-aladin-hirlam-newsletters.]

  • Nipen, T., , West G. , , and Stull R. , 2011: Updating short-term probabilistic weather forecasts of continuous variables using recent observations. Wea. Forecasting, 26, 564571, doi:10.1175/WAF-D-11-00022.1.

    • Search Google Scholar
    • Export Citation
  • Palmer, T., 1993: Extended-range atmospheric prediction and the Lorenz model. Bull. Amer. Meteor. Soc., 74, 4965, doi:10.1175/1520-0477(1993)074<0049:ERAPAT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Park, Y.-Y., , Buizza R. , , and Leutbecher M. , 2008: TIGGE: Preliminary results on comparing and combining ensembles. Quart. J. Roy. Meteor. Soc., 134, 20292050, doi:10.1002/qj.334.

    • Search Google Scholar
    • Export Citation
  • Peralta, C., , Bouallègue Z. , , Theis S. , , Gebhardt C. , , and Buchhold M. , 2012: Accounting for initial condition uncertainties in COSMODE-EPS. J. Geophys. Res., 117, D07108, doi:10.1029/2011JD016581.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., , Gneiting T. , , Balabdaoui F. , , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174, doi:10.1175/MWR2906.1.

    • Search Google Scholar
    • Export Citation
  • Romine, G. S., , Schwartz C. S. , , Berner J. , , Fossell K. R. , , Snyder C. , , Anderson J. L. , , and Weisman M. L. , 2014: Representing forecast error in a convection-permitting ensemble system. Mon. Wea. Rev., 142, 45194541, doi:10.1175/MWR-D-14-00100.1.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., , Thorarinsdottir T. L. , , and Gneiting T. , 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616640, doi:10.1214/13-STS443.

    • Search Google Scholar
    • Export Citation
  • Seity, Y., , Brosseau P. , , Malardel S. , , Hello G. , , Benard P. , , Bouttier F. , , Lac C. , , and Masson V. , 2011: The AROME-France convective-scale operational model. Mon. Wea. Rev., 139, 976991, doi:10.1175/2010MWR3425.1.

    • Search Google Scholar
    • Export Citation
  • Shutts, G. J., 2005: A kinetic energy backscatter algorithm for use in ensemble prediction systems. Quart. J. Roy. Meteor. Soc., 131, 30793102, doi:10.1256/qj.04.106.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M., , Raftery A. E. , , Gneiting T. , , and Fraley C. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220, doi:10.1175/MWR3441.1.

    • Search Google Scholar
    • Export Citation
  • Thorarinsdottir, T. L., , and Gneiting T. , 2010: Probabilistic forecasts of wind speed: Ensemble model statistics by using heteroscedastic censored regression. J. Roy. Stat. Soc., 173A, 371388, doi:10.1111/j.1467-985X.2009.00616.x.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., , and Kalnay E. , 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74, 23172330, doi:10.1175/1520-0477(1993)074<2317:EFANTG>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., , and Kalnay E. , 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 32973319, doi:10.1175/1520-0493(1997)125<3297:EFANAT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wang, Y., and Coauthors, 2011: The central European limited-area ensemble forecasting system: ALADIN-LAEF. Quart. J. Roy. Meteor. Soc., 137, 483502, doi:10.1002/qj.751.

    • Search Google Scholar
    • Export Citation
  • Wang, Y., , Tascu S. , , Weidle F. , , and Schmeisser K. , 2012: Evaluation of the added value of regional ensemble forecasts on global ensemble forecasts. Wea. Forecasting, 27, 972987, doi:10.1175/WAF-D-11-00102.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences: An Introduction. 2nd ed. Elsevier Science, 627 pp.

  • Yang, X., 2008: Status of the reference system. HIRLAM Newsletter, No. 55, 202–203. [Available online at http://hirlam.org/index.php/component/docman/doc_view/152-hirlam-newsletter-no-54-paper-27-yang?Itemid=70.]

  • View in gallery

    (top) GLAMEPS integration area (black) and HarmonEPS integration area (blue). (bottom) Zoom-in of the Sochi area; the observations used for verification are marked with the colored points. The mountain cluster is outlined by the red circle. (From Google maps.)

  • View in gallery

    CRPS for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red): (top left) Pmsl (hPa), (top right) T2m (°C), (bottom left) S10m (m s−1), and (bottom right) AccPcp3h (mm). Note that for CRPS a perfect score is zero.

  • View in gallery

    As in Fig. 2, but for spread (dashed) and skill (solid) for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red).

  • View in gallery

    Area under ROC curve for 24-h lead time as a function of the threshold for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red): (top left) T2m, (top right) S10m, (bottom left) AccPcp3h, and (bottom right) AccPcp12h. Perfect score is 1.

  • View in gallery

    Decomposed BS as a function of lead time for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red). Reliability is shown as a solid line, resolution as a dashed line, and uncertainty as a dotted line. Shown are (top left) T2m, 0°C threshold; (top right) S10m, 4 m s−1 threshold; (bottom left) AccPcp3h, 0.1-mm threshold; and (bottom right) AccPcp12h 0.3-mm threshold.

  • View in gallery

    CRPS for full GLAMEPS (blue), GLAMEPS26 (red), and GLAMEPS28 (black): (top left) T2m (°C), (top right) S10m (m s−1), (bottom left) AccPcp3h (mm), and (bottom right) AccPcp12h (mm). Note that for CRPS a perfect score is zero.

  • View in gallery

    CRPS for HarmonEPS with 13 members (blue) and HarmonEPS with only one model (black and yellow): (top left) Pmsl (hPa), (top right) T2m (°C), (bottom left) S10m (m s−1), and (bottom right) AccPcp3h (mm).

  • View in gallery

    Spread (dashed) and skill (solid) for GLAMEPS (blue), calibrated GLAMEPS (green), calibrated GLAMEPS28 (orange), and calibrated GLAMEPS26 (yellow): (top left) T2m (°C), (top right) S10m (m s−1), (bottom left) AccPcp3h (mm), and (bottom right) AccPcp12h (mm).

  • View in gallery

    As in Fig. 8, but for HarmonEPS (red), calibrated HarmonEPS (purple), and calibrated GLAMEPS26 (yellow).

  • View in gallery

    BS (perfect score 0) for GLAMEPS (blue), HarmonEPS (red), calibrated GLAMEPS (green), calibrated GLAMEPS28 (orange), calibrated GLAMEPS26 (yellow), and calibrated HarmonEPS (purple): (top left) T2m for 0°C threshold, (top right) S10m for 3 m s−1 threshold, (bottom left) AccPcp3h for 0.1-mm threshold, and (bottom right) AccPcp12h for 0.1-mm threshold.

  • View in gallery

    Temperature forecast from 1800 UTC 10 Mar 2014 at Roza Khutor at 2320-m height for GLAMEPS, calibrated GLAMEPS, HarmonEPS (AROME only), and calibrated HarmonEPS. The following percentiles are shown: red, 25th–75th; orange, 10th–90th; and yellow, the full range. The solid black line is the ensemble mean, and the stars are observations.

  • View in gallery

    As in Fig. 11, but for the 10-m wind speed forecast at 1800 UTC 16 Mar 2014 at Roza Khutor.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 98 98 18
PDF Downloads 23 23 2

Ensemble Prediction with Different Spatial Resolutions for the 2014 Sochi Winter Olympic Games: The Effects of Calibration and Multimodel Approaches

View More View Less
  • 1 Norwegian Meteorological Institute, Oslo, Norway
© Get Permissions
Full access

Abstract

Three ensemble prediction systems (EPSs) with different grid spacings are compared and evaluated with respect to their ability to predict wintertime weather in complex terrain. The experiment period was two-and-a-half winter months in 2014, coinciding with the Forecast and Research in the Olympic Sochi Testbed (FROST) project, which took place during the Winter Olympic Games in Sochi, Russia. The global, synoptic-scale ensemble system used is the IFS ENS from the European Centre for Medium-Range Weather Forecasts (ECMWF), and its performance is compared with both the operational pan-European Grand Limited Area Ensemble Prediction System (GLAMEPS) at 11-km horizontal resolution and the experimental regional convection-permitting HIRLAM–ALADIN Regional Mesoscale Operational NWP in Europe (HARMONIE) EPS (HarmonEPS) at 2.5 km. Both GLAMEPS and HarmonEPS are multimodel systems, and it is seen that a large part of the skill in these systems comes from the multimodel approach, as long as all subensembles are performing reasonably. The number of members has less impact on the overall skill measurement. The relative importance of resolution and calibration is also assessed. Statistical calibration was applied and evaluated. In contrast to what is seen for the raw ensembles, the number of members, as well as the number of subensembles, is important for the calibrated ensembles. HarmonEPS shows greater potential than GLAMEPS for predicting wintertime weather, and also has an advantage after calibration.

Denotes Open Access content.

Publisher’s Note: This article was revised on 21 November 2016 to make it available as open access.

Corresponding author address: Inger-Lise Frogner, Norwegian Meteorological Institute, P.O. Box 43, Blindern, N-0313 Oslo, Norway. E-mail: i.l.frogner@met.no

Abstract

Three ensemble prediction systems (EPSs) with different grid spacings are compared and evaluated with respect to their ability to predict wintertime weather in complex terrain. The experiment period was two-and-a-half winter months in 2014, coinciding with the Forecast and Research in the Olympic Sochi Testbed (FROST) project, which took place during the Winter Olympic Games in Sochi, Russia. The global, synoptic-scale ensemble system used is the IFS ENS from the European Centre for Medium-Range Weather Forecasts (ECMWF), and its performance is compared with both the operational pan-European Grand Limited Area Ensemble Prediction System (GLAMEPS) at 11-km horizontal resolution and the experimental regional convection-permitting HIRLAM–ALADIN Regional Mesoscale Operational NWP in Europe (HARMONIE) EPS (HarmonEPS) at 2.5 km. Both GLAMEPS and HarmonEPS are multimodel systems, and it is seen that a large part of the skill in these systems comes from the multimodel approach, as long as all subensembles are performing reasonably. The number of members has less impact on the overall skill measurement. The relative importance of resolution and calibration is also assessed. Statistical calibration was applied and evaluated. In contrast to what is seen for the raw ensembles, the number of members, as well as the number of subensembles, is important for the calibrated ensembles. HarmonEPS shows greater potential than GLAMEPS for predicting wintertime weather, and also has an advantage after calibration.

Denotes Open Access content.

Publisher’s Note: This article was revised on 21 November 2016 to make it available as open access.

Corresponding author address: Inger-Lise Frogner, Norwegian Meteorological Institute, P.O. Box 43, Blindern, N-0313 Oslo, Norway. E-mail: i.l.frogner@met.no

1. Introduction

The atmosphere is inherently chaotic, and the theoretical basis for this was realized by Lorenz in the early 1960s (Lorenz 1963). As a consequence, a single deterministic forecast only represents one of many possible realizations. An ensemble prediction system (EPS) is a tool for assessing the many realizations of the atmosphere. EPSs at different institutes have been operational for many years, and global EPSs became operational about two decades ago (Toth and Kalnay 1993; Molteni et al. 1996). Limited area ensemble systems have been run operationally for about 10 years (e.g., Du and Tracton 2001; Marsigli et al. 2005; Frogner et al. 2006; Bowler et al. 2008; Iversen et al. 2011; Aspelien et al. 2011; Wang et al. 2011; García-Moya et al. 2011). More recently, ensemble prediction systems have also been introduced at convection-permitting scales (e.g., Gebhardt et al. 2011; Peralta et al. 2012, Romine et al. 2014; Bouttier et al. 2015).

Several methods and techniques exist to account for the inherent uncertainties in the initial conditions, boundary conditions, surface, and model physics. For initial state perturbations for example, the singular vector method (Buizza and Palmer 1995), the breeding of growing modes (Toth and Kalnay 1993, 1997), ensemble Kalman filtering (EnKF; Evensen 2003) and its different flavors (e.g., Bishop et al. 2001; Hunt et al. 2007), and ensembles of data assimilations (EDAs; e.g., Buizza et al. 2008) have been tried. Surface perturbations have received more attention with the introduction of EPSs at convection-permitting scales (e.g., Bouttier et al. 2015), and the importance of lateral boundary perturbations was demonstrated in the work of Frogner and Iversen (2002) and more recently in Romine et al. (2014). Stochastic perturbations are one way to account for forecast model uncertainties, and different schemes with different complexities exist. Examples include perturbing the tunable parameters in the parameterizations (Christensen et al. 2015), stochastic model error schemes like the stochastically perturbed physics tendencies scheme (SPPT; Buizza et al. 1999) and energy backscatter schemes [e.g., the stochastic kinetic energy backscatter scheme (SKEBS; Shutts 2005)], and fully stochastic parameterizations (e.g., Craig and Cohen 2006; Bengtsson and Körnich, 2016). Another approach involves multiphysics, or multimodels, in which different parameterization schemes within one model (Wang et al. 2011), or different models (Iversen et al. 2011), are used to create the ensemble. Several authors have investigated whether a multimodel ensemble can perform better than a well-calibrated single-model ensemble (Park et al. 2008; Johnson and Swinbank 2009; Fraley et al. 2010; Hagedorn et al. 2012), and they have shown that global multimodel ensembles can outperform the best-calibrated single-model ensemble. For limited area ensembles the same conclusion holds (Iversen et al. 2011; Berner et al. 2015).

In this paper, three ensemble prediction systems with different spatial resolutions are used for forecasting winter weather in complex terrain. Eckel and Mass (2005), Frogner et al. (2006), Bowler et al. (2008), Montani et al. (2011), and Wang et al. (2012) also compare ensemble prediction systems on different scales, but their work does not include a system at convection-permitting scales, which is the case in this paper. Junk et al. (2015) compare, as this study does, an ensemble prediction system on three scales, but their focus is solely on 100-m wind speed forecasts. In this study we focus on wintertime complex terrain. Grimit and Mass (2002) showed that slightly different realizations of the synoptic-scale flow over the complex terrain of the Pacific Northwest region of the United States could lead to large variations in mesoscale forecasts, which shows the potential for ensemble prediction in complex topography.

This study is conducted within the context of the Forecast and Research in the Olympic Sochi Testbed (FROST) project, which took place in connection with the XXII Olympic and XI Paralympic Winter Games held in Sochi, Russia, from 7 to 23 February and from 7 to 16 March 2014, respectively. The test period used in this study is longer than the duration of the games, from 15 January to 31 March 2014. The Olympic sites were separated into two clusters: a coastal cluster along the Black Sea and a mountain cluster located some 45 km away from the Black Sea coast at Krasnaya Polyana. This complex topography, stretching from the Black Sea up to mountains with peaks over 2000 m, is a serious challenge for any weather forecast model. However, a global EPS on synoptic scales can better describe the overall weather situation than could a much smaller domain system on convection-permitting scales (Homleid and Tveter 2016). For this area (Fig. 1) and period, an extensive observation program was established and many deterministic and probabilistic forecasts from different providers were made available to the Olympic forecasters (Kiktev et al. 2016, manuscript submitted to Bull. Amer. Meteor. Soc.). The High Resolution Limited Area Model (HIRLAM) consortium was one contributor to the task of probabilistic forecasting, providing as part of the Forecast Demonstration Projects (FDPs) the Grand Limited Area Ensemble Prediction system (GLAMEPS) at 11-km horizontal resolution, which is a coproduction with Aire Limitée Adaptation Dynamique Developpement International (ALADIN), and the HIRLAM–ALADIN Regional Mesoscale Operational NWP in Europe system (HARMONIE) EPS (HarmonEPS) at 2.5-km horizontal resolution as part of the Research Development Projects (RDPs).

Fig. 1.
Fig. 1.

(top) GLAMEPS integration area (black) and HarmonEPS integration area (blue). (bottom) Zoom-in of the Sochi area; the observations used for verification are marked with the colored points. The mountain cluster is outlined by the red circle. (From Google maps.)

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

GLAMEPS and HarmonEPS are compared to the global ensemble prediction system from ECMWF (IFS ENS) at 32 km, and we also calibrate them to assess the relative importance of resolution and calibration. Both GLAMEPS and HarmonEPS are multimodel ensemble systems, and we assess the effect calibration has on multimodel systems. Running an ensemble prediction system is expensive, while calibration is much cheaper in terms of computing cost. The question is asked whether we can achieve the same level of skill by calibrating a coarser ensemble prediction system as compared to running a system with higher resolution. Ensembles are often given a probabilistic interpretation. However, when validated against site measurements, weaknesses with respect to reliability are often present. These are due to model biases, spread deficiencies, and model variables not being exactly representative of the local measurements. The latter may especially be evident in complex terrain. The lack of reliability can be overcome by using statistical methods that model the relation between the measurements and the ensemble output. Given a representative training/learning set, these will in theory ensure well-calibrated probabilistic forecasts if properly fitted.

The three ensemble systems are described in section 2 together with the calibration method and the experimental setup. Results are presented in section 3, while discussion and our conclusions are offered in section 4.

2. Methodology

In this section, the three ensemble prediction systems are briefly described.

a. IFS ENS

The ensemble prediction system at ECMWF, IFS ENS, became operational in 1992 and since that time has been continually upgraded, resulting in improved performance (Palmer 1993; Buizza and Palmer 1995; Molteni et al. 1996; Buizza et al. 1999; Leutbecher and Palmer 2008, Isaksen et al. 2010). During FROST, the ensemble consisted of 50 perturbed members and a control forecast and was run operationally twice daily at 0000 and 1200 UTC to a lead time of 15 days. The horizontal resolution was T639, which corresponds to approximately 32 km, and there were 91 levels in the vertical. Initial condition uncertainty was estimated by a combination of singular vectors (Buizza and Palmer 1995) and EDA perturbations (Buizza et al. 2008). For describing the model uncertainty two schemes were applied: SPPT (Buizza et al. 1999) and SKEBS (Shutts 2005).

b. GLAMEPS

GLAMEPS is used for operational production as a part of the cooperation between two European consortia, HIRLAM and ALADIN, for short-range NWP. Its purpose is to predict atmospheric features at intermediate spatial scales between the synoptic, covered by leading global EPSs, and convection-permitting scales. Thus, the system has been constructed to be a well-calibrated, pan-European ensemble for short-range NWP that accounts for both initial state and model inaccuracies, as well as lateral boundary uncertainty. During FROST, version 1 (v1) of GLAMEPS, with a horizontal resolution of 11 km, was operational. GLAMEPS is a multimodel ensemble; thus, model uncertainties are taken into account by using a small number of different subensemble models and versions. For GLAMEPS v1 there are a total of 54 members consisting of two versions of the HIRLAM model (Yang 2008), each with 12 perturbed members and one control; one version of the ALADIN and Application de la Recherche à l’Opérationnel à Meso-Echelle combined model (ALARO; Gerard and Geleyn 2005; Gerard et al. 2009), also with 12 perturbed members and one control; 14 perturbed members of IFS ENS; and the high-resolution ECMWF deterministic model as a final member. In this way all 50 + 1 members from IFS ENS are used in GLAMEPS, either directly or indirectly, as boundary and initial condition perturbations. Initial state uncertainties are taken into account in two ways: ensemble perturbations are imported from the global ECMWF 51-member IFS ENS, where the difference between IFS ENS member n (IFS_n) and the IFS ENS control (IFS_c) is added to the control members of GLAMEPS (A_c) to produce the perturbed initial state of member n (IC_n):
e1
where A_c is produced from data assimilation except for the subensemble based on ALARO, which is derived from pure downscaling.

Additional initial state perturbations are included by running two different assimilation cycles in parallel with different models and model versions. IFS ENS also provides perturbations at the lateral boundaries during the prediction period. Contrary to the IFS ENS, GLAMEPS runs main forecasts at 0600 and 1800 UTC (all members to +54 h). At 0000 and 1200 UTC only the controls (which perform upper-air assimilation) are run, up to +6 h. At every cycle, boundaries (and thus initial perturbations interpolated from IFS ENS) are typically 6 or 12 h old, since the same cycle IFS data are not yet available. This means that even if GLAMEPS also performs assimilation at 0000 and 1200 UTC, it is never reusing observations assimilated by the IFS ENS system. All of the LAM members (i.e., the HIRLAM and ALARO members of GLAMEPS) also run with a separate data assimilation cycling for the ground surface, yielding a unique surface analysis for each ensemble member. All subensembles are interpolated to a common grid (the HIRLAM grid). Note that GLAMEPS v1 used in this study has since been replaced with a new version (v2) with a slightly different configuration of subensemble models and increased spatial resolution. The GLAMEPS integration area is presented in Fig. 1.

c. HarmonEPS

HarmonEPS is the ensemble configuration of HARMONIE. This system is intended to give reliable predictions of probabilities for high-impact weather events that are confined in space and time by mesoscale dynamical structures, as well as orographic and other fine-scaled structures. While HarmonEPS in principle could include any model from the HARMONIE system, for the convection-permitting scale (2.5 km) the candidate models (or parameterization schemes) are Application de la Recherche à l’Opérationnel à Meso-Echelle (AROME; Seity et al. 2011) and ALARO (Gerard and Geleyn 2005; Gerard et al. 2009). In AROME, deep convection is assumed to be sufficiently well resolved to not be parameterized, but shallow convection is parameterized. On the other hand, ALARO is designed to be run in the “gray zone,” and thus both shallow and deep convection are parameterized, using a multiscale approach.

During FROST, HarmonEPS was run in real time with a setup consisting of 12 perturbed members and a control, all with AROME physics. However, the period was later rerun with ALARO (version 0) physics, again with 12 perturbed members and a control. Lateral boundary conditions came from the control and the 12 first members of the IFS ENS (at 32-km horizontal resolution), and the same 12 members were used for AROME and ALARO. The AROME and ALARO control members used their own three-dimensional variational data assimilation (3DVAR) methods, while all members used their own separate surface assimilation (without perturbing observations). For the initial conditions of the upper-air atmosphere, the perturbed members also used the control analysis as their basis, but added the boundary perturbations at the initial time (interpolated) from the corresponding member of IFS ENS on top of the control analysis; this is the same method as is applied in GLAMEPS [see Eq. (1)]. Also like GLAMEPS, main forecasts (to +36 h) were run at 0600 and 1800 UTC. At 0000 and 1200 UTC only the control members were run, up to +6 h, to produce a first guess for the next main cycle.

d. Calibration method

Observations available at 31 stations (Fig. 1) within the domain were used to statistically calibrate the model output in order to reduce biases. Calibration was only done for the 31 point locations and not the entire grid. For each of the locations, the raw forecasts were interpolated and corrected based on past observations and forecasts for that point. Each location, forecast lead time, and initialization time was calibrated separately. Calibration aimed at improving the ensemble forecasts through a three-stage approach: 1) removing biases in the central tendency of the ensemble, 2) adjusting the spread of the ensemble to match the uncertainty of the forecasts, and 3) adjusting the distribution by incorporating information from the most recent observation. The stages used were dependent on the forecast variable.

Calibration of temperature employed all three stages. The temperature ensemble was first bias corrected using an adaptive scheme. The systematic bias B at time t (Bt) was computed adaptively using the previously estimated bias on the day before (Bt−1):
e2
where ft−1 is the ensemble mean and ot−1 the corresponding observation. This correction was subtracted from each ensemble member. The scheme weights the most recently measured bias by 1/5 and the previous bias estimate by 4/5. This factor was chosen by trial and error and was a compromise between ensuring that the bias adapts to changing biases while filtering out potentially noisy measurements.
The corrected ensemble was then replaced by a Gaussian distribution centered on the ensemble mean and where the variance of the distribution σt2 was adaptively set to
e3
This means that the spread of the ensemble was not used in the calibration. We found that a much slower varying factor (29/30) was needed to adapt the variance of the distribution. Finally, the forecasts were updated every hour, as new observations became available, using the approach of Nipen et al. (2011). This method assumes that the percentile the observation falls in with respect to the forecast distribution is highly correlated in time. The method therefore reduces the error of the ensemble mean and the spread of the forecasts for the first six or so lead times. The final output ensemble was then regenerated from this distribution by sampling evenly spaced quantiles.
Calibration of wind speed employed stages 1 and 2. Instead of using an additive bias [Eq. (2)], a multiplicative bias scheme was used. That is, each raw ensemble member was multiplied by a factor Gt equal to
e4
where
e5
e6
The same factor was used to scale all ensemble members. This corrected the ensemble’s systematic under- or overprediction of wind speeds. After the correction, the same Gaussian approach as for temperature was used, except that the distribution was truncated at 0 m s−1, in a similar fashion to that of Thorarinsdottir and Gneiting (2010). Stage 3 was not employed since the temporal correlation of verifying quantiles was not strong.

For precipitation, we only used a multiplicative bias scheme (stage 1) and no further calibration.

e. Verification metrics

The relative performances of the ensemble systems are assessed using standard objective verification scores. Ensemble forecasts are compared with in situ observations for the same parameters from surface synoptic observation (SYNOP) stations that are common to all domains, as well as stations that were specially installed as part of the FROST project. In total, 31 observation sites (Fig. 1) were used to compute verification scores. Three-hourly observations of 2-m temperature (T2m), 10-m wind speed (S10m), mean sea level pressure (Pmsl), and 3- and 12-h accumulated precipitation (AccPcp3h and AccPcp12h) are compared with forecasts for the same parameters from each ensemble horizontally interpolated to the station locations using a four-point bilinear interpolation method. In addition, a simple height correction, using a moist-adiabatic lapse rate of 6.5 K km−1 (Holton, 2004), from model elevation to station elevation, is applied to T2m for the raw (uncalibrated) ensemble forecasts.

The following scores, all of which are described in detail by Wilks (2006) and references therein, are used to assess model performance.

  • The continuous rank probability score (CRPS) is a score that summarizes the overall accuracy of the ensemble by comparing a cumulative distribution function computed from all ensemble members with the observed value. The closer the CRPS is to zero, the better the ensemble.
  • The spread–skill relationship compares the spread of an ensemble, computed as the standard deviation of all ensemble members about the ensemble mean, with the skill of the ensemble represented as the RMS error of the ensemble mean compared to the observed value. For a perfect ensemble, the spread should be equal to the skill.
  • The relative operating characteristic (ROC) is a measure of the decision-making skill of an ensemble for a given threshold. ROC curves are constructed by plotting the probability of detection (POD) as a function of the false alarm rate (FAR) for each probability bin. Here, we summarize the ROC over all probabilities for a given threshold by computing the area under the ROC curve. The closer the area under the ROC curve is to 1, the more skillful the ensemble is as a decision-making tool.
  • The Brier score (BS) is a measure of the probabilistic skill of an ensemble for a given threshold, computed as the MSE between the forecast probability and the binary outcome of the observations for that threshold. Here, we decompose the BS following Murphy (1973) into components of reliability (REL), resolution (RES), and uncertainty (UNC), whereby
    eq1
    Here, REL measures how close the forecast probabilities are to the observed frequencies, RES measures how much the forecast probabilities differ from the climatic average, and UNC is based on the observed frequency of the event and is thus independent of the forecast. The closer BS is to zero, the more probabilistic skill the ensemble has. It therefore follows that for a perfect BS, REL would be zero and RES would equal UNC.
In evaluating the differences between the verification scores of the different EPS systems, it is important to take the statistical significance of these differences into account. To achieve this, a bootstrap statistical significance test is applied to the score differences between the experiments to determine whether the signs of the differences are significant at the 95% confidence level. Each score is recomputed 10 000 times for each lead time using a random sample, with replacement, drawn from the 76 forecast days and the differences computed between each experiment. If more than 95% of the differences between the samples have the same sign, the sign of the difference is considered to be statistically significant. All statements made about the differences between experiments are supported by this test for statistical significance unless otherwise stated.

Given that the stations used in the verification are in close proximity to each other (Fig. 1, bottom), spatial correlations in the forecast errors may become influential in evaluating the significance of the score differences between the experiments. To test whether this is the case, the verification scores were recomputed and the bootstrap significance test reapplied for a selection of stations where the maximum allowed correlation coefficient between the stations was 0.75. This generally reduced the number of stations used in the verification by about a third. It was found that none of the statements made in this paper could be disputed using this approach, so we present results using the full set of stations in computing the verification scores. The results of Candille et al. (2007) suggest that temporal correlations in forecast errors for successive days can be neglected for the short lead times we consider in this paper. Thus, similar to Bouttier et al. (2015), we do not take temporal correlations in the forecast errors into account in computing the verification scores.

3. Results

In the subsequent verification, only 0600 UTC forecasts are shown, although GLAMEPS and HarmonEPS are cycled every 6 h, and with full forecast lengths at both 0600 and 1800 UTC. The scores are comparable when both forecast times are used, but diurnal trends are less obvious when the scores include both 0600 and 1800 UTC forecasts. IFS ENS is run at 0000 and 1200 UTC. In the results IFS ENS therefore has a 6-h-older lead time than the two other systems. This might seem to result in an unfair comparison, but as GLAMEPS and HarmonEPS have considerably shorter production times, concerns about fairness are mitigated when one takes into account when the forecasts are available for use. The verification shown is for the 31 stations at sport stadiums and locations that are also close to the Olympic activities (Fig. 1). The station elevations range from 6 to 2320 m above sea level. We show results from the 6-h lead time, as there are spinup issues during the first few hours of the forecasts.

a. Overall performance of the three uncalibrated ensemble prediction systems

Figure 2 shows the CRPSs for IFS ENS, GLAMEPS, and HarmonEPS for Pmsl, T2m, S10m, and AccPcp3h. While IFS ENS and GLAMEPS have a comparable number of ensemble members (51 and 54, respectively), HarmonEPS has about half the number of members (26). Still, HarmonEPS is the best system in terms of CRPS for wind speed and mainly also for precipitation. For T2m, HarmonEPS is better than IFS ENS in the daytime and worse at night, while for Pmsl it is the least skillful of the three systems. GLAMEPS is at the same level or better than IFS ENS for all four of the parameters and is especially good compared to the other systems for T2m. Note the large diurnal trend for T2m, with much worse scores during nighttime than during the day, especially for HarmonEPS. Niemelä et al. (2014) ran HARMONIE at 1 km during FROST using a similar setup to the AROME part of our HarmonEPS and also found a large underestimation of temperature during nighttime for the mountain stations. They traced this back to the use of Masson and Seity’s (2009) canopy scheme This scheme predicts the evolution of TKE, wind, specific humidity, and temperature and provides diagnostics of the screen-level values. By switching off the canopy scheme, they were able to considerably reduce the nighttime underestimation of T2m in the mountains, while the impact during the day was small. HarmonEPS is not rerun with this change; however, calibration (see section 3c) takes care of this bias to a large extent.

Fig. 2.
Fig. 2.

CRPS for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red): (top left) Pmsl (hPa), (top right) T2m (°C), (bottom left) S10m (m s−1), and (bottom right) AccPcp3h (mm). Note that for CRPS a perfect score is zero.

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

The spread and skill are shown in Fig. 3. IFS ENS generally has the lowest spread for all four parameters. For Pmsl and T2m, GLAMEPS clearly has the best spread–skill relationship, with the two curves closest together. For S10m and AccPcp3h, HarmonEPS is the best in terms of spread–skill. Note that for all four parameters, HarmonEPS, with only half the number of members, yields more spread than IFS ENS. All three systems are underdispersive for all four parameters, except for Pmsl in GLAMEPS for lead times larger than about 20 h, where the relationship between spread and skill is almost perfect, with the two lines almost on top of each other.

Fig. 3.
Fig. 3.

As in Fig. 2, but for spread (dashed) and skill (solid) for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red).

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

The area under the ROC curve is presented in Fig. 4, shown are T2m, S10m, and 3- and 12-h accumulated precipitation as a function of the threshold for 24-h lead time. A set of thresholds was defined as being particularly important for some of the competitions, like 10-m wind speed restrictions for ski jumping and visibility limitations for biathlon. All the official thresholds of the FROST project are marked along the x axis in Fig. 4 for the respective parameters. In Fig. 4, we see that HarmonEPS is best for wind speed and precipitation. For temperature GLAMEPS is best for the higher thresholds. The lowest temperature thresholds hardly occurred during our experiment period.

Fig. 4.
Fig. 4.

Area under ROC curve for 24-h lead time as a function of the threshold for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red): (top left) T2m, (top right) S10m, (bottom left) AccPcp3h, and (bottom right) AccPcp12h. Perfect score is 1.

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

It is of interest to look at the decomposition of the Brier score, and we are particularly interested in the resolution part of the BS, as increased ensemble resolution is nontrivial. It is sometimes argued that calibration can only improve reliability by reducing the overall bias across stations. However, calibration can also improve the resolution term if it removes biases at each location separately, as this is similar to resolving weather situations into different types. In Fig. 5 the decomposed Brier scores are shown for T2m, S10m, and 3- and 12-hourly accumulated precipitation. Although GLAMEPS has the best BSs for T2m (not shown), we see from the top-left panel in Fig. 5 that they come from the reliability part of the BS, while the resolution parts are about the same for HarmonEPS and GLAMEPS. IFS ENS has the lowest resolution of the three systems, and its main quality compared to GLAMEPS and HarmonEPS comes from the reliability part of the BS. Also for S10m (top-right panel in Fig. 5) it is evident that HarmonEPS has the best ensemble resolution of the three systems, but in this case it also has the best reliability. It should be noted that all systems are unskillful for S10m in terms of BS, which can be seen in Fig. 5, as the reliability part is larger than the resolution. For precipitation (bottom two panels in Fig. 5) we see that HarmonEPS is better than IFS ENS in terms of reliability at all lead times. The resolution component is comparable for the three systems.

Fig. 5.
Fig. 5.

Decomposed BS as a function of lead time for IFS ENS (black), GLAMEPS (blue), and HarmonEPS (red). Reliability is shown as a solid line, resolution as a dashed line, and uncertainty as a dotted line. Shown are (top left) T2m, 0°C threshold; (top right) S10m, 4 m s−1 threshold; (bottom left) AccPcp3h, 0.1-mm threshold; and (bottom right) AccPcp12h 0.3-mm threshold.

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

b. Multimodel

To look at the effects of the numbers of subensembles in GLAMEPS, we have run the verification for two subsets; first, we pick 12 + 1 members from the two HIRLAM subensembles, which gives 26 members, and we call this GLAMEPS26. The second subset consists of all four submodels, but with only 6 + 1 members from each. This gives 28 members, and we call it GLAMEPS28. See Table 1 for an overview of the different ensemble configurations from GLAMEPS. The effect of the number of submodels in GLAMEPS can be seen in Fig. 6. It is clear from Fig. 6 that the GLAMEPS28 scores are about the same as the full GLAMEPS scores, and that GLAMEPS26 is clearly inferior, except for the case of AccPcp3h at longer lead times, despite the ensemble size being only two less than GLAMEPS28. This is also confirmed when looking at other scores (not shown). Both GLAMEPS26 and GLAMEPS28 are multimodel, but clearly adding more models on top of the two in GLAMEPS26 is beneficial.

Table 1.

Overview of the GLAMEPS configurations.

Table 1.
Fig. 6.
Fig. 6.

CRPS for full GLAMEPS (blue), GLAMEPS26 (red), and GLAMEPS28 (black): (top left) T2m (°C), (top right) S10m (m s−1), (bottom left) AccPcp3h (mm), and (bottom right) AccPcp12h (mm). Note that for CRPS a perfect score is zero.

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

The same exercise can be done for HarmonEPS. In Fig. 7 we have plotted CRPS for HarmonEPS with a subset of members, taking 13 members from the full HarmonEPS (6 + 1 from one of the subensembles and 5 + 1 from the other) and comparing this with the two single-model ensembles, each with 13 members. We can see in Fig. 7 that the two subensembles have very different qualities, especially for T2m and Pmsl. For S10m, HarmonEPS with both subensembles is the best of the three, but not for Pmsl and T2m. For precipitation full HarmonEPS is better than one of the subensembles, but not better than the other. This indicates that to get a good quality multimodel ensemble, the subensembles need to be of good and comparable quality.

Fig. 7.
Fig. 7.

CRPS for HarmonEPS with 13 members (blue) and HarmonEPS with only one model (black and yellow): (top left) Pmsl (hPa), (top right) T2m (°C), (bottom left) S10m (m s−1), and (bottom right) AccPcp3h (mm).

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

c. Calibration

Both HarmonEPS and GLAMEPS have been calibrated for station points with the method described in section 2d. We have also looked at what is most important for the calibrated forecasts: the number of ensemble members or the number of subensembles. One such test is calibrating GLAMEPS using fewer members from each subensemble. We used 6 + 1 members from each of the four subensembles, keeping the number of submodels/subversions but reducing the number of members to about half. We call this calibrated GLAMEPS28. The other test is reducing the number of submodels/subversions. In calibrated GLAMEPS26, we use the two HIRLAM subensembles with 12 + 1 members in each (see Table 1). In Fig. 8 the spread and skill are shown for GLAMEPS and the three versions of calibrated GLAMEPS. For T2m and S10m, calibrated GLAMEPS based on the full ensemble has the best skill and, for the most part, the best spread and skill relationship. Although the calibrated spread is closer to the skill, the results do not match exactly because the members in the calibrated ensemble were generated using evenly spaced quantiles, which reduces the spread of the output. An alternative would have been to randomly sample the distribution, yielding an ensemble with spread closer to the spread of the distribution; however, this has been shown to lead to poorer performance for other scores (see, e.g., Schefzik et al. 2013). Also for precipitation calibrated GLAMEPS has good skill, but it is not statistically significantly better than the other systems. As long as the number of subensembles was kept constant, raw GLAMEPS was almost insensitive to the number of members being reduced from 54 to about half of that number (Fig. 6). This is not the case after calibration, and it is most easily seen in the figure for S10m, where the skill is shown to be reduced when going from 54 to 28 members while keeping the four submodels. Calibrated GLAMEPS also shows a dependency on the number of subensembles, as we see an even further reduction in the level of skill when the number of submodels is reduced to two.

Fig. 8.
Fig. 8.

Spread (dashed) and skill (solid) for GLAMEPS (blue), calibrated GLAMEPS (green), calibrated GLAMEPS28 (orange), and calibrated GLAMEPS26 (yellow): (top left) T2m (°C), (top right) S10m (m s−1), (bottom left) AccPcp3h (mm), and (bottom right) AccPcp12h (mm).

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

The effect of calibrating HarmonEPS can be seen in Fig. 9; also plotted is calibrated GLAMEPS26, which has the same number of members and subensembles as HarmonEPS. For T2m, the most noticeable feature is the reduction of the nighttime error for raw HarmonEPS, which in turn gives a much better spread–skill relationship than for uncalibrated HarmonEPS. Comparing calibrated HarmonEPS and GLAMEPS with the same number of members and subensembles (purple and yellow curves in Fig. 9), calibrated HarmonEPS is the better system for T2m. For S10m, calibrated HarmonEPS has better skill than raw HarmonEPS and has comparable skill to calibrated GLAMEPS26. For precipitation, calibrating HarmonEPS degraded the scores, but calibrated HarmonEPS has a better spread–skill relationship than calibrated GLAMEPS26.

Fig. 9.
Fig. 9.

As in Fig. 8, but for HarmonEPS (red), calibrated HarmonEPS (purple), and calibrated GLAMEPS26 (yellow).

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

Looking at the Brier scores for GLAMEPS, HarmonEPS, and all of the calibrated systems (Fig. 10), we see that for T2m calibrated HarmonEPS performs best and calibrated GLAMEPS, based on the full ensemble, is better than raw GLAMEPS for day 1. Calibrated HarmonEPS performs better than both calibrated GLAMEPS26 and calibrated GLAMEPS28, except for lead times of 30 and 33 h. Both calibrated HarmonEPS and calibrated GLAMEPS26 consist of two subensembles with the same number of members, indicating that HarmonEPS has higher potential for probabilistic prediction of T2m than does GLAMEPS. HarmonEPS is not run with the same submodels, so one cannot conclude from this experiment whether it is due to resolution or the choice of models. For S10m, calibrated GLAMEPS based on the full ensemble is best, but again HarmonEPS performs as well as calibrated GLAMEPS28 (not for lead times of 27 and 30 h), and better than calibrated GLAMEPS26 (not statistically significant for lead times 27 and 30 h). From the two bottom panels in Fig. 10 we see again that calibrating precipitation was not successful. This is likely because the ensemble means do not systematically over- or underestimate precipitation, and hence stage 1 in the calibration has little effect.

Fig. 10.
Fig. 10.

BS (perfect score 0) for GLAMEPS (blue), HarmonEPS (red), calibrated GLAMEPS (green), calibrated GLAMEPS28 (orange), calibrated GLAMEPS26 (yellow), and calibrated HarmonEPS (purple): (top left) T2m for 0°C threshold, (top right) S10m for 3 m s−1 threshold, (bottom left) AccPcp3h for 0.1-mm threshold, and (bottom right) AccPcp12h for 0.1-mm threshold.

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

d. Case studies

The previous section has shown that calibration improves the ensemble verification scores for both T2m and S10m. However, since these scores are aggregated over many cases, they may mask some important weaknesses and strengths that the calibration procedures used may possess in certain weather situations. In this section we present two case studies that show the effects of calibration on forecasts in an operational setting. Case 1 involves the passage of a cold front and poor visibility that led to the postponement of the super giant slalom (Super G) part of women’s combined Alpine skiing competition at Roza Khutor. For this case the timing of the passage of the front was poorly forecasted by the uncalibrated models. Case 2 is a high wind speed event in which the uncalibrated models underestimated the wind speed at stations above 1500 m.

In Fig. 11 the temperature forecasts are shown for GLAMEPS, calibrated GLAMEPS, HarmonEPS, and calibrated HarmonEPS for case 1. We see that the observed temperature values for 1200 UTC 11 March are included in the range predicted by GLAMEPS, and GLAMEPS captures the range of temperature values during the passage of the cold front that happened between 0600 and 1200 UTC. The cold front is clearly visible in the calibrated GLAMEPS forecast. After the passage of the cold front, raw GLAMEPS better fits to the observations than calibrated GLAMEPS. The improvement for the passage of the cold front seen in calibrated GLAMEPS is likely due to the diurnally varying bias correction that increased the nighttime temperatures and maintained the daytime temperatures, thereby causing an overall drop in temperature. This case also highlights an issue with a calibration scheme that is slow to adapt to a new weather regime. The bias properties of the weather leading up to this event clearly do not persist after the frontal passage. For HarmonEPS, what is most striking is temperatures that are much too low in the early part of the forecast, which is improved by the calibration. The cold front is missed by raw HarmonEPS, with an almost constant temperature for the time of the passage of the cold front. Calibrated HarmonEPS captures this passage of the cold front very well. This is again a side effect of the diurnally varying bias, as a calibration scheme based on correcting for past systematic errors cannot predict a frontal passage when the phenomenon is not present in the raw forecast. As with calibrated GLAMEPS, the calibrated HarmonEPS forecasts overestimate temperatures after the frontal passage.

Fig. 11.
Fig. 11.

Temperature forecast from 1800 UTC 10 Mar 2014 at Roza Khutor at 2320-m height for GLAMEPS, calibrated GLAMEPS, HarmonEPS (AROME only), and calibrated HarmonEPS. The following percentiles are shown: red, 25th–75th; orange, 10th–90th; and yellow, the full range. The solid black line is the ensemble mean, and the stars are observations.

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

Forecasts for S10m for case 2 are shown in Fig. 12. Both uncalibrated systems underestimate the maximum wind speed during the morning of 17 March. There is a marked improvement in the maximum wind speed after calibration, both for GLAMEPS and HarmonEPS. For this case we see that calibrated GLAMEPS is also better than raw GLAMEPS for forecast day 2. The daytime bias leading up to the event persists for the whole forecast length, contrary to what we saw in case 1, and therefore we are also able to see improvements at longer lead times in this case.

Fig. 12.
Fig. 12.

As in Fig. 11, but for the 10-m wind speed forecast at 1800 UTC 16 Mar 2014 at Roza Khutor.

Citation: Weather and Forecasting 31, 6; 10.1175/WAF-D-16-0048.1

4. Discussion and conclusions

Three ensemble prediction systems with different spatial resolutions have been evaluated for their ability to predict wintertime weather in complex terrain for a 2.5-month period around the time of the Sochi Winter Olympics in 2014. We have compared the performance of raw ensemble outputs as well as calibrated ensemble forecasts. Since both GLAMEPS and HarmonEPS are multimodel ensemble prediction systems, we have also investigated the effects the numbers of subensembles and members have on the raw and the calibrated forecasts, as well as the overall impact of calibrating forecasts with different spatial resolutions. In comparing the systems, we have not taken into account the double-penalty problem that higher-resolution forecasts can suffer from when compared to smoother, lower-resolution forecasts (Mass et al. 2002). This means that HarmonEPS would probably score better compared with GLAMEPS and IFS ENS, and GLAMEPS would probably score better compared with IFS ENS, if postprocessing techniques like upscaling had been used. However, as some weather events can be strongly forced by the topography, it is not certain to what extent the double-penalty issue is problematic across complex terrain.

As a result of the nighttime bias of T2m for HarmonEPS in mountainous stations (section 3a), this parameter is not suitable for comparing the three uncalibrated systems. However, it should be noted that the simple calibration method used provided sufficient improvement and enabled suitable comparisons. Regarding all scores for S10m, AccPcp3h, and AccPcp12h, HarmonEPS verifies better than both IFS ENS and GLAMEPS, while raw GLAMEPS verifies better than, or at the same level as, IFS ENS. Taking into account both the double-penalty issue mentioned above, and the fact that the number of members is about half for HarmonEPS compared with IFS ENS and GLAMEPS, this is an encouraging result for HarmonEPS. The configuration of HarmonEPS used was a first setup of the system with simple perturbations only to the boundaries and initial conditions, in addition to the multiphysics. Considerable work is presently being put into the development of HarmonEPS, whereby more sophisticated perturbation techniques are being investigated for the surface, initial conditions, and physics. Furthermore, several HIRLAM institutes are planning operational versions of HarmonEPS in the near future, which is expected to enhance the development of perturbation strategies.

A large part of the skill in GLAMEPS can be attributed to the multimodel approach. In HarmonEPS one of the submodels is suboptimal for Pmsl and T2m, which results in inferior scores for the combined ensemble (Fig. 7). For the parameters where the individual subensembles verify with similar scores, the overall score for the multimodel HarmonEPS is improved. This suggests that to have a good multimodel EPS, all submodels need to perform similarly well. None of the models in HarmonEPS were specifically tuned for the Sochi area, and the performance could probably be improved. One drawback of multimodel EPSs is the need to maintain two or more models or physics options. Furthermore, the performance of the system will be affected should one of the components change. Multimodel EPSs are sometimes criticized for being unphysical and for producing forecasts that cluster according to the subensembles. Ideally, one would like an EPS where all sources of uncertainty are correctly described within the context of a single model. However, Berner et al. (2015) argue that the complexity of the model error is so large that no single scheme that exists today is able to describe this fully, and combining multimodel/physics with stochastic perturbations has been found to result in the best performing ensemble.

The calibration was done for the 31 stations at and in the vicinity of the sport stadiums. Since the calibration is done separately for each forecast time, the effect of the calibration is seen throughout the forecast range. For T2m in Fig. 8, we see that raw GLAMEPS has higher spread than calibrated GLAMEPS for the first few forecast hours. This might seem contradictory, as most of the time an ensemble prediction system has too small spread, and one could expect that calibration would increase it. For uncalibrated multimodel ensemble systems different model biases in the different models or model versions can lead to large spread for the whole system. However, it is not desirable that different model biases are responsible for the increased spread in a multimodel ensemble. Calibration removes this bias, and therefore the spread in the calibrated forecast can become smaller than in the raw forecast. Even though raw GLAMEPS at some lead times has larger spread, calibrated GLAMEPS has lower RMSEs, so for most lead times calibrated GLAMEPS has a better spread–skill relationship. The calibration of precipitation improves GLAMEPS verification scores slightly, but not those for HarmonEPS. It should be noted that there were few precipitation events during the period, and this is likely to have a considerable impact on both the calibration and the verification scores, meaning that this result should be interpreted with caution. Uncalibrated GLAMEPS was almost insensitive to a reduction in the number of ensemble members, as long as the number of subensembles was retained. This is not seen for calibrated GLAMEPS. Calibrated GLAMEPS based on the full ensemble results in the best verification scores, but there is also a clear dependency on both the number of submodels and the number of members (Figs. 8 and 10). The reason for the dependence on the number of members that is seen after calibration, but not before, is open to question. It could indicate that the effect of a multimodel approach is not only to increase spread or reduce mean error, but to also account for more basic aspects of model uncertainty. The spread in an ensemble should come from random, nonsystematic components of forecast uncertainty, and not from different biases in the multimodel components. After calibration the biases of the component ensembles are removed, but the inclusion of more models in the ensemble still results in superior verification scores. This suggests that the improved performance of the multimodel ensemble goes beyond the effects of error cancellation on the verification scores.

The effect of calibration and spatial resolution can be assessed by comparing calibrated HarmonEPS and calibrated GLAMEPS with the same number of submodels (two) and the same number of members (26). Figure 9 shows that the mean error is mainly smaller for calibrated HarmonEPS, and Fig. 10 shows that it has a better BS, at least for the first forecast day; thus, we conclude that HarmonEPS has greater potential for probabilistic prediction than GLAMEPS. The effect on the forecast of calibration is unquestionable, and since it is much cheaper in terms of computer costs than running the ensemble itself, it should be an integral aspect of any ensemble prediction system. In this study we only calibrate at station points; however, calibration can be performed on every grid point (see, e.g., Raftery et al. 2005), and such an approach is being developed for GLAMEPS, for example. This is particularly useful when forecasts are made available to the public for places where there are no direct observations with which to calibrate the forecasts.

The calibration approaches used were quite simple and do not use all of the information provided by the ensemble (such as ensemble spread). Using methods such as ensemble model output statistics (Gneiting et al. 2005) and Bayesian model averaging (Raftery et al. 2005, Sloughter et al. 2007) would likely lead to larger improvements for the calibrated forecasts in cases where the ensemble is able to discriminate between certain and uncertain events. These methods allow ensemble members to be weighted based on their past performance, which could be beneficial given that multimodel ensembles were used in this study. These approaches could improve the calibrated precipitation forecasts, where only the ensemble members were scaled and no attempt was made to adjust the rest of the forecast distribution. Finally, as was seen in the cold front case study, approaches that can quickly adapt to regime changes, such as analog methods (Delle Monache et al. 2011), could yield further improvements.

There is an improvement seen in the scores for GLAMEPS when the number of members is increased beyond 26 and 28. It is reasonable to believe that this also holds for HarmonEPS, and we should aim for more members in our convection-permitting ensemble than were used here. However, because of limitations in computer power this is not achievable for most centers in an operational setting. As always, there is a compromise between domain size, number of members, and spatial resolution. We would argue that as weather at the convection-permitting scale is unpredictable by nature, increasing the number of members to get a better representation of the inherent uncertainty is important. Also the size of the integration area is important, as a sufficiently sized area is required for data assimilation, and any perturbations that are imposed on the ensembles at initial time will quickly be dominated by synoptic-scale features entering from the lateral boundaries if the area is too small.

The weather during the experiment period did not include many high-impact events, like strong winds or heavy precipitation. Future work will address the performance of both raw and calibrated GLAMEPS and HarmonEPS for high-impact weather. This will be the focus when introducing new ways of describing the uncertainty in the different aspects of HarmonEPS and also in the development of calibration techniques.

Acknowledgments

We wish to thank our Russian colleagues for a well-organized project, Kai Sattler for technical work in providing GLAMEPS in real time, and Pål Sannes for making the observations from Sochi available at MET Norway. Helpful comments from three anonymous reviewers are acknowledged. This work was funded by MET Norway.

REFERENCES

  • Aspelien, T., , Iversen T. , , Bremnes J. B. , , and Frogner I.-L. , 2011: Short-range probabilistic forecasts from the Norwegian limited-area EPS: Long-term validation and a polar low study. Tellus, 63A, 564584, doi:10.1111/j.1600-0870.2010.00502.x.

    • Search Google Scholar
    • Export Citation
  • Bengtsson, L., , and Körnich H. , 2016: Impact of a stochastic parameterization of cumulus convection, using cellular automata, in a mesoscale ensemble prediction system. Quart. J. Roy. Meteor. Soc., 142, 11501159, doi:10.1002/qj.2720.

    • Search Google Scholar
    • Export Citation
  • Berner, J., , Fossell K. R. , , Ha S.-Y. , , Hacker J. P. , , and Snyder C. , 2015: Increasing the skill of probabilistic forecasts: Understanding performance improvements from model-error representations. Mon. Wea. Rev., 143, 12951320, doi:10.1175/MWR-D-14-00091.1.

    • Search Google Scholar
    • Export Citation
  • Bishop, C. H., , Etherton B. J. , , and Majumdar S. J. , 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420436, doi:10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Bouttier, F., , Raynaud L. , , Nuissier O. , , and Ménétrier B. , 2015: Sensitivity of the AROME ensemble to initial and surface perturbations during HyMeX. Quart. J. Roy. Meteor. Soc., 142, 390403, doi:10.1002/qj.2622.

    • Search Google Scholar
    • Export Citation
  • Bowler, N. E., , Arribas A. , , Mylne K. R. , , Robertson K. B. , , and Beare S. E. , 2008: The MOGREPS short-range ensemble prediction system. Quart. J. Roy. Meteor. Soc., 134, 703722, doi:10.1002/qj.234.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., , and Palmer T. , 1995: The singular-vector structure of the atmospheric global circulation. J. Atmos. Sci., 52, 14341456, doi:10.1175/1520-0469(1995)052,1434:SVSOT.2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., , Miller M. , , and Palmer T. , 1999: Stochastic representation of model uncertainties in the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 125, 28872908, doi:10.1002/qj.49712556006.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., , Leutbecher M. , , and Isaksen L. , 2008: Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 134, 20512066, doi:10.1002/qj.346.

    • Search Google Scholar
    • Export Citation
  • Candille, G., , Côté C. , , Houtekamer P. L. , , and Pellerin G. , 2007: Verification of an ensemble prediction system against observations. Mon. Wea. Rev., 135, 26882699, doi:10.1175/MWR3414.1.

    • Search Google Scholar
    • Export Citation
  • Christensen, H. M., , Moroz I. M. , , and Palmer T. N. , 2015: Stochastic and perturbed parameter representations of model uncertainty in convection parameterization. J. Atmos. Sci., 72, 25252544, doi:10.1175/JAS-D-14-0250.1.

    • Search Google Scholar
    • Export Citation
  • Craig, G. C., , and Cohen B. G. , 2006: Fluctuations in an equilibrium convective ensemble. Part I: Theoretical formulation. J. Atmos. Sci., 63, 19962004, doi:10.1175/JAS3709.1.

    • Search Google Scholar
    • Export Citation
  • Delle Monache, L., , Nipen T. , , Liu Y. , , Roux G. , , and Stull R. , 2011: Kalman filter and analog schemes to postprocess numerical weather predictions. Mon. Wea. Rev., 139, 35543570, doi:10.1175/2011MWR3653.1.

    • Search Google Scholar
    • Export Citation
  • Du, J., , and Tracton M. , 2001: Implementation of a real-time short range ensemble forecasting system at NCEP: An update. Preprints, Ninth Conf. on Mesoscale Processes, Fort Lauderdale, FL, Amer. Meteor. Soc., P4.9. [Available online at https://ams.confex.com/ams/pdfpapers/23074.pdf.]

  • Eckel, F. A., , and Mass C. F. , 2005: Aspects of effective mesoscale, short-range ensemble forecasting. Wea. Forecasting, 20, 328350, doi:10.1175/WAF843.1.

    • Search Google Scholar
    • Export Citation
  • Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343367, doi:10.1007/s10236-003-0036-9.

    • Search Google Scholar
    • Export Citation
  • Fraley, C., , Raftery A. E. , , and Gneiting T. , 2010: Calibrating multimodel forecast ensembles with exchangeable and missing members using Bayesian model averaging. Mon. Wea. Rev., 138, 190202, doi:10.1175/2009MWR3046.1.

    • Search Google Scholar
    • Export Citation
  • Frogner, I.-L., , and Iversen T. , 2002: High-resolution limited-area ensemble predictions based on low resolution targeted singular vectors. Quart. J. Roy. Meteor. Soc., 128, 13211341, doi:10.1256/003590002320373319.

    • Search Google Scholar
    • Export Citation
  • Frogner, I.-L., , Haakenstad H. , , and Iversen T. , 2006: Limited-area ensemble predictions at the Norwegian Meteorological Institute. Quart. J. Roy. Meteor. Soc., 132, 27852808, doi:10.1256/qj.04.178.

    • Search Google Scholar
    • Export Citation
  • García-Moya, J.-A., , Callado A. , , Escribà P. , , Santos C. , , Santos-Muñoz D. , , and Simarro J. , 2011: Predictability of short-range forecasting: A multimodel approach. Tellus, 63A, 550563, doi:10.1111/j.1600-0870.2010.00506.x.

    • Search Google Scholar
    • Export Citation
  • Gebhardt, C., , Theis S. , , Paulat M. , , and Ben Bouallègue Z. , 2011: Uncertainties in COSMO-DE precipitation forecasts introduced by model perturbations and variation of lateral boundaries. Atmos. Res., 100, 168177, doi:10.1016/j.atmosres.2010.12.008.

    • Search Google Scholar
    • Export Citation
  • Gerard, L., , and Geleyn J.-F. , 2005: Evolution of a subgrid deep convection parameterization in a limited-area model with increasing resolution. Quart. J. Roy. Meteor. Soc., 131, 22932312, doi:10.1256/qj.04.72.

    • Search Google Scholar
    • Export Citation
  • Gerard, L., , Piriou J.-M. , , Brožková R. , , Geleyn J.-F. , , and Banciu D. , 2009: Cloud and precipitation parameterization in a meso-gamma-scale operational weather prediction model. Mon. Wea. Rev., 137, 39603977, doi:10.1175/2009MWR2750.1.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., , Raftery A. E. , , Westveld A. H. III, , and Goldman T. , 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, doi:10.1175/MWR2904.1.

    • Search Google Scholar
    • Export Citation
  • Grimit, E. P., , and Mass C. F. , 2002: Initial results of a mesoscale short-range ensemble forecasting system over the Pacific Northwest. Wea. Forecasting, 17, 192205, doi:10.1175/1520-0434(2002)017<0192:IROAMS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., , Buizza R. , , Hamill T. , , Leutbecher M. , , and Palmer T. , 2012: Comparing TIGGE multimodel forecasts with reforecast calibrated ECMWF ensemble forecasts. Quart. J. Roy. Meteor. Soc., 138, 18141827, doi:10.1002/qj.1895.

    • Search Google Scholar
    • Export Citation
  • Holton, J. R., 2004: An Introduction to Dynamic Meteorology. 4th ed. Elsevier Science, 535 pp.

  • Homleid, M., , and Tveter F. T. , 2016: Verification of operational weather prediction models September to November 2015. METInfo Rep. 16/2016, Norwegian Meteorological Institute, 74 pp. [Available online at http://met.no/Forskning/Publikasjoner/MET_info/filestore/Verification_report_201509-2015111.pdf.]

  • Hunt, B. R., , Kostelich E. J. , , and Szunyogh I. , 2007: Efficient data assimilation of spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112126, doi:10.1016/j.physd.2006.11.008.

    • Search Google Scholar
    • Export Citation
  • Isaksen, L., , Bonavita M. , , Buizza R. , , Fisher M. , , Haseler J. , , Leutbecher M. , , and Raynaud L. , 2010: Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo. 636, 45 pp. [Available online at http://www.ecmwf.int/sites/default/files/elibrary/2010/10125-ensemble-data-assimilations-ecmwf.pdf.]

  • Iversen, T., , Deckmyn A. , , Santos C. , , Sattler K. , , Bremnes J. B. , , Feddersen H. , , and Frogner I.-L. , 2011: Evaluation of ‘GLAMEPS’—A proposed multimodel EPS for short range forecasting. Tellus, 63A, 513530, doi:10.1111/j.1600-0870.2010.00507.x.

    • Search Google Scholar
    • Export Citation
  • Johnson, C., , and Swinbank R. , 2009: Medium-range multimodel ensemble combination and calibration. Quart. J. Roy. Meteor. Soc., 135, 777794, doi:10.1002/qj.383.

    • Search Google Scholar
    • Export Citation
  • Junk, C., , Späth S. , , von Bremen L. , , and Delle Monache L. , 2015: Comparison and combination of regional and global ensemble prediction systems for probabilistic predictions of hub-height wind speed. Wea. Forecasting, 30, 12341253, doi:10.1175/WAF-D-15-0021.1.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., , and Palmer T. , 2008: Ensemble forecasting. J. Comput. Phys., 227, 35153539, doi:10.1016/j.jcp.2007.02.014.

  • Lorenz, E. N., 1963: Deterministic nonperiodic flow. J. Atmos. Sci., 20, 130141, doi:10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.

  • Marsigli, C., , Boccanera F. , , Montani A. , , and Paccagnella T. , 2005: The COSMO-LEPS mesoscale ensemble system: Validation of the methodology and verification. Nonlinear Processes Geophys., 12, 527536, doi:10.5194/npg-12-527-2005.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., , Ovens D. , , Westrick K. , , and Colle B. A. , 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407430, doi:10.1175/1520-0477(2002)083<0407:DIHRPM>2.3.CO;2.

    • Search Google Scholar
    • Export Citation
  • Masson, V., , and Seity Y. , 2009: Including atmospheric layers in vegetation and urban offline surface schemes. J. Appl. Meteor. Climatol., 48, 13771397, doi:10.1175/2009JAMC1866.1.

    • Search Google Scholar
    • Export Citation
  • Molteni, F., , Buizza R. , , Palmer T. N. , , and Petroliagis T. , 1996: The ECMWF Ensemble Prediction System: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122, 73119, doi:10.1002/qj.49712252905.

    • Search Google Scholar
    • Export Citation
  • Montani, A., , Cesari D. , , Marsigli C. , , and Paccagnella T. , 2011: Seven years of activity in the field of mesoscale ensemble forecasting by the COSMO-LEPS system: Main achievements and open challenges. Tellus, 63A, 605624, doi:10.1111/j.1600-0870.2010.00499.x.

    • Search Google Scholar
    • Export Citation
  • Murphy, A., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600, doi:10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Niemelä, S., , Näsman S. , , and Nurmi P. , 2014: FROST-2014—Performance of Harmonie 1 km during Sochi Olympics. ALADIN-HIRLAM Newsletter, No. 3, 79–86. [Available online at http://hirlam.org/index.php/hirlam-documentation/cat_view/77-hirlam-official-publications/285-aladin-hirlam-newsletters.]

  • Nipen, T., , West G. , , and Stull R. , 2011: Updating short-term probabilistic weather forecasts of continuous variables using recent observations. Wea. Forecasting, 26, 564571, doi:10.1175/WAF-D-11-00022.1.

    • Search Google Scholar
    • Export Citation
  • Palmer, T., 1993: Extended-range atmospheric prediction and the Lorenz model. Bull. Amer. Meteor. Soc., 74, 4965, doi:10.1175/1520-0477(1993)074<0049:ERAPAT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Park, Y.-Y., , Buizza R. , , and Leutbecher M. , 2008: TIGGE: Preliminary results on comparing and combining ensembles. Quart. J. Roy. Meteor. Soc., 134, 20292050, doi:10.1002/qj.334.

    • Search Google Scholar
    • Export Citation
  • Peralta, C., , Bouallègue Z. , , Theis S. , , Gebhardt C. , , and Buchhold M. , 2012: Accounting for initial condition uncertainties in COSMODE-EPS. J. Geophys. Res., 117, D07108, doi:10.1029/2011JD016581.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., , Gneiting T. , , Balabdaoui F. , , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174, doi:10.1175/MWR2906.1.

    • Search Google Scholar
    • Export Citation
  • Romine, G. S., , Schwartz C. S. , , Berner J. , , Fossell K. R. , , Snyder C. , , Anderson J. L. , , and Weisman M. L. , 2014: Representing forecast error in a convection-permitting ensemble system. Mon. Wea. Rev., 142, 45194541, doi:10.1175/MWR-D-14-00100.1.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., , Thorarinsdottir T. L. , , and Gneiting T. , 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616640, doi:10.1214/13-STS443.

    • Search Google Scholar
    • Export Citation
  • Seity, Y., , Brosseau P. , , Malardel S. , , Hello G. , , Benard P. , , Bouttier F. , , Lac C. , , and Masson V. , 2011: The AROME-France convective-scale operational model. Mon. Wea. Rev., 139, 976991, doi:10.1175/2010MWR3425.1.

    • Search Google Scholar
    • Export Citation
  • Shutts, G. J., 2005: A kinetic energy backscatter algorithm for use in ensemble prediction systems. Quart. J. Roy. Meteor. Soc., 131, 30793102, doi:10.1256/qj.04.106.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M., , Raftery A. E. , , Gneiting T. , , and Fraley C. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220, doi:10.1175/MWR3441.1.

    • Search Google Scholar
    • Export Citation
  • Thorarinsdottir, T. L., , and Gneiting T. , 2010: Probabilistic forecasts of wind speed: Ensemble model statistics by using heteroscedastic censored regression. J. Roy. Stat. Soc., 173A, 371388, doi:10.1111/j.1467-985X.2009.00616.x.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., , and Kalnay E. , 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74, 23172330, doi:10.1175/1520-0477(1993)074<2317:EFANTG>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., , and Kalnay E. , 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 32973319, doi:10.1175/1520-0493(1997)125<3297:EFANAT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Wang, Y., and Coauthors, 2011: The central European limited-area ensemble forecasting system: ALADIN-LAEF. Quart. J. Roy. Meteor. Soc., 137, 483502, doi:10.1002/qj.751.

    • Search Google Scholar
    • Export Citation
  • Wang, Y., , Tascu S. , , Weidle F. , , and Schmeisser K. , 2012: Evaluation of the added value of regional ensemble forecasts on global ensemble forecasts. Wea. Forecasting, 27, 972987, doi:10.1175/WAF-D-11-00102.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences: An Introduction. 2nd ed. Elsevier Science, 627 pp.

  • Yang, X., 2008: Status of the reference system. HIRLAM Newsletter, No. 55, 202–203. [Available online at http://hirlam.org/index.php/component/docman/doc_view/152-hirlam-newsletter-no-54-paper-27-yang?Itemid=70.]

Save