Nonhomogeneous Boosting for Predictor Selection in Ensemble Postprocessing

Jakob W. Messner University of Innsbruck, Innsbruck, Austria

Search for other papers by Jakob W. Messner in
Current site
Google Scholar
PubMed
Close
,
Georg J. Mayr University of Innsbruck, Innsbruck, Austria

Search for other papers by Georg J. Mayr in
Current site
Google Scholar
PubMed
Close
, and
Achim Zeileis University of Innsbruck, Innsbruck, Austria

Search for other papers by Achim Zeileis in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

Nonhomogeneous regression is often used to statistically postprocess ensemble forecasts. Usually only ensemble forecasts of the predictand variable are used as input, but other potentially useful information sources are ignored. Although it is straightforward to add further input variables, overfitting can easily deteriorate the forecast performance for increasing numbers of input variables. This paper proposes a boosting algorithm to estimate the regression coefficients, while automatically selecting the most relevant input variables by restricting the coefficients of less important variables to zero. A case study with ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) shows that this approach effectively selects important input variables to clearly improve minimum and maximum temperature predictions at five central European stations.

Denotes Open Access content.

Current affiliation: Technical University of Denmark, Lyngby, Denmark.

Corresponding author address: Jakob W. Messner, Technical University of Denmark, Elektrovej, Building 325, Kgs. Lyngby, Denmark. E-mail: jwmm@elektro.dtu.dk

Abstract

Nonhomogeneous regression is often used to statistically postprocess ensemble forecasts. Usually only ensemble forecasts of the predictand variable are used as input, but other potentially useful information sources are ignored. Although it is straightforward to add further input variables, overfitting can easily deteriorate the forecast performance for increasing numbers of input variables. This paper proposes a boosting algorithm to estimate the regression coefficients, while automatically selecting the most relevant input variables by restricting the coefficients of less important variables to zero. A case study with ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) shows that this approach effectively selects important input variables to clearly improve minimum and maximum temperature predictions at five central European stations.

Denotes Open Access content.

Current affiliation: Technical University of Denmark, Lyngby, Denmark.

Corresponding author address: Jakob W. Messner, Technical University of Denmark, Elektrovej, Building 325, Kgs. Lyngby, Denmark. E-mail: jwmm@elektro.dtu.dk
Save
  • Bröcker, J., 2010: Regularized logistic models for probabilistic forecasting and diagnostics. Mon. Wea. Rev., 138, 592–604, doi:10.1175/2009MWR3126.1.

    • Search Google Scholar
    • Export Citation
  • Bühlmann, P., and B. Yu, 2003: Boosting with the L2 loss: Regression and classification. J. Amer. Stat. Assoc., 98, 324–339, doi:10.1198/016214503000125.

    • Search Google Scholar
    • Export Citation
  • Bühlmann, P., and T. Hothorn, 2007: Boosting algorithms: Regularization, prediction and model fitting. Stat. Sci., 22, 477–505, doi:10.1214/07-STS242.

    • Search Google Scholar
    • Export Citation
  • Dabernig, M., J. W. Messner, G. J. Mayr, and A. Zeileis, 2016: Spatial ensemble post-processing with standardized anomalies. Working Paper 2016-08, Faculty of Economics and Statistics, University of Innsbruck, 18 pp. [Available online at http://EconPapers.repec.org/RePEc:inn:wpaper:2016-08.]

  • Feldmann, K., M. Scheuerer, and T. L. Thorarinsdottir, 2015: Spatial postprocessing of ensemble forecasts for temperature using nonhomogeneous Gaussian regression. Mon. Wea. Rev., 143, 955–971, doi:10.1175/MWR-D-14-00210.1.

    • Search Google Scholar
    • Export Citation
  • Freund, Y., and R. E. Schapire, 1997: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55, 119–139, doi:10.1006/jcss.1997.1504.

    • Search Google Scholar
    • Export Citation
  • Friedman, J., T. Hastie, and R. Tibshirani, 2000: Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat., 28, 337–407, doi:10.1214/aos/1016218223.

    • Search Google Scholar
    • Export Citation
  • Glahn, H., and D. Lowry, 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11, 1203–1211, doi:10.1175/1520-0450(1972)011<1203:TUOMOS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118, doi:10.1175/MWR2904.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434–1447, doi:10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., R. Tibshirani, and J. Friedman, 2013: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer, 745 pp.

  • Hersbach, H., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559–570, doi:10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., G. J. Mayr, D. S. Wilks, and A. Zeileis, 2014a: Extending extended logistic regression: Extended versus separate versus ordered versus censored. Mon. Wea. Rev., 142, 3003–3014, doi:10.1175/MWR-D-13-00355.1.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., A. Zeileis, G. J. Mayr, and D. S. Wilks, 2014b: Heteroscedastic extended logistic regression for postprocessing of ensemble guidance. Mon. Wea. Rev., 142, 448–456, doi:10.1175/MWR-D-13-00271.1.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., G. J. Mayr, and A. Zeileis, 2016: Heteroscedastic censored and truncated regression with crch. R J., 8, 173–181.

  • Pinson, P., 2012: Adaptive calibration of (u,v)-wind ensemble forecasts. Quart. J. Roy. Meteor. Soc., 138, 1273–1284, doi:10.1002/qj.1873.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 1155–1174, doi:10.1175/MWR2906.1.

    • Search Google Scholar
    • Export Citation
  • R Core Team, 2015: R: A language and environment for statistical computing. Vienna, Austria, R Foundation for Statistical Computing. [Available online at http://www.R-project.org/.]

  • Roulston, M. S., and L. A. Smith, 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 16–30, doi:10.1034/j.1600-0870.2003.201378.x.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., T. L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616–640, doi:10.1214/13-STS443.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., 2014: Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quart. J. Roy. Meteor. Soc., 140, 1086–1096, doi:10.1002/qj.2183.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., and L. Büermann, 2014: Spatially adaptive post-processing of ensemble forecasts for temperature. J. Roy. Stat. Soc. Ser. C Appl. Stat., 63, 405–422, doi:10.1111/rssc.12040.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., and T. M. Hamill, 2015: Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Wea. Rev., 143, 4578–4596, doi:10.1175/MWR-D-15-0061.1.

    • Search Google Scholar
    • Export Citation
  • Schuhen, N., T. L. Thorarinsdottir, and T. Gneiting, 2012: Ensemble model output statistics for wind vectors. Mon. Wea. Rev., 140, 3204–3219, doi:10.1175/MWR-D-12-00028.1.

    • Search Google Scholar
    • Export Citation
  • Stauffer, R., N. Umlauf, J. W. Messner, G. J. Mayr, and A. Zeileis, 2017: Ensemble postprocessing of daily precipitation sums over complex terrain using censored high-resolution standardized anomalies. Mon. Wea. Rev., doi:10.1175/MWR-D-16-0260.1, in press.

    • Search Google Scholar
    • Export Citation
  • Thorarinsdottir, T. L., and T. Gneiting, 2010: Probabilistic forecasts of wind speed: Ensemble model output statistics by using heteroscedastic censored regression. J. Roy. Stat. Soc. A, 173, 371–388, doi:10.1111/j.1467-985X.2009.00616.x.

    • Search Google Scholar
    • Export Citation
  • Tibshirani, R., 1994: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B, 58, 267–288.

  • Wahl, S., 2015: Uncertainty in mesoscale numerical weather prediction: Probabilistic forecasting of precipitation. Ph.D. thesis, Rheinische Friedrich-Wilhelms-Universität Bonn, 120 pp. [Available online at http://hss.ulb.uni-bonn.de/2015/4190/4190.htm.].

  • Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361–368, doi:10.1002/met.134.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 676 pp.

  • Wilson, L. J., and M. Vallé, 2002: The Canadian Updateable Model Output Statistics (UMOS) system: Design and development tests. Wea. Forecasting, 17, 206–222, doi:10.1175/1520-0434(2002)017<0206:TCUMOS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1769 779 39
PDF Downloads 1529 277 11