A Bayesian Framework for Multimodel Regression

Timothy DelSole George Mason University, Fairfax, Virginia, and Center for Ocean–Land–Atmosphere Studies, Calverton, Maryland

Search for other papers by Timothy DelSole in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

This paper presents a framework based on Bayesian regression and constrained least squares methods for incorporating prior beliefs in a linear regression problem. Prior beliefs are essential in regression theory when the number of predictors is not a small fraction of the sample size, a situation that leads to overfitting—that is, to fitting variability due to sampling errors. Under suitable assumptions, both the Bayesian estimate and the constrained least squares solution reduce to standard ridge regression. New generalizations of ridge regression based on priors relevant to multimodel combinations also are presented. In all cases, the strength of the prior is measured by a parameter called the ridge parameter. A “two-deep” cross-validation procedure is used to select the optimal ridge parameter and estimate the prediction error.

The proposed regression estimates are tested on the Development of a European Multimodel Ensemble System for Seasonal to Interannual Prediction (DEMETER) hindcasts of seasonal mean 2-m temperature over land. Surprisingly, none of the regression models proposed here can consistently beat the skill of a simple multimodel mean, despite the fact that one of the regression models recovers the multimodel mean in a suitable limit. This discrepancy arises from the fact that methods employed to select the ridge parameter are themselves sensitive to sampling errors. It is plausible that incorporating the prior belief that regression parameters are “large scale” can reduce overfitting and result in improved performance relative to the multimodel mean. Despite this, results from the multimodel mean demonstrate that seasonal mean 2-m temperature is predictable for at least three months in several regions.

Corresponding author address: Timothy DelSole, Center for Ocean–Land–Atmosphere Studies, 4041 Powder Mill Rd., Suite 302, Calverton, MD 20705-3106. Email: delsole@cola.iges.org

Abstract

This paper presents a framework based on Bayesian regression and constrained least squares methods for incorporating prior beliefs in a linear regression problem. Prior beliefs are essential in regression theory when the number of predictors is not a small fraction of the sample size, a situation that leads to overfitting—that is, to fitting variability due to sampling errors. Under suitable assumptions, both the Bayesian estimate and the constrained least squares solution reduce to standard ridge regression. New generalizations of ridge regression based on priors relevant to multimodel combinations also are presented. In all cases, the strength of the prior is measured by a parameter called the ridge parameter. A “two-deep” cross-validation procedure is used to select the optimal ridge parameter and estimate the prediction error.

The proposed regression estimates are tested on the Development of a European Multimodel Ensemble System for Seasonal to Interannual Prediction (DEMETER) hindcasts of seasonal mean 2-m temperature over land. Surprisingly, none of the regression models proposed here can consistently beat the skill of a simple multimodel mean, despite the fact that one of the regression models recovers the multimodel mean in a suitable limit. This discrepancy arises from the fact that methods employed to select the ridge parameter are themselves sensitive to sampling errors. It is plausible that incorporating the prior belief that regression parameters are “large scale” can reduce overfitting and result in improved performance relative to the multimodel mean. Despite this, results from the multimodel mean demonstrate that seasonal mean 2-m temperature is predictable for at least three months in several regions.

Corresponding author address: Timothy DelSole, Center for Ocean–Land–Atmosphere Studies, 4041 Powder Mill Rd., Suite 302, Calverton, MD 20705-3106. Email: delsole@cola.iges.org

Save
  • Barnston, A. G., and T. M. Smith, 1996: Specification and prediction of global surface temperature and precipitation from global SST using CCA. J. Climate, 9 , 26602697.

    • Search Google Scholar
    • Export Citation
  • Barnston, A. G., S. J. Mason, L. Goddard, D. G. DeWitt, and S. E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI. Bull. Amer. Meteor. Soc., 84 , 17831796.

    • Search Google Scholar
    • Export Citation
  • Box, G. E. P., and G. C. Tiao, 1973: Bayesian Inference in Statistical Analysis. Addison-Wesley, 588 pp.

  • Clemen, R. T., 1989: Combining forecasts: A review and annotated bibliography. Int. J. Forecasting, 5 , 559583.

  • DelSole, T., 2005: Predictability and information theory. Part II: Imperfect forecasts. J. Atmos. Sci., 62 , 33683381.

  • DelSole, T., and J. Shukla, 2006: Specification of wintertime North American surface temperature. J. Climate, 19 , 26912716.

  • Doblas-Reyes, F. J., R. Hagedorn, and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting. Part II: Calibration and combination. Tellus, 57A , 234252.

    • Search Google Scholar
    • Export Citation
  • Draper, N. R., and R. C. Van Nostrand, 1979: Ridge regression and James Stein estimation: Review and comments. Technometrics, 21 , 451466.

    • Search Google Scholar
    • Export Citation
  • Draper, N. R., and H. Smith, 1998: Applied Regression Analysis. 3d ed. John Wiley and Sons, 706 pp.

  • Efron, B., and R. J. Tibshirani, 1993: An Introduction to the Bootstrap. Chapman and Hall, 436 pp.

  • Goldstein, M., and A. F. M. Smith, 1974: Ridge-type estimators for regression analysis. J. Roy. Stat. Soc., 36B , 284291.

  • Golub, G. H., M. Heath, and G. Wahba, 1979: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21 , 215223.

    • Search Google Scholar
    • Export Citation
  • Hansen, P. C., 1998: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM Monogr. on Mathematical Modeling and Computation, Society for Industrial and Applied Mathematics, 247 pp.

  • Hoerl, A. E., and R. W. Kennard, 1970: Ridge regression: Applications to non-orthogonal problems. Technometrics, 12 , 6982.

  • Horn, R. A., and C. R. Johnson, 1985: Matrix Analysis. Cambridge University Press, 561 pp.

  • Jaynes, E. T., 2003: Probability Theory: The Logic of Science. Cambridge University Press, 727 pp.

  • Jones, P. D., and A. Moberg, 2003: Hemispheric and large-scale surface air temperature variations: An extensive revision and an update to 2001. J. Climate, 16 , 206223.

    • Search Google Scholar
    • Export Citation
  • Kharin, V. V., and F. W. Zwiers, 2002: Climate predictions with multimodel ensembles. J. Climate, 15 , 793799.

  • Krishnamurti, T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, Z. Zhang, C. E. Williford, S. Gadgil, and S. Surendran, 1999: Improved weather and seasonal climate forecasts from multimodel superensemble. Science, 285 , 15481550.

    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., C. M. Kishtawal, Z. Zhang, T. E. LaRow, D. R. Bachiochi, C. E. Williford, S. Gadgil, and S. Surendran, 2000: Multimodel ensemble forecasts for weather and seasonal climate. J. Climate, 13 , 41964216.

    • Search Google Scholar
    • Export Citation
  • Krishnamurti, T. N., and Coauthors, 2001: Real-time multianalysis—multimodel superensemble forecasts of precipitation using TRMM and SSM/I products. Mon. Wea. Rev., 129 , 28612883.

    • Search Google Scholar
    • Export Citation
  • Lindley, D. V., and A. F. M. Smith, 1972: Bayes estimates for the linear model (with discussion). J. Roy. Stat. Soc., 34B , 141.

  • Michaelsen, J., 1987: Cross-validation in statistical climate forecast models. J. Climate Appl. Meteor., 26 , 15891600.

  • Palmer, T. N., and Coauthors, 2004: Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER). Bull. Amer. Meteor. Soc., 85 , 853872.

    • Search Google Scholar
    • Export Citation
  • Peng, P., A. Kumar, H. Van den Dool, and A. G. Barnston, 2002: An analysis of multimodel ensemble predictions for seasonal climate anomalies. J. Geophys. Res., 107 .4710, doi:10.1029/2002JD002712.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133 , 11551174.

    • Search Google Scholar
    • Export Citation
  • Rajagopalan, B., U. Lall, and S. E. Zebiak, 2002: Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Mon. Wea. Rev., 130 , 17921811.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., U. Lall, S. E. Zebiak, and L. Goddard, 2004: Improved combination of multiple atmospheric GCM ensembles for seasonal prediction. Mon. Wea. Rev., 132 , 27322744.

    • Search Google Scholar
    • Export Citation
  • Smith, G., and F. Campbell, 1980: A critique of some ridge regression methods. J. Amer. Stat. Assoc., 75 , 7486.

  • Stephenson, D. B., C. A. S. Coelho, F. J. Doblas-Reyes, and M. Balmaseda, 2005: Forecast assimilation: A unified framework for the combination of multi-model weather and climate predictions. Tellus, 57A , 253264.

    • Search Google Scholar
    • Export Citation
  • Stone, M., 1974: Cross-validatory choice and assessment of statistical predictions. J. Roy. Stat. Soc., 36A , 111147.

  • Van den Dool, H. M., and L. Rukhovets, 1994: On the weights for an ensemble-averaged 6–10-day forecast. Wea. Forecasting, 9 , 457465.

    • Search Google Scholar
    • Export Citation
  • Yun, W. T., L. Stefanova, and T. N. Krishnamurti, 2003: Improvement of the multimodel superensemble technique for seasonal forecasts. J. Climate, 16 , 38343840.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 465 97 6
PDF Downloads 326 77 1