Abstract
This paper presents a framework based on Bayesian regression and constrained least squares methods for incorporating prior beliefs in a linear regression problem. Prior beliefs are essential in regression theory when the number of predictors is not a small fraction of the sample size, a situation that leads to overfitting—that is, to fitting variability due to sampling errors. Under suitable assumptions, both the Bayesian estimate and the constrained least squares solution reduce to standard ridge regression. New generalizations of ridge regression based on priors relevant to multimodel combinations also are presented. In all cases, the strength of the prior is measured by a parameter called the ridge parameter. A “two-deep” cross-validation procedure is used to select the optimal ridge parameter and estimate the prediction error.
The proposed regression estimates are tested on the Development of a European Multimodel Ensemble System for Seasonal to Interannual Prediction (DEMETER) hindcasts of seasonal mean 2-m temperature over land. Surprisingly, none of the regression models proposed here can consistently beat the skill of a simple multimodel mean, despite the fact that one of the regression models recovers the multimodel mean in a suitable limit. This discrepancy arises from the fact that methods employed to select the ridge parameter are themselves sensitive to sampling errors. It is plausible that incorporating the prior belief that regression parameters are “large scale” can reduce overfitting and result in improved performance relative to the multimodel mean. Despite this, results from the multimodel mean demonstrate that seasonal mean 2-m temperature is predictable for at least three months in several regions.
Corresponding author address: Timothy DelSole, Center for Ocean–Land–Atmosphere Studies, 4041 Powder Mill Rd., Suite 302, Calverton, MD 20705-3106. Email: delsole@cola.iges.org