Cross-Validation in Statistical Climate Forecast Models

Joel Michaelsen Department of Geography, University of California, Santa Barbara, CA 93106

Search for other papers by Joel Michaelsen in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Cross-validation is a statistical procedure that produces an estimate of forecast skill which is less biased than the usual hindcast skill estimates. The cross-validation method systematically deletes one or more cases in a dataset, derives a forecast model from the remaining cases, and tests it on the deleted case or cases. The procedure is nonparametric and can be applied to any automated model building technique. It can also provide important diagnostic information about influential cases in the dataset and the stability of the model. Two experiments were conducted using cross-validation to estimate forecast skill in different predictive models of North Pacific sea surface temperatures (SSTs). The results indicate that bias, or artificial predictability (defined here as the difference between the usual hindcast skill and the forecast skill estimated by cross-validation), increases with each decision—either screening of potential predictors or fixing the value of a coefficient—drawn from the data. Bias introduced by variable screening depends on the size of the pool of potential predictors, while bias produced by fitting coefficients depends on the number of coefficients. The results also indicate that winter SSTs are predictable with a skill of about 20%–25%. Several models were compared. More flexible ones which allow the data to guide the selection of variables generally show poorer skill than the relatively inflexible models where a priori variable selection is used. The cross-validation estimates of artificial skill were compared with estimates derived from other methods. Davis and Chelton's method showed close agreement with the cross-validation results for a priori models. However, Monte Carlo estimates and cross-validation estimates do not agree well in the case of predictor screening models. The results of this study indicate that the amount of artificial skill depends on the amount of true skill, so Monte Carlo techniques which assume no true skill cannot be expected to perform well when there is some true skill.

Abstract

Cross-validation is a statistical procedure that produces an estimate of forecast skill which is less biased than the usual hindcast skill estimates. The cross-validation method systematically deletes one or more cases in a dataset, derives a forecast model from the remaining cases, and tests it on the deleted case or cases. The procedure is nonparametric and can be applied to any automated model building technique. It can also provide important diagnostic information about influential cases in the dataset and the stability of the model. Two experiments were conducted using cross-validation to estimate forecast skill in different predictive models of North Pacific sea surface temperatures (SSTs). The results indicate that bias, or artificial predictability (defined here as the difference between the usual hindcast skill and the forecast skill estimated by cross-validation), increases with each decision—either screening of potential predictors or fixing the value of a coefficient—drawn from the data. Bias introduced by variable screening depends on the size of the pool of potential predictors, while bias produced by fitting coefficients depends on the number of coefficients. The results also indicate that winter SSTs are predictable with a skill of about 20%–25%. Several models were compared. More flexible ones which allow the data to guide the selection of variables generally show poorer skill than the relatively inflexible models where a priori variable selection is used. The cross-validation estimates of artificial skill were compared with estimates derived from other methods. Davis and Chelton's method showed close agreement with the cross-validation results for a priori models. However, Monte Carlo estimates and cross-validation estimates do not agree well in the case of predictor screening models. The results of this study indicate that the amount of artificial skill depends on the amount of true skill, so Monte Carlo techniques which assume no true skill cannot be expected to perform well when there is some true skill.

Save