A Degeneracy in Cross-Validated Skill in Regression-based Forecasts

View More View Less
  • 1 NWS/NMC/Climate Analysis Center, Washington, D.C.
© Get Permissions Rent on DeepDyve
Restricted access

Abstract

Highly negative skill scores may occur in regression-based experimental forecast trials in which the data being forecast are withheld in turn from a fixed sample, and the remaining data are used to develop regression relationships-that is, exhaustive cross-validation methods. A small negative bias in skill is amplified when forecasts are verified using the correlation between forecasts and actual data. The same outcome occurs when forecasts are amplitude-inflated in conversion to a categorical system and scored in a “number of hits” framework. The effect becomes severe when predictor-predictand relationships are weak, as is often the case in climate prediction. Some basic characteristics of this degeneracy are explored for regression-based cross-validation.

Simulations using both randomized and designed datasets indicate that the correlation skill score degeneracy becomes important when nearly all of the available sample is used to develop forecast equations for the remaining (very few) points, and when the predictability in the full dependent sample falls short of the conventional requirement for statistical significance for the sample size. The undesirable effects can be reduced with one of the following methodological adjustments: 1) excluding more than a very small portion of the sample from the development group for each cross-validation forecast trial or 2) redefining the “total available sample” within one cross-validation exercise. A more complete elimination of the effects is achieved by 1) downward adjusting the magnitude of negative correlation skills in proportion to forecast amplitude, 2) regarding negative correlation skills as zero, or 3) using a forecast verification measure other than correlation such as root-mean-square error.

When the correlation skill score degeneracy is acknowledged and treated appropriately, cross-validation remains an effective and valid technique for estimating predictive skill for independent data.

Abstract

Highly negative skill scores may occur in regression-based experimental forecast trials in which the data being forecast are withheld in turn from a fixed sample, and the remaining data are used to develop regression relationships-that is, exhaustive cross-validation methods. A small negative bias in skill is amplified when forecasts are verified using the correlation between forecasts and actual data. The same outcome occurs when forecasts are amplitude-inflated in conversion to a categorical system and scored in a “number of hits” framework. The effect becomes severe when predictor-predictand relationships are weak, as is often the case in climate prediction. Some basic characteristics of this degeneracy are explored for regression-based cross-validation.

Simulations using both randomized and designed datasets indicate that the correlation skill score degeneracy becomes important when nearly all of the available sample is used to develop forecast equations for the remaining (very few) points, and when the predictability in the full dependent sample falls short of the conventional requirement for statistical significance for the sample size. The undesirable effects can be reduced with one of the following methodological adjustments: 1) excluding more than a very small portion of the sample from the development group for each cross-validation forecast trial or 2) redefining the “total available sample” within one cross-validation exercise. A more complete elimination of the effects is achieved by 1) downward adjusting the magnitude of negative correlation skills in proportion to forecast amplitude, 2) regarding negative correlation skills as zero, or 3) using a forecast verification measure other than correlation such as root-mean-square error.

When the correlation skill score degeneracy is acknowledged and treated appropriately, cross-validation remains an effective and valid technique for estimating predictive skill for independent data.

Save