Correspondence among the Correlation, RMSE, and Heidke Forecast Verification Measures; Refinement of the Heidke Score

Anthony G. Barnston Climate Analysis Center, NMC/NWS/NOAA, Washington, D.C.

Search for other papers by Anthony G. Barnston in
Current site
Google Scholar
PubMed
Close
Restricted access

Abstract

The correspondence among the following three forecast verification scores, based on forecasts and their associated observations, is described: 1) the correlation score, 2) the root-mean-square error (RMSE) score, and 3) the Heidke score (based on categorical matches between forecasts and observations). These relationships are provided to facilitate comparisons among studies of forecast skill that use these differing measures.

The Heidke score would be more informative, more “honest,” and easier to interpret at face value if the severity of categorical errors (i.e., one-class errors versus two-class errors, etc.) were included in the scoring formula. Without taking categorical error severity into account the meaning of Heidke scores depends heavily on the categorical definitions (particularly the number of categories), making intercomparison between Heidke and correlation (or RMSE) scores, or even among Heidke scores, quite difficult.

When categorical error severity is taken into account in the Heidke score, its correspondence with other verification measures more closely approximates that of more sophisticated scoring systems such as the experimental LEPS score.

Abstract

The correspondence among the following three forecast verification scores, based on forecasts and their associated observations, is described: 1) the correlation score, 2) the root-mean-square error (RMSE) score, and 3) the Heidke score (based on categorical matches between forecasts and observations). These relationships are provided to facilitate comparisons among studies of forecast skill that use these differing measures.

The Heidke score would be more informative, more “honest,” and easier to interpret at face value if the severity of categorical errors (i.e., one-class errors versus two-class errors, etc.) were included in the scoring formula. Without taking categorical error severity into account the meaning of Heidke scores depends heavily on the categorical definitions (particularly the number of categories), making intercomparison between Heidke and correlation (or RMSE) scores, or even among Heidke scores, quite difficult.

When categorical error severity is taken into account in the Heidke score, its correspondence with other verification measures more closely approximates that of more sophisticated scoring systems such as the experimental LEPS score.

Save