The Evaluation of Yes/No Forecasts for Scientific and Administrative Purposes

Frank Woodcock Bureau of Meteorology, Melbourne, Victoria, Australia

Search for other papers by Frank Woodcock in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The basis upon which skill scores for evaluating yes/no categorical forecasts for scientific and administrative purposes depends is, discussed and many of the common discriminants (formulas from which skill scores are derived) are reviewed and compared. The common process of subjecting forecasts to a trial consisting of a mixture of event and non-event occasions is outlined.

Those discriminants which prove to be measures of a forecasting technique's skill are shown, with the exception of Hanssen and Kuipers’ (1965) discriminant, to give skill scores which depend upon the mixture of events and non-events in the trial. All these discriminants give incompatible rankings of forecasts because they are based on different standards of skill. It is shown that this discrepancy is resolved by ensuring that the trials under which forecasts are compared have equal numbers of event and non-event occasions; under these conditions, rankings become compatible. Hanssen and Kuipers' discriminant is shown to give the best estimate on an “unequal“ trial to that expected if equalization were to be enforced. Hence, it is argued that Hanssen and Kuipers' discriminant is universally acceptable for evaluating yes/no forecasts for scientific and administrative purposes. Finally, the variance of Hanssen and Kuipers’ discriminant is given to enable the statistical significance of the difference between two scores to be assessed and thereby make comparisons between techniques more meaningful.

Abstract

The basis upon which skill scores for evaluating yes/no categorical forecasts for scientific and administrative purposes depends is, discussed and many of the common discriminants (formulas from which skill scores are derived) are reviewed and compared. The common process of subjecting forecasts to a trial consisting of a mixture of event and non-event occasions is outlined.

Those discriminants which prove to be measures of a forecasting technique's skill are shown, with the exception of Hanssen and Kuipers’ (1965) discriminant, to give skill scores which depend upon the mixture of events and non-events in the trial. All these discriminants give incompatible rankings of forecasts because they are based on different standards of skill. It is shown that this discrepancy is resolved by ensuring that the trials under which forecasts are compared have equal numbers of event and non-event occasions; under these conditions, rankings become compatible. Hanssen and Kuipers' discriminant is shown to give the best estimate on an “unequal“ trial to that expected if equalization were to be enforced. Hence, it is argued that Hanssen and Kuipers' discriminant is universally acceptable for evaluating yes/no forecasts for scientific and administrative purposes. Finally, the variance of Hanssen and Kuipers’ discriminant is given to enable the statistical significance of the difference between two scores to be assessed and thereby make comparisons between techniques more meaningful.

Save