Abstract
The concept of sufficiency, originally introduced in the context of the comparison of statistical experiments, has recently been shown to provide a coherent basis for comparative evaluation of forecasting systems. Specifically, forecasting system A is said to be sufficient for forecasting system B if B's forecasts can be obtained from A's forecasts by a stochastic transformation. The sufficiency of A's forecasts for B's forecasts implies that the former are of higher quality than the latter and that all users will find A's forecasts of greater value than B's forecasts. However, it is not always possible to establish that system A is sufficient for system B or vice versa. This paper examines the concept of sufficiency in the context of comparative evaluation of simple probabilistic weather forecasting systems and investigates its interpretations and implications from perspectives provided by a recently developed general framework for forecast verification.
It is shown here that if system A is sufficient for system B, then the basic performance characteristics of the two systems are related via sets of inequalities and A's forecasts are necessarily more accurate than B's forecasts. Conversely, knowledge of a complete set of performance characteristics makes it possible to infer whether A is sufficient for B, B is sufficient for A, or the two systems are insufficient for each other. In general, however, information regarding only relative accuracy, as measured by a performance measure such as the mean square error, will not be adequate to determine the presence or absence of sufficiency, except in situations in which the accuracy of the system of interest exceeds some relatively high critical value. These results, illustrated by means of numerical examples, suggest that comparative evaluation of weather forecasting systems should be based on fundamental performance characteristics rather than on overall performance measures.
Possible extensions of these results to situations involving more general forecasting systems, as well as some implications of the results for verification procedures and practices in meteorology, are briefly discussed.