In a recent paper Krzysztofowicz (1999, hereafter K99) has analyzed relations between probability π of occurrence of rain within a period T and marginal probabilities π1 and π2 of rain within nonoverlapping subperiods T1 and T2 such that T1 ∪ T2 = T. The most nontrivial part of the study is devoted to estimating π in the case when π1 and π2 are known from a forecast and two approaches were put forward. The heart of them is estimating stochastic dependence between rains within subperiods.
Of course, the probability of π conditional on any event of nonzero probability may be calculated. However, unless we have knowledge that this event has occurred, any inference drawn from that conditional probability has nothing in common with the Bayesian analysis, which is based on what actually did happen but not on what might have happen but did not (Jaynes 1986). Using another partitioning of Δ and such an inference procedure, one can get any point within Δ as an estimate of π.
When only partial information about the probability distribution of the elementary events is given, a natural and probably the only possible way of reconstructing the whole statistics is to invoke prior knowledge. In this sense, the second approach seems preferential because it mobilizes more information about the phenomenon of interest. However, following this scheme, one cannot use the whole dataset but only those meteorological observations for which a prediction has been made. Further, even working with this reduced dataset, the forecaster does not extract all information contained in the observations but only the correlation coefficient. Though the data provide us with sampling probabilities μij of occurrence of rain and no rain within T1 and T2, the approach proposed cannot accumulate these data and simply reject them.
The probabilities provided by historical records and forecasting are not and have to be not identical, otherwise there is no need in any weather prediction. Thus, combining these different types of information in a unique solution is a very delicate task and setting characteristics of the forecast statistics equal to those derived from observations is at least questionable. A point that was not recognized in K99 is fundamental difference between probabilities μij and pij. The former express our a priori knowledge, which comes before forecasting. The objective of weather prediction is to update that a priori information. Consequently, no property of μij can generally survive.
A procedure that enables one to take into account the total information contained in the meteorological data and forecasting in a consistent way and to solve much more general problems is the principle of maximum entropy (PME). Among other approaches, it has probably the longest and most spectacular history. The concept of entropy originated about a century ago within the context of a fundamental but specific physical problem; it has recently found widespread application in dynamical systems theory, ergodicity theory, inverse problems, communication theory, numerical analysis, theory of functions, decision theory etc., to say, to problems straying far from the classical thermodynamics. Probably, the application closest to weather forecasting was recently presented by Urban (1996).
The heart of the PME is choosing those pk’s that maximize (7) subject to certain constraints we wish them to satisfy. This approach provides us with a probability distribution that is as much uncertain as allowed by the constraints imposed. In other words, the maximum entropy distribution is free from any information additional to that which we use as input. It might be helpful to point out that the classical equilibrium statistical mechanics is completely deduced from the PME if Liouville’s measure is chosen as a prior probability distribution over the phase space. On the other hand, failures in certain predictions pointed not to failure of the PME but, as it was realized further, to inadequate prescription of the prior. The rise of quantum theory made it possible to resolve those difficulties by assigning a new prior over a deeper hypothesis space (Jaynes 1957).
A final point that seems worth noting is the meaning of the probabilities pij. These are better thought not as relative frequencies of rain or no rain at a particular period but as the forecaster’s degree of belief in those events. In this sense, pij are “subjective” quantities rather than attributes of underlying stochastic atmospheric processes. However, those pij that were calculated by means of the PME are the most “objective” among others since they contain only information we do have and nothing more. Any other choice of pij rests on an extremely insecure foundation built with the use of assumptions about the probability distribution that were not drawn from the data. On the other hand, if we have experimental proof that our predictions based on the PME are incorrect, it will give evidence that the inputs (πi, μik) are wrong. Modern physics has been evolving just in this way.
REFERENCES
Jaynes, E. T., 1957: Information theory and statistical mechanics. Phys. Rev.,106, 620–630.
——, 1986: Bayesian methods: General background. Maximum Entropy and Bayesian Methods in Applied Statistics, J. H. Justice, Ed., Cambridge University Press, 1–25.
Kullback, S., 1959: Information Theory and Statistics. John Wiley and Sons, 399 pp.
Krzysztofowicz, R., 1999: Probability for a period and its subperiods:Theoretical relations for forecasting. Mon. Wea. Rev.,127, 228–235.
Shannon, C. E., 1948: A mathematical theory of communication. Bell Syst. Tech. J.,27, 397–423, 623–656.
Urban, B., 1996: Retrieval of atmospheric thermodynamical parameters using satellite measurements with a maximum entropy method. Inverse Problems,12, 779–796.