In a recent paper Krzysztofowicz (1999, hereafter K99) has analyzed relations between probability *π* of occurrence of rain within a period *T* and marginal probabilities *π*_{1} and *π*_{2} of rain within nonoverlapping subperiods *T*_{1} and *T*_{2} such that *T*_{1} ∪ *T*_{2} = *T.* The most nontrivial part of the study is devoted to estimating *π* in the case when *π*_{1} and *π*_{2} are known from a forecast and two approaches were put forward. The heart of them is estimating stochastic dependence between rains within subperiods.

*π*

*π*

_{1}

*π*

_{2}

*π*

_{1}

*π*

_{2}

*π.*If no other information on the unknown

*π*is available, it is natural to consider all values within Δ as equally probable. Since this homogenous distribution does not point to the most probable value of

*π,*it seems that the best estimate one can obtain is the mean. However, K99 makes another inference. Having the probability distribution of

*π,*one can calculate probabilities of the sign of stochastic dependence,

*s*

*π*

_{1}

*π*

_{2}

*π*

_{1}

*π*

_{2}

*π*

*π*conditional on this choice. The mean

*π** with respect to this conditional probability is proposed to be used as an estimate of

*π.*This estimator is claimed to be justified by a principle of Bayesian inference.

Of course, the probability of *π* conditional on any event of nonzero probability may be calculated. However, unless we have knowledge that this event has occurred, any inference drawn from that conditional probability has nothing in common with the Bayesian analysis, which is based on what actually did happen but not on what might have happen but did not (Jaynes 1986). Using another partitioning of Δ and such an inference procedure, one can get any point within Δ as an estimate of *π.*

*T*

_{1}and

*T*

_{2}was employed. It is convenient for further discussion to introduce the following notation. Let

*p*

_{11},

*p*

_{22},

*p*

_{12}, and

*p*

_{21}be probabilities of four elementary events: nonoccurrence and occurrence of rain within the whole period

*T,*rain at

*T*

_{1}and no rain at

*T*

_{2}, and, at last, no rain at

*T*

_{1}and rain at

*T*

_{2}, correspondingly. ThenK99 proposed to calculate the Pearson correlation coefficient

*α*(

*π*

_{1},

*π*

_{2}) between rains within the subperiods from historical data and observations of forecast probabilities

*π*

_{i}and to choose

*π*in such a way that the resulting distribution of

*p*

_{ij}would describe the same correlations.

When only partial information about the probability distribution of the elementary events is given, a natural and probably the only possible way of reconstructing the whole statistics is to invoke prior knowledge. In this sense, the second approach seems preferential because it mobilizes more information about the phenomenon of interest. However, following this scheme, one cannot use the whole dataset but only those meteorological observations for which a prediction has been made. Further, even working with this reduced dataset, the forecaster does not extract all information contained in the observations but only the correlation coefficient. Though the data provide us with sampling probabilities *μ*_{ij} of occurrence of rain and no rain within *T*_{1} and *T*_{2}, the approach proposed cannot accumulate these data and simply reject them.

The probabilities provided by historical records and forecasting are not and have to be not identical, otherwise there is no need in any weather prediction. Thus, combining these different types of information in a unique solution is a very delicate task and setting characteristics of the forecast statistics equal to those derived from observations is at least questionable. A point that was not recognized in K99 is fundamental difference between probabilities *μ*_{ij} and *p*_{ij}. The former express our a priori knowledge, which comes before forecasting. The objective of weather prediction is to update that a priori information. Consequently, no property of *μ*_{ij} can generally survive.

A procedure that enables one to take into account the total information contained in the meteorological data and forecasting in a consistent way and to solve much more general problems is the principle of maximum entropy (PME). Among other approaches, it has probably the longest and most spectacular history. The concept of entropy originated about a century ago within the context of a fundamental but specific physical problem; it has recently found widespread application in dynamical systems theory, ergodicity theory, inverse problems, communication theory, numerical analysis, theory of functions, decision theory etc., to say, to problems straying far from the classical thermodynamics. Probably, the application closest to weather forecasting was recently presented by Urban (1996).

*p*

_{k},

*k*= 1, . . . ,

*n*, is proportional to the Gibbs entropy,if that measure is (i) a continuous function of probabilities, (ii) a decreasing function of the number of possible events in the case when they are equally probable, and (iii) consistent, that is, if there is more than one way of evaluating out its value, every possible way yields the same result.

*μ*

_{k}is another set of probabilities. The cross-entropy (7) measures the relative uncertainty in

*p*

_{k}with respect to

*μ*

_{k}and coincides with (6) for the uniform distribution

*μ*

_{k}.

The heart of the PME is choosing those *p*_{k}’s that maximize (7) subject to certain constraints we wish them to satisfy. This approach provides us with a probability distribution that is as much uncertain as allowed by the constraints imposed. In other words, the maximum entropy distribution is free from any information additional to that which we use as input. It might be helpful to point out that the classical equilibrium statistical mechanics is completely deduced from the PME if Liouville’s measure is chosen as a prior probability distribution over the phase space. On the other hand, failures in certain predictions pointed not to failure of the PME but, as it was realized further, to inadequate prescription of the prior. The rise of quantum theory made it possible to resolve those difficulties by assigning a new prior over a deeper hypothesis space (Jaynes 1957).

*π,*the forecaster maximizes (7) subject to (3), (4), and normalizabilityHere

*μ*

_{ij}are prior probabilities derived from meteorological records. Since

*p*

_{ij}appear in (7) as arguments of the logarithmic function, nonnegativity of all

*p*

_{ij}is guaranteed.

*p*

_{ij}may be expressed in terms of

*π*:and (7) becomesThen, if there is an extreme point of

*H*(

*π*) inside Δ, it is a solution to the equationEquation (11) is reduced to a quadratic equation and the final expression for

*π*depends on particular values of input parameters (

*π*

_{i},

*μ*

_{ik}). After some algebra, one arrives at the following solution. At first, it can be seen that (11) has no solution if

*D*

*μ*

_{12}

*μ*

_{21}

*d*

*π*

_{1}

*π*

_{2}

^{2}

*dπ*

_{1}

*π*

_{2}

*μ*

_{11}

*μ*

_{22}

*d*

*μ*

_{11}

*μ*

_{22}

*μ*

_{12}

*μ*

_{21}

*dH*/

*dπ*) < 0 and the maximum value of

*H*is reached when

*π*

*π*

_{1}

*π*

_{2}

*d*= 0, then (11) has a unique solution:

*π*

*π*

_{1}

*π*

_{2}

*π*

_{1}

*π*

_{2}

*D*≥ 0 and

*d*≠ 0, Eq. (11) has two solutions and only one of themmay in principle belong to Δ. Whether it is the case or not depends on particular values of

*μ*

_{ij}. If the right-hand side of (16) is greater (less) than the upper (lower) bound of Δ, then (

*dH*/

*dπ*) > 0 when

*d*> 0 (

*d*< 0). Consequently,

*H*is maximum at

*π*

*π*

_{1}

*π*

_{2}

*d*> 0 (

*d*< 0) and the right-hand side of (16) is less (greater) than the lower (upper) bound of Δ, (

*dH*/

*dπ*) < 0 and

*H*is maximum at

*π*

*π*

_{1}

*π*

_{2}

A final point that seems worth noting is the meaning of the probabilities *p*_{ij}. These are better thought not as relative frequencies of rain or no rain at a particular period but as the forecaster’s degree of belief in those events. In this sense, *p*_{ij} are “subjective” quantities rather than attributes of underlying stochastic atmospheric processes. However, those *p*_{ij} that were calculated by means of the PME are the most “objective” among others since they contain only information we do have and nothing more. Any other choice of *p*_{ij} rests on an extremely insecure foundation built with the use of assumptions about the probability distribution that were not drawn from the data. On the other hand, if we have experimental proof that our predictions based on the PME are incorrect, it will give evidence that the inputs (*π*_{i}, *μ*_{ik}) are wrong. Modern physics has been evolving just in this way.

## REFERENCES

Jaynes, E. T., 1957: Information theory and statistical mechanics.

*Phys. Rev.,***106,**620–630.——, 1986: Bayesian methods: General background.

*Maximum Entropy and Bayesian Methods in Applied Statistics,*J. H. Justice, Ed., Cambridge University Press, 1–25.Kullback, S., 1959:

*Information Theory and Statistics.*John Wiley and Sons, 399 pp.Krzysztofowicz, R., 1999: Probability for a period and its subperiods:Theoretical relations for forecasting.

*Mon. Wea. Rev.,***127,**228–235.Shannon, C. E., 1948: A mathematical theory of communication.

*Bell Syst. Tech. J.,***27,**397–423, 623–656.Urban, B., 1996: Retrieval of atmospheric thermodynamical parameters using satellite measurements with a maximum entropy method.

*Inverse Problems,***12,**779–796.

^{}

* Alfred Wegener Institute for Polar and Marine Research Contribution Number 1794.