• Jaynes, E. T., 1957: Information theory and statistical mechanics. Phys. Rev.,106, 620–630.

  • ——, 1986: Bayesian methods: General background. Maximum Entropy and Bayesian Methods in Applied Statistics, J. H. Justice, Ed., Cambridge University Press, 1–25.

  • Kullback, S., 1959: Information Theory and Statistics. John Wiley and Sons, 399 pp.

  • Krzysztofowicz, R., 1999: Probability for a period and its subperiods:Theoretical relations for forecasting. Mon. Wea. Rev.,127, 228–235.

  • Shannon, C. E., 1948: A mathematical theory of communication. Bell Syst. Tech. J.,27, 397–423, 623–656.

  • Urban, B., 1996: Retrieval of atmospheric thermodynamical parameters using satellite measurements with a maximum entropy method. Inverse Problems,12, 779–796.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 10 10 1
PDF Downloads 2 2 0

Comments on “Probability for a Period and Its Subperiods: Theoretical Relations for Forecasting”

View More View Less
  • 1 Alfred Wegener Institute for Polar and Marine Research, Bremerhaven, Germany, and St. Petersburg Branch, Shirshov Institute of Oceanology, St. Petersburg, Russia
© Get Permissions
Full access

Corresponding author address: Dr. Gennady A. Kivman, Alfred-Wegener Insitut, Columbustrasse, Postfach 12 01 61, 27515 Bremerhaven, Germany.

Email: gkivman@awi-bremerhaven.de

Corresponding author address: Dr. Gennady A. Kivman, Alfred-Wegener Insitut, Columbustrasse, Postfach 12 01 61, 27515 Bremerhaven, Germany.

Email: gkivman@awi-bremerhaven.de

In a recent paper Krzysztofowicz (1999, hereafter K99) has analyzed relations between probability π of occurrence of rain within a period T and marginal probabilities π1 and π2 of rain within nonoverlapping subperiods T1 and T2 such that T1T2 = T. The most nontrivial part of the study is devoted to estimating π in the case when π1 and π2 are known from a forecast and two approaches were put forward. The heart of them is estimating stochastic dependence between rains within subperiods.

As K99 pointed out, the probability theory imposes a restriction,
ππ1π2π1π2
on a range of possible values of π. If no other information on the unknown π is available, it is natural to consider all values within Δ as equally probable. Since this homogenous distribution does not point to the most probable value of π, it seems that the best estimate one can obtain is the mean. However, K99 makes another inference. Having the probability distribution of π, one can calculate probabilities of the sign of stochastic dependence,
sπ1π2π1π2π
between rains within subperiods and compare them. Then, it is suggested to choose the most likely sign and apply Bayes’ theorem to obtain the probability of π conditional on this choice. The mean π* with respect to this conditional probability is proposed to be used as an estimate of π. This estimator is claimed to be justified by a principle of Bayesian inference.

Of course, the probability of π conditional on any event of nonzero probability may be calculated. However, unless we have knowledge that this event has occurred, any inference drawn from that conditional probability has nothing in common with the Bayesian analysis, which is based on what actually did happen but not on what might have happen but did not (Jaynes 1986). Using another partitioning of Δ and such an inference procedure, one can get any point within Δ as an estimate of π.

In the other approach proposed in K99, a priori information on correlations between rain occurrence within T1 and T2 was employed. It is convenient for further discussion to introduce the following notation. Let p11, p22, p12, and p21 be probabilities of four elementary events: nonoccurrence and occurrence of rain within the whole period T, rain at T1 and no rain at T2, and, at last, no rain at T1 and rain at T2, correspondingly. Then
i1520-0493-128-8-3011-EQ3
K99 proposed to calculate the Pearson correlation coefficient α(π1, π2) between rains within the subperiods from historical data and observations of forecast probabilities πi and to choose π in such a way that the resulting distribution of pij would describe the same correlations.

When only partial information about the probability distribution of the elementary events is given, a natural and probably the only possible way of reconstructing the whole statistics is to invoke prior knowledge. In this sense, the second approach seems preferential because it mobilizes more information about the phenomenon of interest. However, following this scheme, one cannot use the whole dataset but only those meteorological observations for which a prediction has been made. Further, even working with this reduced dataset, the forecaster does not extract all information contained in the observations but only the correlation coefficient. Though the data provide us with sampling probabilities μij of occurrence of rain and no rain within T1 and T2, the approach proposed cannot accumulate these data and simply reject them.

The probabilities provided by historical records and forecasting are not and have to be not identical, otherwise there is no need in any weather prediction. Thus, combining these different types of information in a unique solution is a very delicate task and setting characteristics of the forecast statistics equal to those derived from observations is at least questionable. A point that was not recognized in K99 is fundamental difference between probabilities μij and pij. The former express our a priori knowledge, which comes before forecasting. The objective of weather prediction is to update that a priori information. Consequently, no property of μij can generally survive.

A procedure that enables one to take into account the total information contained in the meteorological data and forecasting in a consistent way and to solve much more general problems is the principle of maximum entropy (PME). Among other approaches, it has probably the longest and most spectacular history. The concept of entropy originated about a century ago within the context of a fundamental but specific physical problem; it has recently found widespread application in dynamical systems theory, ergodicity theory, inverse problems, communication theory, numerical analysis, theory of functions, decision theory etc., to say, to problems straying far from the classical thermodynamics. Probably, the application closest to weather forecasting was recently presented by Urban (1996).

All these were made possible only after recognizing, thanks to Shannon (1948), the fact that entropy has much wider content than had been thought. To be concise, Shannon proved that any measure of uncertainty in a discrete probability distribution pk, k = 1, . . . , n, is proportional to the Gibbs entropy,
i1520-0493-128-8-3011-EQ6
if that measure is (i) a continuous function of probabilities, (ii) a decreasing function of the number of possible events in the case when they are equally probable, and (iii) consistent, that is, if there is more than one way of evaluating out its value, every possible way yields the same result.
Kullback (1959) generalized (6) and introduced the cross-entropy
i1520-0493-128-8-3011-EQ7
Here μk is another set of probabilities. The cross-entropy (7) measures the relative uncertainty in pk with respect to μk and coincides with (6) for the uniform distribution μk.

The heart of the PME is choosing those pk’s that maximize (7) subject to certain constraints we wish them to satisfy. This approach provides us with a probability distribution that is as much uncertain as allowed by the constraints imposed. In other words, the maximum entropy distribution is free from any information additional to that which we use as input. It might be helpful to point out that the classical equilibrium statistical mechanics is completely deduced from the PME if Liouville’s measure is chosen as a prior probability distribution over the phase space. On the other hand, failures in certain predictions pointed not to failure of the PME but, as it was realized further, to inadequate prescription of the prior. The rise of quantum theory made it possible to resolve those difficulties by assigning a new prior over a deeper hypothesis space (Jaynes 1957).

Application of the PME to the problem considered in K99 is straightforward. To chose π, the forecaster maximizes (7) subject to (3), (4), and normalizability
i1520-0493-128-8-3011-EQ8
Here μij are prior probabilities derived from meteorological records. Since pij appear in (7) as arguments of the logarithmic function, nonnegativity of all pij is guaranteed.
In this particular example, pij may be expressed in terms of π:
i1520-0493-128-8-3011-EQ9
and (7) becomes
i1520-0493-128-8-3011-EQ10
Then, if there is an extreme point of H(π) inside Δ, it is a solution to the equation
i1520-0493-128-8-3011-EQ11
Equation (11) is reduced to a quadratic equation and the final expression for π depends on particular values of input parameters (πi, μik). After some algebra, one arrives at the following solution. At first, it can be seen that (11) has no solution if
Dμ12μ21dπ1π221π2μ11μ22
Here
dμ11μ22μ12μ21
In this case, (dH/) < 0 and the maximum value of H is reached when
ππ1π2
Next, if d = 0, then (11) has a unique solution:
ππ1π2π1π2
If D ≥ 0 and d ≠ 0, Eq. (11) has two solutions and only one of them
i1520-0493-128-8-3011-EQ16
may in principle belong to Δ. Whether it is the case or not depends on particular values of μij. If the right-hand side of (16) is greater (less) than the upper (lower) bound of Δ, then (dH/) > 0 when d > 0 (d < 0). Consequently, H is maximum at
ππ1π2
On the other side, if d > 0 (d < 0) and the right-hand side of (16) is less (greater) than the lower (upper) bound of Δ, (dH/) < 0 and H is maximum at
ππ1π2

A final point that seems worth noting is the meaning of the probabilities pij. These are better thought not as relative frequencies of rain or no rain at a particular period but as the forecaster’s degree of belief in those events. In this sense, pij are “subjective” quantities rather than attributes of underlying stochastic atmospheric processes. However, those pij that were calculated by means of the PME are the most “objective” among others since they contain only information we do have and nothing more. Any other choice of pij rests on an extremely insecure foundation built with the use of assumptions about the probability distribution that were not drawn from the data. On the other hand, if we have experimental proof that our predictions based on the PME are incorrect, it will give evidence that the inputs (πi, μik) are wrong. Modern physics has been evolving just in this way.

REFERENCES

  • Jaynes, E. T., 1957: Information theory and statistical mechanics. Phys. Rev.,106, 620–630.

  • ——, 1986: Bayesian methods: General background. Maximum Entropy and Bayesian Methods in Applied Statistics, J. H. Justice, Ed., Cambridge University Press, 1–25.

  • Kullback, S., 1959: Information Theory and Statistics. John Wiley and Sons, 399 pp.

  • Krzysztofowicz, R., 1999: Probability for a period and its subperiods:Theoretical relations for forecasting. Mon. Wea. Rev.,127, 228–235.

  • Shannon, C. E., 1948: A mathematical theory of communication. Bell Syst. Tech. J.,27, 397–423, 623–656.

  • Urban, B., 1996: Retrieval of atmospheric thermodynamical parameters using satellite measurements with a maximum entropy method. Inverse Problems,12, 779–796.

* Alfred Wegener Institute for Polar and Marine Research Contribution Number 1794.

Save