• Brown, B. G., , R. W. Katz, , and A. H. Murphy, 1984: Time series models to simulate and forecast wind speed and wind power. J. Climate Appl. Meteor., 23 , 11841195.

    • Search Google Scholar
    • Export Citation
  • Campbell, S. D., , and F. X. Diebold, 2005: Weather forecasting for weather derivatives. J. Amer. Stat. Assoc., 100 , 616.

  • Gneiting, T., , K. Larson, , K. Westrick, , M. G. Genton, , and E. Aldrich, 2006: Calibrated probabilistic forecasting at the stateline wind energy center: The regime-switching space-time method. J. Amer. Stat. Assoc., 101 , 968979.

    • Search Google Scholar
    • Export Citation
  • Kelly, K. S., , and R. Krzysztofowicz, 1995: Bayesian revision of an arbitrary prior density. Proc. Section on Bayesian Statistical Science, Alexandria, VA, American Statistical Association, 50–53.

    • Search Google Scholar
    • Export Citation
  • Kelly, K. S., , and R. Krzysztofowicz, 1997: A bivariate meta-Gaussian density for use in hydrology. Stochastic Hydrol. Hydraul., 11 , 1731.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., 1983: A Bayesian Markov model of the flood forecast process. Water Resour. Res., 19 , 14551465.

  • Krzysztofowicz, R., 1985: Bayesian models of forecasted time series. J. Water Resour. Bull., 21 , 805814.

  • Krzysztofowicz, R., 1992: Bayesian correlation score: A utilitarian measure of forecast skill. Mon. Wea. Rev., 120 , 208219.

  • Krzysztofowicz, R., , and A. A. Sigrest, 1999: Calibration of probabilistic quantitative precipitation forecasts. Wea. Forecasting, 14 , 427442.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., , and K. S. Kelly, 2000: Hydrologic uncertainty processor for probabilistic river stage forecasting. Water Resour. Res., 36 , 32653277.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., , and H. D. Herr, 2001: Hydrologic uncertainty processor for probabilistic river stage forecasting: Precipitation-dependent model. J. Hydrol., 249 , 4668.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., , and W. B. Evans, 2008: Probabilistic forecasts from the National Digital Forecast Database. Wea. Forecasting, 23 , 270289.

    • Search Google Scholar
    • Export Citation
  • Lund, R., , Q. Shao, , and I. Basawa, 2006: Parsimonious periodic time series modeling. Aust. N. Z. J. Stat., 48 , 3347.

  • Murphy, A. H., , and R. W. Katz, 1985: Probability, Statistics, and Decision Making in the Atmospheric Sciences. Westview Press, 545 pp.

  • Peroutka, M. R., , G. J. Zylstra, , and J. L. Wagner, 2005: Assessing forecast uncertainty in the National Digital Forecast Database. Preprints, 21st Conf. on Weather Analysis and Forecasting, Washington, DC, Amer. Meteor. Soc., P2B.3. [Available online at http://ams.confex.com/ams/pdfpapers/94464.pdf.].

  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. Academic Press, 467 pp.

  • View in gallery

    Dependence structures (prior, likelihood, and posterior) in the BPF for a Markov predictand process and the forecast lead time of 2 days.

  • View in gallery

    Dependence structure of the prior 1-step transition distribution function of the daily maximum temperature: the meta-Gaussian regression with the 80% central credible interval in the standardized sample space for days (a) k = 32 and (b) k = 214; the linear regression with the 80% central credible interval in the normal sample space for days (c) k = 32 and (d) k = 214 at Savannah.

  • View in gallery

    Validation of the meta-Gaussian dependence structure for the Markov model of the daily maximum temperature: homoscedasticity of dependence for days (a) k = 32 and (b) k = 214; normality of residuals for days (c) k = 32 and (d) k = 214 at Savannah. [A high quantile–quantile correlation suggests approximate normality of residuals even though the null hypothesis is rejected by the Shapiro–Francia test in each case at p < 0.003 because of the sensitivity of the test for large sample sizes, here 595. The pattern in (b) and the tails in (d) are the artifacts of the precision of measurement.]

  • View in gallery

    Time series of the sample autocorrelation coefficient (in the normal sample space) of the daily maximum temperature, the envelope, and the fitted fourth-order Fourier series expansion at Savannah.

  • View in gallery

    Prior marginal distribution function Gk and prior l-step transition distribution functions Hkl conditional on antecedent observations Wkl = wkl for two lead times: (a) 24 h (l = 1) for 1 Feb (k = 32) and (b) 96 h (l = 4) for 4 Feb (k = 35). Also shown are density functions corresponding to the distribution functions (c) from (a) and (d) from (b). This figure is for the cool season at Savannah.

  • View in gallery

    Example of the Markov prior distribution having insignificant effect: (a) two posterior distribution functions resulting from the marginal prior distribution function and (b) two posterior distribution functions resulting from the Markov prior distribution function. Also shown are density functions corresponding to the distribution functions (c) from (a) and (d) from (b). This figure is for 24-h lead time and cool season at Savannah.

  • View in gallery

    Example of the Markov prior distribution having significant effect: (a) two posterior distribution functions resulting from the marginal prior distribution function and (b) two posterior distribution functions resulting from the Markov prior distribution function. Also shown are density functions corresponding to the distribution functions (c) from (a) and (d) from (b). This figure is for 96-h lead time and warm season at Savannah.

  • View in gallery

    The width of the posterior 50% CCI, resulting from the Markov prior distribution, as a function of informativeness score γ, autocorrelation coefficient c, and lead time l, when the predictors (X, W0) take on (a)–(c) “confirmatory” values and (d)–(f) “contradictory” values.

  • View in gallery

    The posterior median, w0.5, and the 50% CCI, (w0.25, w0.75), resulting from the Markov prior distribution, as a function of informativeness score γ, autocorrelation coefficient c, and lead time (l = 1 day, solid line; l = 7 days, broken line), when the predictors take on “contradictory” values: X = 78°F, W0 = 46°F.

  • View in gallery

    The percent reduction in the width of the 50% CCI using the Markov BPF instead of the climatic model for forecasting with lead time of 1 day, given predictor values X = 60°F and W0 = 64°F.

  • View in gallery

    The percent reduction in the width of the 50% CCI using the Markov BPF instead of the Markov climatic model for forecasting with lead time of 1 day, given predictor values X = 60°F and W0 = 64°F.

  • View in gallery

    The percent reduction in the width of the 50% CCI using the Markov BPF instead of the BPF for forecasting with lead time of 1 day, given predictor values X = 60°F and W0 = 64°F.

  • View in gallery

    The percent reduction in the width of the 50% CCI using the Markov BPF instead of the BPF for forecasting with lead time of 1 day, given two diametrical vectors of “contradictory” predictor values.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 34 34 3
PDF Downloads 21 21 1

The Role of Climatic Autocorrelation in Probabilistic Forecasting

View More View Less
  • 1 University of Virginia, Charlottesville, Virginia
© Get Permissions
Full access

Abstract

A sequence of meteorological predictands of one kind (e.g., temperature) forms a discrete-time, continuous-state stochastic process, which typically is nonstationary and periodic (because of seasonality). Three contributions to the field of probabilistic forecasting of such processes are reported. First, a meta-Gaussian Markov model of the stochastic process is formulated, which provides a climatic probabilistic forecast with the lead time of l days in the form of a (prior) l-step transition distribution function. A measure of the temporal dependence of the process is the autocorrelation coefficient (which is nonstationary). Second, a Bayesian processor of forecast (BPF) is formulated, which fuses the climatic probabilistic forecast with an operational deterministic forecast produced by any system (e.g., a numerical weather prediction model, a human forecaster, a statistical postprocessor). A measure of the predictive performance of the system is the informativeness score (which may be nonstationary). The BPF outputs a probabilistic forecast in the form of a (posterior) l-step transition distribution function, which quantifies the uncertainty about the predictand that remains, given the antecedent observation and the deterministic forecast. The working of the Markov BPF is explained on probabilistic forecasts obtained from the official deterministic forecasts of the daily maximum temperature issued by the U.S. National Weather Service with the lead times of 1, 4, and 7 days. Third, a numerical experiment demonstrates how the degree of posterior uncertainty varies with the informativeness of the deterministic forecast and the autocorrelation of the predictand series. It is concluded that, depending upon the level of informativeness, the Markov BPF is a contender for operational implementation when a rank autocorrelation coefficient is between 0.3 and 0.6, and is the preferred processor when a rank autocorrelation coefficient exceeds 0.6. Thus, the climatic autocorrelation can play a significant role in quantifying, and ultimately in reducing, the meteorological forecast uncertainty.

Corresponding author address: Professor Roman Krzysztofowicz, University of Virginia, P.O. Box 400747, Charlottesville, VA 22904-4747. Email: rk@virginia.edu

Abstract

A sequence of meteorological predictands of one kind (e.g., temperature) forms a discrete-time, continuous-state stochastic process, which typically is nonstationary and periodic (because of seasonality). Three contributions to the field of probabilistic forecasting of such processes are reported. First, a meta-Gaussian Markov model of the stochastic process is formulated, which provides a climatic probabilistic forecast with the lead time of l days in the form of a (prior) l-step transition distribution function. A measure of the temporal dependence of the process is the autocorrelation coefficient (which is nonstationary). Second, a Bayesian processor of forecast (BPF) is formulated, which fuses the climatic probabilistic forecast with an operational deterministic forecast produced by any system (e.g., a numerical weather prediction model, a human forecaster, a statistical postprocessor). A measure of the predictive performance of the system is the informativeness score (which may be nonstationary). The BPF outputs a probabilistic forecast in the form of a (posterior) l-step transition distribution function, which quantifies the uncertainty about the predictand that remains, given the antecedent observation and the deterministic forecast. The working of the Markov BPF is explained on probabilistic forecasts obtained from the official deterministic forecasts of the daily maximum temperature issued by the U.S. National Weather Service with the lead times of 1, 4, and 7 days. Third, a numerical experiment demonstrates how the degree of posterior uncertainty varies with the informativeness of the deterministic forecast and the autocorrelation of the predictand series. It is concluded that, depending upon the level of informativeness, the Markov BPF is a contender for operational implementation when a rank autocorrelation coefficient is between 0.3 and 0.6, and is the preferred processor when a rank autocorrelation coefficient exceeds 0.6. Thus, the climatic autocorrelation can play a significant role in quantifying, and ultimately in reducing, the meteorological forecast uncertainty.

Corresponding author address: Professor Roman Krzysztofowicz, University of Virginia, P.O. Box 400747, Charlottesville, VA 22904-4747. Email: rk@virginia.edu

1. Introduction

a. Forecasting stochastic process

An element of sensible weather is typically forecasted and observed at predetermined times in the daily cycle. The associated sequence of predictands (variates whose realizations are forecasted) forms a discrete-time stochastic process, or a time series, which can be characterized by its marginal distribution functions and its temporal dependence structure (in particular, the autocorrelation structure). For a continuous element, such as temperature, humidity, or pressure, and a short time step, say 24 h or less, the temporal dependence may be strong enough to allow forecasting future realizations based on antecedent observations. This is, of course, well known and has been exploited in various time series models (e.g., Brown et al. 1984; Murphy and Katz 1985; Wilks 1995). Typically, such statistical models produce deterministic forecasts for a few steps ahead. A recent example is the periodic autoregressive model of the daily temperature time series by Lund et al. (2006).

Much less known in meteorology are (i) the concept of forecasting probabilistically a time series via a stochastic model (e.g., Campbell and Diebold 2005; Gneiting et al. 2006) and (ii) the concept of fusing a probabilistic forecast produced by a stochastic model with a deterministic forecast produced by a numerical weather prediction (NWP) model, or by a human forecaster in a National Weather Service (NWS) field office, or by a statistical postprocessor of the NWP model output, such as the model output statistics technique of the NWS.

b. Markov Bayesian processor of forecast

This article presents the Bayesian theory and the meta-Gaussian model for implementing both concepts. Toward this end, we extend the previously developed Bayesian processor of forecast (BPF) for the National Digital Forecast Database (NDFD). The purpose of that BPF (Krzysztofowicz and Evans 2008) is to process a deterministic forecast (a point estimate of the predictand) into a probabilistic forecast (a distribution function, a density function, and a quantile function). The quantification of uncertainty is accomplished via Bayes theorem, which extracts and fuses two kinds of information from two different sources: (i) Information about the natural variability of the predictand is extracted from a (relatively) long climatic sample of the predictand, and is quantified by a prior distribution function. (ii) Information about the predictive performance of the deterministic forecast system is extracted from a (relatively) short joint sample of the forecast and the predictand, and is quantified by a family of the likelihood functions.

The prior distribution function Gk of predictand Wk constitutes a climatic probabilistic forecast of Wk; in the application reported herein, Wk denotes the maximum temperature on day k of the year (k = 1, . . . , 365) at a given station. When the stochastic process {Wk: k = 1, . . . , 365} is not independent but Markov (of order one), it may be advantageous to formulate a Markov BPF in which both the prior distribution function and the family of the likelihood functions are conditioned on the antecedent observation Wkl = wkl, the last observation preceding the forecast time when Wk is forecasted with the lead time of l days. Thus, in the Markov BPF, the prior marginal distribution function Gk is replaced by the prior l-step transition distribution function Hkl(·|wkl), which constitutes a climatic Markov probabilistic forecast of Wk with lead time l. In particular, Hk1(·|wk−1) is the distribution function of Wk, conditional on the antecedent observation Wk−1 = wk−1; it is the general stochastic model of the Markov process.

c. Stochastic–deterministic model fusion

Three research questions arise: How to obtain a flexible yet simple parametric model for the family Hkl of the l-step transition distribution functions for all lead times l of interest when the stochastic process {Wk: k = l, . . . , 365} is Markov and nonstationary, as is typically the case in meteorology? How to incorporate this model into the BPF? What advantages can be expected from the Markov BPF, beyond the advantages of the simpler BPF? The last question may be rephrased in the context of weather forecasting: What advantages can be expected from fusing (i) a climatic probabilistic forecast produced by a stochastic model of the autocorrelated predictand time series and (ii) an operational deterministic forecast produced by a NWP model, or a human forecaster, or a statistical postprocessor?

d. Modeling approach

The Markov BPF is a natural extension of the BPF developed previously (Krzysztofowicz and Evans 2008) and represents a specialized application of the Bayesian theory of probabilistic forecasting formulated and tested for various time series over the past two decades (e.g., Krzysztofowicz 1983, 1985; Krzysztofowicz and Kelly 2000; Krzysztofowicz and Herr 2001). This Markov BPF is applicable to any continuous predictand. Herein, it is applied to quantify the uncertainty in a deterministic forecast of the daily maximum temperature—one of the predictands selected by the NWS for development of their technique (Peroutka et al. 2005). This deterministic forecast is the official forecast produced by a NWS field office and stored in the NDFD.

The article is organized as follows. Section 2 outlines the theoretic foundation of the Markov BPF, and defines its major components. Section 3 details the modeling and estimation of the first component: the Markov prior (climatic) distribution function (the first research question). Section 4 does the same for the second component: the family of the conditional likelihood functions. Section 5 gives the forecasting equations (the second research question), presents examples of probabilistic forecasts from the Markov BPF, and compares them with examples of probabilistic forecasts from the BPF. Section 6 reports a numerical experiment designed to uncover the role and the impact of the autocorrelation in the predictand time series on probabilistic forecasts (the third research question). Section 7 summarizes the advantages of the Markov BPF.

2. Bayesian processor

The Bayesian theory of probabilistic forecasting of time series (Krzysztofowicz 1985) provides the structure for modeling the stochastic dependence between the predictand, the predictor, and the antecedent. A schematic of this structure for forecasting a Markov process two steps ahead is shown in Fig. 1. The multivariate dependence structure is decomposed into the prior dependence structure and the likelihood dependence structure; the posterior dependence structure is derived from them.

a. Prior uncertainty

Let Wk denote the predictand on day k of the year (k = 1, . . . , 365). Herein, Wk is the maximum temperature on day k at a given station (or a grid point). Suppose the time series of the predictands {Wk: k = 1, . . . , 365} forms a nonstationary Markov process of order one. The climatic uncertainty (or the prior uncertainty) about such a process is fully characterized in terms of a sequence of families of 1-step transition density functions {rk(·|wk−1): all wk−1} for k = 1, . . . , 365, where rk(·|wk−1) is the density function of Wk, conditional on the antecedent observation Wk−1 = wk1. (The periodicity convention is assumed throughout the article, whereby k − 1 = 365 if k = 1; in general, kl = 365 + kl if kl < 1 for any integer l, 1 ≤ l ≤ 364. For simplicity, 29 February is excluded.)

The 1-step transition density function rk(·|wk−1) gives a climatic Markov forecast of Wk with the lead time of one day. To obtain a climatic Markov forecast of Wk with the lead time of l days, the density function hkl(·|wkl) of Wk, conditional on observation Wkl = wkl must be derived. It is, in fact, the l-step transition density function, which can be obtained recursively:
i1520-0493-136-12-4572-e1
and for l = 2, 3, . . . , L (L < 365),
i1520-0493-136-12-4572-e2
Given the antecedent observation Wkl = wkl, the function hkl(·|wkl) constitutes the Markov prior density function of Wk. It quantifies the uncertainty about the predictand Wk which exists after the observation wkl is collected but before a forecast is issued by the NWS.

b. Forecast uncertainty

Let Xkl denote a predictor of Wk, whose realization is available with the lead time of l days. Herein, the realization Xkl = xkl is a deterministic forecast of the maximum temperature Wk on day k issued by the NWS with the lead time of l days. The forecast uncertainty or, more specifically, the stochastic dependence between the predictor Xkl and the predictand Wk, is characterized by the family of conditional density functions {fkl(·|wk, wk−l): all wk, wkl}, where fkl(·|wk, wkl) is the density function of Xkl, conditional on the hypothesis that the realization of the predictand is Wk = wk, and given that the antecedent observation is Wk−l = wk−l. Then, for fixed realizations Xkl = xkl and Wk−l = wk−l, there exists a function fkl(xkl|·, wkl); it is called the conditional likelihood function of Wk. More generally, there exists a family of the conditional likelihood functions {fkl(xkl|·, wkl): all xkl, wkl}.

The family of the conditional likelihood functions fkl is doubly nonstationary because, in general, the performance of a deterministic forecast varies throughout the year (hence index k) and with the lead time (hence index l).

c. Bayesian revision

For any day k (k = 1, . . . , 365) and lead time l (l = 1, . . . , N), the Bayesian procedure for information fusion and revision of uncertainty involves two steps. First, the expected density function κkl(·|wkl) of predictor Xkl, conditional on an antecedent observation Wkl = wkl, is derived via the total probability law:
i1520-0493-136-12-4572-e3
Second, the posterior density function ϕkl(·|xkl, wkl) of predictand Wk, conditional on a deterministic forecast Xkl = xkl and an antecedent observation Wkl = wkl, is derived via Bayes theorem:
i1520-0493-136-12-4572-e4

The posterior density function constitutes the probabilistic forecast of the predictand Wk with the lead time of l days; it quantifies the uncertainty about Wk that remains after the NWS collects the antecedent observation Wkl = wkl and issues deterministic forecast Xkl = xkl.

The corresponding posterior distribution function Φkl(·|xkl, wkl) of predictand Wk is defined by
i1520-0493-136-12-4572-e5
The inverse function Φ−1kl(·|xkl, wkl) is called the posterior quantile function. For any number p, such that 0 < p < 1, the posterior p-probability quantile of predictand Wk is the quantity wkp such that Φkl(wkp|xkl, wkl) = p. Therefrom,
i1520-0493-136-12-4572-e6

Equations (1)(4) define the theoretic structure of the Markov BPF. Equations (4)(6) specify the three outputs, each of which constitutes the probabilistic forecast of Wk, given deterministic forecast xkl and antecedent observation wkl.

d. Bayesian meta-Gaussian model

The Markov BPF, being a natural extension of the BPF described by Krzysztofowicz and Evans (2008), will be implemented likewise—by adapting the meta-Gaussian model of Krzysztofowicz and Kelly (2000). This model has the structural properties that are quintessential for correct representation of meteorological processes: (i) Each element is allowed to be nonstationary. (ii) The predictand Wk and the predictor Xkl are allowed to have distribution functions of any form. (iii) The temporal dependence structure between Wk−1 and Wk is allowed to be nonlinear and heteroscedastic. (iv) The likelihood dependence structure between Xkl, Wk, and Wkl is pairwise and is allowed to be nonlinear and heteroscedastic (which is the case with most meteorological forecasts, especially for longer lead times).

e. Modeling and estimation

The remaining sections describe the modeling process, the estimation procedure, the goodness of fit to data, the statistical properties, and the practical advantages of the meta-Gaussian Markov BPF. In all illustrations, the predictand is the daily maximum temperature; the forecast lead times are 24 h (1 day), 96 h (4 days), and 168 h (7 days) after 0000 UTC; the forecast point is Savannah, Georgia.

3. Markov prior distribution function

The development of a meta-Gaussian model for the family of 1-step transition density functions involves two steps: (i) modeling the marginal distribution functions and (ii) modeling the dependence structure. The first step was performed by Krzysztofowicz and Evans (2008) and is summarized in the next section. The second step is detailed in the subsequent sections.

a. Marginal distribution functions

For each variate Wk (k = 1, . . . , 365), define the mean, the variance, and the marginal distribution function:
i1520-0493-136-12-4572-e7a
i1520-0493-136-12-4572-e7b
i1520-0493-136-12-4572-e7c
where E stands for expectation, P stands for probability, and w is any point in the sample space of Wk. The standardized maximum temperature,
i1520-0493-136-12-4572-e8
has stationary first two moments, E(W ′k) = 0 and Var(W ′k) = 1, as is well known, and an approximately stationary marginal distribution function G′, as shown by Krzysztofowicz and Evans (2008): for k = 1, . . . , 365 and at any point w,
i1520-0493-136-12-4572-e9
where G′(w′) = P(W ′kw′) at any point w′ in the standardized sample space. The estimation of mk, sk, and G′ is detailed in the previous article.

b. Meta-Gaussian dependence model

The objective is to characterize the dependence structure of the standardized stochastic process {W ′k: k = 1, . . . , 365}. In general, this dependence structure may be nonlinear and heteroscedastic, while the marginal distribution function G′ may be of any form. To allow for such general properties, we employ the meta-Gaussian model of Krzysztofowicz and Kelly (2000). At the heart of this model is the normal quantile transform (NQT):
i1520-0493-136-12-4572-e10
where Q is the standard normal distribution function and Q−1 is its inverse. The NQT guarantees that the marginal distribution of the transformed variate Vk is standard normal (Kelly and Krzysztofowicz 1995, 1997). Our modeling hypothesis is that the 1-step transition distribution from (Vk−1 = υk−1) to Vk is normal as well. This hypothesis has been validated empirically for the river stage process (Krzysztofowicz and Kelly 2000; Krzysztofowicz and Herr 2001), and will be tested herein for the temperature process.
The Gaussian Markov model of the transformed process {Vk: k = 1, . . . , 365} is
i1520-0493-136-12-4572-e11
where Θk is a variate stochastically independent of Vk−1 and normally distributed with mean zero and variance 1 − c2k, and ck is the Pearson’s product-moment autocorrelation coefficient:
i1520-0493-136-12-4572-e12
It follows that for any p such that 0 < p < 1, the p-probability quantile of Vk, conditional on observation Vk−1 = υk−1, is
i1520-0493-136-12-4572-e13
Thus for any p, the conditional quantile of Vk is a linear function of the antecedent observation υk−1. In particular, the conditional median is υk(0.5|υk−1) = ckυk−1, and is equal to the conditional mean E(Vk|Vk−1 = υk−1) = ckυk−1.
The meta-Gaussian Markov model for the p-probability quantile of W ′k, conditional on observation Wk−1′ = wk−1′, is obtained by embedding the NQT in Eq. (13):
i1520-0493-136-12-4572-e14
As is well known, under the Gaussian model, ck is a fully efficient measure of stochastic dependence between Vk1 and Vk. Under the meta-Gaussian model, ck remains a fully efficient measure of stochastic dependence between the standardized variates Wk−1′ and W ′k, as well as between the original variates Wk−1 and Wk (Kelly and Krzysztofowicz 1997); it can be transformed into the Spearman’s rank autocorrelation coefficient:
i1520-0493-136-12-4572-e15

c. Empirical analyses

1) Joint samples

Let {(wk1(n), wk(n)): n = 1, . . . , M} be the climatic joint sample of the maximum temperatures (Wk−1, Wk) on two consecutive days. Herein, it is an augmented sample in that for each day k data were pooled from the consecutive five days centered on day k. Thus, the record of 119 yr (from 1874 to 2001, with 9 yr missing) gave the sample size M = 119 × 5 = 595.

For each k and k − 1, every realization from the climatic joint sample is first standardized,
i1520-0493-136-12-4572-e16
and then processed through the NQT with the stationary marginal distribution function
i1520-0493-136-12-4572-e17
The realizations are reassembled to form the transformed climatic joint sample {(υk−1(n), υk(n)): n = 1, . . . , M} for each day k (k = 1, . . . , 365). This sample is used to estimate the Pearson’s product-moment correlation coefficient ck between the standard normal variates Vk−1 and Vk, from which the Spearman’s rank correlation coefficient ρk between the original variates Wk−1 and Wk is calculated according to Eq. (15).

2) Model validation

Validation of the meta-Gaussian dependence structure amounts to checking the three requirements of the Gaussian model (11):

  1. Linearity—the regression of Vk on Vk−1 must be linear.
  2. Homoscedasticity—the variance of the residual Θk = VkckVk−1 must be independent of Vk−1.
  3. Normality—the distribution function of Θk must be normal (Gaussian) with mean 0 and variance 1 − c2k.

These requirements can be validated graphically (as well as through formal hypothesis testing); the results are shown for k = 32 214.

Figure 2 shows the empirical dependence structure (the scatterplot of 595 points, some of which overlap) and the parametric dependence structure (the conditional quantile functions for p = 0.1, 0.5, 0.9). In the standardized sample space (Figs. 2a,b), the dependence structure is slightly nonlinear and heteroscedastic; the conditional median of W ′k plots as a slightly concave–convex function of wk−1′; and the width of the 80% central credible interval decreases with wk−1′. In the normal sample space (Figs. 2c,d), the dependence structure is linear (as the conditional median of Vk varies linearly with υk−1) and homoscedastic [as the width of the 80% central credible interval, υk(0.9|υk−1) − υk(0.1|υk−1), is constant with υk−1]. The homoscedasticity is validated in Figs. 3a,b: the scatter of residuals appears independent of the predictor value. The normality is validated in Figs. 3c,d: the quantile–quantile plot of the residuals is predominantly linear. (The grid pattern in Figs. 2b,d and its effect on Figs. 3b,d are the artifacts of the precision of measurement, 1°F, which appear when the observation variability is low and the sample size is large. The pattern affects especially the tails in Fig. 3d, where realizations are sparse.)

3) Nonstationary autocorrelation

The time series of the autocorrelation coefficients {ck: k = 1, . . . , 365} at Savannah (Fig. 4) reveals two properties. First, there is a relatively high variability of estimates from day to day; for operational forecasting, the time series can be smoothed and approximated by fourth-order Fourier series expansion, as shown in Fig. 4. Second, the fitted function and the envelope of estimates show that the autocorrelation of the daily maximum temperatures is moderate (between 0.5 and 0.75) and periodic, with two maxima (a higher in July and a lower in January) and two minima (a lower in April and a higher in October). In other words, the autocorrelation is the strongest in the middle of a season (warm, cold) and the weakest in the transition between the seasons.

The main implication for further modeling is that the process of the standardized daily maximum temperatures {W ′k: k = 1, . . . , 365} is nonstationary, even though it has a stationary marginal distribution function G′.

d. Structural assumptions

The analyses of the climatic samples reported in the preceding section and in the previous article (Krzysztofowicz and Evans 2008) support four assumptions upon which the meta-Gaussian model for the Markov prior distribution function will be built.

  1. The predictand time series {Wk: k = 1, . . . , 365} forms a nonstationary Markov process of order one. After the standardization, this process has distribution functions with the following properties.
  2. The marginal distribution function G′ is stationary.
  3. The family of 1-step transition distribution functions Rk is nonstationary.
  4. The family of 1-step transition distribution functions is locally stationary: between day kl and day k, every 1-step transition is governed by the same family of 1-step transition distribution functions Rk; this applies to every k ∈ {1, . . . , 365} and every l ∈ {1, . . . , L}, when L is small.

Assumption 1 formalizes the basic dependence structure present in daily time series of many meteorological variates, for example the temperature (Lund et al. 2006). Assumption 2 has empirical support for the daily maximum temperature (Krzysztofowicz and Evans 2008); in general, it should be viewed as an approximation that may be reasonable for operational forecasting. Assumption 3 recognizes the empirical evidence supplied by the time series of the autocorrelation coefficients (Fig. 4). Assumption 4 states an approximation. Its graphical interpretation is that the plot of the autocorrelation coefficients in Fig. 4 can be approximated stepwise, using the step width of L days. Given the day-to-day variability of ck within a relatively narrow envelope, it appears reasonable to assume that ck does not vary appreciably within L = 14 days.

e. Transition distribution functions

The implication of assumption 4 for modeling is this: For any k ∈ {1, . . . , 365}, given the 1-step autocorrelation coefficient Cor(Vk−1, Vk) = ck, the l-step autocorrelation coefficient is
i1520-0493-136-12-4572-e18
Thus, a single function, such as the Fourier series of ck in Fig. 4, is sufficient to model the autocorrelation coefficients for all lead times. This property is obviously convenient for operational forecasting, and it ensures monotonicity of the l-step autocorrelation: clk converges toward zero as l increases. For example, ck = 0.50 (the lower bound in Fig. 4) yields c4k = 0.06 and c7k = 0.01, whereas ck = 0.75 (the upper bound in Fig. 4) yields c4k = 0.32 and c7k = 0.13. Table 1 reports the ck values used later in the examples.
Heretofore, all elements have been defined from which the meta-Gaussian model of the nonstationary Markov process {Wk: k = 1, . . . , 365} is constructed (Krzysztofowicz and Kelly 2000). For any day k ∈ {1, . . . , 365} and any lead time l ∈ {1, . . . , L}, the prior l-step transition distribution function takes the form
i1520-0493-136-12-4572-e19
For any p such that 0 < p < 1, the prior p-probability quantile of Wk, conditional on observation Wkl = wkl, is
i1520-0493-136-12-4572-e20
The prior l-step transition density function takes the form
i1520-0493-136-12-4572-e21

Figure 5 shows two examples. For a short lead time (l = 1 in Figs. 5a,c), the prior 1-step transition density functions are shifted relative to the prior marginal density function; the shift direction and magnitude depend upon the antecedent observation w31. There is also a reduction in the prior variance of W32; this reduction depends upon w31 because of the heteroscedasticity of the Markov prior dependence structure (as revealed in Fig. 2). For a long lead time (l = 4 in Figs. 5b,d), the effect of the antecedent observation w31 is negligible. This is explained by the declining autocorrelation coefficient (Table 1): c32 = 0.572, but c435 = 0.147.

4. Conditional likelihood function

The formulation and estimation of the meta-Gaussian model for the family of conditional likelihood functions follow the methodology of Krzysztofowicz and Evans (2008). Therefore, details are omitted and only the equations which define the parameters of the model are presented.

a. Likelihood parameters

For each day k ∈ {1, . . . , 365} and each lead time l ∈ {1, . . . , L}, there are now three variates: the predictand Wk having the marginal distribution function Gk, the antecedent Wkl having the marginal distribution function Gkl, and the predictor Xkl having the marginal distribution function Kkl. Each variate is subjected to the NQT:
i1520-0493-136-12-4572-e22a
i1520-0493-136-12-4572-e22b
i1520-0493-136-12-4572-e22c
The likelihood parameters akl, bkl, dkl, and σkl are defined by the Gaussian model:
i1520-0493-136-12-4572-e23a
and
i1520-0493-136-12-4572-e23b

Although the likelihood parameters are indexed by k, their values need not change from day to day because the performance of a forecasting system does not change, in a statistical sense, every 24 h. Thus, the frequency of updating the likelihood parameters may be dictated by operational considerations. For instance, under the adaptive scheme for sampling, estimation, and forecasting (Krzysztofowicz and Evans 2008), the standardized time series of each variate is assumed to be stationary and ergodic within the sampling window (about 90–120 days) and the subsequent forecasting window (about 5–10 days). Consequently, the likelihood parameters are reestimated every 5–10 days, after the sampling window shifts forward.

Table 2 reports the estimates obtained from two sampling windows. Of particular interest here are the values of dkl. They are significantly different from zero for l = 1, 4 in the cool season and for l = 1 in the warm season. In other words, the antecedent observation explains (or predicts), in part, the error of a deterministic forecast up to 4 days ahead. Hence, the conditioning of the likelihood function on the antecedent observation Wkl = wkl, as dictated by the theory of Markov BPF (section 2c), serves not merely to ensure the coherence but also to improve the probabilistic forecast.

b. Forecast informativeness

To determine the effect of the antecedent Wkl on the probabilistic forecast of Wk, it is necessary to characterize the informativeness of the predictor Xkl alone. For this purpose, the likelihood function from the original BPF (Krzysztofowicz and Evans 2008) must be recalled. Its parameters kl, kl, and σ̇kl are defined by
i1520-0493-136-12-4572-e24a
i1520-0493-136-12-4572-e24b
and the informativeness score of predictor Xkl with respect to predictand Wk is given by
i1520-0493-136-12-4572-e25
The score is bounded, 0 ≤ ISkl ≤ 1, with ISkl = 0 for an uninformative predictor, and ISkl = 1 for a perfect predictor. The quantity γkl is the Pearson’s product-moment correlation coefficient:
i1520-0493-136-12-4572-e26
Table 3 reports the estimates obtained from two sampling windows.

5. Posterior distribution from Markov prior

a. Posterior parameters

Given the prior parameter and the likelihood parameters, the posterior parameters can be calculated as follows (Krzysztofowicz and Kelly 2000): t2kl = 1 − c2lk, and then
i1520-0493-136-12-4572-e27a
i1520-0493-136-12-4572-e27b
i1520-0493-136-12-4572-e27c
i1520-0493-136-12-4572-e27d
These parameters are for the lead time of l days, and are valid for every day k within the forecasting window, as explained in section 4a.

b. Forecasting equations

Given the antecedent observation Wkl = wkl and the deterministic forecast Xkl = xkl, the probabilistic forecast of Wk on day k with the lead time of l days is specified by one of the following constructs (Krzysztofowicz and Kelly 2000).

The posterior distribution function of Wk, defined in Eq. (5), is specified by the equation
i1520-0493-136-12-4572-e28
The posterior p-probability quantile of Wk, defined in Eq. (6), is specified by the equation
i1520-0493-136-12-4572-e29
The posterior density function of Wk, defined in Eq. (4), is specified by the equation
i1520-0493-136-12-4572-e30

c. Posterior functions

Figures 6 and 7 show examples of probabilistic forecasts of the daily maximum temperature in Savannah. (The notation in the figures is complete for the density functions but abbreviated for the distribution functions because of lack of space.)

For a cool day and the 24-h lead time (Fig. 6), the prior 1-step transition density function (Fig. 6d) differs from the prior marginal density function (Fig. 6c), but the two posterior density functions are not significantly affected by the choice of the prior density function. The reason is that the informativeness of the deterministic forecast (γ32,1 = 0.920) is high enough to render even a moderate autocorrelation of the predictand series (c32 = 0.572) useless.

For a warm day and the 96-h lead time (Fig. 7), the prior 4-step transition density function (Fig. 7d) differs from the prior marginal density function (Fig. 7c), and the two posterior density functions are significantly affected by the choice of the prior density function. Given the forecast X217,4 = 80°F, the posterior density function resulting from the Markov prior density function (Fig. 7d) is flatter and shifted toward the antecedent observation W213 = 105°F. The larger posterior variance may be explained by the large difference between the forecasted temperature and the antecedent observation. Given the forecast X217,4 = 95°F, the posterior density function resulting from the Markov prior density function (Fig. 7d) is sharper and also shifted toward W213 = 105°F. Overall, the Markov prior density function has a significant effect on the posterior density function. The reason is that the informativeness of the deterministic forecast (γ217,4 = 0.725) is low enough to make even a weak autocorrelation of the predictand series (c4217 = 0.248) useful.

In summary, the usage of the Markov prior density function instead of the marginal prior density function may, or may not, affect (or improve) the resultant probabilistic forecast. It is important, therefore, to identify the conditions under which a significant effect occurs.

6. Comparison of forecasting models

a. Experimental design

An experiment was designed to isolate and characterize the effect of the interaction between forecast informativeness and predictand autocorrelation. Because the prior marginal distribution function does not vary much within seven days, a single Weibull distribution function G was chosen for every day in the forecasting window. [In the notation of Krzysztofowicz and Evans (2008, their appendix) its parameters (α = 55, β = 6, η = 12) were representative of the cool season in Savannah.] Similarly, to eliminate forecast bias, the marginal distribution function of the forecast variate was set equal to the prior marginal distribution function of the predictand. Because no specific day is being considered, the k subscript can be dropped from the notation. Thus, the following distribution functions are equivalent for all k and l in the experiment:
i1520-0493-136-12-4572-e31a
i1520-0493-136-12-4572-e31b
The likelihood parameters were set to the values that make the forecast unbiased and independent of the antecedent observation: a = 1, b = 0, and d = 0. As a result, the informativeness score of the predictor X is given by the equation
i1520-0493-136-12-4572-e32
and, with t2l = 1 − c2l, the posterior parameters are given by the equations
i1520-0493-136-12-4572-e33a
i1520-0493-136-12-4572-e33b
Finally, given a deterministic forecast X = x and an antecedent observation W0 = w0, the posterior quantile of W corresponding to probability p (0 < p < 1) is specified by the equation
i1520-0493-136-12-4572-e34

One of the simplest probabilistic forecasts, which conveys a minimum information about uncertainty, consists of the posterior median, w0.5, and the posterior 50% central credible interval (CCI), given by (w0.25, w0.75), whose width, w0.75w0.25, is a measure of the posterior uncertainty. In the experiment, this forecast is assumed to be well calibrated (Krzysztofowicz and Sigrest 1999).

b. Impacts of autocorrelation

Figure 8 shows how the width of the posterior 50% CCI varies with γ, c, and l. The horizontal line in each graph is the case with cl = 0, effectively the case of the BPF with the marginal prior distribution function. The intercept of this line decreases with γ; the highest intercept is 13.4°F, which is the width of the prior 50% CCI under the marginal prior distribution function G.

In Figs. 8a–c, the antecedent temperature W0 = 64°F equals the climatic median of W, and the forecast temperature X = 60°F is nearby. For every value of γ and c (not only those shown in the three graphs), the posterior 50% CCI resulting from the Markov prior distribution is never wider than the posterior 50% CCI resulting from the marginal prior distribution.

In Figs. 8d–f, the antecedent temperature W0 = 46°F equals the 5th percentile of W under G and thus predicts a rather cold day, whereas the forecast temperature X = 78°F equals the 95th percentile and thus predicts a rather warm day. With the two predictor values being contradictory, there exists a region of γ and c values producing the posterior 50% CCI that is wider than the posterior 50% CCI resulting from the marginal prior distribution. This illustrates an important property of probabilistic forecasts based on two or more predictors: relative to the degree of uncertainty indicated by a single predictor, the degree of uncertainty indicated by two predictors may be smaller, or equal, or larger, depending on the predictor values. Loosely speaking, the predictor values may be either “confirmatory” (as in Figs. 8a–c), or “contradictory” (as in Figs. 8d–f).

With regard to the main objective of this study, Fig. 8 demonstrates that the impact of the autocorrelation in the predictand time series on the posterior uncertainty (i) is nonlinear and (ii) depends on the informativeness of the predictor: the less informative the predictor is, the greater is the impact of the autocorrelation.

Figure 9 depicts the probabilistic forecasts corresponding to the lowest curve (l = 1) and the highest curve (l = 7) in Figs. 8d–f. A forecast is represented by three posterior quantiles (w0.25, w0.5, w0.75), given the particular values of γ, c, and l, and given “contradictory” predictor values X = 78°F, W0 = 46°F. The posterior median w0.5 approaches 46°F as c increases while γ is fixed, and approaches 78°F as γ increases while c is fixed. Obviously, the day 1 forecasts are more sensitive to changes in c than the day 7 forecasts.

Overall, the example in Fig. 9 demonstrates that the autocorrelation of the predictand time series does have an impact on the location of the posterior quantiles regardless of the c value (when l is small) and on the degree of posterior uncertainty when c is large enough.

c. Comprehensive evaluation

1) Methodology

There are four basic models for producing probabilistic forecasts that differ in their use of climatic data and deterministic forecasts:

  1. The climatic model—The forecast of Wk is simply the marginal prior (climatic) distribution function Gk.
  2. The Markov climatic model—The forecast of Wk is the Markov prior (climatic) distribution function Hk(·|wk−1), conditional on the antecedent observation Wk−1 = wk−1.
  3. The BPF—The forecast of Wk is the posterior distribution function Φk(·|xk), conditional on the deterministic forecast Xk = xk, and derived from the marginal prior distribution function Gk.
  4. The Markov BPF—The forecast of Wk is the posterior distribution function Φk(·|xk, wk−1), conditional on the deterministic forecast Xk = xk, and derived from the Markov prior distribution function Hk(·|wk−1), which is conditional on the antecedent observation Wk−1 = wk−1.

The objective is to evaluate how the Markov BPF improves upon the other three models, when the improvement is measured in terms of a reduction of uncertainty from the viewpoint of a decision maker. Formally, let t = w0.75w0.25 be the width of a 50% CCI, let m (m = 1, 2, 3, 4) be the index of the model, and let t(m) be the width of the 50% CCI under the distribution function output as the forecast by model m. Then the percent reduction in the width of the 50% CCI achieved by the Markov BPF (m = 4) relative to model m (m = 1, 2, 3) is
i1520-0493-136-12-4572-eq1

For each m, the surface of R4m in the space of the informativeness score γ and the autocorrelation coefficient c is depicted by isoquants drawn at 10% increments. Of course, the values of R4m depend upon the values of X and W0; however, the pattern of isoquants is essentially invariant. For this reason, only the results for X = 60°F and W0 = 64°F are discussed.

2) Markov BPF versus climatic model

This comparison serves to evaluate gains from two simultaneous predictors, X and W0. The surface of R41 (Fig. 10) shows essentially a symmetric influence of γ and c: when γ = c, the predictors X and W0 are essentially exchangeable. For example, the Markov BPF with c = 0 and is γ = 0.6 (effectively using only predictor X) gives R41 of about 20%, which is the same as would be given by the Markov BPF with c = 0.6 and γ = 0 (effectively using only predictor W0).

Finally, when either γ increases, or c increases, or both γ and c increase, the spacing of the R41 isoquants decreases, implying the increasing marginal gain in uncertainty reduction.

3) Markov BPF versus Markov climatic model

This comparison serves to evaluate gains from the deterministic forecast X as the second predictor, adjoining the antecedent observation W0. The surface of R42 (Fig. 11) shows (i) that, for a fixed c, an increase in γ reduces the uncertainty; (ii) that, to achieve a sizable reduction (say 10%), γ must be larger than a threshold (which increases with c); (iii) that the marginal gain in R42 increases as γ approaches 1; and (iv) that, as the autocorrelation rises, the informativeness must also be rising, at an increasing rate, to maintain a constant gain in R42.

4) Markov BPF versus BPF

This comparison serves to evaluate gains from the antecedent observation W0 as the second predictor, adjoining the deterministic forecast X. The surface of R43 (Fig. 12) shows (i) that, for a fixed γ, an increase in c reduces the uncertainty; (ii) that, to achieve a sizable reduction (say 10%), c must be larger than a threshold (which increases with γ); (iii) that the marginal gain in R43 increases as c approaches one; and (iv) that, as the informativeness rises, the autocorrelation must also be rising, at an increasing rate, to maintain a constant gain in R43.

Finally, Fig. 13 shows two surfaces of R43, which result from “contradictory” predictor values, giving an idea about the range of variability of R43, and confirming the essential invariance of the pattern of isoquants.

d. Conclusions

The Markov BPF offers several advantages over the BPF and, of course, over the two climatic models. When the informativeness of the deterministic forecast is high but the autocorrelation of the predictand series is low, the BPF automatically gives more weight to the deterministic forecast. When the informativeness is low but the autocorrelation is high, the BPF gives more weight to the antecedent observation. Thus in principle, the Markov BPF should always be preferred over the BPF because it can automatically account for any level of forecast informativeness and any degree of predictand autocorrelation. But as the development of the Markov BPF entails a greater cost than the development of the BPF, the following practical and preliminary (because of the limited scope of this study) guidelines may be offered. Depending upon the level of informativeness of the deterministic forecast, the Markov BPF should be considered a contender for operational implementation when the autocorrelation of the predictand time series is approximately between 0.3 and 0.6, and should be considered the preferred processor when the autocorrelation exceeds 0.6. (Nota bene: the above values of the autocorrelation coefficient pertain to variates which have been suitably transformed and conform to the multivariate Gaussian distribution.)

7. Closure

Three contributions to the field of probabilistic forecasting of nonstationary, discrete-time, continuous-state stochastic processes in meteorology have been presented. The first one is the meta-Gaussian Markov model; it characterizes the (nonstationary) autocorrelation of the process and provides, for each day (or some other suitable time step) a family of the (prior) l-step transition distribution functions, from which a climatic probabilistic forecast with the lead time of l days can be obtained, given an antecedent observation. The second one is the meta-Gaussian Markov BPF; it fuses the (prior) climatic probabilistic forecast with a deterministic forecast produced by any system (such as a numerical weather prediction model, a human forecaster, or a statistical postprocessor), and outputs a probabilistic forecast of the predictand; this forecast is in the form of a (posterior) l-step transition distribution function, which quantifies the uncertainty about the predictand that remains, given the antecedent observation and the deterministic forecast. The third one is the demonstration that the climatic autocorrelation of the predictand time series, when suitably exploited within the Bayesian forecast processor, can play a significant role in quantifying and in reducing the meteorological forecast uncertainty. Further research should, therefore, consider extending the meta-Gaussian Markov BPF so that a probabilistic forecast could be obtained from an ensemble of deterministic forecasts.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant ATM-0641572, “New Statistical Techniques for Probabilistic Weather Forecasting.” The Meteorological Development Laboratory of the National Weather Service provided the data.

REFERENCES

  • Brown, B. G., , R. W. Katz, , and A. H. Murphy, 1984: Time series models to simulate and forecast wind speed and wind power. J. Climate Appl. Meteor., 23 , 11841195.

    • Search Google Scholar
    • Export Citation
  • Campbell, S. D., , and F. X. Diebold, 2005: Weather forecasting for weather derivatives. J. Amer. Stat. Assoc., 100 , 616.

  • Gneiting, T., , K. Larson, , K. Westrick, , M. G. Genton, , and E. Aldrich, 2006: Calibrated probabilistic forecasting at the stateline wind energy center: The regime-switching space-time method. J. Amer. Stat. Assoc., 101 , 968979.

    • Search Google Scholar
    • Export Citation
  • Kelly, K. S., , and R. Krzysztofowicz, 1995: Bayesian revision of an arbitrary prior density. Proc. Section on Bayesian Statistical Science, Alexandria, VA, American Statistical Association, 50–53.

    • Search Google Scholar
    • Export Citation
  • Kelly, K. S., , and R. Krzysztofowicz, 1997: A bivariate meta-Gaussian density for use in hydrology. Stochastic Hydrol. Hydraul., 11 , 1731.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., 1983: A Bayesian Markov model of the flood forecast process. Water Resour. Res., 19 , 14551465.

  • Krzysztofowicz, R., 1985: Bayesian models of forecasted time series. J. Water Resour. Bull., 21 , 805814.

  • Krzysztofowicz, R., 1992: Bayesian correlation score: A utilitarian measure of forecast skill. Mon. Wea. Rev., 120 , 208219.

  • Krzysztofowicz, R., , and A. A. Sigrest, 1999: Calibration of probabilistic quantitative precipitation forecasts. Wea. Forecasting, 14 , 427442.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., , and K. S. Kelly, 2000: Hydrologic uncertainty processor for probabilistic river stage forecasting. Water Resour. Res., 36 , 32653277.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., , and H. D. Herr, 2001: Hydrologic uncertainty processor for probabilistic river stage forecasting: Precipitation-dependent model. J. Hydrol., 249 , 4668.

    • Search Google Scholar
    • Export Citation
  • Krzysztofowicz, R., , and W. B. Evans, 2008: Probabilistic forecasts from the National Digital Forecast Database. Wea. Forecasting, 23 , 270289.

    • Search Google Scholar
    • Export Citation
  • Lund, R., , Q. Shao, , and I. Basawa, 2006: Parsimonious periodic time series modeling. Aust. N. Z. J. Stat., 48 , 3347.

  • Murphy, A. H., , and R. W. Katz, 1985: Probability, Statistics, and Decision Making in the Atmospheric Sciences. Westview Press, 545 pp.

  • Peroutka, M. R., , G. J. Zylstra, , and J. L. Wagner, 2005: Assessing forecast uncertainty in the National Digital Forecast Database. Preprints, 21st Conf. on Weather Analysis and Forecasting, Washington, DC, Amer. Meteor. Soc., P2B.3. [Available online at http://ams.confex.com/ams/pdfpapers/94464.pdf.].

  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. Academic Press, 467 pp.

APPENDIX

Forecasting Experiment

Objective and design

The three contributions presented in this article are theoretic. How to implement them effectively in operational forecasting is a separate issue that cannot be treated thoroughly in the same article, if only because of its length. Yet a reviewer craved some verification results. Therefore, we performed a simple forecasting experiment using solely the data and the estimates already reported. The objective of this experiment is to illustrate the coherence and the robustness of our Bayesian theory on real data.

The four basic models compared in section 6c are employed to produce probabilistic forecasts of the daily maximum temperature in Savannah. The parameters of these models are set to the estimates reported in this article. To recall, the climatic model and the Markov climatic model have their parameters estimated (section 3) for each day of the year from a sample of size M = 595 recorded in 119 yr (1874–2001). The BPF and the Markov BPF have their likelihood parameters (Tables 2 and 3) estimated for a cool season and a warm season, and for each lead time, from a sample of size N, varying between 38 and 116, and recorded in 1.5 years: October 2004–January 2005 for cool season and April–July 2005 for warm season.

Forecasts with lead time of l days (l = 1, 4, 7) are next produced by each of the four models for every day for which the joint realization (xkl, wk, wk−l) is available but was not included in the joint sample for the likelihood parameter estimation. Thereby the verification sample comprises days from 9 months: (i) February–March 2005 and October 2005–February 2006 in cool season and (ii) August–September 2005 in warm season. Four of the months (February, March, August, September) were not represented at all in the joint sample for the likelihood parameter estimation. Thus, the test is unfavorable to the BPF and the Markov BPF because it presumes the stationarity of the likelihood function during two months beyond the sampling window—which is a gross approximation, especially for longer lead times (here l = 4, 7), as evidenced by the differences in the likelihood parameter estimates for the cool season and the warm season (Tables 2 and 3).

Verification measures

Parallel to the comparison in section 6, each model outputs the simplest probabilistic forecast, consisting of three p-probability quantiles of Wk (p = 0.25, 0.5, 0.75). The calibration of p-probability quantile is evaluated in terms of rp—the relative frequency with which the quantile is not exceeded by the predictand realization. The calibration of forecast is evaluated in terms of the calibration score (Krzysztofowicz and Sigrest 1999):
i1520-0493-136-12-4572-eqa1
The score is bounded, 0 ≤ CS ≤ 0.677, with CS = 0 being the best; it represents the Euclidean distance between the empirical and the normative nonexceedance probabilities.

The sharpness of forecast (equivalently, the degree of uncertainty indicated by the forecast about the predictand) is evaluated in terms of AW—the average width of the 50% CCI, in degrees Fahrenheit. Of course, AW is meaningful provided the 50% CCI is well calibrated; that is, r0.75r0.25 = 0.5, at least approximately.

The informativeness of the forecast is evaluated in terms of IS—the informativeness score calculated from the joint realizations of the forecast median (which in each of the four models carries all predictive information there is) and the predictand (Krzysztofowicz 1992). The score is bounded, 0 ≤ IS ≤ 1, with IS = 0 for an uninformative forecast model and IS = 1 for a perfect forecast model; it ranks the forecast models consistently with their economic values (which would be output from a Bayesian decision procedure with any utility function and any prior distribution function). Because the time series of the predictand within the 9-month verification window is obviously nonstationary (Krzysztofowicz and Evans 2008, their Fig. 2), the IS is calculated from the joint sample of the forecast-predictand vector in the standardized sample space; and because the marginal distribution function of each standardized variate is non-Gaussian, the joint sample is processed through the empirical NQT (Krzysztofowicz 1992, his appendix D).

[Note that the standardization of the forecast-predictand vector (i) does not affect the values of rp and CS and (ii) does not alter the ranking of AW values across the forecast models, but only changes their scale from degrees Fahrenheit to dimensionless. Therefore, all three measures—CS, AW, and IS—are compatible.]

Hypotheses

For the months covered in this forecasting experiment, the autocorrelation coefficients ck are in the range 0.52–0.72 (Fig. 4) and the informativeness scores γk1 are about 0.9 (Table 3). When these values are inserted into Figs. 12 and 13, they yield the hypothesis that the impact of the climatic autocorrelation on forecasts of the daily maximum temperature at Savannah is negligible.

The second hypothesis is that the theoretical coherence between the four models is preserved despite the pitifully small joint samples (Tables 2 and 3) for the likelihood parameter estimation of the BPF and the Markov BPF. The coherence means (i) a stable calibration of models for all lead times and (ii) a stable preference order of models (climatic, Markov climatic, BPF, Markov BPF) in terms of sharpness and informativeness for all lead times.

The third hypothesis is that the Markov BPF is robust in that it performs no worse than the BPF despite (i) the experimental conditions in which its advantage is negligible, (ii) the two extra parameters (ck, dkl), and (iii) the potentially large parameter uncertainty due to the small joint samples.

Results

The calibration of the three quantiles of the climatic model (Table A1) reveals that the 9-month verification period was somewhat warmer than the historical period: the prior (climatic) distribution would have to be shifted to the right (so that the climatic median becomes the 0.43-probability quantile) to match the relative nonexceedance frequencies during the verification period. (Whether this shift is caused by a random fluctuation, and therefore is unpredictable, or by a long-term trend, and therefore can be accounted for, is of no interest herein.)

What is of interest is the stability of the calibration of all three quantiles across the models and the lead times (Table A1). Taking the calibration of the climatic model as the standard, the other three models are calibrated nearly as well, with the exception of the 0.25-probability quantiles from the BPF and the Markov BPF at l = 7; but even in these cases the largest relative miscalibration is only 0.08, and it can be traced to the warm season for which the samples were the smallest (the estimation sample of size 38; the verification sample of size 44).

Despite warmer than normal verification period, the 50% CCI is about equally well calibrated across the models and the lead times (except for the Markov BPF at l = 7, where the miscalibration by 0.08 is the largest for the reason explained above).

For the calibration of forecast (all three quantiles), the CS of the climatic model (Table A2) sets the standard: its values reflect warmer-than-normal verification period, and its variability across the lead times reflects small verification samples. Relative to this standard, the Markov climatic model is calibrated equally well, whereas the BPF and the Markov BPF are either slightly better calibrated or slightly worse calibrated, depending on the lead time. Based strictly on the CS, the best calibrated model is the BPF for l = 1, the Markov BPF for l = 4, and the Markov climatic model for l = 7. This shows that the complexity of the Bayesian forecasting models does not degrade their calibration, unless the estimation sample is pitifully small (like for l = 7 in the warm season), but even then the maximum miscalibration relative to the standard remains small (0.099 − 0.062 = 0.037).

The ranking of the four models in terms of AW is coherent. Theoretically, the climatic model has a constant AW independent of the lead time l, but because the verification sample size varies slightly with l, so does AW. For each of the other three models, AW should not decrease with l, and it does not. For each lead time l, AW should not increase with model complexity, and it does not.

The ranking of the four models in terms of IS is also coherent. Theoretically and practically, the climatic model is uninformative (IS = 0). For each of the other three models, IS should not increase with l, and it does not. For each lead time l, IS should not decrease with model complexity, and it does not, except for l = 7 where the order of scores 0.509 and 0.508 is reversed. This is a numerical fluke, and with the verification sample size of 213, the difference of 0.0009 is insignificant: the BPF and the Markov BPF are equally informative.

Conclusions

The confirmation of the first hypothesis illustrates the advantage of the theoretic analysis from section 6: it can predict the potential impact of the climatic autocorrelation in the Markov BPF, without the need for forecasting experiments.

The confirmation of the second and the third hypotheses illuminates two unique properties of our Bayesian forecasting theory: coherence and robustness. In particular, when the theory is properly implemented (using proper parametric models and proper estimation procedures), the four forecasting models (i) are about equally well calibrated and (ii) are progressively more informative: climatic model, Markov climatic model, BPF, Markov BPF. Thus, the seeming complexity of the BPF and of the Markov BPF is not a hindrance, despite the pitifully small joint samples for the likelihood parameter estimation, and despite the testing conditions being unfavorable to them (as explained in sections a and c of this appendix).

Overall, this forecasting experiment sheds some light on the empirical properties of our Bayesian forecasting models. But the full power of their coherence and robustness is yet to be demonstrated and appreciated.

Fig. 1.
Fig. 1.

Dependence structures (prior, likelihood, and posterior) in the BPF for a Markov predictand process and the forecast lead time of 2 days.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 2.
Fig. 2.

Dependence structure of the prior 1-step transition distribution function of the daily maximum temperature: the meta-Gaussian regression with the 80% central credible interval in the standardized sample space for days (a) k = 32 and (b) k = 214; the linear regression with the 80% central credible interval in the normal sample space for days (c) k = 32 and (d) k = 214 at Savannah.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 3.
Fig. 3.

Validation of the meta-Gaussian dependence structure for the Markov model of the daily maximum temperature: homoscedasticity of dependence for days (a) k = 32 and (b) k = 214; normality of residuals for days (c) k = 32 and (d) k = 214 at Savannah. [A high quantile–quantile correlation suggests approximate normality of residuals even though the null hypothesis is rejected by the Shapiro–Francia test in each case at p < 0.003 because of the sensitivity of the test for large sample sizes, here 595. The pattern in (b) and the tails in (d) are the artifacts of the precision of measurement.]

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 4.
Fig. 4.

Time series of the sample autocorrelation coefficient (in the normal sample space) of the daily maximum temperature, the envelope, and the fitted fourth-order Fourier series expansion at Savannah.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 5.
Fig. 5.

Prior marginal distribution function Gk and prior l-step transition distribution functions Hkl conditional on antecedent observations Wkl = wkl for two lead times: (a) 24 h (l = 1) for 1 Feb (k = 32) and (b) 96 h (l = 4) for 4 Feb (k = 35). Also shown are density functions corresponding to the distribution functions (c) from (a) and (d) from (b). This figure is for the cool season at Savannah.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 6.
Fig. 6.

Example of the Markov prior distribution having insignificant effect: (a) two posterior distribution functions resulting from the marginal prior distribution function and (b) two posterior distribution functions resulting from the Markov prior distribution function. Also shown are density functions corresponding to the distribution functions (c) from (a) and (d) from (b). This figure is for 24-h lead time and cool season at Savannah.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 7.
Fig. 7.

Example of the Markov prior distribution having significant effect: (a) two posterior distribution functions resulting from the marginal prior distribution function and (b) two posterior distribution functions resulting from the Markov prior distribution function. Also shown are density functions corresponding to the distribution functions (c) from (a) and (d) from (b). This figure is for 96-h lead time and warm season at Savannah.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 8.
Fig. 8.

The width of the posterior 50% CCI, resulting from the Markov prior distribution, as a function of informativeness score γ, autocorrelation coefficient c, and lead time l, when the predictors (X, W0) take on (a)–(c) “confirmatory” values and (d)–(f) “contradictory” values.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 9.
Fig. 9.

The posterior median, w0.5, and the 50% CCI, (w0.25, w0.75), resulting from the Markov prior distribution, as a function of informativeness score γ, autocorrelation coefficient c, and lead time (l = 1 day, solid line; l = 7 days, broken line), when the predictors take on “contradictory” values: X = 78°F, W0 = 46°F.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 10.
Fig. 10.

The percent reduction in the width of the 50% CCI using the Markov BPF instead of the climatic model for forecasting with lead time of 1 day, given predictor values X = 60°F and W0 = 64°F.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 11.
Fig. 11.

The percent reduction in the width of the 50% CCI using the Markov BPF instead of the Markov climatic model for forecasting with lead time of 1 day, given predictor values X = 60°F and W0 = 64°F.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 12.
Fig. 12.

The percent reduction in the width of the 50% CCI using the Markov BPF instead of the BPF for forecasting with lead time of 1 day, given predictor values X = 60°F and W0 = 64°F.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Fig. 13.
Fig. 13.

The percent reduction in the width of the 50% CCI using the Markov BPF instead of the BPF for forecasting with lead time of 1 day, given two diametrical vectors of “contradictory” predictor values.

Citation: Monthly Weather Review 136, 12; 10.1175/2008MWR2375.1

Table 1.

Estimates of the autocorrelation coefficients obtained from the climatic sample at Savannah.

Table 1.
Table 2.

Estimates of the likelihood parameters for the Markov BPF obtained from the sampling windows ending before a cool day (31 Jan, k = 31) and a warm day (1 Aug, k = 213) at Savannah.

Table 2.
Table 3.

Estimates of the likelihood parameters for the BPF obtained from the sampling windows ending before a cool day (31 Jan, k = 31) and a warm day (1 Aug, k = 213) at Savannah.

Table 3.

Table A1. Calibration of the three quantiles and the 50% CCI.

i1520-0493-136-12-4572-ta01

Table A2. The CS, average width of the 50% CCI (AW) in degrees Fahrenheit, and IS.

i1520-0493-136-12-4572-ta02
Save