## 1. Introduction

Correctly predicting forecast uncertainty can bring significant economic benefits to many decision makers (AMS 2008). Unlike a deterministic forecast, which supplies only the expected weather outcome, a probabilistic forecast gives the likelihood of occurrence of all outcomes. Decisions are based on combining the relative risks of various weather outcomes with the costs and losses corresponding to those outcomes. Thus, probabilistic forecasts are naturally preferred for economic decision making.

Let *f _{t}*(

*x*) be the forecasted probability density function (PDF) of a continuous meteorological variable

*X*(such as temperature) valid for time

*t*. One can generate

*f*(

_{t}*x*) from an ensemble of numerical weather prediction (NWP) models by using methods such as Bayesian model averaging (Raftery et al. 2005), the binned probability ensemble technique (Anderson 1996), the method of moments (Jewson et al. 2005), or local quantile regression (Bremnes 2004).

*F*(

_{t}*x*) denote the forecasted cumulative distribution function (CDF) given by

*x*denote the observed state of

_{t}*X*at time

*t*. Let

*p*denote the CDF value corresponding to the observed state:

_{t}*p*is called the probability integral transform value (PIT value) corresponding to the observation.

_{t}We will assume an operational ensemble forecasting system initialized at time *t* = 0 that gives hourly forecasts out to time *t* = *T*. At times *t*, where 0 ≤ *t* ≤ *T*, hourly observations from observing stations are made available, but the models do not incorporate these observations until the next forecast cycle starts.

Figure 1a shows a sample temperature CDF forecast for a single location produced from an ensemble. At the time the figure was produced, observations up to 1000 UTC were available. What is clear from the figure is that the CDF value that the observation verifies on (PIT value) is highly correlated in time (Fig. 1c). Given that the most recent PIT value (at 1000 UTC) is 0.75, the next PIT value (at 1100 UTC) will likely be near 0.75.

The probability distribution can therefore be refined to take into account this new information that was not available at the time the model was initialized. The effects of the most recent observation will diminish for longer lead times. The updated probability distribution will therefore be narrow near the time of the observation and widen back to the original distribution for times in the future (Fig. 1b).

*F*(

_{t}*x*) by a function Φ as follows:

Postprocessing weather forecasts is commonly done to increase the correspondence between forecasts and observations. For deterministic forecasts, methods such as model output statistics (Glahn and Lowry 1972), Kalman filtering (Homleid 1995), and analog methods (Delle Monache et al. 2011) are commonly used to reduce forecast error. On the other hand, methods such as ensemble calibration (Hamill and Colucci 1998) and Bayesian model averaging (Raftery et al. 2005) can be used to improve probabilistic forecasts from an ensemble of deterministic forecasts. The method presented here also aims to improve probabilistic forecasts, but differs in that it is only invoked once observations are available after the raw forecasts are created. It is therefore of most use for operational short-term forecasts.

This paper is organized as follows: the method for updating probabilistic forecasts is presented in section 2, the dataset and verification metric used for testing the method is described in section 3, the performance of the method is evaluated in section 4, and conclusions are drawn in section 5.

## 2. Method

Assume that for a given forecast day, *T* + 1 hourly probabilistic forecasts *F _{t}*(

*x*) (where 0 ≤

*t*≤

*T*) are produced. Let

*t*

_{obs}denote the time at which the most recent observation was made. This observation is then used to update all hourly forecasts that are still in the future (i.e., where

*t*

_{obs}<

*t*≤

*T*).

*n*hours after

*t*

_{obs}, that is for time

*t*=

*t*

_{obs}+

*n*, can be updated according to

*(*

_{n}*p*) will in general be different for each value of

*n*and can be constructed based on forecast and observation data prior to the time

*t*

_{obs}. Here, Φ

*(*

_{n}*p*) is the probability function that the verifying PIT value of the original forecast will be less than

*p*.

*(*

_{n}*p*) is the derivative of Φ

*(*

_{n}*p*) and acts as an amplification factor for the original PDF. We note that Ψ

*(*

_{n}*p*) increases probability density in regions where the PIT value is more likely to occur given the recent observation. That is, Ψ

*(*

_{n}*p*) is also the probability density of

*p*being the verifying PIT value of the original forecast.

### a. PIT values as a random walk in time

We model the time sequence of verifying PIT values within one forecast cycle as a random walk in time. Mirror barriers at 0 and 1 are used to handle the fact that PIT values are bounded on the interval [0, 1]. That is, any random steps across the boundaries are reflected back into the domain (Fig. 2). Mirror barriers are commonly used to describe stochastic processes in other areas of modeling [Karlin and Taylor (1981); see also Rose (1995) for applications in economics].

(a) A hypothetical time series of verifying PIT values (solid line). Mirror barriers at 0 and 1 reflect any steps back into the domain. The dashed line shows the PIT time series without reflections. The transition from time 3 to 4 involves a reflection across 1 as shown by the arrows. (b) The PDF (thick solid line) of the PIT value for time 9, given that the PIT value at time 8 was 0.80. The dashed line shows the probability of the Gaussian distribution that has been reflected back into the domain.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

(a) A hypothetical time series of verifying PIT values (solid line). Mirror barriers at 0 and 1 reflect any steps back into the domain. The dashed line shows the PIT time series without reflections. The transition from time 3 to 4 involves a reflection across 1 as shown by the arrows. (b) The PDF (thick solid line) of the PIT value for time 9, given that the PIT value at time 8 was 0.80. The dashed line shows the probability of the Gaussian distribution that has been reflected back into the domain.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

(a) A hypothetical time series of verifying PIT values (solid line). Mirror barriers at 0 and 1 reflect any steps back into the domain. The dashed line shows the PIT time series without reflections. The transition from time 3 to 4 involves a reflection across 1 as shown by the arrows. (b) The PDF (thick solid line) of the PIT value for time 9, given that the PIT value at time 8 was 0.80. The dashed line shows the probability of the Gaussian distribution that has been reflected back into the domain.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

*(*

_{n}*p*) be the probability density function of the verifying PIT value being

*p*at

*n*hours after

*t*

_{obs}. When

*n*= 0, the PIT value is fully known and can therefore be described by

*δ*is the Dirac delta function defined by

*S*(

*p*,

*q*) represent the probability density of arriving at a PIT value of

*p*, given that the previous PIT value was

*q*. Since our stochastic model for PIT values is a first-order Markov model, the probability of a certain PIT at time

*n*can be found from all transitions to that PIT from time

*n*− 1. The probability density after a transition can therefore be determined by the following recursive equation:

### b. Determining the transition function

*σ*

^{2}. That is, the transition function

*S*can be constructed as follows:

*φ*(

*x*;

*μ*;

*σ*

^{2}) is a Gaussian PDF with mean

*μ*and variance

*σ*

^{2}. The first term in Eq. (10) comes from steps within the domain, the second comes from steps reflected across 0, and the third term comes from steps reflected across 1. Equation (11) includes all possible steps, including steps that cross both boundaries one or more times.

*n*number of steps can also be constructed and is denoted by

*S*. The variance of multiple steps (under the assumed model) increases linearly with time, and

_{n}*S*can therefore be computed by

_{n}*σ*is small in our study (around 0.15), and we use values of

*n*no larger than 24, we restrict the summation to

*i*∈ [−10, 10]. A wider range for

*i*may be required for large

*σ*and

*n*values.

*S*allows us to simplify Eq. (9) to the following:

_{n}*t*

_{obs}. This simplification avoids the need to recursively compute Ψ

*[as in Eq. (9)]. Note that for forecast variables that require a non-Gaussian transition function, it is possible that Eq. (12) cannot be constructed analytically in which case the above simplification may not be possible.*

_{n}Figure 3 shows an example sequence of Ψ* _{n}*(

*p*) for various values of

*n*. The PIT value distribution clearly widens as time goes on, indicative of the disappearing effects of the last observed PIT value.

An example sequence of PDFs of PIT values for different numbers of hours (*n*) after an observation has been made. In this case at *n* = 0, the PIT value is fully known to be 0.7.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

An example sequence of PDFs of PIT values for different numbers of hours (*n*) after an observation has been made. In this case at *n* = 0, the PIT value is fully known to be 0.7.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

An example sequence of PDFs of PIT values for different numbers of hours (*n*) after an observation has been made. In this case at *n* = 0, the PIT value is fully known to be 0.7.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

### c. Parameter estimation

*σ*

^{2}is needed by Eq. (12). The variance of the step sizes of past PIT values

*σ*

^{2}since some steps will appear to be short steps when in fact they are longer steps that have reflected across a boundary.

*σ*, the expected value of

*σ*

_{0}can be computed by the integral over all possible PIT transitions from

*p*to

*q*:

*σ*[as required by Eq. (12)] was not possible analytically. We found that the following is a good approximation for

*σ*in terms of

*σ*

_{0}:

*σ*

_{0}values up to 0.3 (Fig. 4).

Standard deviation of PIT step sizes used in the transition function as a function of the measured standard deviation of step sizes of past PIT values (solid line) and the approximation *σ* = tan(3.5*σ*_{0})/3.5 (dashed line).

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

Standard deviation of PIT step sizes used in the transition function as a function of the measured standard deviation of step sizes of past PIT values (solid line) and the approximation *σ* = tan(3.5*σ*_{0})/3.5 (dashed line).

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

Standard deviation of PIT step sizes used in the transition function as a function of the measured standard deviation of step sizes of past PIT values (solid line) and the approximation *σ* = tan(3.5*σ*_{0})/3.5 (dashed line).

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

A summary of the process involved with updating a probabilistic forecast goes as follows: the variance of past PIT transition distances (*σ*_{0}) is computed by Eq. (15), which is used to approximate *σ* in Eq. (17); *σ* is then used in Eq. (12) to compute the transition function (*S _{n}*); and the transition function, combined with the latest available verifying PIT value, are used to calculate the PIT distribution (Ψ

*) by Eq. (14), which is used to update the original probabilistic forecast through Eq. (5).*

_{n}## 3. Operational test case

### a. Model data and configuration

Hourly surface temperature forecasts from the Mesoscale Compressible Community (MC2; Benoit et al. 1997) model, the fifth-generation Pennsylvania State University–National Center for Atmospheric Research (Penn State–NCAR) Mesoscale Model (MM5; Grell et al. 1994), and the Weather Research and Forecasting Model (WRF; Skamarock et al. 2005) were used for the case study period: 0000 UTC 1 September 2005–2300 UTC 1 February 2008. Two runs for the WRF model were used: one using Global Forecast System (GFS) initialization (WRFG) and the other using North American Mesoscale Modeling (NAM) model initialization (WRFN), while MC2 and MM5 both used NAM initialization. The MC2 and MM5 runs had outer domains with 108-km grid spacing, and inner 36-, 12-, and 4-km nested domains. The WRF runs were similar, but did not contain the 4-km nested domain. These domains composed our 14-member ensemble.

The models were initialized once per day at 0000 UTC, and hourly forecast output to 60 h was available. Probabilistic forecasts were generated for the same time period.

The model runs and probabilistic forecasts were generally completed by 0600 UTC, after which we used the latest observation to update the probabilistic forecasts valid for the subsequent 24 h. The update process was repeated each hour as a new observation became available. This was done until 0600 UTC the next day, when the probabilistic forecasts from the next forecast cycle were used. This means that for each forecast cycle twenty-four 24-h updated forecasts were produced, yielding 576 forecasts per day.

We tested the method on temperature probabilistic forecasts and observations for the following five airport stations in British Columbia, Canada: Vancouver International Airport station (CYVR), Abbotsford International Airport (CYXX), Victoria International Airport (CYYJ), Kamloops Airport (CYKA), and Kelowna Airport (CYLW). This group of stations provided a geographically diverse sample from within our smallest model domain.

### b. Original probabilistic forecasts

*φ*is a Gaussian PDF,

*x*is a temperature value,

*ξ*is the ensemble mean at time

_{t}*t*,

*μ*is a bias-correction term for the center of the distribution, and

*s*

^{2}is the variance of the distribution.

The parameters *μ* and *s* were computed separately for each station and separately for each of the 24 forecast hours. They were computed from a 40-day sliding window that ended the day before the forecast was initialized. A training period of 40 days is a compromise between the need to use statistics that adapt quickly to seasonal changes and the requirement to have enough data to robustly estimate the parameters. Similar training lengths have been used to produce probabilistic forecasts using Bayesian model averaging (Raftery et al. 2005; Sloughter et al. 2007).

The spread parameter *σ*_{0} (and consequently *σ*) was also computed separately for each station using a 40-day sliding window; however, all 24 forecast offsets for a given station were pooled together to give a more robust estimate.

## 4. Analysis

### a. Ignorance score

The total ignorance scores of the original probabilistic forecasts were computed by averaging ignorance scores over all forecast cycles, and forecast hours, but separately for each station and each value of *n* in order to see how far into the future a recent observation can improve the ignorance score.

Figure 5a shows the improvement in the ignorance score provided by the updated probabilistic forecast as a function of distance from the most recent observation. The updated forecasts at 0 h after an observation has been made has an ignorance score of −∞ since the true state is fully known. However, this update forecast is of no value since it is only available after the observation has been made. As the time since the most recent observation increases, the improvement in the ignorance score reduces down toward 0.

Verification statistics for the probabilistic forecasts used in the study. (a) Reduction (improvement) of the ignorance score by the updated probabilistic forecast relative to the original probabilistic forecast. Each of the five lines represents the score for a different station. (b) Percentage improvement in the CRPS by the updated probabilistic forecast. (c) PIT histogram of the updated forecasts (black bars) and the original forecasts (white bars), indicating the reliability of the forecasts. (d) Percentage improvement in mean absolute error of the median of the updated probability distributions relative to the median of the original distribution.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

Verification statistics for the probabilistic forecasts used in the study. (a) Reduction (improvement) of the ignorance score by the updated probabilistic forecast relative to the original probabilistic forecast. Each of the five lines represents the score for a different station. (b) Percentage improvement in the CRPS by the updated probabilistic forecast. (c) PIT histogram of the updated forecasts (black bars) and the original forecasts (white bars), indicating the reliability of the forecasts. (d) Percentage improvement in mean absolute error of the median of the updated probability distributions relative to the median of the original distribution.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

Verification statistics for the probabilistic forecasts used in the study. (a) Reduction (improvement) of the ignorance score by the updated probabilistic forecast relative to the original probabilistic forecast. Each of the five lines represents the score for a different station. (b) Percentage improvement in the CRPS by the updated probabilistic forecast. (c) PIT histogram of the updated forecasts (black bars) and the original forecasts (white bars), indicating the reliability of the forecasts. (d) Percentage improvement in mean absolute error of the median of the updated probability distributions relative to the median of the original distribution.

Citation: Weather and Forecasting 26, 4; 10.1175/WAF-D-11-00022.1

### b. Continuous ranked probability score

*H*(

*s*) is the Heaviside function defined by

Results for CRPS show a similar pattern as for the ignorance score, with the update method providing less improvement as the time since the most recent observation increases. The average CRPS of the five stations was 1.50°C and the update method brought the values down to 1.06° and 1.27°C at 3 and 6 h, respectively.

### c. Reliability

A probabilistic forecast is reliable (or calibrated) when the PIT values are uniformly distributed between 0 and 1 (Gneiting et al. 2007). This can be diagnosed by a simple histogram of verifying PIT values, as reliable forecasts will give a flat histogram.

Figure 5c shows the histogram of PIT values from all forecast hours, forecast cycles, stations, and values of *n*. The update method does not appear to degrade or improve the reliability of the original forecasts in any significant way.

### d. Mean absolute error

*F*giving the temperature value corresponding to a nominal proportion of 0.5.

_{t}The MAE of the deterministic forecast (Fig. 5d) showed a similar pattern to the ignorance score and CRPS, with the update method improving the MAEs from 2.07°C down to 1.42°C and 1.73°C at 3 and 6 h, respectively. Improvements in MAE suggest that the update method improves the central tendency of the probabilistic forecasts.

## 5. Conclusions

We have presented a method to update probabilistic forecasts of continuous variables based on recent observations, which should prove useful for a variety of nowcasting purposes. An alternative to this is to use data assimilation after new observations are available in order to create new initializations for the ensemble, followed by a complete rerun of the ensemble. This is considerably more expensive from a computational point of view, and may be infeasible for many operational systems.

The method improves the ignorance score and CRPS of the probabilistic forecasts, and improves the MAE of the median of the distribution significantly for forecasts up to 6 h after a recent observation, while not affecting reliability negatively.

Future work includes investigating the benefits of using a higher-order Markov model for modeling PIT transitions. In addition to accounting for the hour-by-hour correlation of PIT values, a higher-order Markov model can also incorporate any diurnal correlation of PIT values that may exist, thereby allowing for the potential to improve forecasts for 24 h after a recent observation.

## Acknowledgments

This research was made possible by funding from the Canadian Natural Science and Engineering Research Council, the Canadian Foundation for Climate and Atmospheric Science, and the BC Hydro and Power Authority. We also thank Kristian Soltesz and two reviewers for providing helpful insight.

## REFERENCES

AMS, 2008: Enhancing weather information with probability forecasts.

,*Bull. Amer. Meteor. Soc.***89**, 1049–1053.Anderson, J. L., 1996: A method for producing and evaluating probabilistic precipitation forecasts from ensemble model integrations.

,*J. Climate***9**, 1518–1530.Benoit, R., Desgagne M. , Pellerin P. , Pellerin S. , Chartier Y. , and Desjardins S. , 1997: The Canadian MC2: A semi-Lagrangian, semi-implicit wideband atmospheric model suited for finescale process studies and simulation.

,*Mon. Wea. Rev.***125**, 2382–2415.Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output.

,*Mon. Wea. Rev.***132**, 338–347.Delle Monache, L., Nipen T. , Liu Y. , Roux G. , and Stull R. , 2011: Kalman filter and analog schemes to postprocess numerical weather predictions.

, in press.*Mon. Wea. Rev.*Glahn, H., and Lowry D. , 1972: The use of model output statistics (MOS) in objective weather forecasting.

,*J. Appl. Meteor.***11**, 1203–1211.Gneiting, T., Balabdaoui F. , and Raftery A. E. , 2007: Probabilistic forecasts, calibration and sharpness.

,*J. Roy. Stat. Soc.***69B**, 243–268.Good, I. J., 1952: Rational decisions.

,*J. Roy. Stat. Soc.***14B**, 107–114.Grell, G. J., Dudhia J. , and Stauffer D. R. , 1994: A description of the fifth generation Penn State/NCAR Mesoscale Model (MM5). NCAR Tech. Rep. TN-398+STR, 122 pp.

Hamill, T. M., and Colucci S. J. , 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts.

,*Mon. Wea. Rev.***126**, 711–724.Homleid, M., 1995: Diurnal correction of short-term surface temperature forecasts using the Kalman filter.

,*Wea. Forecasting***10**, 689–707.Jewson, S., Brix A. , and Ziehmann C. , 2005:

*Weather Derivative Valuation*. Cambridge University Press, 373 pp.Karlin, S., and Taylor H. , 1981:

*A Second Course in Stochastic Processes*. Academic Press, 582 pp.Raftery, A. E., Gneiting T. , Balabdaoui F. , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles.

,*Mon. Wea. Rev.***133**, 1155–1174.Rose, C., 1995: A statistical identity linking folded and censored distributions.

,*J. Econ. Dyn. Control***19**, 1391–1403.Roulston, M. S., and Smith L. A. , 2002: Evaluating probabilistic forecasts using information theory.

,*Mon. Wea. Rev.***130**, 1653–1660.Skamarock, W. C., Klemp J. B. , Dudhia J. , Gill D. O. , Barker D. M. , Wang W. , and Powers J. G. , 2005: A description of the Advanced Research WRF version 2. NCAR Tech. Rep. TN-468+STR, 88 pp.

Sloughter, J. M., Raftery A. E. , and Gneiting T. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging.

,*Mon. Wea. Rev.***135**, 3209–3220.