Improved Seasonal Probability Forecasts

Viatcheslav V. Kharin Canadian Centre for Climate Modelling and Analysis, Meteorological Service of Canada, Victoria, British Columbia, Canada

Search for other papers by Viatcheslav V. Kharin in
Current site
Google Scholar
PubMed
Close
and
Francis W. Zwiers Canadian Centre for Climate Modelling and Analysis, Meteorological Service of Canada, Victoria, British Columbia, Canada

Search for other papers by Francis W. Zwiers in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

A simple statistical model of seasonal variability is used to explore the properties of probability forecasts and their accuracy measures. Two methods of estimating probabilistic information from an ensemble of deterministic forecasts are discussed. The estimators considered are the straightforward nonparametric estimator defined as the relative number of the ensemble members in an event category, and a parametric Gaussian estimator derived from a fitted Gaussian distribution. The parametric Gaussian estimator is superior to the standard nonparametric estimator on seasonal timescales. A statistical skill improvement technique is proposed and applied to a collection of 24-member ensemble seasonal hindcasts of northern winter 700-hPa temperature (T700) and 500-hPa height (Z500). The improvement technique is moderately successful for T700 but fails to improve Brier skill scores of the already relatively reliable raw Z500 probability forecasts.

Abstract

A simple statistical model of seasonal variability is used to explore the properties of probability forecasts and their accuracy measures. Two methods of estimating probabilistic information from an ensemble of deterministic forecasts are discussed. The estimators considered are the straightforward nonparametric estimator defined as the relative number of the ensemble members in an event category, and a parametric Gaussian estimator derived from a fitted Gaussian distribution. The parametric Gaussian estimator is superior to the standard nonparametric estimator on seasonal timescales. A statistical skill improvement technique is proposed and applied to a collection of 24-member ensemble seasonal hindcasts of northern winter 700-hPa temperature (T700) and 500-hPa height (Z500). The improvement technique is moderately successful for T700 but fails to improve Brier skill scores of the already relatively reliable raw Z500 probability forecasts.

1. Introduction

The chaotic nature of the climate system implies that predictions of future states are inherently uncertain. This is evident from short-term forecasts that are sensitive to small differences in initial conditions and have predictability limit of the order of two weeks. Nonetheless, experience had demonstrated that longer-term forecast of, for example, seasonal statistics (such as the seasonal mean) are possible with moderate levels of skill. The main source of this extended predictability on seasonal and longer timescales is thought to be lower boundary forcing associated with the distribution of persistent sea surface temperature and sea ice anomalies.

The utility of a single forecast of an atmospheric quantity is limited because the population of forecasts that are consistent with uncertainty in the initial conditions has a large spread at long lead times. Thus, forecast uncertainty must be quantified and communicated in order to extract useful information from long-lead forecasts. A common approach used in seasonal forecasting is to estimate the likelihoods of a small number of mutually exclusive events. Typically three equiprobable categories “below normal,” “near normal,” and “above normal” are considered. Forecast uncertainty is characterized by the discrete probability distribution of the three outcomes. This forecast format is motivated by the simplicity of the forecast presentation and is used by many operational centers that make seasonal forecasts.

The present study examines some aspects of seasonal probability forecasts in a very simple framework that assumes that seasonal variations are made up of a potentially predictable signal and unpredictable noise. This approach is not new and has been used in numerous studies in the past (e.g., Leith 1973; Madden 1976; Zwiers 1996). More recently it was used by Kharin and Zwiers (2001) and Kharin et al. (2001) to investigate some properties of deterministic seasonal forecasts.

Two aspects of probability forecast estimation are considered here. First, a parametric approach to deriving probabilistic information from an ensemble of deterministic forecasts is compared to the straightforward nonparametric approach based on the relative number of the ensemble members in a category of interest. The parametric probability estimator exploits explicitly the properties of the assumed underlying distribution of seasonal variations. Second, a statistical skill improvement technique for biased probability forecasts is proposed. The statistical procedure adjusts the signal-to-noise ratio in model forecasts to minimize the mean-square error of probability forecasts. The probability forecasts methods are tested on 24-member ensembles of northern winter seasonal hindcasts of 700-hPa temperature and 500-hPa height produced with the second-generation general circulation model of the Canadian Centre for Climate Modelling and Analysis (CCCma).

The paper is organized as follows. A simple statistical model of seasonal variability is introduced in section 2. This model is used to examine some probabilistic aspects of seasonal variations. Some of the properties of the Brier score, the mean-square error of probability forecasts, are discussed in section 3 in the context of the introduced simple model. Readers who are interested in more practical aspects may skip this section and go directly to section 4 where the probability forecast estimation problem is addressed and a statistical skill improvement technique is introduced. The results for CCCma seasonal hindcasts are reported in section 5. The paper is completed with a summary in section 6.

2. Probability in a simple seasonal variability model

In this section we use a very simple statistical model of atmospheric variations on seasonal timescales to examine some probabilistic aspects of observed seasonal variability.

We assume that seasonal variations of a scalar quantity X can be represented as a sum of a potentially predictable signal β and nonpredictable variability that will be treated as stochastic noise ϵ, that is,
Xβϵ.
The term β comprises all potentially predictable signals arising from external and internal sources. On seasonal and longer timescales, the primary predictability source is thought to be the atmospheric response to slowly varying lower boundary conditions such as sea surface temperature. The effect of the initial conditions is felt primarily in the first few weeks of model integrations. The noise term ϵ represents the unpredictable effects of day-to-day weather. It is assumed that all terms in (1) are stochastic processes with the zero mean, that is, E(β) = E(ϵ) = 0, where the symbol E denotes expectation, or the mean of a random variable.

The ratio of the variance of the potentially predictable signal σ2β to the total observed variance σ2X is often referred to as the potential predictability (Madden 1976; Zwiers 1996; Rowell 1998). This ratio is equal to the squared correlation between the potentially predictable signal β and the predictand X. Therefore, we will use the notation ρ2potσ2β/σ2X to denote the deterministic potential predictability of the system (1).

a. Probability of an “event”

We define an event Ω as an occurrence of X in an interval (x1, x2). Let FX(x|β) be the (cumulative) distribution of the predictand X conditional on a given value of β. Then the probability that X lies in an interval (x1, x2) conditional on β is given by
i1520-0442-16-11-1684-e2
If the noise term ϵ is Gaussian, this conditional probability may be written as
i1520-0442-16-11-1684-eq1
where FN is the distribution function of the standard normal distribution. The probability depends both on the value of β and the standard deviation of ϵ.
In seasonal forecasting it is common to define three equiprobable mutually exclusive and collectively exhaustive categories: below normal (B), near normal (N), and above normal (A). The boundaries xa and xb between the categories are usually defined in terms of the terciles of the normal distribution
xaxbx1/3σX
where x1/3F−1N(1/3) ≈ 0.43. The corresponding conditional probabilities of the below-normal, near-normal, and above-normal events are given by
i1520-0442-16-11-1684-e4
These probabilities depend on the standardized signal perturbations β/σβ and on the potential predictability of the system. The probability of the near-normal category is symmetric with respect to the sign of the potentially predictable signal perturbation, that is, PX(N|β; σϵ) = PX(N|−β; σϵ).

Figure 1 illustrates the probabilities of observing X in one of the three equiprobable categories conditional upon β = +σβ (right panel) and β = −0.5σβ (left panel). The potential predictability in these examples is set to ρ2pot = 0.3, which is within the range of typical values for the proportion of the boundary forced seasonal variability of 500-hPa geopotential height over North America in winter (Madden 1976; Zwiers 1996). The probabilities of the below-, near-, and above-normal categories are equal to the corresponding areas under the probability density curve of the noise as indicated by different shading intensities. Note that there is only weak dependence of PX(N|β) upon β in these examples, but rather strong dependence of PX(A|β) and PX(B|β) upon β.

The dependence of conditional probabilities PX(B|β), PX(N|β), and PX(A|β) in (4) on β is more fully illustrated in Fig. 2 for potential predictabilities ρ2pot = 0.15, 0.3, and 0.6. The latter value is typical for seasonal atmospheric variations in the Tropics, while the former two values are typical of northern midlatitudes. A similar diagram is presented in Leith (1973). The β values on the x axis are expressed in terms of the signal standard deviation σβ. The probability of the above-/below-normal event depends monotonically on the signal. The greater the potential predictability, the steeper the probability curves, which become, asymptotically, step functions in a fully deterministic system with probability values of either 0 or 1. The PX(N|β) curves are flat at the origin, and thus the conditional probability of the near-normal category is relatively insensitive to small signal perturbations. This is particularly evident in low or moderate potential predictability situations for which the near-normal probabilities are close to the climatological probability over a wide range of signal values. Thus knowing the predictable part of X generally does not help to greatly sharpen forecasts of the likelihood of the near-normal category beyond the climatology forecast. This is the main reason for the widely experienced seasonal forecasting phenomenon of near-normal forecasts that are not much more skillful than the climatology (e.g., Van den Dool and Toth 1991).

The insensitivity of the probability of the near-normal category to signal perturbations has led the U.S. National Weather Service (Epstein 1988) to always assign the climatological probability to the near-normal category. This restriction has allowed them to reduce the presentation of the three-category long-range probability outlooks to a single map that displays the forecast probabilities for the more likely of the two extreme categories.

b. Skill score of probability forecasts

Let P be a probability forecast, and let E be a random event variable, or predictand, that takes the value 1 when a forecast event occurs and 0 otherwise. A standard measure of the accuracy of probability forecasts is the mean-square error, which is normally referred to as the Brier score (B) in probabilistic forecasting. The Brier score is defined as
BEPE2
The range of B is the closed interval [0, 1]. The Brier score is a negatively oriented accuracy measure meaning that smaller B values indicate better forecasts, and vice versa. The corresponding Brier skill score (BSS) is defined as
i1520-0442-16-11-1684-e6
where σ2E = PC(1 − PC) is the variance of E, and PC = E(E) is the climatological frequency of the forecast event. Note that the variance σ2E is equal to the Brier score of the probability forecast that always predicts the climatological probability PC. The range of the BSS is the closed interval [{1 − 1/[PC(1 − PC)]}, 1]. For example, in the case of the three-event equiprobable partition of the observed value space, the climatological frequency of every category is PC = 1/3 so that BC = 2/9, and the range of the Brier skill score is the interval [−7/2, 1]. The BSS equals 1 for a perfect probability forecast P = E, 0 for the climatological probability forecast P = E(E), and is negative for forecasts that have larger Brier scores than that of the climatological probability forecast. Further discussion of this and other skill measures of a probability forecast can be found, for example, in Wilks (1995) and Pan and Van den Dool (1998).

3. A probabilistic interpretation of potential predictability

In this section we describe a measure of probabilistic potential predictability in the context of the simple statistical model described in section 2. This measure is derived from the Brier score decomposition of Murphy (1973) and Murphy and Winkler (1987).

Formally, the Brier score in (5) can be rewritten in terms of the joint probability density function f(E, P) of the predictand E and its forecast P as follows:
i1520-0442-16-11-1684-eq2
Based on the calibration-refinement factorization of the joint distribution f(E, P) = f(E|P)f(P) (Murphy and Winkler 1987), where f(E|P) is the conditional probability distribution of observations E for each possible forecast P and f(P) is the marginal probability density function of the forecasts, the Brier score can be decomposed into three nonnegative terms:
i1520-0442-16-11-1684-e7
(see appendix A for derivation details). The three terms are known as the reliability, the resolution, and the uncertainty. The reliability term Brel summarizes the calibration, or the conditional bias, of the forecasts and is equal to the weighted average of squared differences between the forecast and conditional observed probabilities. The resolution term Bres characterizes the amplitude of the deviations of the conditional probability from the climatological frequency PC. The uncertainty term Bunc is equal to the variance of E and as such characterizes the properties of the observed system only. The corresponding BSS can be written in terms of these three components as
i1520-0442-16-11-1684-e8
Assume that the event probability PX ≡ Prob(E = 1) and its probability forecast P are functions of the potentially predictable signal β only. This is certainly a somewhat idealized situation as in practice the probability forecast is at least subject to sampling variability. With this assumption and given the probability density function fβ(β) of the potentially predictable signal, the Brier score decomposition (7) can be rewritten as
i1520-0442-16-11-1684-e9
Note that under the assumptions made above the only term in (9) that depends explicitly on the probability forecast is the nonnegative reliability term Brel. The other two terms are functions only of the properties of the observed system.
The best probability forecast that minimizes the Brier score is given by P(β) = PX(β) ≡ Ppot. This forecast can be seen as the probabilistic analog of the best deterministic forecast in system (1), which is given by β. The corresponding BSSpot is given by
i1520-0442-16-11-1684-e10
Analogous to the deterministic potential predictability ρ2pot, the BSSpot (10) may be referred to as the potential predictability in the probabilistic formulation.
Figure 3 shows BSSpot for the above-/below-normal and near-normal categories as a function of the deterministic potential predictability ρ2pot for the system (1) in the Gaussian setting, that is, when both β and ϵ are normally distributed. The BSSpot in this case is given by
i1520-0442-16-11-1684-e11
where PX(β) is defined in (4). The expression on the right-hand side of (11) was evaluated with the aid of an algebraic manipulator (Maple V, see Char et al. 1991). Initially, BSSpot increases more slowly than ρ2pot. The dependence on ρ2pot is particularly weak for the near-normal category for small to moderate levels of potential predictability. This occurs because the probability of the near-normal category depends only weakly on the amplitude of the potentially predictable signal. The diagram demonstrates that there is a one-to-one correspondence between the potential predictability in the deterministic sense and its counterpart in the probabilistic formulation. It also indicates that this relationship depends on the event definition. A given level of potential predictability in the deterministic sense corresponds to different levels of “probabilistic” potential predictability depending upon how the event of interest is defined.

4. Probability forecasts and their improvement

In this section we describe two methods for producing a probability forecast from an ensemble of deterministic forecasts and evaluate the skill of these forecasts in the idealized setting described in section 2. We also describe a method for the statistical improvement of probability forecasts.

a. Probability estimators

We begin by assuming that {Fi, i = 1, … , N} is an ensemble of N forecasts produced with a perfect forecast system that behaves as (1).

In these circumstances a simple and straightforward estimator of the event probability PX in (2) is obtained as
BnN,
where n is the number of ensemble members in the event category Ω = B, N, or A. The subscript B is used to indicate the fact that the number n is distributed according to the binomial distribution. For a given event probability PX, the mean and the variance of the binomial random variable n are given by
i1520-0442-16-11-1684-eq3
from which it immediately follows that
i1520-0442-16-11-1684-eq4
Thus, B(Ω) is an unbiased and consistent estimator of the true event probability PX.
The nonparametric estimator (12) is very general as no specific assumptions are made about the underlying distribution of X. Another estimate of PX may be obtained by fitting a suitable distribution to the available data sample and then estimating the event probability by computing it from the fitted distribution. In particular, when ϵ in (1) behaves as a Gaussian random variable, the probabilities of the below-normal, near-normal, and above-normal categories are given by (4). Thus, given estimates of β and σϵ, the probabilities can be estimated as
i1520-0442-16-11-1684-e13
where the subscript G is used to indicate the Gaussian nature of the underlying statistical model. Given N independent deterministic forecasts {Fi, i = 1, … , N} produced by a perfect forecast system, unbiased estimators of β and σϵ are given by
i1520-0442-16-11-1684-e14
When more than one sample of ensemble forecasts is available, for example, for different realizations of the potentially predictable signal {βt, t = 1, … , T}, and assuming that the statistical properties of the noise term are independent of β, all samples can be utilized to estimate σ2ϵ by applying a standard analysis of variance (e.g., von Storch and Zwiers 1999). The unbiased estimator of σ2ϵ is then given by
i1520-0442-16-11-1684-e16
where {Fi,t, i = 1, … , N} is the tth ensemble forecast and β̂t = 1/N ΣNi=1 Fi,t. Obviously, estimator (15) is obtained from (16) for T = 1.

Variance estimator (16) is useful when historical ensemble forecasts are available, and when the dependence of σ2ϵ on β may be neglected. Since the overall sample size is larger in this case, one expects a more accurate estimate of σ2ϵ. On the other hand, the assumptions required for estimator (15) are less stringent so it can be used even when σ2ϵ is thought to be dependent upon β.

In appendix B it is shown that, as expected, the Gaussian estimator is preferable to the nonparametric estimator when the data are consistent with the system (1) in the Gaussian setting. The Gaussian estimator has smaller rms errors in these circumstances. It is slightly biased for small ensemble sizes, but the bias is small relative to the corresponding rms errors.

b. Skill scores of biased probability forecasts

Real world forecasts are always subject to errors and biases that result from model errors and errors in specifying the initial and boundary conditions. In its simplest form, the effect of biases and errors can be represented by a forecast of the form
F
where σ2ϵ = σ2ϵ. That is, the potentially predictable signal that is contained in the forecast differs from the observed counterpart by the factor a, and the amplitude of the unpredictable variations in the forecast differs from that of the observed counterpart by the factor b. In the following we consider how such model biases affect three-category probability forecasts and their skill scores.
The first step in constructing probability forecasts is often to ensure that the climatological frequency of the forecast categories is equal to that of their observed counterparts. This can be achieved by adjusting the boundaries xa and xb between the categories for the model forecasts. In the Gaussian setting this is equivalent to standardizing, or equalizing, the observed and model total variances. This step is sometimes referred to as the variance inflation (von Storch 1999). Without this step, the climatological frequency of the near-normal category would be overestimated at the expense of the below- and above-normal categories when the model variance is undersimulated, and vice versa when it is overestimated. Thus in the following we require that
σ2Xσ2βσ2ϵa2σ2βb2σ2ϵσ2F
With this assumption the noise bias factor b is related to the signal bias factor a as
i1520-0442-16-11-1684-e19
The parameter a, which is an indicator of the strength of the signal-to-noise ratio in forecasts relative to that in the observations, can take values in the range [−1/ρpot, 1/ρpot]. The lower and upper bounds correspond to forecasts with no unpredictable component, while the value a = 0 corresponds to no skill forecasts. Negative values of a indicate that the sign of the forecast signal is opposite to that observed. The perfect forecast system is obtained for a = 1.
The conditional probabilities of the three equiprobable categories obtained from the biased forecast system (17) in the Gaussian setting are given by
i1520-0442-16-11-1684-e20

Figure 4 illustrates the behavior of the Brier skill score of biased probability forecasts (20) as a function of the model bias a for ρ2pot = 0.3 and 0.6. The corresponding calculations were made by substituting PX(β) in (11) with the biased probability forecasts (20) and evaluating the resulting expressions with Maple V software. The Brier skill score for the near-normal category varies symmetrically about the origin and has two maxima at a = 1 and a = −1. This is because the probability of the near-normal category is symmetric with respect to the sign of the potentially predictable signal, and it therefore depends only on the absolute value of the signal but not on its sign. The skill of the near-normal forecast is rather insensitive to the bias and takes values that are only slightly above 0 for a wide range of positive and negative values of a. The Brier skill score for the above- and below-normal categories is more sensitive to the model errors. The maximum score is attained for a = 1 and the score is negative when a is negative.

Note that the above results are obtained for the case when β and σϵ and the corresponding conditional probabilities (20) are known exactly. In practice, however, all estimates are subject to sampling variability due to finite sample sizes. Therefore, the dependence of the Brier skill score on the forecast bias a obtained for sampled probability forecasts may differ somewhat from that in Fig. 4. In particular, since the uncertainty in estimating the potentially predictable signal effectively acts as an additional noise, the maximum Brier skill score is achieved for a > 1 for finite sample probability forecasts, that is, when the amplitude of the potentially predictable signal is oversimulated in a forecast system.

c. Improved probability forecasts

Pan and Van den Dool (1998) discuss a procedure that attempts to improve probability forecasts based on the count method (12) by combining ensemble probabilities with a single forecast probability that is derived from the model forecast history. Here we consider a somewhat different approach.

Figure 4 suggests that it may be possible to improve biased probability forecasts by using the Gaussian estimator and adjusting the estimates of the forecast signal β̂′ and standard deviation σ̂ϵ with factors â and b̂, respectively, to maximize the Brier skill score. The resulting improved probability estimators have the form
i1520-0442-16-11-1684-e21
The adjusting factors â and are chosen to satisfy relationship (18), that is, â2σ̂2β + 2σ̂2ϵ = σ̂2X.

We use a leave-one-out cross-validation procedure (e.g., Barnston 1992) to validate the corrected seasonal probability hindcasts that are obtained with (21). The collection of hindcasts {{Fi,t, i = 1, … , N}, t = 1, … , T} and verifying observations {Xt, t = 1, … , T} for a given season and years t = 1, … , T are repeatedly divided into a T − 1 yr training dataset and a 1-yr validation dataset. The boundaries between the forecast categories are determined as in (3) where the observed standard deviation is estimated from the training period. The adjusting factors â and are selected to minimize the Brier score sum B[I(B)] + B[I(N)] + B[I(A)] in the training data. In principle, separate factors â and could be determined for each category. However, we expect that the optimal values of â and are similar for all categories. Our past experience dictates the use of very few parameters in statistical improvement schemes (Kharin and Zwiers 2002). We have therefore reduced the problem to one of estimating a single pair of parameters that minimizes the three-category Brier score sum in the training data. These factors are then used to adjust the values of β̂′ (14) and σ̂ϵ (16) to make the probability hindcasts for the withheld period. The whole procedure is repeated for each year. Finally, skill scores are calculated for the resulting collection of the corrected hindcasts.

A technical concern is that the Brier score does not have a parabolic dependence on the rescaling factor â. Consequently, the Brier score minimization problem cannot be reduced to that of solving a linear equation as in a standard regression analysis. In this study we applied a golden section search algorithm (Press et al. 1992) to the above minimization problem. Convergence problems were not encountered.

Appendix C summarizes the results of a number of Monte Carlo experiments designed to explore the performance of the proposed statistical skill improvement technique for the forecast system (17) in the Gaussian setting. The statistical method was able to improve the Brier score when biases in the signal-to-noise ratio in the forecast system are sufficiently large. However, the quality of probability forecasts produced by a nearly perfect forecast system is degraded by sampling errors that are the consequence of finite ensemble sizes and sample lengths, a result that is not unexpected given our experience with deterministic statistical forecast improvement.

5. Example—HFP seasonal hindcasts

We used the methods discussed above to analyze 24-member ensemble hindcasts of 26 northern winters [December–January–February (DJF)] produced for the 1969–95 period with the second-generation general circulation model of the Canadian Centre for Climate Modelling and Analysis (CCC AGCM2; McFarlane et al. 1992). These integrations were performed as the part of the Canadian Historical Forecast Project (HFP; Derome et al. 2001).

CCC AGCM2 is a spectral model with a T32 truncation and 10 vertical levels. It contains a comprehensive suite of physical parameterizations of subgrid-scale processes. The model has been used in a number of numerical integrations, several transient climate change simulations (Boer et al. 2000a,b), and the Atmospheric Model Intercomparison Project–like integrations with prescribed observed lower boundary conditions (Zwiers et al. 2000).

The HFP experimental design generally follows that suggested by Boer (1993) for monthly forecasts. The approach for the choice of initial conditions is known as lagged-average forecasting (Hoffman and Kalnay 1983). Each ensemble member is initialized from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalyzed fields (Kalnay et al. 1996) lagged at 6-h intervals prior to the forecast season. That is, the first member is initiated just 6 h before the forecast season, the second member is initialized 12 h before the forecast period, and so on. The last, 24th member is initialized 6 days before the forecast season. The monthly mean sea surface temperature anomalies observed in the month prior to the forecast period are persisted throughout the forecast season. These anomalies are obtained from the Global Sea Ice and Sea Surface Temperature dataset (GISST version 2.2; Rayner et al. 1996). Sea ice extent is specified from climatology. The initial snow line in the Northern Hemisphere is specified from NCEP–NCAR satellite observations for the week before the forecast period. The soil conditions are initialized from climatology.

a. Brier skill scores

We consider skill scores for two regions, the Tropics (30°S–30°N) and the North American sector (20°–80°N, 150°–45°W). The skill score for a given region is calculated by first area averaging the Brier score for the region and then substituting the averaged value into (6).

Figure 5 displays the regional Brier skill score obtained for probability hindcasts of DJF T700 and Z500 made with the four forecast variants described previously. The regional BSS of the nonparametric probability hindcast B is displayed by the bars with the darkest shading. The next two more lightly shaded bars indicate the regional BSS of the Gaussian probability hindcasts G for T = 1 and T = 26, respectively. The most lightly shaded bars indicate the cross-validated BSS of the improved probability hindcast I obtained as discussed in section 4c. The latter used all available hindcasts in the training period (T = 25) to estimate the unpredictable variance σ2ϵ.

There are several things to point out in Fig. 5.

  • The Gaussian hindcast G results in slightly but consistently better Brier scores than the nonparametric hindcast B. Apparently, the Gaussian assumption is an acceptable approximation for the unpredictable variability of T700 and Z500 on seasonal timescales.

  • The Gaussian hindcast G that uses all available training data to estimate the unpredictable variance (T = 26) generally results in slightly better regional Brier scores than the Gaussian hindcast that uses only the current ensemble (T = 1) to estimate the unpredictable variance. The additional information obtained by using all available training data apparently overcomes the possible effects of dependence of the noise distribution on the potentially predictable signal.

  • The statistical improvement technique results in better cross-validated regional Brier skill scores for DJF T700. However, slightly worse scores are obtained for the DJF Z500 probability hindcasts. We will show that the unadjusted probability hindcasts for T700 are less reliable than those for Z500. Consequently, the temperature hindcasts benefit from the improvement technique, while the more reliable geopotential height hindcasts are slightly degraded because of sampling errors.

  • The below-normal category for temperature and height is consistently more skillful than its above-normal counterpart. The near-normal category is less skillful than the other two categories.

The geographical distribution of the Brier skill scores of the nonparametric probability hindcasts B for DJF T700 and the corresponding improved probability hindcasts I are shown in Fig. 6. High skill values are confined mainly to the Tropics. There are extensive areas of negative skill for the nonparametric hindcasts in the extratropics. While the statistical improvement technique is generally successful in improving negative skill scores, it has either only modest success or fails to improve positive skill scores.

The probability hindcasts for the near-normal category outperform the climatological forecast in only a few tropical areas. Elsewhere, hindcast skill in this category is near or below 0. The below-normal category is more skillful than the above-normal category in both the Tropics and in the subtropical regions. This asymmetry is apparently related to the asymmetry in the lower boundary forcing in these areas, mainly associated with differences in the frequency of El Niño–La Niña events during the HFP period, and to asymmetry in the atmospheric response to the lower boundary forcing. For example, Fig. 7 shows time series of standardized DJF T700 anomalies in the eastern tropical Pacific at 5.6°N, 86.25°W derived from NCEP–NCAR reanalysis and HFP hindcasts. Most of the observed below-normal events at this location in the 1970–95 record are relatively strong and well simulated by all HFP ensemble members. On the contrary, the anomalies in the above-normal category are less pronounced, except for the 1982/83 El Niño event. Correspondingly, the hindcast performance for the above-normal category is noticeably poorer than that for the below-normal category. The corresponding Brier skill scores for the three categories are indicated at the top of the graph.

The magnitude of the signal scaling factor â is found to be less than 1, while that of the corresponding noise scaling factor is found to be greater than 1 over most of the globe (not shown). This is not an unexpected result, and it does not necessarily imply that the signal-to-noise ratio is undersimulated in the model compared to that in the observations. The amplitude of the scaling factor â is also affected by the relationship between the model-simulated and observed potentially predictable signals. Indeed, consider a forecast
Fβϵ
with σβ = σβ and σϵ = σϵ but correlation ρβ,β < 1. That is, the typical amplitude of the predictable and unpredictable components are correctly simulated, but the actual signal anomalies are not perfectly reproduced by the forecast model. In this case, only a fraction (ρ2β,β) of the observed signal variability can be predicted linearly via the relationship:
βρβ,ββϵβ
The term ϵβ includes the portion of the observed signal that cannot be predicted in this way. Thus, we can rewrite (1) as X = ρβ,ββ′ + ϵ1, where ϵ1 = ϵ + ϵβ. Therefore, the forecast signal β′ must be scaled by ρβ,β, while σϵ must be scaled up correspondingly to obtain a forecast with an unbiased estimate of the predictable signal fraction and unpredictable variance. The lower the correlation between the observed and model-simulated signals, the smaller the scaling factor â becomes. As a result, the corrected forecasts have a reduced signal-to-noise ratio. Thus, the effect of the improvement technique is similar to the forecast “randomization” approach (adding additional noise to the forecast) advocated by von Storch (1999).

Figure 8 displays the regional Brier skill scores of the HFP probability hindcasts in the Tropics and in the North American sector as a function of the ensemble size. The nonparametric hindcast (B, dotted lines) is substantially affected by sampling errors when ensembles are small. The dashed and solid line curves indicate the Brier skill scores of the Gaussian hindcasts G with T = 26 and improved hindcasts I, respectively. The performance gain is substantial for small ensemble sizes in all three categories and is particularly large for the near-normal category. The improved T700 hindcasts are more skillful than the other two unadjusted hindcasts for all ensemble sizes. Evidence of performance differences between G and I is less pronounced for Z500. The unadjusted Gaussian hindcast G for Z500 marginally outperforms the statistically improved hindcast I for ensembles larger than approximately N = 10.

b. Probability hindcast attributes

Attributes diagrams (Hsu and Murphy 1986; Wilks 1995) are another way to represent the quality of probabilistic forecasts. Essentially an attributes diagram displays the dependence of the relative frequency of an observed event on the forecast probability. This is done by categorizing all probability forecasts in a number of bins and estimating the relative frequency of forecasts and the corresponding observed event relative frequency for each bin.

Attributes diagrams for the HFP T700 and Z500 probability hindcasts in the Tropics are shown in Figs. 9 and 10. The points on the diagrams are the area-averaged observed event relative frequencies plotted as a function of forecast probabilities. These points would lie on the diagonal connecting the points (0, 0) and (1, 1) for perfectly reliable forecasts. The attributes diagrams provide a geometrical interpretation of the Brier score decomposition (7). The reliability term Brel that characterizes conditional biases of probability forecasts is the weighted average of the squared vertical distances between the points and the diagonal. The corresponding “weights” are represented by the histogram that displays the relative frequency of probability forecasts in the corresponding probability bins. The horizontal “no-resolution” line is drawn at the climatological probability and indicates forecasts that are unable to resolve occasions where the event is more or less likely than the climatological frequency. The resolution term Bres is the weighted average of the squared vertical distances between the points and the no-resolution line. Points in the shaded areas contribute positively to the Brier skill score.

Figures 9 and 10 (top row) are attributes diagrams for the nonparametric probability hindcasts B. Figures 9 and 10 (middle row) are for the parametric probability hindcasts G obtained for T = 26. Figures 9 and 10 (bottom row) are for the corrected probability forecasts I. The findings in Figs. 9 and 10 are summarized below.

  • The nonparametric probability hindcasts B of DJF T700 suffer from a significant bias and are not very reliable.

  • The parametric probability hindcasts G improve skill slightly over that in the nonparametric probability hindcasts. This improvement is greater for small ensemble sizes (not shown), which is consistent with the results in section 5a (cf. Fig. 8).

  • The statistical improvement technique reduces the forecast bias and improves the reliability of T700 hindcasts.

  • The unadjusted Z500 probability hindcasts are already fairly reliable and the statistical improvement technique does not greatly increase reliability.

  • The improved probability hindcasts are more cautious in the sense that hindcasts close to the climatological probability become more frequent and extreme hindcasts become less frequent.

We draw the same conclusions from attributes diagrams for T700 and Z500 probability hindcasts in the North American sector (not shown).

6. Summary

A simple statistical model of seasonal variations is considered in order to explore the properties of seasonal probability forecasts. The Brier score, which is often used to evaluate the quality of the probability forecasts, is discussed in the context of this simple framework.

The three equiprobable category classification that is common in seasonal forecasting is considered. It is demonstrated that the probability of the near-normal category is relatively insensitive to signal perturbations in low to moderate potential predictability cases. As a result, the probability of the near-normal category is close to the climatological probability so that it is not easy to predict the outcome for the near-normal event with high confidence. Therefore, near-normal forecasts are not very skillful.

Several approaches for deriving probabilistic information from an ensemble of deterministic forecasts are discussed and their sampling properties compared. When the noise is approximately normally distributed, the probability forecasts obtained with the fitted Gaussian distribution have better sampling properties than those obtained by counting the number of the ensemble members in the event category. Using the Gaussian probability estimator, a statistical skill improvement technique is proposed to reduce biases in probability forecasts. The approach is based on the minimization of the Brier score (mean-square error of the probability forecasts) by adjusting the amplitude of the model-simulated potentially predictable signal and noise variance estimates.

The proposed probability estimators and statistical improvement technique are tested on the 24-member ensemble of northern winter seasonal hindcasts of 700-hPa temperature and 500-hPa height produced with the second generation CCCma general circulation model. The Gaussian hindcasts result in somewhat better Brier scores than the nonparametric hindcasts. Apparently, the assumption of Gaussian noise is an acceptable approximation for T700 and Z500 on seasonal timescales. Also, the Gaussian hindcasts that use all available data to estimate the noise variance have slightly better skill scores.

The improvement technique is moderately successful for T700 but generally fails to improve the Brier skill scores of unadjusted Z500 probability hindcasts. We found that the unadjusted probability hindcasts for T700 are less reliable than those for Z500. Consequently, the temperature hindcasts benefit from the improvement technique, while the more reliable geopotential height hindcasts are slightly degraded because of sampling errors.

Acknowledgments

We thank George Boer and two anonymous reviewers for careful reading of an earlier version of the manuscript and providing us with many constructive and helpful comments. We also thank Normand Gagnon for producing the 24-member hindcasts employed in this study.

REFERENCES

  • Barnston, A. G., 1992: Correspondence among the correlation, RMSE, and Heidke forecast verification measures; refinement of the Heidke score. Wea. Forecasting, 7 , 699709.

    • Search Google Scholar
    • Export Citation
  • Boer, G. J., 1993: Systematic and random error in an extended-range forecasting experiment. Mon. Wea. Rev., 121 , 173188.

  • Boer, G. J., G. M. Flato, and D. Ramsden, 2000a: A transient climate change simulation with greenhouse gas and aerosol forcing: Projected climate for the twenty-first century. Climate Dyn., 16 , 427450.

    • Search Google Scholar
    • Export Citation
  • Boer, G. J., G. M. Flato, M. C. Reader, and D. Ramsden, 2000b: A transient climate change simulation with greenhouse gas and aerosol forcing: Experimental design and comparison with the instrumental record for the twentieth century. Climate Dyn., 16 , 405425.

    • Search Google Scholar
    • Export Citation
  • Char, B. W., K. O. Geddes, G. H. Gonnet, B. L. Leong, M. B. Monagan, and S. M. Watt, 1991: Maple V library reference manual. Springer, 698 pp.

  • Derome, J., G. Brunet, A. Plante, N. Gagnon, G. J. Boer, F. W. Zwiers, S. Lambert, and H. Ritchie, 2001: Seasonal predictions based on two dynamical models. Atmos.–Ocean, 39 , 485501.

    • Search Google Scholar
    • Export Citation
  • Epstein, E. S., 1988: Long-range prediction: Limits of predictability and beyond. Wea. Forecasting, 3 , 6975.

  • Hoffman, R. N., and E. Kalnay, 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus, 35A , 100118.

  • Hsu, W-R., and A. H. Murphy, 1986: The attributes diagram: A geometrical framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2 , 285293.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., and Coauthors. 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437471.

  • Kharin, V. V., and F. W. Zwiers, 2001: Skill as a function of time scale in ensembles of seasonal hindcasts. Climate Dyn., 17 , 127141.

    • Search Google Scholar
    • Export Citation
  • Kharin, V. V., and F. W. Zwiers, 2002: Climate predictions with multimodel ensembles. J. Climate, 15 , 793799.

  • Kharin, V. V., F. W. Zwiers, and N. Gagnon, 2001: Skill of seasonal hindcasts as a function of the ensemble size. Climate Dyn., 17 , 835843.

    • Search Google Scholar
    • Export Citation
  • Leith, C. E., 1973: The standard error of time-average estimates of climatic means. J. Appl. Meteor., 12 , 10661068.

  • Madden, R. A., 1976: Estimates of the natural variability of time-averaged sea-level pressure. Mon. Wea. Rev., 104 , 942952.

  • McFarlane, N. A., G. J. Boer, J-P. Blanchet, and M. Lazare, 1992: The Canadian Climate Centre second-generation general circulation model and its equilibrium climate. J. Climate, 5 , 10131044.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115 , 13301338.

  • Pan, J., and H. Van den Dool, 1998: Extended-range probability forecasts based on dynamical model output. Wea. Forecasting, 13 , 983996.

    • Search Google Scholar
    • Export Citation
  • Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, 1992: Numerical Recipes in FORTRAN: The Art of Scientific Computing. 2d ed. Cambridge University Press, 963 pp.

    • Search Google Scholar
    • Export Citation
  • Rayner, N. A., E. B. Horton, D. E. Parker, C. K. Folland, and R. B. Hackett, 1996: Version 2.2 of the Global Sea–Ice and Sea Surface Temperature data set, 1903–1994. Hadley Centre Climate Research Tech. Note CRTN 74, 21 pp.

  • Rowell, D. P., 1998: Assessing potential predictability with an ensemble of multidecadal GCM simulations. J. Climate, 11 , 109120.

  • Van den Dool, H. M., and Z. Toth, 1991: Why do forecasts for “near normal” often fail? Wea. Forecasting, 6 , 7685.

  • von Storch, H., 1999: On the use of “inflation” in statistical downscaling. J. Climate, 12 , 35053506.

  • von Storch, H., and F. W. Zwiers, 1999: Statistical Analysis in Climate Research. Cambridge University Press, 494 pp.

  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.

  • Zwiers, F. W., 1996: Interannual variability and predictability in an ensemble of AMIP climate simulations conducted with the CCC GCM2. Climate Dyn., 12 , 825848.

    • Search Google Scholar
    • Export Citation
  • Zwiers, F. W., X. Wang, and J. Sheng, 2000: The effects of specifying bottom boundary conditions in an ensemble of atmospheric GCM simulations. J. Geophys. Res., 105 , 72957315.

    • Search Google Scholar
    • Export Citation

APPENDIX A

Brier Score Decomposition

Given the joint probability density distribution f(E, P) of the predictand E and its forecast P the Brier score can formally be written as
i1520-0442-16-11-1684-eq7
The joint density distribution may be decomposed as f(E, P) = f(E|P)fP(P), where f(E|P) is the conditional probability distribution of the event E for each possible forecast P and fP(P) is the marginal probability density function of the forecasts. Because E is a discrete random variable, the conditional probability density function f(E|P) must be expressed as a weighted sum of Dirac delta functions. In the case of a dichotomous event, E takes only values 1 or 0 depending upon whether the event occurred. In this case the conditional distribution f(E|P) is completely described by a single parameter, the conditional probability of the event occurrence E(E|P). Also notice that ∫p fP(P)E(E|P) dP = E(E) ≡ PC. Using these expressions we have
i1520-0442-16-11-1684-eq8
It is readily shown that the uncertainty term above is equal to the variance of E.

APPENDIX B

Probability Forecast Sampling Properties

In this section we briefly discuss the sampling properties of the probability forecasts introduced in section 4a. We consider three estimators: the nonparametric forecast B (12), the Gaussian forecast G (13), where β̂ and σ̂ϵ are obtained from (14) and (15), and the Gaussian forecast G where σ̂ϵ is obtained from (16) for T = 25.

Figure B1 illustrates several sampling properties of these forecasts. In the case of B, the properties are derived directly from the corresponding binomial distribution, while those of PG are obtained from Monte Carlo simulations that produced 10 000 realizations of G with T = 1, and 1000 realizations of G with T = 25.

As we noted earlier, nonparametric forecast B is unbiased. The “sawtooth” curves that describe its 80% confidence intervals are the result of the discrete domain range with values that are multiples of 1/N. The bias of G is generally small for ensemble sizes larger than 5. The forecast with T = 25 has a noticeable bias for small ensembles, which is nonetheless small relative to the rms errors. As expected, G with T = 25 has the smallest sampling errors. Forecasts of the near-normal category made with G benefit substantially from larger ensembles. The rms errors of forecasts of the above-normal category made with G for T = 1 and T = 25 become nearly the same for ensembles with more than 10 members.

APPENDIX C

Sampling Properties of Statistically Improved Probability Forecasts

Monte Carlo experiments were also used to explore the performance of the statistical improvement technique proposed in section 4c. Some results of Monte Carlo simulations are summarized in Figs. C1 and C2. Here we discuss the results only for the below (above) normal category.

Figure C1 shows the Brier skill score of the statistically improved probability forecasts as function of the ensemble size on the horizontal axis and the model bias on the vertical axis for the three values of the potential predictability ρ2pot = 0.15, 0.3, and 0.6, with training period length T = 25. The model bias is expressed in terms of the adjusting factor a. The extreme values of a on the plot indicate a fully deterministic model forecast system. Negative values of a indicate that the simulated potentially predictable signal is of opposite sign. A no-skill forecast system is obtained for a = 0. A system with the perfect signal-to-noise ratio (a = 1) is indicated by the horizontal dashed lines. The shaded areas indicate the parameter space for which the statistically improved probability forecasts have better Brier skill scores than the unadjusted Gaussian forecasts.

As one would expect, the performance of the statistical technique improves with the longer training periods (not shown) and larger ensembles. The quality of the “improved” forecasts is worse than that of the unadjusted forecasts in areas around the “perfect-model” line and near the “no-skill” line. It is understandable that when a forecast system is nearly perfect, that is, when the signal-to-noise ratio in the model is close to that in the observations, sampling errors involved in a statistical improvement technique, by trying to adjust this ratio, will only degrade the quality of already good forecasts. Sampling errors become particularly dominant in a forecast system with a small signal-to-noise ratio.

Figure C2 presents the results of the Monte Carlo simulations in a slightly different format. It shows the Brier skill score of the unadjusted and improved probability forecasts as a function of the model bias for the potential predictability ρ2pot = 0.3 and ensemble sizes N = 6, 24, and 256. The training length in all cases is T = 25. The main source of sampling errors in large ensembles such as N = 256 is associated with the finite length of the training periods. While it is unlikely that the training period can be substantially extended in the near future, there is no fundamental barrier to increasing the ensemble size, apart from the obvious cost and performance issues. Thus this exercise allows us to explore the gains in skill that might be achieved by the statistical technique in the best-case scenario of a very large ensemble.

The skill score of the unadjusted Gaussian forecasts is indicated by the solid line, while the dashed line indicates the skill score of the improved probability forecasts. The horizontal dotted line indicates the potential Brier skill score BSSpot that can be achieved for the specified potential predictability. Vertical horizontal lines indicate a forecast system with a perfect signal-to-noise ratio. Negative bias a indicates that the sign of the simulated potentially predictable signal is reversed. The statistical method performs rather well when the amplitude of the signal in the model is oversimulated, or is of opposite sign. However, the statistically improved forecast is often inferior to the raw forecast in a wide range of positive values between 0 and a point just above the perfect signal-to-noise ratio. Increasing the ensemble size leads not only to better skill scores of the unadjusted forecasts but also to the better improvements in skill from the statistical improvement scheme, especially when the signal amplitude is undersimulated in the model.

Fig. 1.
Fig. 1.

The probabilities PX(B|β), PX(N|β), and PX(A|β) of the below normal (blank), near normal (light gray), and above normal (dark gray) for potential predictability ρ2pot = 0.3 and signal values (left) β = −0.5σβ and (right) β = +σβ.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 2.
Fig. 2.

Probabilities of the below-normal PX(B|β) (solid curve), near-normal PX(N|β) (dotted curve), and above-normal PX(A|β) (dashed curve) categories in (4) as functions of the signal β expressed in terms of its std dev for potential predictabilities ρ2pot = 0.15, 0.3, and 0.6.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 3.
Fig. 3.

The potential BSSpot as a function of the deterministic potential predictability ρ2pot. The solid line curve is for the below-/above-normal category and dashed line curve is for the near-normal category

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 4.
Fig. 4.

The BSS as a function of the forecast signal bias factor a for the potential predictability (left) ρ2pot = 0.3 and (right) ρ2pot = 0.6. The solid line curves indicate the skill scores for the above-/below-normal category and the dashed line curves indicate the skill scores for the near-normal category.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 5.
Fig. 5.

The regional BSS of probability hindcasts of the below-normal (BN), near-normal (NN), and above-normal (AN) categories obtained from the 24-member ensemble of HFP (left) DJF T700 hindcasts and (right) DJF Z500 hindcasts in (top) the Tropics and (bottom) the North American sector. The shading intensity (from dark to light) indicates the regional BSS of probability hindcasts B, G for T = 1, G with T = 26, and cross-validated I with T = 25, respectively.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 6.
Fig. 6.

BSS for (left) nonparametric probability hindcasts B and (right) statistically improved hindcasts I of the (top) below-normal, (middle) near-normal, and (bottom) above-normal event. Contour interval is 0.1. Blue (red) shadings indicate areas where the skill score is less than −0.1 (greater than 0.1). The probability hindcasts are derived from 24-member HFP DJF T700 ensemble hindcasts.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 7.
Fig. 7.

Time series of standardized DJF T700 anomalies at 5.6°N, 86.25°W in 1970–95 from NCEP–NCAR reanalysis (thick solid line) and the 24-member ensemble of HFP seasonal hindcasts (dotted lines). Horizontal dashed lines indicate the boundaries between the below-, near-, and above-normal categories. The BSS of the three categories are indicated at the top.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 8.
Fig. 8.

The regional BSS of the nonparametric (B; dotted line curves), the Gaussian (G with T = 26; dashed line curves), and the statistically improved (I with T = 25; solid line curves) probability hindcasts displayed as a function of the ensemble size N for the HFP (left) DJF T700 and (right) Z500 seasonal ensemble hindcasts in (top) the Tropics and (bottom) the North American sector. Skill scores for the below-, near-, and above-normal categories are indicated by red, green, and blue curves, respectively.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 9.
Fig. 9.

Attributes diagrams for DJF T700 probability hindcasts in the Tropics produced from the 24-member ensemble of the HFP DJF seasonal hindcasts: (left) below-normal, (middle) near-normal, and (right) above-normal categories. (top row) The attributes diagrams for the nonparametric probability hindcasts B. (middle row) The attributes diagrams for the Gaussian probability hindcasts G for T = 26. (bottom row) The attributes diagrams for the statistically improved hindcasts I.

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Fig. 10.
Fig. 10.

As in Fig. 9, except for Z500

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

i1520-0442-16-11-1684-fb01

Fig. B1. The mean (dash–dotted lines), 10th and 90th percentiles (solid curves), and rms errors (dashed curves) of the nonparametric probability forecast B (blue lines), the Gaussian probability forecast G with T = 1 (red lines), and with T = 25 (green lines) for the (top) above-normal event and (bottom) the near-normal event as a function of the ensemble size N = 3, … , 24 in a Gaussian noise setting with ρ2pot = 0.3 and (left) β = 0 and (right) β = σβ

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

i1520-0442-16-11-1684-fc01

Fig. C1. The BSS of statistically improved probability forecasts of the above- or below-normal categories as a function of the ensemble size N and the model bias a for the potential predictability (left) ρ2pot = 0.15 (middle) 0.3, and (right) 0.6. The training period is of length T = 25. The horizontal dashed lines indicate a perfect model forecast system (β′ = β and σ2ϵ = σ2ϵ). Shaded areas indicate regions in parameter space for which the statistically improved probability forecasts have better skill scores than those of the unadjusted Gaussian probability forecasts. The results are obtained from Monte Carlo simulations based on 5000 repetitions

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

i1520-0442-16-11-1684-fc02

Fig. C2. The BSS of unadjusted Gaussian probability forecasts (solid lines) and statistically improved probability forecasts (dashed lines) as a function of the model bias a for the potential predictability ρ2pot = 0.3 and ensemble sizes N = 6 (left), 24 (middle), and 256 (right). The training period is of length T = 25. The vertical dashed lines indicate a perfect model forecast system (β′ = β and σ2ϵ = σ2ϵ). The horizontal dotted lines indicate the maximal skill score BSSpot. Light gray areas indicate a skill gain and dark gray areas indicate a skill loss due to the statistical improvement technique. The results are obtained from Monte Carlo simulations based on 5000 repetitions

Citation: Journal of Climate 16, 11; 10.1175/1520-0442(2003)016<1684:ISPF>2.0.CO;2

Corresponding author address: Dr. Viatcheslav V. Kharin, Canadian Centre for Climate Modelling and Analysis, Meteorological Service of Canada, Victoria, BC V8W 2YZ, Canada. slava.kharin@ec.gc.ca

Save
  • Barnston, A. G., 1992: Correspondence among the correlation, RMSE, and Heidke forecast verification measures; refinement of the Heidke score. Wea. Forecasting, 7 , 699709.

    • Search Google Scholar
    • Export Citation
  • Boer, G. J., 1993: Systematic and random error in an extended-range forecasting experiment. Mon. Wea. Rev., 121 , 173188.

  • Boer, G. J., G. M. Flato, and D. Ramsden, 2000a: A transient climate change simulation with greenhouse gas and aerosol forcing: Projected climate for the twenty-first century. Climate Dyn., 16 , 427450.

    • Search Google Scholar
    • Export Citation
  • Boer, G. J., G. M. Flato, M. C. Reader, and D. Ramsden, 2000b: A transient climate change simulation with greenhouse gas and aerosol forcing: Experimental design and comparison with the instrumental record for the twentieth century. Climate Dyn., 16 , 405425.

    • Search Google Scholar
    • Export Citation
  • Char, B. W., K. O. Geddes, G. H. Gonnet, B. L. Leong, M. B. Monagan, and S. M. Watt, 1991: Maple V library reference manual. Springer, 698 pp.

  • Derome, J., G. Brunet, A. Plante, N. Gagnon, G. J. Boer, F. W. Zwiers, S. Lambert, and H. Ritchie, 2001: Seasonal predictions based on two dynamical models. Atmos.–Ocean, 39 , 485501.

    • Search Google Scholar
    • Export Citation
  • Epstein, E. S., 1988: Long-range prediction: Limits of predictability and beyond. Wea. Forecasting, 3 , 6975.

  • Hoffman, R. N., and E. Kalnay, 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus, 35A , 100118.

  • Hsu, W-R., and A. H. Murphy, 1986: The attributes diagram: A geometrical framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2 , 285293.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., and Coauthors. 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437471.

  • Kharin, V. V., and F. W. Zwiers, 2001: Skill as a function of time scale in ensembles of seasonal hindcasts. Climate Dyn., 17 , 127141.

    • Search Google Scholar
    • Export Citation
  • Kharin, V. V., and F. W. Zwiers, 2002: Climate predictions with multimodel ensembles. J. Climate, 15 , 793799.

  • Kharin, V. V., F. W. Zwiers, and N. Gagnon, 2001: Skill of seasonal hindcasts as a function of the ensemble size. Climate Dyn., 17 , 835843.

    • Search Google Scholar
    • Export Citation
  • Leith, C. E., 1973: The standard error of time-average estimates of climatic means. J. Appl. Meteor., 12 , 10661068.

  • Madden, R. A., 1976: Estimates of the natural variability of time-averaged sea-level pressure. Mon. Wea. Rev., 104 , 942952.

  • McFarlane, N. A., G. J. Boer, J-P. Blanchet, and M. Lazare, 1992: The Canadian Climate Centre second-generation general circulation model and its equilibrium climate. J. Climate, 5 , 10131044.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115 , 13301338.

  • Pan, J., and H. Van den Dool, 1998: Extended-range probability forecasts based on dynamical model output. Wea. Forecasting, 13 , 983996.

    • Search Google Scholar
    • Export Citation
  • Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, 1992: Numerical Recipes in FORTRAN: The Art of Scientific Computing. 2d ed. Cambridge University Press, 963 pp.

    • Search Google Scholar
    • Export Citation
  • Rayner, N. A., E. B. Horton, D. E. Parker, C. K. Folland, and R. B. Hackett, 1996: Version 2.2 of the Global Sea–Ice and Sea Surface Temperature data set, 1903–1994. Hadley Centre Climate Research Tech. Note CRTN 74, 21 pp.

  • Rowell, D. P., 1998: Assessing potential predictability with an ensemble of multidecadal GCM simulations. J. Climate, 11 , 109120.

  • Van den Dool, H. M., and Z. Toth, 1991: Why do forecasts for “near normal” often fail? Wea. Forecasting, 6 , 7685.

  • von Storch, H., 1999: On the use of “inflation” in statistical downscaling. J. Climate, 12 , 35053506.

  • von Storch, H., and F. W. Zwiers, 1999: Statistical Analysis in Climate Research. Cambridge University Press, 494 pp.

  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences. Academic Press, 467 pp.

  • Zwiers, F. W., 1996: Interannual variability and predictability in an ensemble of AMIP climate simulations conducted with the CCC GCM2. Climate Dyn., 12 , 825848.

    • Search Google Scholar
    • Export Citation
  • Zwiers, F. W., X. Wang, and J. Sheng, 2000: The effects of specifying bottom boundary conditions in an ensemble of atmospheric GCM simulations. J. Geophys. Res., 105 , 72957315.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    The probabilities PX(B|β), PX(N|β), and PX(A|β) of the below normal (blank), near normal (light gray), and above normal (dark gray) for potential predictability ρ2pot = 0.3 and signal values (left) β = −0.5σβ and (right) β = +σβ.

  • Fig. 2.

    Probabilities of the below-normal PX(B|β) (solid curve), near-normal PX(N|β) (dotted curve), and above-normal PX(A|β) (dashed curve) categories in (4) as functions of the signal β expressed in terms of its std dev for potential predictabilities ρ2pot = 0.15, 0.3, and 0.6.

  • Fig. 3.

    The potential BSSpot as a function of the deterministic potential predictability ρ2pot. The solid line curve is for the below-/above-normal category and dashed line curve is for the near-normal category

  • Fig. 4.

    The BSS as a function of the forecast signal bias factor a for the potential predictability (left) ρ2pot = 0.3 and (right) ρ2pot = 0.6. The solid line curves indicate the skill scores for the above-/below-normal category and the dashed line curves indicate the skill scores for the near-normal category.

  • Fig. 5.

    The regional BSS of probability hindcasts of the below-normal (BN), near-normal (NN), and above-normal (AN) categories obtained from the 24-member ensemble of HFP (left) DJF T700 hindcasts and (right) DJF Z500 hindcasts in (top) the Tropics and (bottom) the North American sector. The shading intensity (from dark to light) indicates the regional BSS of probability hindcasts B, G for T = 1, G with T = 26, and cross-validated I with T = 25, respectively.

  • Fig. 6.

    BSS for (left) nonparametric probability hindcasts B and (right) statistically improved hindcasts I of the (top) below-normal, (middle) near-normal, and (bottom) above-normal event. Contour interval is 0.1. Blue (red) shadings indicate areas where the skill score is less than −0.1 (greater than 0.1). The probability hindcasts are derived from 24-member HFP DJF T700 ensemble hindcasts.

  • Fig. 7.

    Time series of standardized DJF T700 anomalies at 5.6°N, 86.25°W in 1970–95 from NCEP–NCAR reanalysis (thick solid line) and the 24-member ensemble of HFP seasonal hindcasts (dotted lines). Horizontal dashed lines indicate the boundaries between the below-, near-, and above-normal categories. The BSS of the three categories are indicated at the top.

  • Fig. 8.

    The regional BSS of the nonparametric (B; dotted line curves), the Gaussian (G with T = 26; dashed line curves), and the statistically improved (I with T = 25; solid line curves) probability hindcasts displayed as a function of the ensemble size N for the HFP (left) DJF T700 and (right) Z500 seasonal ensemble hindcasts in (top) the Tropics and (bottom) the North American sector. Skill scores for the below-, near-, and above-normal categories are indicated by red, green, and blue curves, respectively.

  • Fig. 9.

    Attributes diagrams for DJF T700 probability hindcasts in the Tropics produced from the 24-member ensemble of the HFP DJF seasonal hindcasts: (left) below-normal, (middle) near-normal, and (right) above-normal categories. (top row) The attributes diagrams for the nonparametric probability hindcasts B. (middle row) The attributes diagrams for the Gaussian probability hindcasts G for T = 26. (bottom row) The attributes diagrams for the statistically improved hindcasts I.

  • Fig. 10.

    As in Fig. 9, except for Z500

  • Fig. B1. The mean (dash–dotted lines), 10th and 90th percentiles (solid curves), and rms errors (dashed curves) of the nonparametric probability forecast B (blue lines), the Gaussian probability forecast G with T = 1 (red lines), and with T = 25 (green lines) for the (top) above-normal event and (bottom) the near-normal event as a function of the ensemble size N = 3, … , 24 in a Gaussian noise setting with ρ2pot = 0.3 and (left) β = 0 and (right) β = σβ

  • Fig. C1. The BSS of statistically improved probability forecasts of the above- or below-normal categories as a function of the ensemble size N and the model bias a for the potential predictability (left) ρ2pot = 0.15 (middle) 0.3, and (right) 0.6. The training period is of length T = 25. The horizontal dashed lines indicate a perfect model forecast system (β′ = β and σ2ϵ = σ2ϵ). Shaded areas indicate regions in parameter space for which the statistically improved probability forecasts have better skill scores than those of the unadjusted Gaussian probability forecasts. The results are obtained from Monte Carlo simulations based on 5000 repetitions

  • Fig. C2. The BSS of unadjusted Gaussian probability forecasts (solid lines) and statistically improved probability forecasts (dashed lines) as a function of the model bias a for the potential predictability ρ2pot = 0.3 and ensemble sizes N = 6 (left), 24 (middle), and 256 (right). The training period is of length T = 25. The vertical dashed lines indicate a perfect model forecast system (β′ = β and σ2ϵ = σ2ϵ). The horizontal dotted lines indicate the maximal skill score BSSpot. Light gray areas indicate a skill gain and dark gray areas indicate a skill loss due to the statistical improvement technique. The results are obtained from Monte Carlo simulations based on 5000 repetitions

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 448 117 16
PDF Downloads 256 79 1