Estimation of Seasonal Precipitation Tercile-Based Categorical Probabilities from Ensembles

Michael K. Tippett International Research Institute for Climate and Society, Columbia University, Palisades, New York

Search for other papers by Michael K. Tippett in
Current site
Google Scholar
PubMed
Close
,
Anthony G. Barnston International Research Institute for Climate and Society, Columbia University, Palisades, New York

Search for other papers by Anthony G. Barnston in
Current site
Google Scholar
PubMed
Close
, and
Andrew W. Robertson International Research Institute for Climate and Society, Columbia University, Palisades, New York

Search for other papers by Andrew W. Robertson in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Ensemble simulations and forecasts provide probabilistic information about the inherently uncertain climate system. Counting the number of ensemble members in a category is a simple nonparametric method of using an ensemble to assign categorical probabilities. Parametric methods of assigning quantile-based categorical probabilities include distribution fitting and generalized linear regression. Here the accuracy of counting and parametric estimates of tercile category probabilities is compared. The methods are first compared in an idealized setting where analytical results show how ensemble size and level of predictability control the accuracy of both methods. The authors also show how categorical probability estimate errors degrade the rank probability skill score. The analytical results provide a good description of the behavior of the methods applied to seasonal precipitation from a 53-yr, 79-member ensemble of general circulation model simulations. Parametric estimates of seasonal precipitation tercile category probabilities are generally more accurate than the counting estimate. In addition to determining the relative accuracies of the different methods, the analysis quantifies the relative importance of the ensemble mean and variance in determining tercile probabilities. Ensemble variance is shown to be a weak factor in determining seasonal precipitation probabilities, meaning that differences between the tercile probabilities and the equal-odds probabilities are due mainly to shifts of the forecast mean away from its climatological value.

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus/61 Route 9W, Palisades, NY 10964. Email: tippett@iri.columbia.edu

Abstract

Ensemble simulations and forecasts provide probabilistic information about the inherently uncertain climate system. Counting the number of ensemble members in a category is a simple nonparametric method of using an ensemble to assign categorical probabilities. Parametric methods of assigning quantile-based categorical probabilities include distribution fitting and generalized linear regression. Here the accuracy of counting and parametric estimates of tercile category probabilities is compared. The methods are first compared in an idealized setting where analytical results show how ensemble size and level of predictability control the accuracy of both methods. The authors also show how categorical probability estimate errors degrade the rank probability skill score. The analytical results provide a good description of the behavior of the methods applied to seasonal precipitation from a 53-yr, 79-member ensemble of general circulation model simulations. Parametric estimates of seasonal precipitation tercile category probabilities are generally more accurate than the counting estimate. In addition to determining the relative accuracies of the different methods, the analysis quantifies the relative importance of the ensemble mean and variance in determining tercile probabilities. Ensemble variance is shown to be a weak factor in determining seasonal precipitation probabilities, meaning that differences between the tercile probabilities and the equal-odds probabilities are due mainly to shifts of the forecast mean away from its climatological value.

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus/61 Route 9W, Palisades, NY 10964. Email: tippett@iri.columbia.edu

1. Introduction

Seasonal climate forecasts are necessarily probabilistic, and forecast information is most completely characterized by a probability density function (pdf). Estimation of the forecast pdf is required to measure predictability and to issue accurate forecasts. For reliable forecasts, the difference between the climatological and forecast pdfs represents predictability, and several measures of this difference have been developed to quantify predictability (Kleeman 2002; DelSole 2004; Tippett et al. 2004; DelSole and Tippett 2007). Quantile probabilities are the probabilities assigned to quantile-delimited categories and provide a coarse-grained description of the forecast and climatological pdfs, which is appropriate for ensembles with relatively few members. The International Research Institute for Climate and Society (IRI) issues seasonal forecasts of precipitation and temperature in the form of tercile-based categorical probabilities (hereafter called tercile probabilities), that is, the probability of the below-normal, normal, and above-normal categories (Barnston et al. 2003). Forecasts that differ from equal-odds probabilities, to the extent that they are reliable, are indications of predictability in the climate system. Accurate estimation of quantile probabilities is important both for quantifying seasonal predictability and for making climate forecasts.

In single-tier seasonal climate forecasts, initial conditions of the ocean–land–atmosphere system are the source of predictability, and ensembles of coupled model forecasts provide samples of the model atmosphere–land–ocean system evolution consistent with the initial conditions, their uncertainty, and the internal variability of the coupled model. In two-tier seasonal forecasts, ensembles of atmospheric general circulation models (GCMs) provide samples of equally likely model atmospheric responses to a particular configuration of sea surface temperature (SST). Tercile probabilities must be estimated from finite ensembles in either system. A simple nonparametric estimate of the tercile probabilities is the fraction of ensemble members in each category. Alternatively, the entire forecast pdf including tercile probabilities can be estimated by modeling the ensemble as a sample from an analytical pdf with adjustable parameters for mean, spread, shape, etc. Here we use a Gaussian distribution described by its mean and variance. The counting method has the advantage of making no assumptions about the form of the forecast pdf. Both approaches are affected by sampling error due to finite ensemble size, though to different degrees. This paper is about the impact of sampling error on parametric and nonparametric estimates of simulated and forecast tercile probabilities for seasonal precipitation totals. We analyze precipitation because of its societal importance and because, even on seasonal time scales, its distribution is farther from being Gaussian, and hence more challenging to describe, than quantities like temperature and geopotential height, which have been previously examined.

In this paper we present analytical descriptions of the accuracy of the counting and Gaussian tercile probability estimators. These analytical results facilitate the comparison of the counting and Gaussian estimates and show how the accuracy of the estimators increases as ensemble size and predictability level increase. The analytical results support previous empirical results showing the advantage of the parametric estimators. Wilks (2002) found that modeling numerical weather prediction ensembles with Gaussian or Gaussian mixture distributions gave more accurate estimations of quantile values than counting, especially for quantiles near the extremes of the distribution. Kharin and Zwiers (2003) used Monte Carlo simulations to show that a Gaussian fit estimate was more accurate than counting for Gaussian distributed forecast variables.

We show how the accuracy of the tercile probability estimates affects the rank probability skill score (RPSS). The RPSS is a multicategory generalization of the two-category Brier skill score. Richardson (2001) found that finite ensemble size had an adverse effect on the Brier skill score with low-skill regions being more negatively affected by small ensemble size. Changes in ensemble size that cause only modest changes in Brier skill score can lead to large changes in economic value implied by a simple cost–loss decision model, particularly for extreme events (Richardson 2001).

Accurate estimation of tercile probabilities from GCM ensembles does not ensure a skillful simulation or forecast if there are systematic errors in the GCM pdf. Calibration of model probabilities is needed to account for model deficiencies and produce reliable climate forecasts (Robertson et al. 2004). We expect that forecast skill would be improved by reducing sampling error in the GCM probabilities that are inputs to both the calibration system and the procedure to estimate calibration parameters. We investigate the roles of sampling and model error using a 79-member ensemble of GCM simulations of seasonal precipitation made with observed SST; we examine the impact of reducing sampling error on the skill of the simulations with and without calibration. Additionally, we use the GCM data to assess the importance of some simplifying assumptions used in the calculation of the analytical results by comparing the analytical results with empirical ones obtained by subsampling from the large ensemble of GCM simulations.

An important predictability issue relevant to parametric estimation of tercile probabilities is the relative roles of the forecast mean and variance in determining predictability (Kleeman 2002). Since predictability is a measure of the difference between forecast and climatological distributions, identifying the parameters associated with predictability also identifies the parameters that are useful for estimating tercile probabilities. For instance, if the predictability of a system is due to only the changes in the forecast mean, then the forecast mean should also be useful for estimating tercile probabilities. One approach to this question is to identify the parameters that give the most skillful forecast probabilities (Buizza and Palmer 1998; Atger 1999). Kharin and Zwiers (2003) showed that the Brier skill score of hindcasts of 700-mb temperature and 500-mb height was improved when probabilities were estimated from a Gaussian distribution with constant variance as compared with counting; fitting a Gaussian distribution with time-varying variance gave inferior results. Hamill et al. (2004) used a generalized linear model (GLM; logistic regression) to estimate forecast tercile probabilities of 6–10 day and week-2 surface temperature and precipitation and found that the ensemble variance was not a useful predictor of tercile probabilities. In addition to looking at skill, we examine the relative importance of the forecast mean and variance for predictability in the perfect model setting by asking whether including ensemble variance in the Gaussian estimate and the GLM estimate reduces sampling error.

The paper is organized as follows. The GCM and observation data are described in section 2. In section 3, we derive some theoretical results about the relative size of the error of the counting and fitting estimates and about the effect of sampling error on the ranked probability skill score. The GLM is also introduced and related to Gaussian fitting. In section 4, we compare the analytical results with empirical GCM-based ones and include effects of model error. A summary and conclusions are given in section 5.

2. Data

Model-simulated precipitation data come from a 79-member ensemble of T42 ECHAM4.5 GCM (Roeckner et al. 1996) simulations forced with observed SST for the period December 1950 to February 2003. We use seasonal averages of the 3-month period December through February (DJF), a period when ENSO is a significant source of predictability. We consider all land points between 55°S and 70°N, including regions whose dry season occurs in DJF and where forecasts are not usually made. While the results here use unprocessed model-simulated precipitation, many of the calculations were repeated using Box–Cox transformed data. The Box–Cox transformation
i1520-0442-20-10-2210-e1
makes the data approximately Gaussian and depends on the parameter λ. Positive skewness is the usual non-Gaussian aspect of precipitation and requires a choice of λ < 1. The value of λ is found by maximizing the log-likelihood function. Figure 1 shows the geographical distribution of the values of λ, which is an indication of the deviation of the data from Gaussianity; we only allow a few values of λ, namely, 0, 1/4, 1/3, 1/2, and 1. The log function and small values of the exponent tend to be selected in dry regions. This is consistent with Sardeshmukh et al. (2000) who found that monthly precipitation in reanalysis and in a GCM was significantly non-Gaussian mainly in regions of mean tropospheric descent.

The precipitation observations used to evaluate model skill and to calibrate model output come from the extended New et al. (2000) gridded dataset of monthly precipitation for the period 1950 to 1998, interpolated to the T42 model grid.

3. Theoretical considerations

a. Variance of the counting estimate

The counting estimate pN of a tercile probability is the fraction n/N, where N is the ensemble size and n is the number of ensemble members in the tercile category. The binomial distribution Pp(n|N), where p is the tercile probability, gives the probability of there being exactly n members in the category. The expected number of members in the tercile category is
i1520-0442-20-10-2210-e2
where the notation 〈·〉 denotes expectation. Consequently, the expected value of the counting estimate pN is the probability p, and the counting estimate is unbiased. However, having a limited ensemble size generally causes any single realization of pN to differ from p. The variance of the counting estimate pN is
i1520-0442-20-10-2210-e3
where we have used the fact that the variance of the binomial distribution is N(1 − p)p. The relation in (3) shows that the error of the counting estimate is inversely proportional to the ensemble size.

Since the counting estimate pN is not normally distributed or even symmetric for p ≠ 0.5 (for instance, the distribution of sampling error necessarily has a positive skew when the true probability p is close to zero), it is not immediately apparent whether its variance is a useful measure. However, the binomial distribution becomes approximately normal for large N. Figure 2 shows that the standard deviation gives a good estimate of the 16th and 84th percentiles of pN for p = 1/3 and modest values of N. In this case, the counting estimate variance is (2/9)N. The percentiles are obtained by inverting the cumulative distribution function of the sample error. Since the binomial cumulative distribution is discrete, we show the smallest value at which it exceeds 0.16 and 0.84. Figure 2 also shows that for modest-sized ensembles (N > 20) the standard deviation is fairly insensitive to incremental changes in ensemble size; increasing the ensemble size by a factor of 4 is necessary to reduce the standard deviation by a factor of 2.

The average variance of the counting estimate for a number of forecasts is found by averaging (3) over the values of the probability p. The extent to which the forecast probability differs from the climatological value of 1/3 is an indication of predictability, with larger deviations indicating more predictability. Intuitively, we expect regions and seasons with more predictability to suffer less from sampling error on average since enhanced predictability implies more reproducibility among ensemble members. In fact, when the forecast distribution is Gaussian with mean μf and variance σf , the variance of the counting estimate of the below-normal category probability is (see the appendix for details)
i1520-0442-20-10-2210-e4
where xb is the left tercile boundary and erf denotes the error function. Since the absolute value of the error function approaches unity when the absolute value of its argument is large, the counting estimate variance is small when the ensemble mean is large or the ensemble variance is small. Assuming that the forecast variance σf is constant and averaging (4) over forecasts gives that the average variance is approximately (see the appendix for details)
i1520-0442-20-10-2210-e5
where S2 is the usual signal-to-noise ratio [see (A4); Kleeman and Moore (1999); Sardeshmukh et al. (2000)]. When there is no skill S = 0, p = 1/3, and the average variance is (2/9)N. The signal-to-noise ratio is related to correlation skill with r = S/1 + S2 being the expected correlation of the ensemble mean with an ensemble member. The relation in (5) has the practical value of providing a simple estimate of the ensemble size needed to achieve a given level of accuracy for the counting estimate of the tercile probability. This value, like the signal-to-noise ratio, depends on the model, season, and region.

b. Variance of the Gaussian fit estimate

Fitting a distribution with a few adjustable parameters to the ensemble values is an alternative method of estimating a quantile probability. Here we use a Gaussian distribution with two parameters, mean and variance, for simplicity and because it can be generalized to more dimensions (Wilks 2002). The Gaussian fit estimate gN of the tercile probabilities is found by fitting the N-member ensemble with a Gaussian distribution and integrating the distribution between the climatological tercile boundaries (Kharin and Zwiers 2003). The Gaussian fit estimate has two sources of error: (i) the non-Gaussianity of the forecast distribution from which the ensemble is sampled and (ii) sampling error in the estimates of mean and variance due to limited ensemble size. The first source of error is problem dependent, and we will quantify its impact empirically for the case of GCM-simulated seasonal precipitation. The variance of the Gaussian fit estimate can be quantified analytically for Gaussian distributed variables. When the forecast distribution is Gaussian with mean μf and known variance σf , the variance of the Gaussian fit estimate of the below-normal category probability is approximately (see the appendix for details)
i1520-0442-20-10-2210-e6
where xb is the left tercile boundary. The average (over forecasts) variance of the Gaussian fit tercile probability is approximately (see appendix for details)
i1520-0442-20-10-2210-e7
where x0 = Φ−1(1/3) ≈ −0.4307 and Φ is normal cumulative distribution function. Comparing this value with the counting estimate variance in (5) shows that the Gaussian fit estimate has smaller variance for all values of S2, with its advantage over the counting estimate increasing slightly as the signal-to-noise ratio increases to levels exceeding unity.
When there is no predictability (S = 0), the average variance of the Gaussian fit is
i1520-0442-20-10-2210-e8
and depends only on ensemble size. Comparing (3) and (8), we see that the variance of the Gaussian estimated tercile probability is about 40% smaller than that of the counting estimate if the ensemble distribution is indeed Gaussian with known variance and no signal (S = 0). The inverse dependence of the variances on ensemble size means that modest decreases in variance are equivalent to substantial increases in ensemble size. For instance, the variance of a Gaussian fit estimate with ensemble size 24, the simulation ensemble size used for IRI forecast calibration (Robertson et al. 2004), is equivalent to that of a counting estimate with ensemble size 40. The results in (3) and (8) also allow us to compare the variances of counting and Gaussian fit estimates of other quantile probabilities for the case S = 0 by appropriately modifying the definition of the category boundary x0. For instance, to estimate the median, x0 = 0, and the variance of the Gaussian estimate is about 36% smaller than that of the counting estimate; in the case of the 10th and 90th percentiles, x0 = Φ−1(1/10) ≈ −1.2816 and the variance of the Gaussian estimated probability is about 66% smaller than that of the counting estimate. The accuracy of the approximation in (8) for higher quantiles depends on the ensemble size being sufficiently large.

c. Estimates from generalized linear models

Generalized linear models offer a parametric estimate of quantile probabilities without the explicit assumption that the ensemble have a Gaussian distribution. GLMs arise in the statistical analysis of the relationship between a response probability p, here the tercile probability, and some set of explanatory variables yi, as for instance the GCM ensemble mean and variance (McCullagh and Nelder 1989). Suppose the probability p depends on the response R, which is the linear combination
i1520-0442-20-10-2210-e9
of the explanatory variables for some coefficients ai and a constant term b. The response R generally takes on all numerical values while the probability p is bounded between zero and one. The GLM approach introduces a function g(p) that maps the unit interval on the entire real line and studies the model
i1520-0442-20-10-2210-e10
The parameters ai and the constant b are found by maximum likelihood estimation. Here, the GLMs are developed with the ensemble mean (standardized) and ensemble standard deviation as explanatory variables and p given by the counting estimate. This procedure is different from that used by Hamill et al. (2004) where the GLM was developed using observations. The procedure here has the potential to reduce sampling error, not systematic model error.
There are a number of commonly used choices for the function g(p), including the logit function, which leads to logistic regression (McCullagh and Nelder 1989; Hamill et al. 2004). Here we use the probit function, which is the inverse of the normal cumulative distribution function Φ; that is, we define
i1520-0442-20-10-2210-e11
Results using the logit function (not shown) are similar since the logistic and probit function are very similar over the interval 0.1 ≤ p ≤ 0.9 (McCullagh and Nelder 1989). The assumption of the GLM method is that g(p) is linearly related to the explanatory variables: here the ensemble mean and standard deviation. When the forecast distribution is Gaussian with constant variance, g(p) is indeed linearly related to the ensemble mean and this assumption is exactly satisfied. To see this, suppose that the forecast ensemble has mean μf and variance σf . Then the probability p of the below-normal category is
i1520-0442-20-10-2210-e12
where xb is the left tercile of the climatological distribution, and
i1520-0442-20-10-2210-e13
Therefore, we expect the Gaussian fit and GLM estimates to have similar behavior for Gaussian ensembles with constant variance.

We show an example with synthetic data to give some indication of the robustness of the GLM estimate when the population that the ensemble represents does not have a Gaussian distribution. We take the forecast pdf to be a gamma distribution with shape and scale parameters (2, 1). The pdf is asymmetric and has a positive skew (see Fig. 3a). Samples are taken from this distribution and the probability of the below-normal category is estimated by counting, Gaussian fit, and GLM; the Gaussian fit assumes constant known variance, and the GLM uses the ensemble mean as an explanatory variable. Interestingly the rms error of both the GLM and Gaussian fit estimates is smaller than that of counting for modest ensemble size (Fig. 3b). As the ensemble size increases further, counting becomes a better estimate than the Gaussian fit. For all ensemble sizes, the performance of the GLM estimate is better than the Gaussian fit.

Other experiments (not shown) compare the counting, Gaussian fit, and GLM estimates when the ensemble is Gaussian with nonconstant variance. The GLM estimate with ensemble mean and variance as explanatory variables and the two-parameter Gaussian fit have smaller error than counting and the one-parameter models (for large enough ensemble size) as expected.

d. Ranked probability skill score

The ranked probability skill score (RPSS; Epstein 1969), a commonly used skill measure for probabilistic forecasts, is also affected by sampling error. The ranked probability score (RPS) is the average integrated squared difference between the forecast and observed cumulative distribution functions and is defined for tercile probabilities to be
i1520-0442-20-10-2210-e14
where M is the number of forecasts, Fi,j (Oi,j) is the cumulative distribution function of the ith forecast (observation) of the jth category. The observation “distribution” is defined to be one for the observed category and zero otherwise. This definition means that Fi,1 = Pi,B, Fi,2 = Pi,B + Pi,N, where Pi,B (Pi,N) is the probability of the below normal (near normal) category for the ith forecast. The terms containing above-normal probabilities (j = 3) vanish.
Suppose we consider the expected (with respect to realizations of the observations) RPS for a single forecast and drop the forecast number i subscript. Let OB, ON, and OA be the probabilities that the verifying observation falls into the below-, near-, and above-normal categories, respectively. That is,
i1520-0442-20-10-2210-e15
where the expectation is with respect to realizations of the observations. Note that OB, ON, and OA collectively represent the uncertainty of the climate state, not due to instrument error but due to the limited predictability of the climate system. In the case of equal odds, OB = ON = OA = 1/3, there is no predictability, while a shift away from equal odds represents predictability. These probabilities are not directly measurable since only a single realization of nature is available. The expected (with respect to the observations) RPS of a particular forecast is the sum of the RPS for each possible category of observation multiplied by its likelihood:
i1520-0442-20-10-2210-e16
If we make the perfect model assumption, observations and forecasts are assumed to be drawn from the same distribution and the forecast and expected observation probabilities are equal. Using (16), the perfect model expected RPS (denoted RPSperfect) is
i1520-0442-20-10-2210-e17
Note that the expected RPS of a perfect model differs from zero unless the probability of a category is one, or zero, that is, unless the forecast is deterministic; RPSperfect is small for large probability shifts. The quantity RPSperfect is a perfect model measure of potential probabilistic skill analogous to the signal-to-noise ratio, which determines the correlation skill of a model to predict itself. The quantity RPSperfect has the same form as the uncertainty term in the decomposition by Murphy (1973) of the Brier score. However, the uncertainty term in the decomposition by Murphy (1973) is the score of the climatological forecast averaged over forecasts, while RPSperfect is the expected score of a correct probability forecast averaged over realizations of the observations. Both quantities measure the variability of the observations with respect to their expected frequency. When the forecast distribution is Gaussian, RPSperfect is simply related to the forecast mean μf and variance σ2f by
i1520-0442-20-10-2210-e18
The above formula shows that RPSperfect = 0 in the limit of σf = 0 (deterministic forecast) and elucidates the empirical relation between probability skill and mean forecast found by Kumar et al. (2001).
Figure 4a shows the time-averaged value of RPSperfect for the 79-member ECHAM4.5 GCM-simulated precipitation data. This is a perfect model measure of potential probabilistic skill with small values of RPSperfect showing that the GCM has skill in the sense of reproducibility with respect to itself. Skills are highest at low latitudes, consistent with our knowledge that tropical precipitation is most influenced by SST. Perfect model RPS values are close to the no-skill limit of 4/9 in much of the extratropics. The RPSS is defined using the RPS and a reference forecast defined to have zero skill, here climatology:
i1520-0442-20-10-2210-e19
where RPSclim is the RPS of the climatological forecast. The expected RPS of a climatological forecast is found by substituting PB = PN = PA = 1/3 into (16), which gives
i1520-0442-20-10-2210-e20
Figure 4b shows the time-averaged value of RPSSperfect ≡ 1 − RPSperfect/RPSclim for the GCM-simulated precipitation data. Even under the perfect model assumption, the RPSS exceeds 0.1 in few regions.
The ensemble-estimated and observation probabilities are different even in the perfect model setting due to finite ensemble size. Suppose that PB = OB + ϵB and PA = OA + ϵA where ϵB and ϵA represent error due to finite ensemble size. If each of the forecast probabilities are unbiased and 〈ϵB〉 = 〈ϵA〉 = 0, then substituting into (16) and averaging over realizations of the ensemble gives
i1520-0442-20-10-2210-e21
This means that the perfect model expected RPS is increased by an amount that depends on the variance of the probability estimate. In particular, if the sampling error is associated with the counting estimate whose variance is given by (3), then
i1520-0442-20-10-2210-e22
and
i1520-0442-20-10-2210-e23
It follows that
i1520-0442-20-10-2210-e24
The relation between RPSS and ensemble size is the same as that for the Brier skill score (Richardson 2001). The relation in (24) quantifies the degradation of RPSS due to sampling error and, combined with (18), provides an analytical expression for the empirical relation between ensemble size, RPSS, and mean forecast found in Kumar et al. (2001).
If the tercile probability estimate has variance that differs from that of the counting estimate by some factor α, as does, for example, the Gaussian fit estimate, then
i1520-0442-20-10-2210-e25
where degradation of the RPSS is reduced for α < 1.

4. Estimates of GCM-simulated seasonal precipitation tercile probability

a. Variance of the counting estimate

The average variance of the counting estimate in (5) was derived assuming Gaussian distributions. To see how well this approximation describes the behavior of GCM-simulated December–February precipitation totals, we compare the average counting estimate variance in (5) to that computed by subsampling from the 79-member ensemble of GCM simulations. We use the fact that the average squared difference of two independent counting estimates is twice the variance. More specifically, we select two independent samples of size N (without replacement) from the ensemble of GCM simulations and compute two counting estimate probabilities denoted pN and pN; the ensemble size of 79 and independence requirement limits the maximum value of N to 39. The expected value of the square of the difference between the two counting estimates pN and pN is twice the variance of the counting estimate since
i1520-0442-20-10-2210-e26
where we use the fact that the sampling errors (pNp) and (ppN) are uncorrelated. The averages in (26) are with respect to time and realizations (1000) of the two independent samples.

We expect especially close agreement between the subsampling calculations and the analytical results of (5) in regions where there is little predictability and the signal-to-noise ratio S2 is small, since, for S2 = 0, the analytical result is exact. In regions where the signal-to-noise ratio is not zero, though generally fairly small, we expect that the average counting variance still decreases as 1/N. However, there is no guarantee that the Gaussian approximation will provide an adequate description of the actual behavior of the GCM data.

Figure 5 shows that in the land gridpoint average the variance of the counting estimate is very well described by the analytical result in (5), with the difference from the analytical result being on the order of a few percent for the below-normal category probability and less than one percent for the above-normal category probability. The accuracy difference between the below- and above-normal categories may be due to the below-normal category being more affected by non-Gaussian behavior. Figure 6a shows the spatial variation of the convergence factor −0.0421868 + 0.264409/1 + S2 appearing in (5). This factor is the variance of the counting estimate based on a single member ensemble; the counting estimate standard deviation for ensemble of size N is obtained by dividing by N. This convergence factor can also be obtained empirically from subsamples of varying size. The difference between the theoretical factor and the empirical estimate is mostly on the order of a few percent (see Figs. 6b,c).

b. Error of counting, Gaussian fit, and GLM estimators

We now use subsampling of the GCM-simulated precipitation data to compare the three estimation methods—counting, Gaussian fit, and GLM—discussed in the previous section. Since the Gaussian fit and GLM estimators may be biased, it is not sufficient to compute their variance. The error variance of the estimators must be computed. The error is not known because the true probability is not known exactly. Therefore each method is compared to a common baseline as follows. Each method is applied to an ensemble of size N (N = 5, 10, 20, 30, 39) to produce an estimate qN. This estimate is then compared to the counting estimate p40 computed from an independent set of 40 ensemble members. This counting estimate p40 serves as a common unbiased baseline. The variance of the difference of these two estimates has contributions from the N-member estimate qN and the 40-member counting estimate. The variance of the difference can be decomposed into error variance contributions from qN and p40:
i1520-0442-20-10-2210-e27
where the theoretical estimate of the variance of p40 is used. Therefore the error variance of the estimate qN is
i1520-0442-20-10-2210-e28
All results for the estimate error variance are presented in terms of 〈(qNp)2〉 rather than 〈(qNp40)2〉 so as to give a sense of the magnitude of the sampling error rather than the difference with the baseline estimate. Results are averaged over time and realizations (100) of the N-member estimate and the 40-member counting estimate.

We begin by examining the land gridpoint average of the sampling error of the three methods. Figure 7a shows the gridpoint-averaged rms error of the tercile probability estimates as a function of ensemble size. The variance of the counting estimate is well described by theory (Fig. 7a) and is larger than that of the parametric estimates. The one-parameter GLM and constant variance Gaussian fit have similar rms error for larger ensemble sizes; the GLM estimate is slightly better for very small ensemble sizes. While the magnitude of the error reduction due to using the parametric estimates is modest, the savings in computational cost compared to the equivalent ensemble size is significant.

The single parameter estimates, that is, the constant variance Gaussian fit and the GLM based on the ensemble mean, have smaller rms error than the estimates based on ensemble mean and variance (Fig. 7b). The advantage of the single parameter estimates is greatest for smaller ensemble sizes. This result is important because it shows that attempting to account for changes in variance, even in the perfect model setting where ensemble size is the only source of error, does not improve estimates of the tercile probabilities for the range of ensemble sizes considered here (Kharin and Zwiers 2003). The sensitivity of the tercile probabilities to changes in variance is, of course, problem specific.

Figure 8 shows the spatial features of the rms error of the below-normal tercile probability estimates for ensemble size 20. Using a Gaussian with constant variance or a GLM based on the ensemble mean has error that is, on average, less than counting; the average performances of the Gaussian fit and the GLM are similar. In a few dry regions, especially in Africa, the error from the parametric estimates is larger. This problem with the parametric estimates in the dry regions is reduced when a Box–Cox transformation is applied to the data (not shown), and overall error levels are slightly reduced as well. The spatial features of rms error when the variance of the Gaussian is estimated and when the mean and standard deviation are used in the GLM are similar to those in Fig. 8, but the overall error levels are slightly higher.

c. RPSS

In the previous section we evaluated the three probability estimation methods in the perfect model setting, applying the estimators to small ensembles and asking how well they reproduce the probabilities from the large ensemble. We now compare the three probability estimation methods in an imperfect model setting by computing their RPSS using observations. We expect the reduction in sampling error to result in improved RPSS, but we cannot know beforehand the extent to which model error confounds or offsets the reduction in sampling error. Figure 9 shows maps of RPSS for ensemble size 20 for the counting, Gaussian fit, and GLM estimates. The results are averaged over 100 random selections of the 20-member ensemble from the full 79-member ensemble. The overall skill of the Gaussian fit and GLM estimate is similar and both are generally larger than that of the counting estimate.

Figure 10 shows the fraction of points with positive RPSS as a function of ensemble size. Again results are averaged over 100 random draws of each ensemble size except for N = 79 when the entire ensemble is used. The parametrically estimated probabilities lead to more grid points with positive RPSS. The Gaussian fit and GLM have similar skill levels with the GLM estimate having larger RPSS for the smallest ensemble sizes and the Gaussian fit being slightly better for larger ensemble sizes. It is useful to interpret the increases in RPSS statistics in terms of effective ensemble size. For instance, applying the Gaussian fit estimator to a 24-member ensemble give RPSS statistics that are on average comparable to those of the counting estimator applied to an ensemble size of about 39. Although all methods show improvement as ensemble size increases, it is interesting to ask to what extent the improvement in RPSS due to increasing ensemble size predicted by (24) is impacted by the presence of model error. For a realistic approximation of the RPSS in the limit of infinite ensemble size, we compute the RPSS for N = 1 and solve (24) for RPSSperfect; we expect that in this case sampling error dominates model error and the relation in (24) holds approximately. Then we use (24) to compute the gridpoint-averaged RPSS for other values of N; the theory curve in Fig. 10 shows these values. In the absence of model error, the count and theory curves of RPSS in Fig. 10 would be the same. However, we see that the effect of model error is such that the curves are close for N = 5 and N = 10 and diverge for larger ensemble sizes with the actual increase in RPSS being lower than that predicted by (24).

The presence of model error means that some calibration of the model output with observations is needed. The GCM ensemble tends to be overconfident and calibration tempers this. To see if reducing sampling error still has a noticeable impact after calibration, we use a simple version of Bayesian weighting (Rajagopalan et al. 2002; Robertson et al. 2004). In the method, the calibrated probability is a weighted average of the GCM probability and the climatology probability (1/3). The weights are chosen to maximize the likelihood of the observations. There is cross-validation in the sense that the weights are computed with a particular ensemble of size N, and the RPSS is computed by applying those weights to a different ensemble of the same size and then comparing the result with observations. The calibrated counting–estimated probabilities still have slightly negative RPSS in some areas (Fig. 11a), but the overall amount of positive RPSS is increased compared to the uncalibrated simulations (cf. with Fig. 9a); the ensemble size is 20 and results are averaged over 100 realizations. The calibrated Gaussian and GLM probabilities have modestly higher overall RPSS than the calibrated counting estimates with noticeable improvement in skillful areas like southern Africa (Figs. 11b,c). We note that a simpler calibration method based on a Gaussian fit with the variance determined by the correlation between ensemble mean and observations, as in Tippett et al. (2005), rather than ensemble spread, performs nearly as well as the Gaussian fit with Bayesian calibration.

It is interesting to look at examples of the probabilities given by the counting and Gaussian fit estimate to see how the spatial distributions of probabilities may differ in appearance. Figure 12 shows uncalibrated tercile probabilities from DJF 1996 (ENSO neutral) and 1998 (strong El Niño). Counting and Gaussian probabilities appear similar, with Gaussian probabilities appearing spatially smoother.

5. Summary and conclusions

Here we have explored how the accuracy of tercile category probability estimates are related to ensemble size and the chosen probability estimation technique. The counting estimate, which uses the fraction of ensemble members that fall in the tercile category, is attractive because it is simple and places no restrictions on the form of the ensemble distribution. The error variance of the counting estimate is a function of the ensemble size and tercile category probability. For Gaussian variables, the tercile category probability is a function of the ensemble mean and variance. Therefore, for Gaussian variables, the counting estimate variance for an individual forecast depends on ensemble size, mean, and variance; the average (over forecasts) counting estimate variance depends on ensemble size and the signal-to-noise ratio. An alternative to the counting estimate is the Gaussian fit estimate, which computes tercile probabilities from a Gaussian distribution with parameters estimated from the forecast ensemble. Like the counting estimate, the variance of the Gaussian fit tercile probabilities is also shown to be a function of the ensemble size and the ensemble mean and variance, and the average variance depends on ensemble size and the signal-to-noise ratio. When the variables are indeed Gaussian, the error variance of the Gaussian fit estimate is smaller than that of the counting estimate by approximately 40% in the limit of small signal. The advantage of the Gaussian fit over the counting estimate is equivalent to fairly substantial increases in ensemble size. However, this advantage depends on the forecast distribution being well described by a Gaussian distribution. Generalized linear models (GLMs) provide a parametric estimate of the tercile probabilities using a nonlinear regression with the ensemble mean and possibly the ensemble variance as predictors. The GLM estimator does not explicitly assume a distribution but, as implemented here, is equivalent to the Gaussian fit estimate in some circumstances.

The accuracy of the tercile probability estimates affects probability forecast skill measures such as the commonly used ranked probability skill score (RPSS). Reducing the variance of the tercile probability estimate is shown to increase the RPSS. We examined this connection in the perfect model setting used extensively in predictability studies in which the “observations” are assumed to be indistinguishable from an arbitrary ensemble member. We find the expected RPSS in terms of the above- and below-normal tercile probabilities and, for Gaussian variables, in terms of the ensemble mean and variance. Finite ensemble size degrades the expected RPSS, conceptually similar to the way that finite ensemble size reduces the expected correlation (Sardeshmukh et al. 2000; Richardson 2001).

Many of the analytical results are obtained assuming that the ensemble variables have a Gaussian distribution. We test the robustness of these findings using simulated seasonal precipitation from an ensemble of GCM integrations forced by observed SST, subsampling from the full ensemble to estimate sampling error. We find that the theoretical results give a good description of the average variance of the counting estimate, particularly in a spatially averaged sense. This means that the theoretical scalings can be used in practice to understand how sampling error depends on ensemble size and level of predictability. Although the GCM-simulated precipitation departs somewhat from being Gaussian, the Gaussian fit estimate had smaller error than the counting estimate. The behavior of the GLM estimate is similar to that of the Gaussian fit estimate. The parametric estimators based on ensemble mean had the best performance; adding ensemble variance as a parameter did not reduce error. This means that with the moderate ensemble sizes typically used, differences between the forecast tercile probabilities and the equal-odds probabilities are due essentially to shifts of the forecast mean away from its climatological value rather than to changes in variance. Since differences between the forecast tercile probabilities and the equal-odds probabilities are a measure of predictability, this result means that predictability in the GCM is due to changes in ensemble mean rather than changes in spread. This result is consistent with Tippett et al. (2004), who found that differences between forecast and climatological GCM seasonal precipitation distributions as measured by relative entropy were basically due to changes in the mean rather than changes in the variance.

The reduced sampling error of the Gaussian fit and GLM is shown to translate into better simulation skill when the tercile probabilities are compared to actual observations. Examining the dependence of the RPSS on ensemble size shows that, although RPSS increases with ensemble size, model error limits the rate of improvement compared to the ideal case. Calibration improves RPSS, regardless of the probability estimator used. However, estimators with larger sampling error retain their disadvantage in RPSS even after calibration. The application of the Gaussian fit estimator to specific years shows that the parametric fit achieves its advantages while also producing probabilities that are spatially smoother than those estimated by counting.

In summary, our main conclusion is that carefully applied parametric estimators provide noticeably more accurate tercile probabilities than do counting estimates. This conclusion is completely rigorous for variables with Gaussian statistics. We find that for variables that deviate modestly from Gaussianity, such as seasonal precipitation totals, the error of the Gaussian fit tercile probabilities is smaller than that of the counting estimates. More substantial deviation from Gaussianity may be treated by transforming the data or using the related GLM approach.

Acknowledgments

We thank Lisa Goddard and Simon Mason for stimulating discussions and Benno Blumenthal for the IRI Data Library. GCM integrations were performed by David DeWitt, Shuhua Li, and Lisa Goddard with computer resources provided in part by the NCAR CSL. Comments from two anonymous reviewers greatly improved the clarity of this paper. IRI is supported by its sponsors and NOAA Office of Global Programs Grant NA07GP0213. The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

REFERENCES

  • Atger, F., 1999: The skill of ensemble prediction systems. Mon. Wea. Rev., 127 , 19411953.

  • Barnston, A. G., S. J. Mason, L. Goddard, D. G. Dewitt, and S. E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI. Bull. Amer. Meteor. Soc., 84 , 17831796.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., and T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction. Mon. Wea. Rev., 126 , 25032518.

  • DelSole, T., 2004: Predictability and information theory. Part I: Measures of predictability. J. Atmos. Sci., 61 , 24252440.

  • DelSole, T., and M. K. Tippett, 2007: Predictability, information theory, and stochastic models. Rev. Geophys., in press.

  • Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8 , 985987.

  • Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132 , 14341447.

    • Search Google Scholar
    • Export Citation
  • Kharin, V. V., and F. W. Zwiers, 2003: Improved seasonal probability forecasts. J. Climate, 16 , 16841701.

  • Kleeman, R., 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci., 59 , 20572072.

  • Kleeman, R., and A. M. Moore, 1999: A new method for determining the reliability of dynamical ENSO predictions. Mon. Wea. Rev., 127 , 694705.

    • Search Google Scholar
    • Export Citation
  • Kumar, A., A. G. Barnston, and M. P. Hoerling, 2001: Seasonal predictions, probabilistic verifications, and ensemble size. J. Climate, 14 , 16711676.

    • Search Google Scholar
    • Export Citation
  • McCullagh, P., and J. A. Nelder, 1989: Generalized Linear Models. Chapman and Hall, 387 pp.

  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • New, M., M. Hulme, and P. Jones, 2000: Representing twentieth-century space–time climate variability. Part II: Development of 1901–96 monthly grids of terrestrial surface climate. J. Climate, 13 , 22172238.

    • Search Google Scholar
    • Export Citation
  • Rajagopalan, B., U. Lall, and S. E. Zebiak, 2002: Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Mon. Wea. Rev., 130 , 17921811.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., 127 , 24732489.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., U. Lall, S. E. Zebiak, and L. Goddard, 2004: Improved combination of multiple atmospheric GCM ensembles for seasonal prediction. Mon. Wea. Rev., 132 , 27322744.

    • Search Google Scholar
    • Export Citation
  • Roeckner, E., and Coauthors, 1996: The atmospheric general circulation model ECHAM-4: Model description and simulation of present-day climate. Max Planck Institute for Meteorology Tech. Rep. 218, 90 pp.

  • Sardeshmukh, P. D., G. P. Compo, and C. Penland, 2000: Changes of probability associated with El Niño. J. Climate, 13 , 42684286.

  • Tippett, M. K., R. Kleeman, and Y. Tang, 2004: Measuring the potential utility of seasonal climate predictions. Geophys. Res. Lett., 31 .L22201, doi:10.1029/2004GL021575.

    • Search Google Scholar
    • Export Citation
  • Tippett, M. K., L. Goddard, and A. G. Barnston, 2005: Statistical–dynamical seasonal forecasts of central-southwest Asian winter precipitation. J. Climate, 18 , 18311843.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2002: Smoothing forecast ensembles with fitted probability distributions. Quart. J. Roy. Meteor. Soc., 128 , 28212836.

APPENDIX

Error in Estimating Tercile Probabilities

Variance of the counting estimate

As shown in (3), the variance of the counting estimate pN is (pp2)/N. When the forecast precipitation anomaly f has a Gaussian distribution with mean μf and variance σ2f , the probability p of the below-normal category is
i1520-0442-20-10-2210-ea1
where Φ is the normal cumulative distribution function, erf denotes the error function, and xb is the left tercile boundary of the climatological distribution. In this case, the counting estimate variance depends on the forecast mean and variance through
i1520-0442-20-10-2210-ea2
Similar relations hold for the above-normal category.
Suppose that the precipitation anomaly x is joint normally distributed with mean zero and variance σ2x. In this paper, the forecast f is the precipitation anomaly x conditioned on the SST. In this case, the left tercile boundary xb of the climatological pdf is σxx0, where x0 = Φ−1(1/3) ≈ −0.4307 is the left tercile boundary of a mean zero normal distribution with unit variance. Averaging x2 over all forecasts gives
i1520-0442-20-10-2210-ea3
which decomposes the climatological variance σ2x into signal and noise contributions. We denote the signal variance 〈μ2f〉 by σ2s and define the signal-to-noise ratio by
i1520-0442-20-10-2210-ea4
Equation (A2) gives the counting estimate variance for a particular forecast. The average counting estimate variance is found by taking the average of (A2) with respect to forecasts. Since the forecast mean is Gaussian with mean zero and variance σ2s = σ2fS2,
i1520-0442-20-10-2210-ea5
To show that the average counting estimate variance is a function of only the signal-to-noise ratio S2, we introduce the variable μ = μf/σf and use the fact that xb/σf = x01 + S2, to obtain
i1520-0442-20-10-2210-ea6
Equation (A6) gives 〈pp2〉 as a function of the signal-to-noise ratio S2. Numerical evaluation of the integral in (A6) suggests that we express this dependence using a new parameter g ≡ (1 + S2)−1/2:
i1520-0442-20-10-2210-ea7
To approximate the dependence of the average variance on the signal-to-noise ratio, we perform a series expansion about g = 1 corresponding to the signal-to-noise ratio S2 being zero. The first term is found from
i1520-0442-20-10-2210-ea8
and then numerical computation gives that
i1520-0442-20-10-2210-ea9
An approximation of 〈pp2〉 is
i1520-0442-20-10-2210-ea10
or in terms of the signal-to-noise ratio
i1520-0442-20-10-2210-ea11
This approximation is valid for small values of S2. A second-order [in powers of (g − 1)] approximation is more accurate for larger values of S and is given by
i1520-0442-20-10-2210-ea12
Since S2 is fairly small for seasonal forecasts, we will use the approximation in (A11).

Error of the Gaussian fit estimate

Suppose the distributions are indeed Gaussian. We fit the N-member forecast ensemble with a Gaussian distribution, using its sample mean mf and sample variance s2f defined by
i1520-0442-20-10-2210-ea13
where xi denotes the value of the ith member of the ensemble. Based on this information and using (A1), the Gaussian fit estimate gN of the probability of the below-normal category is
i1520-0442-20-10-2210-ea14
The squared error of the Gaussian fit probability estimate is
i1520-0442-20-10-2210-ea15
The error of the Gaussian fit probability estimate is due to the difference between the population values and the sample estimates of the forecast mean and variance.
If there is no predictability and the signal-to-ratio is zero, then the forecast mean μf is zero and the true tercile probability is 1/3 for all forecasts. Also, the forecast variance σ2f is equal to the climatological variance σ2x and does not have to be estimated from the ensemble. In this case, the squared error of the Gaussian fit estimate is
i1520-0442-20-10-2210-ea16
where we have made a Maclaurin expansion in mf and used the fact that xb = σxx0. The term O(m3f ) is small and can be neglected for sufficiently large ensemble size N; neglecting the higher-order terms leads to an underestimate in the final result of about 3.6% for N = 10. Since 〈m2f〉 = σ2x/N, the average (over forecasts) variance of the Gaussian fit tercile probability is
i1520-0442-20-10-2210-ea17
On the other hand, suppose that the forecast mean is not identically zero, but the forecast variance σf is constant and known. This means that there is predictability due to changes in forecast mean but not due to changes in forecast variance. The squared error of the Gaussian fit probability estimate is
i1520-0442-20-10-2210-ea18
The error of the Gaussian fit probability is due entirely to the error in estimating the mean. Expanding this expression in a Taylor series in powers of (mfμf) about mf = μf gives that the squared error is
i1520-0442-20-10-2210-ea19
We now take the expectation of the leading order term in (A19) with respect to realizations of the ensemble. Since the variance of the sample mean is 〈(μfmf)2〉 = σ2f/N, the average (over realizations of the ensemble) of the squared error of the Gaussian fit is
i1520-0442-20-10-2210-ea20
Equation (A20) gives the squared error of the Gaussian fit tercile probability for a particular forecast. Averaging (A20) over mean forecasts μf with mean zero and variance σ2s gives that the average variance of the Gaussian fit tercile probability is
i1520-0442-20-10-2210-ea21
where we use the relation x2b/σ2f = x20(1 + S2).

Fig. 1.
Fig. 1.

Spatial distribution of λ appearing in the Box–Cox transformation of Eq. (1).

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 2.
Fig. 2.

The 16th and 84th percentiles (see text for details) of the counting estimate pN (solid lines) and p plus and minus the standard deviation of the estimate pN (dashed lines) for p = 1/3 (dotted line).

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 3.
Fig. 3.

The (a) gamma distribution with shape and scale parameters (2, 1), respectively, and (b) rms error as a function of ensemble size N for the counting, Gaussian fit, and GLM tercile probability estimates.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 4.
Fig. 4.

Perfect model measures of potential probability forecast skill (a) RPSperfect and (b) RPSSperfect for DJF precipitation.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 5.
Fig. 5.

Percent difference between the gridpoint average of the theoretical and empirically estimated variance of the tercile probability estimate for the below-normal and above-normal categories.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 6.
Fig. 6.

(a) Spatial variation of the convergence factor − 0.0421868 + 0.264409/1 + S2. Difference of the theoretical convergence factor with the subsampled estimates from the (b) below-normal and (c) above-normal categories.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 7.
Fig. 7.

The rms error of the below-normal probability as a function of ensemble size N for the (a) one-parameter and (b) two-parameter estimates. The gray curves in (a) are the theoretical error levels for the counting and Gaussian fit methods. Fit-2 (GLM-2) denotes the two-parameter Gaussian (GLM) method.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 8.
Fig. 8.

(a) The rms error of the counting estimate of the below-normal tercile probability with ensemble size 20. The rms error of the counting error minus that of the (b) Gaussian fit and (c) the GLM based on the ensemble mean. The gridpoint averages are shown in the titles.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 9.
Fig. 9.

RPSS of (a) the counting-based probabilities and its difference with that of the (b) Gaussian and (c) GLM-estimated probabilities. Positive values in (b) and (c) correspond to increased RPSS compared to counting. The gridpoint averages are shown in the titles.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 10.
Fig. 10.

The fraction of land points with RPSS > 0.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 11.
Fig. 11.

As in Fig. 9 but for the Bayesian calibrated probabilities.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Fig. 12.
Fig. 12.

Probability of above-normal precipitation for DJF 1996 estimated by (a) counting and (b) Gaussian fit and DJF 1998 using (c) counting and (d) Gaussian fit.

Citation: Journal of Climate 20, 10; 10.1175/JCLI4108.1

Save
  • Atger, F., 1999: The skill of ensemble prediction systems. Mon. Wea. Rev., 127 , 19411953.

  • Barnston, A. G., S. J. Mason, L. Goddard, D. G. Dewitt, and S. E. Zebiak, 2003: Multimodel ensembling in seasonal climate forecasting at IRI. Bull. Amer. Meteor. Soc., 84 , 17831796.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., and T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction. Mon. Wea. Rev., 126 , 25032518.

  • DelSole, T., 2004: Predictability and information theory. Part I: Measures of predictability. J. Atmos. Sci., 61 , 24252440.

  • DelSole, T., and M. K. Tippett, 2007: Predictability, information theory, and stochastic models. Rev. Geophys., in press.

  • Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8 , 985987.

  • Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132 , 14341447.

    • Search Google Scholar
    • Export Citation
  • Kharin, V. V., and F. W. Zwiers, 2003: Improved seasonal probability forecasts. J. Climate, 16 , 16841701.

  • Kleeman, R., 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci., 59 , 20572072.

  • Kleeman, R., and A. M. Moore, 1999: A new method for determining the reliability of dynamical ENSO predictions. Mon. Wea. Rev., 127 , 694705.

    • Search Google Scholar
    • Export Citation
  • Kumar, A., A. G. Barnston, and M. P. Hoerling, 2001: Seasonal predictions, probabilistic verifications, and ensemble size. J. Climate, 14 , 16711676.

    • Search Google Scholar
    • Export Citation
  • McCullagh, P., and J. A. Nelder, 1989: Generalized Linear Models. Chapman and Hall, 387 pp.

  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12 , 595600.

  • New, M., M. Hulme, and P. Jones, 2000: Representing twentieth-century space–time climate variability. Part II: Development of 1901–96 monthly grids of terrestrial surface climate. J. Climate, 13 , 22172238.

    • Search Google Scholar
    • Export Citation
  • Rajagopalan, B., U. Lall, and S. E. Zebiak, 2002: Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Mon. Wea. Rev., 130 , 17921811.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc., 127 , 24732489.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., U. Lall, S. E. Zebiak, and L. Goddard, 2004: Improved combination of multiple atmospheric GCM ensembles for seasonal prediction. Mon. Wea. Rev., 132 , 27322744.

    • Search Google Scholar
    • Export Citation
  • Roeckner, E., and Coauthors, 1996: The atmospheric general circulation model ECHAM-4: Model description and simulation of present-day climate. Max Planck Institute for Meteorology Tech. Rep. 218, 90 pp.

  • Sardeshmukh, P. D., G. P. Compo, and C. Penland, 2000: Changes of probability associated with El Niño. J. Climate, 13 , 42684286.

  • Tippett, M. K., R. Kleeman, and Y. Tang, 2004: Measuring the potential utility of seasonal climate predictions. Geophys. Res. Lett., 31 .L22201, doi:10.1029/2004GL021575.

    • Search Google Scholar
    • Export Citation
  • Tippett, M. K., L. Goddard, and A. G. Barnston, 2005: Statistical–dynamical seasonal forecasts of central-southwest Asian winter precipitation. J. Climate, 18 , 18311843.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2002: Smoothing forecast ensembles with fitted probability distributions. Quart. J. Roy. Meteor. Soc., 128 , 28212836.

  • Fig. 1.

    Spatial distribution of λ appearing in the Box–Cox transformation of Eq. (1).

  • Fig. 2.

    The 16th and 84th percentiles (see text for details) of the counting estimate pN (solid lines) and p plus and minus the standard deviation of the estimate pN (dashed lines) for p = 1/3 (dotted line).

  • Fig. 3.

    The (a) gamma distribution with shape and scale parameters (2, 1), respectively, and (b) rms error as a function of ensemble size N for the counting, Gaussian fit, and GLM tercile probability estimates.

  • Fig. 4.

    Perfect model measures of potential probability forecast skill (a) RPSperfect and (b) RPSSperfect for DJF precipitation.

  • Fig. 5.

    Percent difference between the gridpoint average of the theoretical and empirically estimated variance of the tercile probability estimate for the below-normal and above-normal categories.

  • Fig. 6.

    (a) Spatial variation of the convergence factor − 0.0421868 + 0.264409/1 + S2. Difference of the theoretical convergence factor with the subsampled estimates from the (b) below-normal and (c) above-normal categories.

  • Fig. 7.

    The rms error of the below-normal probability as a function of ensemble size N for the (a) one-parameter and (b) two-parameter estimates. The gray curves in (a) are the theoretical error levels for the counting and Gaussian fit methods. Fit-2 (GLM-2) denotes the two-parameter Gaussian (GLM) method.

  • Fig. 8.

    (a) The rms error of the counting estimate of the below-normal tercile probability with ensemble size 20. The rms error of the counting error minus that of the (b) Gaussian fit and (c) the GLM based on the ensemble mean. The gridpoint averages are shown in the titles.

  • Fig. 9.

    RPSS of (a) the counting-based probabilities and its difference with that of the (b) Gaussian and (c) GLM-estimated probabilities. Positive values in (b) and (c) correspond to increased RPSS compared to counting. The gridpoint averages are shown in the titles.

  • Fig. 10.

    The fraction of land points with RPSS > 0.

  • Fig. 11.

    As in Fig. 9 but for the Bayesian calibrated probabilities.

  • Fig. 12.

    Probability of above-normal precipitation for DJF 1996 estimated by (a) counting and (b) Gaussian fit and DJF 1998 using (c) counting and (d) Gaussian fit.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1711 895 73
PDF Downloads 748 106 7