Comments on “Finite Samples and Uncertainty Estimates for Skill Measures for Seasonal Prediction”

Michael K. Tippett International Research Institute for Climate and Society, Palisades, New York

Search for other papers by Michael K. Tippett in
Current site
Google Scholar
PubMed
Close
,
Anthony G. Barnston International Research Institute for Climate and Society, Palisades, New York

Search for other papers by Anthony G. Barnston in
Current site
Google Scholar
PubMed
Close
, and
Timothy DelSole George Mason University, Fairfax, Virginia, and Center for Ocean–Land–Atmosphere Studies, Calverton, Maryland

Search for other papers by Timothy DelSole in
Current site
Google Scholar
PubMed
Close
Full access

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus, 61 Route 9W, Palisades, NY 10964. Email: tippett@iri.columbia.edu

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus, 61 Route 9W, Palisades, NY 10964. Email: tippett@iri.columbia.edu

1. Introduction

Different skill measures present differing dependence on the intrinsic predictability level as well as differing dependence on sample size. Kumar (2009) computed the expected skill of idealized forecasts with varying levels of predictability using the anomaly correlation (AC), the Heidke skill score (HSS), and the ranked probability skill score (RPSS) as skill measures. An important consideration for seasonal climate forecasts, as demonstrated by Kumar (2009), is that these skill measures may vary significantly from their expected values when they are computed from relatively short forecast histories. In carrying out his skill measure characterizations, Kumar (2009) assumed that the forecast variance was equal to the climatological variance. Additionally, when computing average skill scores, forecast distributions were assumed to have identical means (signals) for a given predictability level. Here we examine some implications of those assumptions.

This comment is organized as follows. In section 2 we present a perfect model framework for examining predictability. A natural requirement for predictability is that the forecast and climatological distributions be different. In the case of joint normal distributions, the existence of predictability (positive signal variance) implies that the forecast variance is less than the climatological variance. In this case, inflating the forecast variance to be equal to the climatological variance results in underconfident forecasts and lower probabilistic skill scores. In section 3 we compute the three skill scores for a given forecast signal (single initial condition) and for a set of forecasts with a specified signal to noise variance ratio (multiple initial conditions). The expected skill of a single forecast depends on both signal level and forecast variance. The AC depends on the ratio of squared signal to forecast variance, while the dependence of HSS and RPSS on signal level and forecast variance is more complex. A consequence of this functional dependence is that the AC of a set of forecasts is equal to that of a single forecast with the signal equal to the signal standard deviation. However, there is no similar relation for HSS and RPSS, and assuming a fixed signal as in Kumar (2009) generally overestimates HSS and always overestimates RPSS. We also present a useful approximation relating the RPSS and AC of a set of forecasts.

2. Predictability framework

A basic question of predictability studies is whether the future (verification) state υ of some quantity is predictable given the current (initial) state i, and if so, with what level of skill. Potential predictability and climate change studies examine the extent to which the climate system is determined by the specification of a boundary condition (e.g., sea surface temperature or land surface properties) or some other forcing (e.g., solar irradiance, aerosols or greenhouse gases). The predictability framework presented here can be applied to questions of potential predictability by interpreting i as a forcing or boundary condition and υ as the associated climate response.

The relation between the initial condition i and the verification υ is completely described by the joint probability distribution p(υ, i); we use the convention that the argument of p determines the probability distribution function in question. The most complete description of υ given i is the conditional distribution p(υ|i). We consider the conditional distribution p(υ|i) to be the “perfect” forecast distribution and base our discussion on it. If υ and i are independent, then υ is not predictable from i, and the conditional distribution p(υ|i) is equal to the unconditional or climatological distribution p(υ). This characterization is consistent with that of Lorenz who described the absence of predictability as when a forecast is no better than a random draw from the climatological distribution (Lorenz 1969; DelSole and Tippett 2007). Therefore, a fundamental property of predictability is that predictability exists only when the conditional distribution p(υ|i) differs from the climatological distribution p(υ). In practice, the climatological distribution is often estimated from recent observations, for instance, a recent 30-yr period in the case of seasonal climate prediction.

The best (in the sense of minimizing squared error) forecast of υ given the initial state i, is the conditional mean μυ|i given by
i1520-0493-138-4-1487-e1
where we use the notation E[·] to denote expectation. The conditional mean μυ|i is the mean of the forecast distribution p(υ|i). The conditional mean μυ|i, generally a nonlinear function of i, is referred to as the signal, assuming, without loss of generality, that the climatological mean E[υ] is zero. The forecast (conditional) variance is defined by
i1520-0493-138-4-1487-e2
and is a measure of the uncertainty of the forecast. The forecast variance depends in general on the initial condition i. The climatological (unconditional) variance συ2 can be decomposed as the sum of signal and mean noise variance:
i1520-0493-138-4-1487-e3
where we have used the property of the conditional mean that (υμυ|i) is uncorrelated with any function of i. This decomposition of the climatological variance is valid for arbitrary probability distribution functions. A sufficient condition for predictability is that the signal variance be nonzero, since in that case the forecast and climatology variances are different for at least some forecasts. This condition is not necessary since differences between the climatology and forecast probability distribution functions can also result from changes in the forecast variance, or from differences in higher-order moments.

In the context of predictability studies based on ensemble integrations, μυ|i is the ensemble mean, and is the variance of the ensemble members about the ensemble mean. The earliest diagnoses of seasonal climate predictability in dynamical models consisted of finding statistically significant differences between forecast and climatological variances (Shukla 1981; Rowell 1998).

More specific conclusions can be made when i and υ have a joint normal distribution, an assumption that we shall make from this point on. A standard result in this case is that the conditional mean is given by
i1520-0493-138-4-1487-e4
and that the conditional variance is independent of i and given by
i1520-0493-138-4-1487-e5
where ρ is the correlation between initial and verification states and σi2 is the variance of the initial state. Since the conditional mean is a linear function of the initial condition, ρ is the correlation between the conditional mean and verification, as well. The conditional mean coincides with the estimate arising from the linear regression between i and υ, and the signal variance is the explained variance of the linear regression.
Since normal distributions are determined by their first two moments, from (5) a necessary and sufficient condition for predictability is that the forecast and climatological variances differ, or more precisely that the forecast variance be less than the climatological variance; this occurs when the initial and verification states have nonzero correlation. Using (3) and (5), the correlation can be expressed in terms of the signal and noise variances as
i1520-0493-138-4-1487-e6
where is the signal-to-noise ratio. The correlation ρ and the signal-to-noise ratio S are equivalent measures of the level of predictability of the system.

3. Predictability and skill scores

The above framework allows us to examine the dependence of various skill scores on the level of predictability, and in the case when i and υ have a joint normal distribution, to obtain fairly explicit formulas. Following Kumar (2009) we examine three skill scores: AC, HSS, and RPSS. First, similar to Kumar et al. (2001) we examine the expected skill of a forecast with a specified signal level. Here, however, we do not assume that the noise variance is equal to the climatological variance συ2, because this assumption is inconsistent with there being predictability, as shown in (5). Second, we show the expected skill of a set of forecasts with specified signal-to-noise ratio. By prescribing the signal-to-noise ratio, we fix the value of the correlation [see Eq. (6)], but allow varying signal sizes, consistent with the signal-to-noise ratio. This differs from Kumar (2009) who examined the expected skill of a set of forecasts with the assumptions that (i) the noise variance is equal to the climatological variance and (ii) the signal is equal to its standard deviation for all the forecasts. We obtain results in fairly explicit form for these first two calculations. Finally, we examine the variability of the three skill measures when computed from finite samples.

Without loss of generality, we take unit climatological variance συ = 1 so that the forecast variance is given by and the signal variance is .

a. Expected skill for a given signal

We begin by computing the expected skill of a forecast with a specified signal level. For a forecast distribution with mean μυ|i and variance , the expected value of the square of the anomaly correlation for a given initial condition i is
i1520-0493-138-4-1487-e7
where we have used a conditional form of the signal-to-noise decomposition in (3); this result does not require the assumption of a joint normal distribution (Sardeshmukh et al. 2000). Figure 1a shows AC as a function of |μυ|i| for four different values of the correlation ρ. Since the conditional mean is 0 for all forecasts when ρ is identically 0 [see (6)], we interpret the case ρ = 0 as the limit of ρ approaching 0 and appreciate that as ρ approaches 0 so does the likelihood of the signal being of order 1. Since AC is a function of the ratio alone, all the curves collapse into a single curve if AC is plotted as a function of .
We base the HSS on forecasts of tercile category probabilities. The expected hit rate (HR) is the frequency with which the observation falls into the tercile category with the largest forecast probability. Therefore, for a reliable forecast, HR is given by
i1520-0493-138-4-1487-e8
where the probabilities b and a of the below- and above-normal terciles are, respectively,
i1520-0493-138-4-1487-e9
Here Φ is the cumulative distribution function of the mean-zero normal distribution with unit variance, and c = Φ−1(⅓) ≈ −0.431. This expression for the HR differs from that used in Kumar (2009), which does not allow the possibility that the near-normal category has the largest forecast probability. The expected HSS for a single forecast is
i1520-0493-138-4-1487-e10
The HSS is a function of both the forecast mean μυ|i and variance and cannot be written as a function of the ratio alone. Equivalently, using (5), HSS is a function of both μυ|i and the correlation ρ. Figure 1b shows HSS as a function of the shift size |μυ|i| for various values of the correlation ρ. The HSS is positive for a conditional mean of zero when the forecast has positive skill, but reaches useful positive levels only for highly skillful forecasts (Van den Dool and Toth 1991). The reason for this behavior is that the variance of a skillful forecast is less than the climatological variance, and therefore, a conditional mean of zero indicates enhanced likelihood of the near-normal category. As the value of the conditional mean of a skillful forecast increases from zero, the HHS decreases because the likelihood of the observations falling into the normal category decreases. When the value of the conditional mean of a highly skillful forecast passes through the value of the tercile boundary, the HSS takes on a relative minimum and then begins to increase again. For instance, when ρ = 0.9, Fig. 1b shows HSS having its minimum value near |μυ|i| = |c| ≈ 0.431. This transition occurs for smaller values of |μυ|i| for forecasts with larger variance (and lower ρ value).
The expected ranked probability score (RPS) of a reliable forecast can be written in terms of the tercile category probabilities as (Tippett et al. 2007):
i1520-0493-138-4-1487-e11
Like the HSS, the RPS is a function of both the forecast mean μυ|i and variance and cannot be written as a function of the ratio alone. The RPS of the climatological forecast is
i1520-0493-138-4-1487-e12
Therefore, when the climatological forecast is used as the reference forecast,
i1520-0493-138-4-1487-e13
Figure 1c shows RPSS as a function of the signal size |μυ|i| for various values of the correlation ρ. As was seen for the HSS, there is skill as measured by RPSS associated with mean zero forecasts for predictable systems.

b. Expected skill for a given signal-to-noise ratio

We now examine the expected skill of a set of forecasts where the intrinsic predictability level is set by the signal-to-noise ratio S. In the limit of large sample size, the AC of a set of forecasts is found by replacing the conditional expectations in (7) by unconditional expectations:
i1520-0493-138-4-1487-e14
where we use the notation AC to denote the AC of a set of forecasts. We emphasize that AC is not an average of (7) but rather the result of replacing the conditional expectations in (7) with unconditional expectations. A consequence of this procedure is that the AC in (7) has the same functional dependence on as AC has on . This relation between conditional and unconditional AC was noted by Kumar (2009).
The distribution of the AC for finite samples was first studied by Fisher (1915). The expected anomaly correlation 〈ACn of a set of n forecasts is (Hotelling 1953)
i1520-0493-138-4-1487-e15
and is related to the signal-to-noise ratio by (6); we use the notation 〈·〉n to denote the average over realizations of n forecasts. For modest values of n, the first term is an adequate approximation and indicates a slight negative bias for values of ρ other than 0 and 1. Figure 1d shows AC as a function of the square root of the signal-to-noise ratio using (6), plotted together with the results of a Monte Carlo simulation. The Monte Carlo simulation uses 5000 simulations of sets of 100 forecasts for each value of the signal-to-noise ratio (i.e., 500 000 random numbers are used).
In section 3a, we developed explicit expressions for RPSS and HSS as functions of μυ|i and . Because the conditional variance is related to the signal-to-noise ratio by
i1520-0493-138-4-1487-e16
the forecast variance is constant for a given signal-to-noise ratio S. The conditional mean μυ|i is normally distributed with mean zero and variance ρ2. Therefore the expected value of the HSS for a given signal to noise ratio is
i1520-0493-138-4-1487-e17
This integral can be evaluated numerically using, for instance, Gauss–Hermite quadrature; relatively many abscissa points (here 24) are required for accurate results when ρ is close to unity. Figure 1e shows HSS as a function of the square root of the signal-to-noise ratio using the above expression and using the Monte Carlo simulation of forecasts and observations. The expected value of HSS for a given signal-to-noise ratio is less overall than that obtained when a single value of the conditional mean is used (cf. Fig. 2 of Kumar 2009).
Likewise, the expected values of the RPS and RPSclim for a given signal-to-noise value are
i1520-0493-138-4-1487-e18
and
i1520-0493-138-4-1487-e19
The result in (19) is obtained either from averaging (12) over forecasts and noting that the average forecast probabilities of the above- and below-normal categories are ⅓, or by direct integration. We derive three approximations for RPS in terms of the predictability level in the appendix. A convenient one is
i1520-0493-138-4-1487-e20
The expected RPSS of the set of forecasts, denoted RPSS, is
i1520-0493-138-4-1487-e21
Figure 1f shows RPSS as a function of the square root of the signal-to-noise ratio S using the above integral expression, the approximation in (21), and a Monte Carlo simulation of forecasts and observations. The maximum error of the approximation in (21) is about 0.032. The expected value of RPSS for a given signal-to-noise ratio is again less than that obtained when a single value of the conditional mean is used (cf. Fig. 2 of Kumar 2009). This result is explained by Jensen’s inequality, which states that the average of a concave function is less than its value evaluated at its average. Therefore, since RPSS is a concave function of (RPSS is not a concave function of |μυ|i|), it follows that .

We note that computing HSS and RPS with finite samples gives unbiased estimates. That is, 〈HSSn = HSS and 〈RPSn = RPS, and there are no equations for HSS and RPSS corresponding to (15) for AC.

c. Variance of the skill for a given signal-to-noise ratio

We next examine the variability of the skill measures when their values are computed using a finite set of forecasts. Hotelling (1953) gave the following series approximation for the variance of the AC computed from a sample of size n:
i1520-0493-138-4-1487-e22
The standard deviation of the AC for n = 30 is shown in Fig. 1g along with the results of a Monte Carlo simulation of forecasts and observations. The Monte Carlo simulation uses 5000 simulations of sets of 30 forecasts.
Because the number of observations that fall into the dominant category has a binomial distribution, for a single forecast with dominant forecast category probability HR, the variance of the hit rate is HR(1 − HR). Averaging this with respect to different values of the signal leads to the following expression for the variance of the HSS because of the observation variability in a set of n forecasts:
i1520-0493-138-4-1487-e23
However, this calculation does not take into account the variability of the HSS due to the finite sample of signals present in the forecast set and therefore, it underestimates the variance. When the signal level is 0, there is no approximation and the variance of the HSS is ½n. Results based on the above integral expression as well as ones based on a Monte Carlo simulation of forecasts and observations for n = 30 are shown in Fig. 1h. We see that for small values of HSS, the expression in (23) is a reasonable approximation.
The variance of the RPS can be computed as
i1520-0493-138-4-1487-e24
where
i1520-0493-138-4-1487-e25
is the variance of RPS for a single forecast. However, computing the variance of the RPSS in closed form is less straightforward. Results based on a Monte Carlo simulation of forecasts and observations for n = 30 are shown in Fig. 1i.

4. Summary and conclusions

The difference between the climatological and forecast probability distribution functions is an indication of predictability. In the case of variables with a joint normal distribution, a necessary and sufficient condition for predictability is that the climatological and forecast variances are different. Making the forecast variance equal to the climatological variance results in underconfident forecasts and lower probabilistic skill scores. The expected skill as measured by the AC, HSS, and RPSS of a forecast with a specified signal level depends on both the signal level and the forecast variance. However, for forecast variances consistent with modest skill levels (ρ ≤ 0.5), the dependence on forecast variance is weak.

We compute the AC, HSS, and RPSS for a set of forecasts with specified signal-to-noise ratio; the forecast variance is constant and the signal is allowed to vary from one forecast to another, consistent with the signal-to-noise ratio. The HSS and RPSS values obtained in this manner are lower than those found in Kumar (2009), which for a given level of predictability used a set of forecasts in which all the forecasts had identical means equal to the signal standard deviation. Assuming a fixed signal results in overestimates of HSS and RPSS. We also provide a useful approximation that expresses expected RPSS values in terms of correlation values.

The variability of the three skill scores is computed when the sample size is finite. The variance of the AC for a finite set of forecasts with constant variance and varying signal can be expressed as a function of the signal-to-noise ratio, or correlation. The variances of the HSS and RPSS were found by Monte Carlo simulation.

Acknowledgments

The author is supported by a grant/cooperative agreement from the National Oceanic and Atmospheric Administration (NA05OAR4311004). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies. The authors gratefully acknowledge Maria R. D’Orsogna for her generous help with the Taylor series approximations.

REFERENCES

  • DelSole, T. , and M. K. Tippett , 2007: Predictability: Recent insights from information theory. Rev. Geophys., 45 , RG4002. doi:10.1029/2006RG000202.

    • Search Google Scholar
    • Export Citation
  • Fisher, R. A. , 1915: Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika, 10 , 507521.

    • Search Google Scholar
    • Export Citation
  • Hotelling, H. , 1953: New light on the correlation coefficient and its transforms. J. Roy. Stat. Soc. B Meteor., 15 , 193225.

  • Kumar, A. , 2009: Finite samples and uncertainty estimates for skill measures for seasonal prediction. Mon. Wea. Rev., 137 , 26222631.

    • Search Google Scholar
    • Export Citation
  • Kumar, A. , A. G. Barnston , and M. P. Hoerling , 2001: Seasonal predictions, probabilistic verifications, and ensemble size. J. Climate, 14 , 16711676.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N. , 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 , 289307.

  • Rowell, D. P. , 1998: Assessing potential seasonal predictability with an ensemble of multidecadal GCM simulations. J. Climate, 11 , 109120.

    • Search Google Scholar
    • Export Citation
  • Sardeshmukh, P. D. , G. P. Compo , and C. Penland , 2000: Changes of probability associated with El Niño. J. Climate, 13 , 42684286.

    • Search Google Scholar
    • Export Citation
  • Shukla, J. , 1981: Dynamical predictability of monthly means. J. Atmos. Sci., 38 , 25472572.

  • Tippett, M. K. , A. G. Barnston , and A. W. Robertson , 2007: Estimation of seasonal precipitation tercile-based categorical probabilities from ensembles. J. Climate, 20 , 22102228.

    • Search Google Scholar
    • Export Citation
  • Van den Dool, H. M. , and Z. Toth , 1991: Why do forecasts for near normal often fail? Wea. Forecasting, 6 , 7685.

APPENDIX

Approximations of the RPSS

The integral for the expected RPS of a set of tercile category forecasts as a function of the correlation ρ is
i1520-0493-138-4-1487-ea1
For arbitrary values of ρ there is no closed form expression for RPS(ρ). For ρ = 0 and ρ = 1, the integral can be evaluated to obtain and RPS(ρ = 1) = 0.
We can expand RPS(ρ) in a Taylor series in ρ to obtain (M. R. D’Orsogna 2009, personal communication)
i1520-0493-138-4-1487-ea2
Numerical computations indicate that RPS(ρ) is roughly linear in 1 − ρ2. Therefore, we can either expand RPS(ρ) in a Taylor series in 1 − ρ2 − 1 to obtain another approximation:
i1520-0493-138-4-1487-ea3
or use the secant approximation:
i1520-0493-138-4-1487-ea4
The Taylor expansions are quite accurate for small values of ρ. However, the secant approximation has the advantage of being exact at both ρ = 0 and ρ = 1. The maximum error of the approximations in (A2), (A3), and (A4) are 0.1378, 0.0727, and 0.0141, respectively, indicating that the secant method is a uniformly good approximation over the entire range of values of ρ. The Taylor series approximations have their maximum error at ρ = 1.

Fig. 1.
Fig. 1.

(a) AC, (b) HSS, and (c) RPSS as a function of the conditional mean μυ|i for ρ = 0.0, 0.5, 0.7, and 0.9; the skill is an increasing function of correlation. (d) AC, (e) HSS, and (f) RPSS for a set of forecasts with signal-to-noise ratio S; the thick lines are based on analytical expression and thin lines are based on 5000 Monte Carlo simulations. In (f), the dotted line is based on the approximation in (21). (g)–(j) The standard deviation of the quantities in (d)–(f) as a function of their expected values when a sample size of n = 30 is used; the thick and thin lines are as in (d)–(f).

Citation: Monthly Weather Review 138, 4; 10.1175/2009MWR3214.1

Save
  • DelSole, T. , and M. K. Tippett , 2007: Predictability: Recent insights from information theory. Rev. Geophys., 45 , RG4002. doi:10.1029/2006RG000202.

    • Search Google Scholar
    • Export Citation
  • Fisher, R. A. , 1915: Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika, 10 , 507521.

    • Search Google Scholar
    • Export Citation
  • Hotelling, H. , 1953: New light on the correlation coefficient and its transforms. J. Roy. Stat. Soc. B Meteor., 15 , 193225.

  • Kumar, A. , 2009: Finite samples and uncertainty estimates for skill measures for seasonal prediction. Mon. Wea. Rev., 137 , 26222631.

    • Search Google Scholar
    • Export Citation
  • Kumar, A. , A. G. Barnston , and M. P. Hoerling , 2001: Seasonal predictions, probabilistic verifications, and ensemble size. J. Climate, 14 , 16711676.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E. N. , 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21 , 289307.

  • Rowell, D. P. , 1998: Assessing potential seasonal predictability with an ensemble of multidecadal GCM simulations. J. Climate, 11 , 109120.

    • Search Google Scholar
    • Export Citation
  • Sardeshmukh, P. D. , G. P. Compo , and C. Penland , 2000: Changes of probability associated with El Niño. J. Climate, 13 , 42684286.

    • Search Google Scholar
    • Export Citation
  • Shukla, J. , 1981: Dynamical predictability of monthly means. J. Atmos. Sci., 38 , 25472572.

  • Tippett, M. K. , A. G. Barnston , and A. W. Robertson , 2007: Estimation of seasonal precipitation tercile-based categorical probabilities from ensembles. J. Climate, 20 , 22102228.

    • Search Google Scholar
    • Export Citation
  • Van den Dool, H. M. , and Z. Toth , 1991: Why do forecasts for near normal often fail? Wea. Forecasting, 6 , 7685.

  • Fig. 1.

    (a) AC, (b) HSS, and (c) RPSS as a function of the conditional mean μυ|i for ρ = 0.0, 0.5, 0.7, and 0.9; the skill is an increasing function of correlation. (d) AC, (e) HSS, and (f) RPSS for a set of forecasts with signal-to-noise ratio S; the thick lines are based on analytical expression and thin lines are based on 5000 Monte Carlo simulations. In (f), the dotted line is based on the approximation in (21). (g)–(j) The standard deviation of the quantities in (d)–(f) as a function of their expected values when a sample size of n = 30 is used; the thick and thin lines are as in (d)–(f).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1574 1287 32
PDF Downloads 236 60 3