## 1. Introduction

Appropriate verification tools are essential in understanding the abilities and weaknesses of (probabilistic) forecast systems.

Verification is often focused on specific (weather) events. Such a binary event either occurs, or does not occur, and is forecast to occur or not to occur, with certain probabilities *p* and 1 − *p* respectively. Examples of such events are more than 10-mm precipitation in 24 h or an anomaly (from a climatological mean) of more than 50 m of the geopotential at 500 hPa. Several well-established tools exist that test how accurately the forecast system is able to describe the occurrence and nonoccurrence of the event under consideration, that is, how good the agreement is between the forecasted probabilities and observed states. Examples of scores, which are commonly used by operational centers such as the European Centre for Medium-Range Weather Forecasts (ECMWF) and the National Centers for Environmental Prediction are Brier scores (Brier 1950), the Relative Operating Characteristics (ROC) curves (Mason 1982; Stanski et al. 1989), and economic cost–loss analyses (see, e.g., Katz and Murphy 1997; or Richardson 1998, 2000).

The (half) Brier score is one of the oldest verification tools in use. From its numerical value alone the quality of a forecast system is difficult to assess. An attractive property of the Brier score, however, is that it can be decomposed into a reliability, a resolution, and an uncertainty part (Murphy 1973). The reliability tests whether the forecast system has the correct statistical properties. It can be presented in a graphical way by the so-called reliability diagram. The uncertainty is the Brier score one would obtain when only the climatological frequency for the occurrence of the event is available. The resolution shows the impact obtained by issuing case-dependent probability forecasts (which do not always equal the probability based on climatology). Therefore, the decomposition of the Brier score gives a detailed insight into the performance of the forecast system with respect to the event under consideration.

Binary events only highlight one aspect of the forecast. Such a single aspect may be quite relevant. For instance, certain extreme events can lead to economic losses, which could be avoided with the help of an accurate forecast system. This kind of issue is addresses by the ROC curve and economic cost–loss analyses. However, it may be desirable to obtain a broader overall view of performance. Several tools in this direction exist. It should however be mentioned that the term overall is often still restricted to the behavior of one forecast parameter only, such as precipitation or the geopotential at 500 hPa.

An example is the Talagrand diagram (Talagrand and Vautard 1997), also known as the rank histogram (Hamill and Collucci 1997) or the binned probability ensemble (Anderson 1996). This tool is tailor made for an ensemble system, that is, in case the probability density function (PDF) is represented by an ensemble of forecasts. Given such an ensemble, its *N* members divide the permissible range of the parameter of interest into *N* + 1 bins. The verifying analysis will be found to be in one of these bins. If all members are assumed to be equally weighted and representative, it is expected that, on average, each bin should be equally populated by the verifying analyses. Deviations from such a flat rank histogram indicate a violation of the above-made assumptions. For instance, a too high frequency of outliers is an indication that the average spread within the ensemble system is too low.

Another example is the ranked probability score (RPS) (see Epstein 1969; Murphy 1969, 1971). It is a generalization of the (half) Brier score. Instead of two options (event occurs or does not occur), the range of the parameter of interest is divided into more classes. In addition, the RPS contains a sense of distance of how far the forecast was found from reality. For a deterministic forecast for instance, the RPS is proportional to the number of classes by which the forecast missed the verifying analysis. Although the choice and number of classes may be prescribed by the specific application, the exact value of RPS will depend on this choice. It is possible to take the limit of an infinite number of classes, each with zero width. This leads to the concept of the continuous ranked probability score (CRPS) (Brown 1974; Matheson and Winkler 1976; Unger 1985; Bouttier 1994). This CRPS has several appealing properties. First of all, it is sensitive to the entire permissible range of the parameter of interest. Second, its definition does not require, such as for the RPS, the introduction of a number of predefined classes, on which results may depend. In addition, it can be interpreted as an integral over all possible Brier scores. Finally, for a deterministic forecast, the CRPS is equal to the mean absolute error (MAE) and, therefore, has a clear interpretation.

Despite these advantages, the CRPS is a single quantity, from which it is difficult to disentangle the detailed behavior of a forecast system. It would be desirable to be able to decompose the CRPS like it is possible for the Brier score. In this paper it is shown how for an ensemble prediction system this indeed can be achieved. In a similar way to the Brier score, the CRPS is shown to be decomposable into a reliability part, an uncertainty part, and a resolution part. The reliability part tests whether for each bin *i* on average the verifying analysis was found to be with a fraction *i*/*N* below this bin. It has a close relation to the rank histogram. The uncertainty part is equal to the CRPS one would receive, in case only a PDF-based on climatology would be available. The resolution finally expresses the improvement gained by issuing probability forecasts that are case dependent. It is shown that the resolution is sensitive to the average ensemble spread and the frequency and magnitude of the outliers. Finally, it is illustrated how the various contributions to the CRPS can be presented in a graphical way, like the reliability diagram of the Brier score.

The paper is organized as follows. In section 2 the CRPS is defined, and some characteristics are mentioned. The uncertainty part of the CRPS is highlighted in section 3. In section 4, the full decomposition for an ensemble system is derived. As an example, the decomposition of the CRPS for total precipitation in the ensemble prediction system (EPS) running at ECMWF is presented in section 5. A summary and some concluding remarks are made in section 6.

## 2. The continuous ranked probability score

*x.*For instance,

*x*could be the 2-m temperature or 10-m wind speed. Suppose that the PDF forecast by an ensemble system is given by

*ρ*(

*x*) and that

*x*

_{a}is the value that actually occurred. Then the continuous ranked probability score (Brown 1974; Matheson and Winkler 1976;Unger 1985; Bouttier 1994), expressing some kind of distance between the probabilistic forecast

*ρ*and truth

*x*

_{a}, is defined as

*P*and

*P*

_{a}are cumulative distributions:

*P*(

*x*) is the forecasted probability that

*x*

_{a}will be smaller than

*x.*Obviously, for any cumulative distribution,

*P*(

*x*) ∈ [0, 1],

*P*(−∞) = 0, and

*P*(∞) = 1. This is also true for parameters that are only defined on a subdomain of ℜ. In that case

*ρ*(

*x*) = 0 and

*P*constant outside the domain of definition. The CRPS measures the difference between the predicted and occurred cumulative distributions. Its minimal value of zero is only achieved for

*P*=

*P*

_{a}, that is, in the case of a perfect deterministic forecast. Note that the CRPS has the dimension of the parameter

*x*(which enters via the integration over

*dx*).

*k*labels the considered grid points and cases. The weights

*w*

_{k}may depend on

*k*(for instance proportional to the cosine of latitude).

The CRPS can be seen as the limit of a ranked probability score with an infinite number of classes, each with zero width.

*x*

_{t}. The event is said to have happened (

*O*= 1) if

*x*

_{a}⩽

*x*

_{t}and not happened (

*O*= 0) if

*x*

_{a}>

*x*

_{t}. If

*p*is the forecast probability that the event will occur, the Brier score is defined as

*p*

^{k}=

*P*

^{k}(

*x*

_{t}) and

*O*

^{k}=

*P*

^{k}

_{a}

*x*

_{t}) and therefore

*x*=

*x*

_{d}without any specified uncertainty,

*P*(

*x*) =

*H*(

*x*−

*x*

_{d}). In that case, the integrand of Eq. (1) is either zero or one. The nonzero contributions are found in the region where

*P*(

*x*) and

*P*

_{a}(

*x*) differ, which is the interval between

*x*

_{d}and

*x*

_{a}. As a result,

## 3. The uncertainty of the CRPS

*x*is available, the same probability forecast

*P*

^{k}=

*P*

_{cli}will be made for each situation. In that case,

_{k}

*w*

_{k}= 1 by definition, and

*H*

^{2}=

*H.*If one defines

*P*

_{sam}is the cumulative distribution based on the sample used in the verification. If, for instance, all

*M*weights would be equal, so

*w*

_{k}= 1/

*M,*then

*P*

_{sam}(

*x*) is just the fraction of cases in which the verifying analysis was found to be smaller than

*x.*The value of

*P*

_{sam}(

*x*) also equals the sample frequency of occurrence

*o*(

*x*

_{t}) for the Brier score with threshold

*x*

_{t}=

*x.*

From Eqs. (10)–(12) it is seen that the CRPS based on climatology is minimal when *P*_{cli} is equal to *P*_{sam}. The impact on the CRPS due to a deviation from the sample statistics is expressed by Eq. (11).

*U*(Murphy 1973; or see, e.g., Wilks 1995) of the Brier score over all possible thresholds:

*x*<

*x*

_{t}occurred. Therefore, it is very natural to define

*U*

*ρ*

_{sam}=

*dP*

_{sam}/

*dx,*because the main contribution to the integral in Eq. (12) comes from the region in

*x*where

*P*

_{sam}is significantly different from 0 and 1. An illustration is given in Fig. 1. To be more exact, the sample distribution

*ρ*

_{sam}can always be written as

*ρ*

_{0}is a distribution with

*σ*= 1 (for instance similar to a standardized Gaussian) and

*P*

_{0}[see Eq. (2)] its cumulative distribution. From the uncertainty

*σ.*

It should be noted that the term climatology depends on the degree of desired sophistication. The most crude level would be to assume the same climatological distribution at all grid points and cases. The mean climatological value of *x,* however, may be quite location and seasonal dependent. The mean 2-m temperature of Norway in January, for instance, is much lower than that of Spain in March. This would result in a very broad sample distribution and, therefore, to a large uncertainty. In order to correct for this, as a first step, the variable *x* can be redefined as being the anomaly with respect to the local climatology. The definition of the CRPS is invariant for such a shift in the variable *x,* as is easily seen from Eq. (1). As a consequence, the distribution *P*_{sam} will change, because for each *k* in Eq. (9) a different shift may have been applied. This should result in a distribution that is much sharper, so the uncertainty *U*

Finally, the entire climatological distribution (so not just its mean) could be chosen to depend on the location and/or season, so *P*^{k} = *P*_{cli,location,season}. For this, the best achievable distribution would be a location/seasonal-dependent sample distribution, also given by Eq. (9) but in which the sum (and the normalization of the weights) is restricted to all points *k* that belong to the same location and or season. Again, the resulting uncertainty is expected to become lower. For parameters like precipitation this will also lead to a lower uncertainty.

*H*(

*x*−

*x*

^{k}

_{a}

*H*(

*x*−

*x*

^{l}

_{a}

*x*

^{k}

_{a}

*x*

^{l}

_{a}

*x*

^{l}

_{a}

*x*

^{k}

_{a}

*U*

*P*

_{sam}will be piecewise constant (see, e.g., Fig. 1). It is zero for

*x*= −∞ and each time an

*x*

^{k}

_{a}

*w*

_{k}. Beyond the largest verifying analysis in the set,

*P*

_{sam}= 1. Now if the

*x*

^{k}

_{a}

*p*

_{k}

*p*

_{k−1}

*w*

_{sort(k)}

*p*

_{0}

*M*

^{2}, where

*M*is the size of the sample set. If

*M*becomes on the order of a few thousand, this evaluation becomes time consuming. In addition, roundoff errors are expected to become nonnegligible. Method (20) only involves a sum of order

*M.*The price to be paid is that the

*x*

^{a}

_{k}

*M*log(

*M*). Therefore, this latter method is still quite feasible and accurate for very large samples.

## 4. The CRPS for an ensemble system

### a. The cumulative distribution of an ensemble

*x*this means that the cumulative distribution forecasted by the ensemble system is given by

*x*

_{1}, . . . ,

*x*

_{N}are the outcomes of the

*N*ensemble members. From now on it is assumed that the members are ordered, that is,

*x*

_{i}

*x*

_{j}

*i*

*j.*

*P*is a piecewise constant function. Transitions occur at the values

*x*

_{i}:

*x*

_{0}= −∞ and

*x*

_{N+1}= ∞ are introduced for convenience. An example of the cumulative distribution for an ensemble of five members is given (thick solid curve) in Fig. 2.

### b. Decomposition for a single case

*x*

_{a},

*H*(

*x*−

*x*

_{a}) will be either 0, or 1, or partly 0, partly 1, in the interval [

*x*

_{i},

*x*

_{i+1}]. For each of these three possible situations,

*c*

_{i}can be written as

*c*

_{i}

*α*

_{i}

*p*

^{2}

_{i}

*β*

_{i}

*p*

_{i}

^{2}

*α*

_{i}and

*β*

_{i}have the dimension of the parameter

*x.*

For the example given in Fig. 2, the verifying analysis is in between *x*_{3} and *x*_{4}. Therefore, for this case *β* = 0 for *i* = 1 and 2, and *α* = 0 for *i* = 4. Only for *i* = 3 both *α* and *β* are nonzero.

*i*= 0 and

*i*=

*N.*These concern the intervals (−∞,

*x*

_{1}] and [

*x*

_{N}, ∞), respectively, and for which

*p*

_{i}= 0 and

*p*

_{i}= 1, respectively. These two intervals will only contribute to the CRPS in cases when the verifying analysis is an outlier, that is, when it is outside the range of the ensemble. In this situation Eq. (25) can also be used, but with

*β*

_{0}, being the difference between

*x*

_{a}and the smallest ensemble member. In the second case,

*α*

_{N}is nonzero and equal to the distance of

*x*

_{a}from the largest ensemble member. Outliers can contribute significantly to the CRPS, because nonzero values of

*β*

_{0}and

*α*

_{N}are weighted stronger than other

*α*’s and

*β*’s (see, e.g., the shaded areas in Fig. 3).

### c. The average over a set of cases

*M*cases and/or grid points, each with a weight

*w*

_{k}, the average CRPS [Eq. (5)] can be found as

*α*

_{i}and

*β*

_{i}.

*α*

_{i}and

*β*

_{i}can be expressed into two quantities

*g*

_{i}and

*o*

_{i}, which both have a physical interpretation. First the case 0 <

*i*<

*N*is considered. Let

*g*

_{i}is the average width of bin number

*i*:

*i.*Then, for most cases, the verifying analysis will not lie in the interval [

*x*

_{i},

*x*

_{i+1}]. Therefore, usually,

*α*

_{i}will be zero and

*β*

_{i}is equal to the width of bin number

*i,*or vice versa. The first case applies to the situation in which the verifying analysis was found to be smaller than the ensemble member

*i,*as can be seen from Eq. (26), the second case to which it was found to be larger than member

*i*+ 1. Taking this in mind,

*o*

_{i}can be seen to be closely related to the average frequency that the verifying analysis was found to be below ½(

*x*

_{i}+

*x*

_{i+1}). Ideally these observed frequencies should match with the forecasted probability that the verifying analysis is to be found below the

*i*th interval. Such a consistency is closely related to the flatness of the rank histogram [also known as Talagrand diagram or binned probability ensemble; see, e.g., Anderson (1996), Talagrand and Vautard (1997), or Hamill and Collucci (1997)].

*o*

_{0}and

*o*

_{N}is defined as the (weighted) frequency that

*x*

_{a}was found to be smaller than

*x*

_{1}and

*x*

_{N}, respectively. Here,

*g*

_{0,N}is defined as the average length of the outlier, given that it occurred:

*i*= 0, . . . ,

*N,*so including the outliers

The quantity *p* was forecast, on average, the event occurred with that fraction *p.* Here, it is tested whether, on average, the frequency *o*_{i} that the verifying analysis was found to be below the middle of interval number *i* is proportional to *i*/*n.* Therefore, it is tested here whether the ensemble is capable of generating cumulative distributions that have, on average, this desired statistical property. The reliability (36) is closely connected to the rank histogram, which shows whether the frequency that the verifying analysis was found in bin number *i* is equal for all bins. The rank histogram does not take care of the width of the ensemble. It only counts how often the verifying analysis was located in a bin, regardless of the width of the bins. The reliability *α*_{i} and *β*_{i} and therefore *o*_{i}. Note that *x*), while the reliability of the Brier score is dimensionless. The term CRPS_{pot} given in Eq. (37) is called the potential CRPS (in analogy with Murphy and Epstein 1989), because it is the CRPS one would obtain after the probabilities *p*_{i} would have been retuned, such that the system would become perfectly reliable, that is, for which *g*_{i} and the smaller Eq. (37). The potential CRPS is also sensitive to outliers. Too many and too large outliers will result in large values of *g*_{0}*o*_{0} and *g*_{N}(1 − *o*_{N}) and therefore affect CRPS_{pot} considerably. Although the small average bin widths *g*_{1}, . . . , *g*_{N} of an ensemble system with a too small spread may have a positive impact on the potential CRPS, the too high frequency of outliers and the large magnitudes of such outliers will have a clear negative impact. Given a certain degree of unpredictability, the optimal value for CRPS_{pot} will be achieved for an ensemble system in which the spread and the statistics of outliers are in balance.

*U*

*x*in Eq. (12) is to be approximated by a sum over intervals Δ

*x*

_{i}, each representing an equal part of 1/

*N*of integrated probability. The Δ

*x*

_{i}may be identified with the widths

*g*

_{i}and the

*P*

_{sam}(

*x*

_{i}) with the observed frequencies

*o*

_{i}. As a result, these approximations lead to Eq. (37). It may be clear that it is desirable for an ensemble system that CRPS

_{pot}is smaller than the potential CRPS based on climatology. Therefore, the potential CRPS may, although perhaps somewhat artificially, be further decomposed into

### d. Relation to the decomposition of the Brier score

In section 2 it was shown that the

*x*) may be stratified with respect to the set of allowable probabilities

*p*

_{i}= 0, 1/

*N,*. . . , 1:

*g*

_{i}is the (weighted) fraction of cases in which a probability

*p*=

*p*

_{i}was issued, while

*o*

_{i}is the fraction of such cases in which indeed the event was observed. Note that both quantities depend on the value of the threshold

*x.*

*g*

_{i}(

*x*) and

*g*

_{i}(

*x*)

*o*

_{i}(

*x*) over

*x*is equal to the

*g*

_{i}

*g*

_{i}

*o*

_{i}, respectively, defined by Eqs. (30)–(33). When integral (7) is performed, the relation between decompositions (39) and (41) can be established:

*z*

_{i}=

*o*

_{0},

*o*

_{1}, . . . , (1 −

*o*

_{N})

*D*will be nonzero. Therefore, the integration of the resolution and reliability of the Brier score over all possible thresholds, in general, differs from the reliability and resolution, respectively, of the

*U*(

*x*) is equal to the uncertainty of the

*i*<

*N,*

*D*are positive definite. Only when

*o*

_{i}(

*x*) does not depend on

*x,*they are zero. Therefore this part of 〈Reli〉 is stricter than the corresponding part of

*g*

_{i}(

*x*) is infinite, and therefore Eq. (49) is not valid for

*i*= 0,

*N.*

The quantities 〈Reli〉 and 〈Resol〉, as well as *D,* involve integrals over *g*_{i}*o*^{2}_{i}*g*_{i} and *g*_{i}*o*_{i} (see the appendix), difficult to perform analytically. Therefore, in practice, it is a tedious procedure to evaluate 〈Reli〉 and 〈Resol〉. Besides, 〈Reli〉 does not have the same clear relation to the rank histogram as

## 5. Decomposition for the EPS at ECMWF

The ideas developed in the previous sections will be illustrated by the performance of the ensemble prediction system running at ECMWF. This ensemble forecasting system (see Molteni et al. 1996; Buizza and Palmer 1998; Buizza et al. 1999) consists of 50 perturbed forecasts plus a control forecast integrated with the ECMWF T_{L}159L31 primative equation (PE) model up to day 10. For seven cases in the summer of 1999, the *w*_{k} [see Eq. (5)] were chosen to be proportional to the cosine of latitude. As verifying analysis the precipitation accumulated within the first 24 h of the ECMWF operational T_{L}319L50 PE model forecasts was taken [for a discussion on this choice, see the appendix of Buizza et al. (1999)].

Table 1 shows the ^{−1}) at day 2 to 1.31 mm (24 h^{−1}) at day 10, expressing a decreasing predictability as a function of forecast time. The reliability only forms a small part of the ^{−1} at day 2, to 0.086 mm (24 h)^{−1} at day 10. Therefore, the first days, EPS significantly outperforms a forecast based on climatology, while for longer forecast periods there is an onset of convergence to climatology.

In order to be able to understand these trends in more detail, in Figs. 4, 5, and 6 a graphical representation of the reliability, uncertainty, and resolution is displayed for forecast days 3, 6, and 9, respectively. In the top panels the observed frequencies *o*_{i} as defined in Eqs. (31) and (33) are plotted as a function of the fraction of members *p*_{i}. Any deviation from the diagonal will contribute to the reliability Reli defined in Eq. (36). The lower panels of Figs. 4–6 show (staircase curve) the accumulation of the average bin widths *g*_{i}, as defined in Eq. (30). The leftmost and rightmost bins show the average magnitude *g*_{0} and *g*_{N}, respectively, of the outliers [see Eq. (33)]. The width of this curve determines the potential CRPS, because CRPS_{pot} can be seen as the integral over this curve with the weight function *o*_{i}(1 − *o*_{i}). The narrower the staircase curve, the smaller the region for which the weight function is significantly different from zero, and as a result, the smaller CRPS_{pot} is. In addition, the lower panels show the cumulative distribution (“smooth” curve) of the sample climatology, as defined in Eq. (9). As is illustrated by Fig. 1, for example, the uncertainty *U**P*_{sam}. In addition (see discussion at the end of section 4c) it can be seen as the expected CRPS of a forecast system based on the climatology of the sample. The difference in widths between the staircase curve and the cumulative distribution, therefore, is a measure for the resolution (38).

The discrepancy from perfect reliability for the first forecast days is mainly due to the lower bins of the ensembles, as can be seen in Fig. 4 for day 3. The frequency that the verifying analysis is found to be below these bins is too high. It occurs too often that all members predict at least some precipitation, while it remained dry (based on climatology as can be seen from *P*_{sam} in the lower panel of Fig. 4, the probability that it remains dry is about 50%). However, for these cases, the amount of precipitation of the member with the smallest amount of rain is on average quite small (around 0.3 mm; see *g*_{0} in bottom panel of Fig. 4). Therefore this mild overestimation of precipitation will not contribute very strongly to Reli. Such a delicate analysis would not be visible from the rank histogram. It would only show a too high frequency of outliers.

The high resolution of the EPS for day 3 can clearly be seen from the bottom panel of Fig. 4. The average bin widths of the ensemble, including the outliers, is, compared to *P*_{sam}, considerably small. The climatological distribution has a large tail for high amounts of precipitation. Apparently, for such cases, the EPS was capable of generating sharp ensembles with fair amounts of precipitation. This is the reason why the size of the outlier *g*_{N} is reasonably small. The reduction of resolution with increasing forecast time is well illustrated by comparing the lower panels of Figs. 4–6. At day 3, the ensemble is much sharper than *P*_{sam}, while at day 9, it is quite similar to the sample distribution, leaving only a low value of resolution.

## 6. Concluding remarks

In this paper it was shown how for an ensemble prediction system, the continuous ranked probability score can be decomposed into three parts. This decomposition is very similar to that of the Brier score. The first part, reliability, is closely related to the rank histogram. An important difference, however, is that the reliability of the CRPS is sensitive to the width of the ensemble bins, while the rank histogram gives each forecast the same weight. The reliability should be zero for an ensemble system with the correct statistical properties. The second part, uncertainty, is the best achievable value of the continuous ranked probability score, in case only climatological information is available. It was discussed that in contrast to the uncertainty of the Brier score, the value of uncertainty depends on the degree of sophistication. The third term, the resolution, expresses the superiority of a forecast system with respect to a forecast system based on climatology. The uncertainty/reliability part was found to be both sensitive to the average spread within the ensemble, and to the behavior of the outliers. It was shown that the proposed decomposition is not equal to the integral over the decomposition of the Brier score.

It was illustrated how the reliability part could be presented in a graphical way. In addition, it was shown how the resolution part of the CRPS can be visualized by looking at the difference between the sample climate distribution and the accumulated average bin widths of the ensemble system. As an example the decomposition for total precipitation for seven summer cases in 1999 of the ECMWF ensemble prediction system was considered.

In this paper attention was focused on ensemble forecasts, for which the allowable set of forecasted probabilities is finite. However, in general, a forecast system could issue any probability between 0 and 1. Such systems could be regarded as the limit of *N* → ∞, of an *N*-member ensemble, in which the *i*th member is positioned at the location where the cumulative distribution has the value *P*(*x*_{i}) = *p*_{i} = *i*/*N.* Therefore, the decomposition of the CRPS, given in section 4, can be extended to any continuous forecast system. As a result, the summations over probabilities *p*_{i} in the definitions of reliability, resolution, and uncertainty will transform into integrals (from 0 to 1) over probabilities. In order to evaluate such integrals for continuous systems, it is more sensible to discretize the allowable set of probabilities, than to discretize the variable *x.* Therefore, in practice, the evaluation of the CRPS and its decomposition for continuous forecast systems exactly reduces to the method proposed in section 4.

The continuous ranked probability score is a verification tool that is sensitive to the overall (with respect to a certain parameter) performance of a forecast system. By using the decomposition proposed in this paper, it was argued how for an ensemble prediction system, a detailed picture of this overall behavior can be obtained.

## Acknowledgments

The author would like to thank François Lalaurette at ECMWF and Kees Kok at KNMI for stimulating discussions.

## REFERENCES

Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations.

*J. Climate,***9,**1518–1530.Bouttier, F., 1994: Sur la prévision de la qualité des prévisions météorologiques. Ph.D. thesis, Université Paul Sabatier, Toulouse, France, 240 pp. [Available from Libray, Université Paul Sabatier, route de Narbonne, Toulouse, France.].

Brier, G. W., 1950: Verification of forecasts expressed in terms of probability.

*Mon. Wea. Rev.,***78,**1–3.Brown, T. A., 1974: Admissible scoring systems for continuous distributions. Manuscript P-5235, The Rand Corporation, Santa Monica, CA, 22 pp. [Available from The Rand Corporation, 1700 Main St., Santa Monica, CA 90407-2138.].

Buizza, R., and T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction.

*Mon. Wea. Rev.,***126,**2503–2518.——, A. Hollingsworth, F. Lalaurette, and A. Ghelli, 1999: Probabilistic predictions of precipitation using the ECMWF Ensemble Prediction System.

*Wea. Forecasting,***14,**168–189.Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories.

*J. Appl. Meteor.,***8,**985–987.Hamill, T., and S. J. Colucci, 1997: Verification of Eta–RSM short-range ensemble forecasts.

*Mon. Wea. Rev.,***125,**1312–1327.Katz, R. W., and A. H. Murphy, 1997:

*Economic Value of Weather and Climate Forecasts.*Cambridge University Press, 222 pp.Mason, I., 1982: A model for assessment of weather forecasts.

*Aust. Meteor. Mag.,***30,**291–303.Matheson, J. E., and R. L. Winkler, 1976: Scoring rules for continuous probability distributions.

*Manage. Sci.,***22,**1087–1095.Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF Ensemble Prediction System: Methodology and validation.

*Quart J. Roy. Meteor. Soc.,***122,**73–119.Murphy, A. H., 1969: On the “ranked probability score.”

*J. Appl. Meteor.,***8,**988–989.——, 1971: A note on the ranked probability score.

*J. Appl. Meteor.,***10,**155–156.——, 1973: A new vector partition of the probability score.

*J. Appl. Meteor.,***12,**595–600.——, and E. S. Epstein, 1989: Skill scores and correlation coefficients in model verification.

*Mon. Wea. Rev.,***117,**572–581.Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, 1989:

*Numerical Recipes: The Art of Scientific Computing.*Cambridge University Press, 818 pp.Richardson, D., 1998: Obtaining economic value from the EPS.

*ECMWF Newsletter,*Vol. 80, 8–12.——, 2000: Skill and relative economic value of the ECMWF Ensemble Prediction System.

*Quart J. Roy. Meteor. Soc.,***126,**649–668.Stanski, H. R., L. J. Wilson, and W. R. Burrows, 1989: Survey of common verification methods in meteorology. Atmospheric Environment Service Research Rep. 89-5, 114 pp. [Available from Forecast Research Division, 4905 Dufferin St., Downsview, ON M3H 5T4, Canada.].

Talagrand, O., and R. Vautard, 1997: Evaluation of probabilistic prediction systems.

*Proc. ECMWF Workshop on Predictability,*Reading, United Kingdom, ECMWF, 1–25.Unger, D. A., 1985: A method to estimate the continuous ranked probability score. Preprints,

*Ninth Conf. on Probability and Statistics in Atmospheric Sciences,*Virginia Beach, VA, Amer. Meteor. Soc., 206–213.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences.*Academic Press, 467 pp.

## APPENDIX

### Some Technical Details

In this appendix the relation between the various terms of the Brier score defined in Eq. (40) and the terms of the continuous ranked probability score given in Eq. (39) will be determined.

*I*(

*x, a, b*) be defined by

*g*

_{i}(

*x*) and frequencies

*o*

_{i}(

*x*) introduced in Eq. (40) can be written as

*g*

_{i}are normalized and

*o*(

*x*) is related to the cumulative distribution of the sample. From the expressions

*i*<

*N,*where

*β*

^{k}

_{i}

*g*

_{i}and

*g*

_{i}

*o*

_{i}are defined in Eq. (32).

Continuous ranked probability score and its decomposition into reliability, resolution, and uncertainty [see Eq. (39)] of total precipitation accumulated in the 24 h prior to the displayed forecast day for seven cases in the summer of 1999 for the ECMWF ensemble prediction system. The dimension of these quantities is mm (24h)^{−1}