A Note On the Maximum Peirce Skill Score

Agostino Manzato Osservatorio Meteorologico Regionale dell’ARPA (OSMER), Visco, Udine, Friuli Venezia Giulia, Italy

Search for other papers by Agostino Manzato in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Binary classifiers are obtained from a continuous predictor using a threshold to dichotomize the predictor value into event occurrence and nonoccurrence classes. A contingency table is associated with each threshold, and from this table many statistical indices (like skill scores) can be computed. This work shows that the threshold that maximizes one of these indices [the Peirce skill score (PSS)] has some important properties. In particular, at that threshold the ratio of the two likelihood distributions is always 1 and the event posterior probability is equal to the event prior probability. These properties, together with the consideration that the maximum PSS is the point with the “most skill” on the relative operating characteristic curve and the point that maximizes the forecast value, suggest the use of the maximum PSS as a good scalar measure of the classifier skill. To show that this most skilled point is not always the best one for all the users, a simple economic cost model is presented.

Corresponding author address: Agostino Manzato, Osservatorio Meteorologico Regionale dell’ARPA, Via Oberdan 18, I-33040 Visco (UD), Friuli Venezia Giulia, Italy. Email: agostino.manzato@osmer.fvg.it

Publisher's Note: This article was revised on 7 October 2016 to correct an error in Eq. 12 that was present when originally published.

Abstract

Binary classifiers are obtained from a continuous predictor using a threshold to dichotomize the predictor value into event occurrence and nonoccurrence classes. A contingency table is associated with each threshold, and from this table many statistical indices (like skill scores) can be computed. This work shows that the threshold that maximizes one of these indices [the Peirce skill score (PSS)] has some important properties. In particular, at that threshold the ratio of the two likelihood distributions is always 1 and the event posterior probability is equal to the event prior probability. These properties, together with the consideration that the maximum PSS is the point with the “most skill” on the relative operating characteristic curve and the point that maximizes the forecast value, suggest the use of the maximum PSS as a good scalar measure of the classifier skill. To show that this most skilled point is not always the best one for all the users, a simple economic cost model is presented.

Corresponding author address: Agostino Manzato, Osservatorio Meteorologico Regionale dell’ARPA, Via Oberdan 18, I-33040 Visco (UD), Friuli Venezia Giulia, Italy. Email: agostino.manzato@osmer.fvg.it

Publisher's Note: This article was revised on 7 October 2016 to correct an error in Eq. 12 that was present when originally published.

1. Introduction

Signal detection theory (Green and Swets 1966) is a technique used to verify the goodness of a classifier and is widespread in many disciplines, such as signal processing, pattern recognition, medicine diagnosis, psychological testing, and also weather forecasting. In forecast verification a contingency table (e.g., Wilks 2006) is built for assessing the quality of a binary classifier, which is a tool to forecast the occurrence of an event. Let us introduce these concepts in more detail.

Given N samples, x1, . . . , xN, of a continuous predictor X (which can be a simple observed variable or the output of a complex model, which uses as inputs many observed variables), one can draw the histogram, normalized by N, which estimates the probability density function, that will be called p(x). For a particular event, the joint distribution of the predictor X and of the event observations—a binary variable—can be used to split p(x) in two conditional components: the function fY(x) when there is the event occurrence (e.g., for the NYES cases of rain occurrence) and its complement fN(x) (i.e., the NNO = NNYES cases without rain). These two functions are not used in this form, because they are not probabilities. In fact, if we integrate them on the X domain we do not obtain 1, but obtain the sample estimate of the event (PYES) and nonevent (PNO) prior probabilities:
i1520-0434-22-5-1148-e1

Instead, what is usually done is to consider the two conditional density probabilities, obtained by normalizing fY and fN by the prior probabilities. In particular, the first conditional density probability, also called the event likelihood, is that of having a value x given that there is the event occurrence: p(x | YES) = fY(x)/PYES. The latter conditional density probability, namely the nonevent likelihood, is that of having a value x given that there is nonoccurrence: p(x | NO) = fN(x)/PNO.

If a threshold, let us say x, is used to dichotomize the domain of the values of X, what is obtained is a binary event classifier based on the X predictor. Without any loss of generality, let us suppose that this classifier forecasts event occurrence if x > x and nonoccurrence in the opposite situation. For example, there will be a false alarm in all the NNO · P(x > x | NO) = b cases. On the other hand, the classifier will miss NYES · P(xx | YES) = c events.

In this way, it is finally possible to build the contingency table and derive many statistical measures of the classifier skill, as is done for the example in Table 1. In this case the event studied is the occurrence of at least 20 mm of rain during a 6-h period in the Friuli Venezia Giulia region (notheast Italy, hereafter FVG) using as a simple predictor the mid- to-low-level relative humidity (MRH) derived from the lowest 500 hPa of the Udine radiosounding, released at the beginning of the 6-h period. These data will be used in section 3.

2. The maximum Peirce skill score

The Peirce skill score has been rediscovered many times since its first formulation given by Peirce (1884). For that reason it has been renamed over the years, examples include the Kuipers skill score or the true skill score. Following the suggestion of Mason (2003), it will be abbreviated as PSS, even if in previous works of the same author it was called KSS. As defined in Table 1, its value is computed by
i1520-0434-22-5-1148-e2
where POD is the probability of detection, POFD is the probability of false detection, and the dependence on the threshold x used to build the contingency table from the continuous X variable has been emphasized.
The relative operating characteristic (ROC) curve (Swets 1973) is the line connecting all the points [POD(x), POFD(x)] obtained for x varying on the entire X domain. In Manzato (2005b, hereafter ORP), it is stated that PSS(x) is the vertical line connecting a point on the ROC curve to the bisector line POD = POFD (called the no-skill line).1 In addition, for the properties of a bisector line, it is obvious that PSS is also the horizontal line connecting the point on the ROC curve to the bisector line. Then, using only PSS, it is possible to compute the value of the distance (H, perpendicular to the bisector) of a ROC point from the bisector, which is simply given by
i1520-0434-22-5-1148-e3

In ORP it has also been shown that the threshold that maximizes PSS, let us say xP, corresponds to a ROC point with a 45° tangent, because (dPOD/dPOFD)|x=xP = 1. Here, it is added that maximizing the PSS(x) maximizes the distance H between the ROC point and the diagonal bisector, which is considered the zero- skill level of a binary classifier. So, the ROC point [POD(xP), POFD(xP)] is the point farthest from the no-skill line and, in that sense, it can be stated that it is the point with the maximum skill of a classifier.2

Richardson (2000) has already shown how the classifier forecast relative value—relative to a climatological forecast—is equal to its PSS and then is maximized at xP. Moreover, Woodcock (1976) has also shown that PSS does not vary for unequal trials (as explained in the next section) and is the slope of the least squares linear regression line between the—binary—observed and forecast occurrences.

Finally, in ORP it was shown that PSS is also equitable, in the sense introduced by Gandin and Murphy (1992), for an asymmetrical scoring matrix. So, it was suggested that all of these properties make the maximum Peirce skill score, PSS(xP), a good scalar measure of the whole classifier quality. In this section it is shown how other properties relate the threshold that maximizes PSS to the prior probabilities.

a. The xP and the likelihood ratio

First, it is noted how the POD and POFD can be written in an integral form, similar to 1:
i1520-0434-22-5-1148-e4
and
i1520-0434-22-5-1148-e5
Then, it is possible to rewrite PSS(x) as follows:
i1520-0434-22-5-1148-e6
PSS(x) depends on the shape of the two conditional distributions only above the thresholds x.
The likelihood ratio, Λ(x), is usually defined as follows (e.g., Jolliffe and Stephenson 2003):
i1520-0434-22-5-1148-e7
It is well known that Λ(x) is the ROC slope at the point corresponding to the threshold x (e.g., Green and Swets 1966, Choi 1998):
i1520-0434-22-5-1148-e8
So, what was stated in ORP about the slope at xP means that the threshold that maximizes PSS has a likelihood ratio equal to 1. In fact, if one is interested in maximizing PSS, then one has to derive PSS(x) in (6) and set the derivative to zero:
i1520-0434-22-5-1148-e9

If the two conditional probabilities overlap in part of their domain and are unimodal, then there is only one threshold that maximizes PSS: the point where they intersect. At this threshold the ratio of the components fN(xP) and fY(xP) is equal to the prior probability ratio, α = PNO/PYES, so xP corresponds to the intersection of the conditional components only when the event has a prior probability of PYES = 0.5, that is α = 1.3

Finally, note that even if the threshold that maximizes PSS varies as the event climatology changes, as stated before, the maximum PSS value itself theoretically does not. Woodcock (1976) has shown that PSS does not change if one uses a database with NYESNNO or a random subsection of it that is “equalized” with respect to the event and nonevent frequency, so that NsubYES = NsubNO. In that case, the underlying likelihood distributions, that is, the shapes of p(x | YES) and p(x | NO), are the same for the original and the equalized datasets.

Also in ORP it was shown, for Gaussian likelihood, how the maximum PSS does not change with varying α if the likelihood mean and standard deviation are the same. In any case, it seems more likely to obtain higher maximum PSS for rare events than for near-equiprobable event problems,4 because the likelihood shapes are more likely to change. Thus, the event climatology must be shown when reporting the classifier skill, because the maximum PSS can vary for events with different frequencies.

b. The xP and the event posterior probability

It is also possible to use the property in (9) to determine the value of the event posterior probability associated with the predictor X in xP. First, Bayes’s theorem (Bayes 1763) states that
i1520-0434-22-5-1148-e10
If we put (9) in this equation, we obtain the event posterior probability value associated with the threshold xP, given by
i1520-0434-22-5-1148-e11
So, even the event posterior probability in xP is known a priori, because it is equal to the event prior probability. In the rare-event case (α ≫ 1), the event posterior probability will tend to zero (α−1) for the threshold that maximizes PSS.
The result in (11) is consistent with that originally found by Mason (1979) for probabilistic forecasts. In that case, he found that the threshold on the event probability that maximizes PSS for the N + 1 forecast is given by
i1520-0434-22-5-1148-e12
which approximates PYES for large N.5

Equation (11) can be particularly useful when X is the output of a complex model that correctly estimates the event posterior probability. For example, if an artificial neural network is used to predict the event posterior probability from the value of many different predictors (inputs), then one can simply choose as an output threshold the event prior, PYES, to dichotomize the forecast in yes–no classes maximizing the PSS of the model.

In Manzato (2005a) a method for transforming a predictor X into its event posterior probability was shown. It is interesting to note that (10) shows how the event posterior probability is a monotonic transformation of the likelihood ratio Λ (Kupinski et al. 2001). It is straightforward to show that the ROC curve is invariant for any monotonic transformation of the thresholded variable, because it is just a relabeling of the threshold (e.g., Green and Swets 1966). This means that converting the original X variable into its posterior probability or into its likelihood ratio will produce the same ROC curve for the new variable.

The fact that the same ROC is obtained from either the posterior probability or from the likelihood ratio transformation is important because the ROC curve obtained using the likelihood ratio Λ(x) as a mapping function is the optimal ROC, that is, a curve that will always lie on or above the ROC curve made with the original X values, or with any other transformation of X (Green and Swets 1966; Zhang and Mueller 2005). This is a consequence of the Neyman–Pearson criterion (Neyman and Pearson 1928). Then, it can be said that transforming the predictor X into its p(YES | x), as was done in Manzato (2005a), is an optimal preprocessing choice for classification problems.

3. A practical example

Let us show how the previous properties can be applied in a concrete example. Figure 1a shows the two estimated conditional components (normalized by N) of the sounding-derived mean relative humidity in the lowest 500 hPa. These two histograms are built by splitting the normalized histogram of all the MRH values for the case when there was an occurrence of rainfall greater than 20 mm (in the 6 h after the sounding release) in the FVG plain, [ fY(MRH)], and for the case when there was not occurrence, [ fN(MRH)]. The conditional density probabilities, or likelihoods, could be obtained by dividing these components by the prior probabilities estimated from the data sample with (1).

Varying the threshold in all of the MRH domain, one can compute many contingency tables and their derived PSSs. The vertical dashed line in Fig. 1a indicates the threshold (MRHP ≅ 71%) that gives the maximum PSS in this empirical way and sets the four coefficients of the contingency table shown in Table 1. The corresponding ratio of the two conditional components is 23, while the sample estimate of α is 21, which is quite close. The ratio of the two conditional probabilities (figure not shown) is 1.1, which is very close to the theoretical value (Λ = 1). These small errors are because continuous density functions have been approximated with discrete histograms.

Figure 1b shows the sample estimate posterior probability fit derived for the same dataset. The small circles are obtained by splitting the MRH domain into 21 equal bins and then, for each single bin, dividing the number of event occurrences by the number of total cases populating the bin. The continuous black line is a two-parameter exponential fit of these points, weighted with the population of each bin. Other details on how to interpret this kind of figure and the fitting line can be found in Manzato (2005a).

The gray dashed horizontal line shows the event prior probability, PYES = 1/(1 + α) ≅ 0.045, while the gray dashed vertical line shows the threshold MRHP, which empirically maximizes PSS. These two lines intersect very close to the fitted posterior probability line. So, instead of empirically computing the threshold that maximizes PSS, one can transform the original values into their posterior probability (which means issuing a calibrated event probability forecast) and then use the event prior probability as the threshold that maximizes PSS. Because of the optimality of the posterior probability-derived ROC, this method has the advantage that the PSS will be greater than the original one when the posterior probability fit is not monotonic.

Figure 2 shows the ROC curve and the point corresponding to PSS(MRHP). The value of the Peirce skill score is given by the vertical and the horizontal gray segments, while the third steep segment shows the maximum distance H from the no-skill bisector. In this particular case, one obtains the same ROC curve if the sample estimate event posterior probability of MRH is used as the thresholded variable, because p(YES | MRH) is monotonic.

For an example of a nonmonotonic transformation, if we consider the vertical component of the water vapor flux in the lowest 3 km (VFlux), then we obtain a fY (VFlux) function with a minimum around the mean VFlux value and two maxima around the range extremes. This leads to a u-shaped posterior probability fit. The threshold that maximizes PSS on VFlux leads to a PSS of 0.56, while using the posterior probability transformed data leads to a higher ROC, with a maximum PSS of 0.57.

4. Conclusions

The threshold that maximizes the Peirce skill score identifies the point on the ROC curve that has the maximum distance H from the no-skill line. In this sense, it is the ROC point that maximizes the skill of the classifier. As shown by Richardson (2000), this threshold maximizes the forecast relative value. It has been shown how the likelihood ratio and the event posterior probability are known a priori for that particular threshold. In particular, at the threshold that maximizes PSS, the two likelihoods have the same value (Λ = 1) and the event posterior probability is equal to the event prior probability.

These results support the conclusion made in ORP that it seems reasonable to use the maximum PSS as a scalar measure of the absolute classifier skill, together with an estimate of the event climatology (like PYES or α). This can be particularly useful when comparing different classifiers, especially when applied to different datasets. Of course, this does not mean that the threshold that maximizes PSS is always the best one for all end-user purposes, which can be differently sensitive to false alarms or missing events. This is because of the complex relationship between forecast skill and forecast value (e.g., Roebber and Bosart 1996; Semazzi and Mera 2006).

To clarify this point, let us review a simple example of customer economic cost. Suppose that the user will suffer a loss L when there is an event occurrence, but that the user can completely avoid any loss by taking a protective action, which costs C (with a “loss–cost” ratio λ = L/C > 1). If this user will make the decision to take the protective action based on the binary classifier obtained from the threshold x, and its associated contingency table, then the total expense, E(x), will be given by (Richardson 2003)
i1520-0434-22-5-1148-e13
To minimize this cost function, we can write the contingency coefficients in integral form, similarly to (4) and (5), and then set the derivative of E(x) to zero, as was done previously for PSS. It is straightforward to show how this leads to the following properties for the threshold, xE, which minimizes this cost problem:
i1520-0434-22-5-1148-e14
So, the threshold that minimizes the user expense is found on the posterior probability curve at p(YES | xE) = C/L. Hence, xE will be the same value that maximizes PSS only if the loss–cost ratio λ is equal to P−1YES = α + 1 (maximum value). Similar properties can be found in the same way for a generic cost function, F(x) = s11a(x) + s12b(x) + s21c(x) + s22d(x), minimized at xm such that
i1520-0434-22-5-1148-e15
where the last equation is the same result found by Mason (1979), while the previous one was already shown in Eq. (1.20) of Green and Swets (1966).

Acknowledgments

The author thanks his friend Luciano Sbaiz (EPFL, Lausanne, Switzerland) and Prof. Matthew Kupinski (Optical Sciences, The University of Arizona, Tucson, Arizona) for their support via e-mail. Three anonymous reviewers provided very useful suggestions to improve an earlier version of this note. This work was done using only open-source software and in particular the R statistical software package, the Python script language, the Emacs editor, and the Latex editing software, under the Linux Ubuntu distribution.

REFERENCES

  • Bayes, T., 1763: An essay towards solving a problem in the doctrine of chances. Philos. Trans. Roy. Soc. London, 53 , 370418.

  • Choi, B. C. K., 1998: Slopes of a receiver characteristic curve and likelihood ratios for diagnostic test. Amer. J. Epidemiol., 148 , 11271132.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gandin, L. S., and Murphy A. H. , 1992: Equitable skill scores for categorical forecasts. Mon. Wea. Rev., 120 , 361370.

  • Green, D. M., and Swets J. A. , 1966: Signal Detection Theory and Psychophysics. J. Wiley and Sons, 455 pp. (Reprinted by R. E. Krieger Publishing, 1974.).

    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and Stephenson D. B. , 2003: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. J. Wiley and Sons, 240 pp.

    • Search Google Scholar
    • Export Citation
  • Katz, R. W., and Ehrendorfer M. , 2006: Bayesian approach to decision making using ensemble weather forecast. Wea. Forecasting, 21 , 220231.

  • Kupinski, M., Edwards D. C. , Giger M. L. , and Metz C. , 2001: Ideal observer approximation using Bayesian classification neural network. IEEE Trans. Med. Imaging, 20 , 886899.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Manzato, A., 2005a: The use of sounding-derived indices for a neural network short-term thunderstorm forecast. Wea. Forecasting, 20 , 896917.

  • Manzato, A., 2005b: An odds ratio parameterization for ROC diagram and skill score indices. Wea. Forecasting, 20 , 918930.

  • Manzato, A., 2007: Sounding-derived indices for neural network based short-term thunderstorm and rainfall forecasts. Atmos. Res., 83 , 349365.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mason, I. B., 1979: On reducing probability forecasts to yes/no forecasts. Mon. Wea. Rev., 107 , 207211.

  • Mason, I. B., 2003: Binary events. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., J. Wiley and Sons, 37–76.

    • Search Google Scholar
    • Export Citation
  • Neyman, J., and Pearson E. S. , 1928: On the problem of the most efficient test of statistical hypotheses. Philos. Trans. Roy. Soc. London, 231 , 289337.

    • Search Google Scholar
    • Export Citation
  • Peirce, C. S., 1884: The numerical measure of the success of predictions. Science, 4 , 453454.

  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 126 , 649667.

  • Richardson, D. S., 2003: Economic value and skill. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., J. Wiley and Sons, 165–188.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., and Bosart L. F. , 1996: The complex relationship between forecast skill and forecast value: A real-world analysis. Wea. Forecasting, 11 , 544558.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Semazzi, F. H. M., and Mera R. J. , 2006: An extended procedure for implementing the relative operating characteristic graphical method. J. Appl. Meteor. Climatol., 45 , 12151223.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Swets, J. A., 1973: The relative operating characteristic in psychology. Science, 182 , 9901000.

  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2d ed. Academic Press, 648 pp.

  • Woodcock, F., 1976: The evaluation of yes/no forecasts for scientific and administrative purposes. Mon. Wea. Rev., 104 , 12091214.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Mueller S. T. , 2005: A note on ROC analysis and non-parametric estimate of sensitivity. Psychometrika, 70 , 145154.

Fig. 1.
Fig. 1.

(a) The two conditional component histograms of the mean relative humidity, fN(MRH) and fY(MRH), together with the threshold that maximizes the PSS (about 71%). The four areas produced by the threshold line correspond to the contingency table coefficients, normalized by N. (b) The sample estimate event posterior probability derived from the previous histograms and its theoretical value PYES ≅ 0.045 for the threshold that maximizes PSS. The vertical bars indicate how populated each bin interval is (see Manzato 2005a for more details). The continuous and dashed tick marks identify the mean values of the nonoccurrence and occurrence samples, respectively, while the numbers in between are their differences divided by the (95%–5%) quantiles interval.

Citation: Weather and Forecasting 22, 5; 10.1175/WAF1041.1

Fig. 2.
Fig. 2.

The ROC curve obtained by this binary classifier and the segments that show the maximum PSS and the maximum distance H from the no-skill bisector line.

Citation: Weather and Forecasting 22, 5; 10.1175/WAF1041.1

Table 1.

The contingency table and the derived scores obtained for the rainfall > 20 mm classifier built using the mean relative humidity in the lowest 500 hPa of the Udine sounding and the threshold that optimizes the PSS. A total of 18 555 soundings (from 1992 to 2005, 4 times per day) without missing MRH have been used. The rain occurrence was measured in the 6 h after the sounding release.

Table 1.

1

It is interesting to note that Eq. (14) of Semazzi and Mera (2006) extends the PSS definition to a new skill score computed as the vertical distance between the ROC point and a generic “baseline,” which can be different from the bisector line because it takes into account the user-defined loss–cost ratio.

2

In general, this optimal point is not necessarily the nearest to the top-left angle of the ROC diagram. It surely is for symmetric ROCs, like those obtained for Gaussian likelihoods with the same standard deviation.

3

As shown in ORP, if α = 1, then PSS = HSS. In other cases, it is not possible to find similar properties for HSS.

4

An example of this behavior can be found in Fig. 11 of Manzato (2007), where the maximum PSS of a regression neural network increases almost linearly as the event prior probability decreases.

5

Katz and Ehrendorfer (2006) have shown how (12) is the Bayesian estimator of the event probability in case of a uniform prior distribution, while the “face value,” PYES, is obtained for a prior that is a limiting case of a beta distribution.

Save
  • Bayes, T., 1763: An essay towards solving a problem in the doctrine of chances. Philos. Trans. Roy. Soc. London, 53 , 370418.

  • Choi, B. C. K., 1998: Slopes of a receiver characteristic curve and likelihood ratios for diagnostic test. Amer. J. Epidemiol., 148 , 11271132.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gandin, L. S., and Murphy A. H. , 1992: Equitable skill scores for categorical forecasts. Mon. Wea. Rev., 120 , 361370.

  • Green, D. M., and Swets J. A. , 1966: Signal Detection Theory and Psychophysics. J. Wiley and Sons, 455 pp. (Reprinted by R. E. Krieger Publishing, 1974.).

    • Search Google Scholar
    • Export Citation
  • Jolliffe, I. T., and Stephenson D. B. , 2003: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. J. Wiley and Sons, 240 pp.

    • Search Google Scholar
    • Export Citation
  • Katz, R. W., and Ehrendorfer M. , 2006: Bayesian approach to decision making using ensemble weather forecast. Wea. Forecasting, 21 , 220231.

  • Kupinski, M., Edwards D. C. , Giger M. L. , and Metz C. , 2001: Ideal observer approximation using Bayesian classification neural network. IEEE Trans. Med. Imaging, 20 , 886899.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Manzato, A., 2005a: The use of sounding-derived indices for a neural network short-term thunderstorm forecast. Wea. Forecasting, 20 , 896917.

  • Manzato, A., 2005b: An odds ratio parameterization for ROC diagram and skill score indices. Wea. Forecasting, 20 , 918930.

  • Manzato, A., 2007: Sounding-derived indices for neural network based short-term thunderstorm and rainfall forecasts. Atmos. Res., 83 , 349365.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mason, I. B., 1979: On reducing probability forecasts to yes/no forecasts. Mon. Wea. Rev., 107 , 207211.

  • Mason, I. B., 2003: Binary events. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., J. Wiley and Sons, 37–76.

    • Search Google Scholar
    • Export Citation
  • Neyman, J., and Pearson E. S. , 1928: On the problem of the most efficient test of statistical hypotheses. Philos. Trans. Roy. Soc. London, 231 , 289337.

    • Search Google Scholar
    • Export Citation
  • Peirce, C. S., 1884: The numerical measure of the success of predictions. Science, 4 , 453454.

  • Richardson, D. S., 2000: Skill and relative economic value of the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 126 , 649667.

  • Richardson, D. S., 2003: Economic value and skill. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., J. Wiley and Sons, 165–188.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., and Bosart L. F. , 1996: The complex relationship between forecast skill and forecast value: A real-world analysis. Wea. Forecasting, 11 , 544558.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Semazzi, F. H. M., and Mera R. J. , 2006: An extended procedure for implementing the relative operating characteristic graphical method. J. Appl. Meteor. Climatol., 45 , 12151223.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Swets, J. A., 1973: The relative operating characteristic in psychology. Science, 182 , 9901000.

  • Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2d ed. Academic Press, 648 pp.

  • Woodcock, F., 1976: The evaluation of yes/no forecasts for scientific and administrative purposes. Mon. Wea. Rev., 104 , 12091214.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Mueller S. T. , 2005: A note on ROC analysis and non-parametric estimate of sensitivity. Psychometrika, 70 , 145154.

  • Fig. 1.

    (a) The two conditional component histograms of the mean relative humidity, fN(MRH) and fY(MRH), together with the threshold that maximizes the PSS (about 71%). The four areas produced by the threshold line correspond to the contingency table coefficients, normalized by N. (b) The sample estimate event posterior probability derived from the previous histograms and its theoretical value PYES ≅ 0.045 for the threshold that maximizes PSS. The vertical bars indicate how populated each bin interval is (see Manzato 2005a for more details). The continuous and dashed tick marks identify the mean values of the nonoccurrence and occurrence samples, respectively, while the numbers in between are their differences divided by the (95%–5%) quantiles interval.

  • Fig. 2.

    The ROC curve obtained by this binary classifier and the segments that show the maximum PSS and the maximum distance H from the no-skill bisector line.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 2670 1006 36
PDF Downloads 1577 349 10