The ROC Curve and the Area under It as Performance Measures

Caren Marzban Applied Physics Laboratory and Department of Statistics, University of Washington, Seattle, Washington, and Center for Analysis and Prediction of Storms, University of Oklahoma, Norman, Oklahoma

Search for other papers by Caren Marzban in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The receiver operating characteristic (ROC) curve is a two-dimensional measure of classification performance. The area under the ROC curve (AUC) is a scalar measure gauging one facet of performance. In this short article, five idealized models are utilized to relate the shape of the ROC curve, and the area under it, to features of the underlying distribution of forecasts. This allows for an interpretation of the former in terms of the latter. The analysis is pedagogical in that many of the findings are already known in more general (and more realistic) settings; however, the simplicity of the models considered here allows for a clear exposition of the relation. For example, although in general there are many reasons for an asymmetric ROC curve, the models considered here clearly illustrate that an asymmetry in the ROC curve can be attributed to unequal widths of the distributions. Furthermore, it is shown that AUC discriminates well between “good” and “bad” models, but not between good models.

Corresponding author address: Dr. Caren Marzban, Dept. of Statistics, University of Washington, Box 354323, Seattle, WA 98195-4323. Email: marzban@caps.ou.edu

Abstract

The receiver operating characteristic (ROC) curve is a two-dimensional measure of classification performance. The area under the ROC curve (AUC) is a scalar measure gauging one facet of performance. In this short article, five idealized models are utilized to relate the shape of the ROC curve, and the area under it, to features of the underlying distribution of forecasts. This allows for an interpretation of the former in terms of the latter. The analysis is pedagogical in that many of the findings are already known in more general (and more realistic) settings; however, the simplicity of the models considered here allows for a clear exposition of the relation. For example, although in general there are many reasons for an asymmetric ROC curve, the models considered here clearly illustrate that an asymmetry in the ROC curve can be attributed to unequal widths of the distributions. Furthermore, it is shown that AUC discriminates well between “good” and “bad” models, but not between good models.

Corresponding author address: Dr. Caren Marzban, Dept. of Statistics, University of Washington, Box 354323, Seattle, WA 98195-4323. Email: marzban@caps.ou.edu

1. Introduction

Consider the problem of assessing the quality of forecasts produced for binary observations (here labeled 0 and 1). The forecast quantity may be a continuous quantity ranging from −∞ to +∞, or it may be a probability, ranging from 0 to 1. It was shown by Murphy and Winkler (1987, 1992) that this problem is best cast into a framework based on the joint probability distribution of the forecasts and observations. Figure 1 depicts the general situation, where L0 and L1 are the likelihoods for the two classes. In other words, Li(x) is the probability of the forecast x, given that the observation is from the ith class.1 This figure illustrates an example of what Murphy and Winkler call a discrimination diagram. There, it was shown that the quality of forecasts can be assessed with complete generality in terms of several such diagrams; other diagrams gauge different facets of that quality, for example, refinement, resolution, reliability, etc.

Meteorologists (Harvey et al. 1992; Mason 1982; Mason and Graham 1999; Stephenson 2000; Wilks 2001; Atger 2004) have also become interested in a procedure heavily utilized in medical circles (Dorfman and Alf 1969; Dorfman et al. 1997; Metz et al. 1998; Shapiro 1999; Zhou et al. 2002; Zou 2003; Coffin and Sukhatme 1997). The procedure is based on the receiver operating characteristic (ROC) curve, sometimes referred to as relative operating characteristic. In its simplest form it is a parametric plot of the hit rate (or probability of detection) versus the false alarm rate, as a decision threshold is varied across the full range of a continuous forecast quantity. The diagonal line corresponds to random forecasts, and the amount of concavity is taken to be a measure of performance. The area under the ROC curve (AUC) is often taken as a scalar measure (Hanley and McNeil 1982). An AUC of 0.5 reflects random forecasts, while AUC = 1 implies perfect forecasts. It has also been shown by Mylne (1999) and Richardson (2000, 2001) that AUC is closely related to the economic value of a forecast system.

The hit rate and the false alarm rate can be computed from the following likelihoods:
i1520-0434-19-6-1106-e1
where t is the decision threshold. The upper limit of the integral corresponds to the maximum allowed value of x. For probabilistic forecasts, that limit is 1.

The ROC framework is somewhat different from the Murphy–Winkler framework. For example, for probabilistic forecasts the Murphy–Winkler framework does not require, and indeed discourages, the reduction of the forecasts into categorical classes. The ROC analysis, by contrast, is based on the contingency table and, therefore, requires the introduction of a decision threshold for the purpose of reducing the continuous forecasts into binary forecasts. Of course, the introduction of a threshold does not imply that ROC analysis is in any way inferior to the Murphy–Winkler framework; it is simply another method of assessing performance, with an emphasis on different facets of performance. The Murphy– Winkler framework is more suitable for comparing different sets of forecasts (e.g., from two forecasters), while the explicit presence of a decision threshold in ROC analysis lends itself to the situation where a decision must be made, or action must be taken, in response to forecasts.

In this paper, a number of questions are addressed regarding the shape of ROC curves. A few examples are provided to motivate the questions, and five toy models are utilized to answer the questions. The toy models, although somewhat unrealistic, are designed to be progressively better approximations to the general problem depicted in Fig. 1. The primary aim of this study is to introduce an awareness of the connections between the Murphy–Winkler framework and ROC analysis. As such, the results reported here are specific to the toy models considered and are unlikely to be generally true. Although one model—based on Gaussians—is likely to be generally valid, all of the considered examples are sufficiently flexible to allow for a number of ROC behaviors observed in realistic situations. The simplicity of the models offers a transparent environment wherein observed ROC behaviors can be explained in terms of more basic quantities, namely the parameters of the class-conditional distribution of forecasts (i.e., the likelihoods).

Figure 2a displays 16 ROC curves representing different levels of performance. These curves gauge the performance of a Markov chain model for forecasting tornadic activity in four different regions of the United States, during four seasons (Drton et al. 2003). The behavior of these curves is canonical in that they do what they are expected to. They all begin from the point (0, 0) and end at (1, 1). But note the high degree of symmetry about the diagonal(s). Figure 2b displays another set of 16 ROC curves; this time from a statistical model for predicting hail size (Marzban and Witt 2001). Although, these curves are not pathological in any sense, they do display a few features that are common to many ROC curves. The lowest performing models have symmetric ROC curves, but the midrange models begin to loose that symmetry. A natural question to ask is if this asymmetry can be explained in terms of the underlying distributions?

Another feature that often emerges is the extensive overlap of the ROC curve with one (or two) of the axes of the diagram. In Fig. 2b, this can be seen in the most concave curves (i.e., corresponding to the best-performing models). These yield ROC curves that overlap the top axis for all false alarm rates higher than 0.4. What is the explanation for this type of overlap? And what about an overlap with the y axis?

Another type of asymmetry (not shown here) arises when the ROC curve crosses the diagonal at some (usually one) point. What causes this type of crossover?

Many users of ROC curves observe that in dealing with a wide range of forecasts in different situations, most forecasts appear to lead to highly concave ROC curves, or equivalently high AUC values. AUC values of, say, 0.9995 are not uncommon. Figure 2c displays eight sets of ROC curves with extreme concavity. These are related to a neural network developed for the prediction of ceiling and visibility (Marzban et al. 2003). The forecasts underlying the curves have different forecast characteristics (in terms of the various attributes of probabilistic forecasts computed within the Murphy– Winkler framework), yet they all lead to very concave ROC curves. The AUC values for these curves vary from 0.990 to 0.996. Why are these AUC values exceedingly near 1? Is it because the forecasts are of extraordinary quality? Or is it an artifact of the AUC itself? If the former is true, then a histogram of all AUC values would be right-peaked (or show a heavy tail to the left). This is difficult to test for, because the necessary data would be difficult to compile. On the other hand, if the culprit is the measure itself, then testing that hypothesis would be unnecessary, for an explanation would then be at hand. And what sort of artifact would lead to near-one AUC values?

As mentioned above, although the two approaches have different emphases, they are related. After all, the quantities from which an ROC curve is derived—hit rate and false alarm rate—are areas under the conditional distributions, above some decision threshold. Moreover, although the computation of ROC curves does not require knowledge of these distributions, an assessment of the statistical significance of ROC curves does (Dorfman and Alf 1969; Hanley and McNiel 1982; Stephenson 2000; Dorfman et al. 1997). For example, in order to compute standard errors for ROC or AUC (in a parametric approach) one makes some assumptions regarding these underlying distributions. It is natural, then, to utilize the connection between the ROC curve and the underlying distributions to answer the above questions. The answers, then, offer a means of interpreting ROC curves at a more fundamental level.

In summary, here, several toy models are utilized to relate some characteristic features of ROC curves with features of the underlying distributions. As such, the shape of the ROC curve can be interpreted or “explained.” Knowledge of the underlying distributions can guide the development of better forecasts. AUC is also examined within the toy models. It is important to emphasize that the distributions examined here are toy models and mostly of pedagogical value. The five distributions considered are shown in Figs. 3a–7a. They are referred to as 1) uniform, 2) triangular with unconstrained support, 3) Gaussian, 4) triangular with constrained support, and 5) beta distributions. The first three are appropriate for cases where the forecast quantity varies over the real line from -∞ to +∞, while the last two apply to probabilistic forecasts.

2. Uniform distribution

A generic situation involving forecasts with uniform distributions is shown in Fig. 3a. There are four parameters involved: two means, c0 and c1, and two half-widths, w0 and w1.2 Without loss of generality, it is assumed that c1c0. It is then straightforward to show that the false alarm rate and the hit rate are given by
i1520-0434-19-6-1106-e2
where t is the threshold above (below) which a case is classified into class 1 (0).3
The equation for the ROC curve follows immediately from (2):
i1520-0434-19-6-1106-e3
where δc = c1c0 and δw = w1w0. Figure 3b displays the situation. It can be seen that the ROC curve consists of three line segments, with the equation for the middle segment given by (3).

Several observations can be made. First, (3) implies that two models with different means and widths can yield the same ROC curve if they have the same slope and intercept (see Fig. 3b). As such, the ROC curve does not uniquely specify the underlying parameters. In other words, there is a family of underlying distributions that give rise to the same ROC curve. This is a known fact even for more general distributions (Zhou et al. 2002).

Second, the length of the vertical segment overlapping the y axis is determined by two quantities, δc and w0/w1. This is sensible since the “goodness” of the underlying model is determined by both quantities. By contrast, the slope of the middle segment depends only on the ratio of the half-widths (and not δc). As such, the inequality of w0 and w1 reflects itself as an asymmetric ROC curve.

Given the analytic expression for the ROC curve (3), it is then possible to compute the area under the curve4:
i1520-0434-19-6-1106-e4
where
δcw0w1
Since δcw0 + w1 for the arrangement displayed in Fig. 3a, it can be seen that increasing δc leads to better performance. Furthermore, decreasing w0 or w1 can also yield better performance. In short, model selection based on AUC selects for sharp (i.e., narrow width) and well-separated class-conditional distributions. Note that in terms of the underlying distributions, each of the quantities δc, w0, and w1 can be interpreted as a performance measure.

As a function of the measure δc, AUC is a parabola. Figure 3c shows an instance for w0 = w1 = 0.4 and w0 = 0.4, w1 = 0.6. The AUC curve rises rapidly and then flattens. It is this nonlinear behavior that explains the appearance of near-one AUC values in practice. For example, in Fig. 3c, as a model improves in terms of δc, its AUC value increases quickly to 0.99 at around δc ∼ 0.8. And the infinity of better models with δc ≥ 0.8 will result in only comparable AUC values, still around 0.99. In other words, the frequent appearance of high AUC values in practice suggests that the corresponding models are all in the “good” range of the AUC curve. One can say that AUC discriminates well between “good” and “bad” models, but not between good models, where those adjectives are gauged in terms of the underlying distributions.5 Similar arguments apply to the performance measures w0 and w1; AUC flattens off for sharper distributions.

3. Triangular distribution with unconstrained support

A better, but still crude, approximation is shown in Fig. 4a. For this case one has
i1520-0434-19-6-1106-e6
The ROC curve is given by
i1520-0434-19-6-1106-e7
and is shown in Fig. 4b. Evidently, this ROC curve is more realistic than that of the previous section. A common feature, however, is the overlap with the axes.

From the endpoints of the middle segment (Fig. 4b), it follows that the ROC curve is asymmetric if and only if w0w1. Specifically, if the concavity is mostly to the left, then w0 < w1. Bowing to the right suggests w0 > w1. Note that the asymmetry is independent of ci.

Also, the two extremes of the curves—F = 0 and H = 1—convey some useful information as well. Note that if δw = δc, then the right extreme of the curve meets the (1,1) point without overlapping the H = 1 line. Similarly, δw = −δc implies that the left extreme of the curve meets the (0,0) point without overlapping the F = 0 axis. Therefore, the amount of overlap of the curve and the two axes is a measure of the distance between the two means relative to the difference between the half-widths. AUC can be computed to be
i1520-0434-19-6-1106-e8
Like the expression for AUC in the previous case [Eq. (4)], this expression also displays an affinity for the quantity Δ. Furthermore, noting the quartic power of Δ, in comparison with the quadratic power in (4), it is clear that this AUC is more nonlinear in that it rises faster and has a broader plateau. Figure 4c displays this quartic dependence. This further flattening of the AUC curve exacerbates AUC's inability to discriminate between good models.

4. Gaussian distribution

Among the three distributions dealing with unbounded forecast quantities, the Gaussian offers the most realistic approximation. However, the expressions for ROC and AUC are not as transparent because of the appearance of certain integrals. The likelihood for the forecasts in the ith class is written as (see Fig. 5a)
i1520-0434-19-6-1106-e9
Then, (1) implies
i1520-0434-19-6-1106-e10
where Φ(x) is the standard normal cumulative distribution:
i1520-0434-19-6-1106-e11
Eliminating the threshold t from these equations leads to a formal expression for the ROC curve:
i1520-0434-19-6-1106-e12
where Φ−1 is defined by Φ−1Φ = 1. This expression is not too illuminating, but it does allow one to compute some useful quantities. For example, it implies that if plotted on a double-probability paper, the ROC curve will be a straight line with slope w0/w1 and intercept δc/w1. Note the similarity to (3) for the case of uniform distributions. It also allows one to compute the slope of the ROC curve to be L1(t)/L0(t).6 Substituting (9) into this expression yields a formula (not shown) that implies that the slope of the ROC curve at its ends is either 0 or ∞. In other words, the ROC curve is always tangent to the axes.

A common error is to assume that a theoretical ROC curve based on Gaussian distributions is constrained to obey the canonical ROC behavior, that is, concave either above or below the diagonal. Although this is true for the symmetric case where w0 = w1, in general the ROC curve is not strictly concave. It is easy to show that if w0w1, then the ROC curve crosses the diagonal at precisely one point (other than the end points). Proof: The ROC curve will cross the diagonal where Φ[(c0t)/w0] = Φ[(c1t)/w1], that is, when c1/w1c0/w0 = (1/w1 − 1/w0) t. This equation has only one nontrivial solution when w0w1. The value of F at this crossing point is given by Φ(δc/δw). Figure 5b illustrates this crossover.

This result must be interpreted cautiously. Specifically, it does not imply that an apparently concave empirical ROC curve suggests w0 = w1. Even if w0w1, the ROC curve can still appear to be mostly concave (i.e., without a crossover). This is because Φ(x) is a rapidly increasing function of x. In fact, it is nearly 0 or 1, when x is nearly +2 or −2, respectively. Therefore, a concave empirical ROC curve suggests one of two possibilities: Either w0 = w1, or w0w1, but with |δc/ δw| ≥ ∼2.

The AUC can be computed to be
i1520-0434-19-6-1106-e13
Again the AUC is a nonlinear function of all the underlying parameters that assess performance: δc, w0, and w1. The functional dependence on the former is shown in Fig. 5c. Clearly, the nonlinearity of the curve is present even in this realistic example. Again, two good models, with one distinctly superior to the other (e.g., with different values of δc) can have comparable and high AUC values. Equation (13) also explains why empirical AUC values in practice are often in the 0.9 or higher range. The reason can be traced again to the behavior of Φ(x). As mentioned previously, modestly large values of x, for example 2, correspond to near-1 values for Φ.

5. Triangular distribution with constrained support

In some situations the forecast quantity is a probability, calling for distributions that are restricted to that range. The first of the two such distributions considered here is shown in Fig. 6a. This model does assume that the forecasts do span the full range of possibilities (i.e., 0 to 1). In the language of Murphy and Winkler (1987, 1992), the forecasts are assumed to be well refined. Also note that in this approximation, the only parameters are the two modes: c0 and c1.7

Three different regions must be considered: tc0, c0tc1, and tc1. Unlike the previous examples, here there exists no region that overlaps with the axes; this is a consequence of the aforementioned assumption about the refinement of the forecasts. The respective ROC curves are
i1520-0434-19-6-1106-e14
Note that the ROC curves for the first and third regions are linear, while that of the middle section is not. Figure 6b displays the ROC curve.

From an expression of the slope, it follows that a symmetric ROC curve imples c0 + c1 = 1. Any other combination of c0 and c1 will result in an asymmetric curve. It is also easy to show that there does not exist a crossover; a nontrivial curve is either always above or always below the diagonal. It also follows that the ROC curve will bow to the left if c0 ∼ 0.5, and to the right if c1 ∼ 0.5.

Finally, the AUC can be computed to be
i1520-0434-19-6-1106-e15
First, note that AUC depends on two independent quantities: (c1c0) and c1(1 − c0). For small values of the former, that is; poor performance, the first two terms in (10) dominate the expression, leading to a linear dependence on c1c0. However, for better performance values, the last term begins to penalize (because of the negative sign) AUC in a nonlinear fashion. This nonlinear penalty again leads to a flattening of the AUC curve for better models. Figure 6c displays the AUC as a function of the measure c1c0. The reason the flattening is not evident in this figure is that the simplicity of the model does not allow high values of AUC. In fact, according to (15) the highest allowed value of AUC is only 5/6 or 0.83.

6. Beta distribution

A more realistic likelihood for probabilistic forecasts is the beta distribution:
i1520-0434-19-6-1106-e16
where B(ai, bi) = 10 xai−1(1 − x)bi−1. An instance is shown in Fig. 7a. Note that, in this example, the distributions themselves are possibly asymmetric (or skewed). If ai, bi are integers, then one can write B(ai, bi) = [(ai − 1)!(bi − 1)!]/[(ai + bi − 1)!]. The mean, mode, and variance can be computed by
i1520-0434-19-6-1106-e17
In this case, given that the likelihoods are written in terms of ai and bi, it is natural to ask what combination of these quantities constitutes a measure of performance. From a decision theoretic point of view, the natural quantity is L1(x)/L0(x), and this ratio is a function of (a1a0) and (b1b0).8 Therefore, these two differences are natural measures of performance. Note that each of these measures depends on both c and w. For example,
i1520-0434-19-6-1106-e18
The corresponding ROC curve is shown in Fig. 7b. The analytic expressions for F and H are not illuminating, but the slope of the ROC curve is
i1520-0434-19-6-1106-e19
A symmetric ROC curve requires the product of the slopes at the end points of the curve to be inversely proportional. And for that to occur one must have (a1 + b1) = (a0 + b0). It follows that the ROC curve is symmetric if (a0 + b0) = (a1 + b1), which in terms of the means and variances translates to
i1520-0434-19-6-1106-e20
An apparent asymmetry in an empirical ROC curve, then, implies that this equation is violated. Note that in the symmetric ROC case, the two performance measures, a1a0 and b1b0, differ only in sign.
It also follows that a crossover occurs when a1 > a0 and b1 > b0, because the slopes at the two extremes are then both less than 1. These two inequalities together imply
i1520-0434-19-6-1106-e21
Compare this with (20), which is the condition for a symmetric ROC curve. The quantity c(1 − c)/w2 determines both the symmetry and the crossover of the ROC curve. The crossover is displayed in Fig. 7b.
The expression for AUC is somewhat tedious to derive, but for the case of integer parameters can be computed as
i1520-0434-19-6-1106-e22
Figure 7c displays a plot of the AUC as a function of the measure (a1a0) for a few different values of the parameters. The nonlinearity is now evident when the AUC reaches near-1 values.

7. Summary and conclusions

Several models are examined for the purpose of explicitly illustrating some features of ROC curves and the area under the curve (AUC). The findings aid in interpreting the shape of the ROC curve in terms of the parameters defining the class-conditional distributions of the forecast quantity. In addition to providing a pedagogical exposition of the ROC analysis, the work also offers some guidance for interpreting ROC curves and the AUC. The guidance is based on only the models examined here. As such, the generality of the results is not assured by any means. Nevertheless, all of the examples shown in Fig. 2 are found to be completely consistent with the findings here. The following statements should be interpreted only as qualitative guidance. More quantitative statements are found in the text.

For unbounded forecasts, an asymmetric ROC curve suggests unequal widths for the underlying distributions. If the class with the larger mean is labeled as 1, then a concavity to the top suggests w0 > w1, and concavity to the bottom suggests w0 < w1. In other words, in attempting to explain any asymmetry in an empirical ROC curve, it is advisable to examine the widths of the underlying distributions. The amount of overlap with the axes is also a measure of the difference in the widths. The crossing of the diagonal by a ROC curve suggests that the quantity |δc/δw| is smaller than some critical value. For example, if the distributions are Gaussian, then that critical value is approximately 2.

For bounded forecasts, the distributions examined here do not generate an overlap with the axes. The existence of a significant overlap in an empirical ROC plot suggests that the underlying distributions are different from the ones examined here in some significant way. The symmetry and crossover of the ROC are determined by a combination of means and variances, for example, (20).

For both bounded and unbounded forecasts, the AUC increases nonlinearly with respect to natural measures of forecast quality derived from parameters of the underlying distributions. Moreover, in the examples considered here, the more realistic models display more of this nonlinearity. The nonlinearity is such as to reduce the effectiveness of the AUC in assessing performance, as performance increases. As such, the frequent occurrence of near-1 AUC values observed empirically is an indication that many forecasts are of “reasonable” quality.

Acknowledgments

The author is grateful to Rich Caruana for invaluable discussions and a reading of an early version of this article.

REFERENCES

  • Atger, F., 2004: Estimation of the expected reliability of ensemble based probabilistic forecasts. Quart. J. Roy. Meteor. Soc, 130 , 627646.

  • Coffin, M., and Sukhatme S. , 1997: Receiver operating characteristic studies and measurement errors. Biometrics, 53 , 823837.

  • Dorfman, D. D., and Alf E. Jr., 1969: Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals. J. Math. Psychol, 6 , 487496.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dorfman, D. D., Berbaum K. S. , Metz C. E. , Lenth R. V. , Hanley J. A. , and Abu Dagga H. , 1997: Proper receiver operating characteristic analysis: The bigamma model. Acad. Radiol, 4 , 138149.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Drton, M., Marzban C. , Guttorp P. , and Schaefer J. T. , 2003: A Markov chain model of tornadic activity. Mon. Wea. Rev, 131 , 29412953.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hanley, J. A., and McNeil B. J. , 1982: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143 , 2936.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hanley, J. A., and McNeil B. J. , 1983: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148 , 839843.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Harvey L. O. Jr., , Hammond K. R. , Lusk C. M. , and Mross E. F. , 1992: Application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev, 120 , 863883.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marzban, C., and Witt A. , 2001: A Bayesian neural network for hail size prediction. Wea. Forecasting, 16 , 600610.

  • Marzban, C., Leyton S. , and Colman B. , cited 2003: Nonlinear post-processing of model output: Ceiling and visibility. NWS/COMET Rep. [Available online at http://www.nhn.ou.edu/marzban/comet1.pdf.].

    • Search Google Scholar
    • Export Citation
  • Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag, 30 , 291303.

  • Mason, S. J., and Graham N. E. , 1999: Conditional probabilities, relative operating characteristics, and relative operating levels. Wea. Forecasting, 14 , 713725.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Metz, C. E., Herman B. A. , and Shen J. H. , 1998: Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat. Med, 17 , 10331053.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and Winkler R. L. , 1987: A general framework for forecast verification. Mon. Wea. Rev, 115 , 13301338.

  • Murphy, A. H., and Winkler R. L. , 1992: Diagnostic verification of probability forecasts. Int. J. Forecasting, 7 , 435455.

  • Mylne, K. R., 1999: The use of forecast value calculations for optimal decision making using probability forecasts. Preprints, 17th Conf. on Weather Analysis and Forecasting, Denver, CO, Amer. Meteor. Soc., 235–239.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2000: Applications of cost loss models. Proc. Seventh Workshop on Meteorological Operational Systems, Reading, United Kingdom, ECMWF, 209–213.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc, 127 , 24732489.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shapiro, D. E., 1999: The interpretation of diagnostic tests. Stat. Methods Med. Res, 8 , 113134.

  • Stephenson, D. B., 2000: Use of the “odds ratio” for diagnosing forecast skill. Wea. Forecasting, 15 , 221232.

  • Wilks, D. S., 2001: A skill score based on economic value for probability forecasts. Meteor. Appl, 8 , 209219.

  • Zhou, X-H., McClish D. K. , and Obuchowski N. A. , 2002: Statistical Methods in Diagnostic Medicine. John Wiley and Sons, 464 pp.

  • Zou, K. H., cited 2003: Receiver operating characteristic (ROC) literature research. [Available online at http://splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html.].

    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

A generic situation involving a forecast of two classes

Citation: Weather and Forecasting 19, 6; 10.1175/825.1

Fig. 2.
Fig. 2.

Examples of ROC curves representing different levels of performance quality. The diagonal line corresponds to random forecasts (i.e., poor performance), while the curves away from the diagonal represent higher levels of performance. The following features are noted: (a) symmetric ROC curves, (b) symmetric and asymmetric curves, also overlapping one axis, and (c) extremely concave curves

Citation: Weather and Forecasting 19, 6; 10.1175/825.1

Fig. 3.
Fig. 3.

Schematics of (top) uniform class-conditional distributions, (middle) the corresponding ROC curve, and (bottom) the AUC curve as a function of δc = c1c0

Citation: Weather and Forecasting 19, 6; 10.1175/825.1

Fig. 4.
Fig. 4.

Same as in Fig. 3 but for triangular distributions over unbounded forecasts

Citation: Weather and Forecasting 19, 6; 10.1175/825.1

Fig. 5.
Fig. 5.

Same as in Fig. 3 but for Gaussian distributions

Citation: Weather and Forecasting 19, 6; 10.1175/825.1

Fig. 6.
Fig. 6.

Same as in Fig. 3 but for bounded (e.g., probabilistic) forecasts

Citation: Weather and Forecasting 19, 6; 10.1175/825.1

Fig. 7.
Fig. 7.

Same as in Fig. 3 but for beta distributions. The corresponding parameters are b0 = 2, b1 = 3, a0 = 2, with a1 taking values 2, 3, 4, and 5 (from top to bottom)

Citation: Weather and Forecasting 19, 6; 10.1175/825.1

1

For a given dataset, a normalized histogram of x is the best way of visualizing the likelihood.

2

Throughout this paper, the symbols c and w refer to measures of central tendency and half-width, respectively, of the respective distribution. For the case of the Gaussian, they coincide with the mean and the standard deviation of the distribution.

3

The expressions in (2) are specific to Fig. 3a; changing the relative position of c0 and c1, or the magnitudes of the widths, yields different expressions.

4

Again, this equation is specific to the arrangement considered in Fig. 3a.

5

This is not a problem in model selection, because the standard error of the AUC converges to 0, as AUC approaches 1 (Hanley and McNeil 1983).

6

In decision theoretic applications where one seeks an “optimal” decision threshold, this expression is often given to argue for the threshold at which slope = 1. However, that choice assumes that the two classes have equal prior probabilities, pi. Sometimes p1 and p0 are referred to as the base rate and its complement. The optimal threshold should be the one corresponding to slope = p0/p1.

7

First, note that in this section, c stands for the mode (not mean) of the distribution. Also, the widths of the distributions are not independent quantities. The mean is given as (1 + c)/3, and the variance as (1 − c + c2)/18.

8

Technically, this expression should be multiplied by the ratio of the respective prior probabilities as well. They are neglected here because they are not functions of x.

Save
  • Atger, F., 2004: Estimation of the expected reliability of ensemble based probabilistic forecasts. Quart. J. Roy. Meteor. Soc, 130 , 627646.

  • Coffin, M., and Sukhatme S. , 1997: Receiver operating characteristic studies and measurement errors. Biometrics, 53 , 823837.

  • Dorfman, D. D., and Alf E. Jr., 1969: Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals. J. Math. Psychol, 6 , 487496.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dorfman, D. D., Berbaum K. S. , Metz C. E. , Lenth R. V. , Hanley J. A. , and Abu Dagga H. , 1997: Proper receiver operating characteristic analysis: The bigamma model. Acad. Radiol, 4 , 138149.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Drton, M., Marzban C. , Guttorp P. , and Schaefer J. T. , 2003: A Markov chain model of tornadic activity. Mon. Wea. Rev, 131 , 29412953.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hanley, J. A., and McNeil B. J. , 1982: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143 , 2936.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hanley, J. A., and McNeil B. J. , 1983: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148 , 839843.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Harvey L. O. Jr., , Hammond K. R. , Lusk C. M. , and Mross E. F. , 1992: Application of signal detection theory to weather forecasting behavior. Mon. Wea. Rev, 120 , 863883.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marzban, C., and Witt A. , 2001: A Bayesian neural network for hail size prediction. Wea. Forecasting, 16 , 600610.

  • Marzban, C., Leyton S. , and Colman B. , cited 2003: Nonlinear post-processing of model output: Ceiling and visibility. NWS/COMET Rep. [Available online at http://www.nhn.ou.edu/marzban/comet1.pdf.].

    • Search Google Scholar
    • Export Citation
  • Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag, 30 , 291303.

  • Mason, S. J., and Graham N. E. , 1999: Conditional probabilities, relative operating characteristics, and relative operating levels. Wea. Forecasting, 14 , 713725.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Metz, C. E., Herman B. A. , and Shen J. H. , 1998: Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat. Med, 17 , 10331053.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., and Winkler R. L. , 1987: A general framework for forecast verification. Mon. Wea. Rev, 115 , 13301338.

  • Murphy, A. H., and Winkler R. L. , 1992: Diagnostic verification of probability forecasts. Int. J. Forecasting, 7 , 435455.

  • Mylne, K. R., 1999: The use of forecast value calculations for optimal decision making using probability forecasts. Preprints, 17th Conf. on Weather Analysis and Forecasting, Denver, CO, Amer. Meteor. Soc., 235–239.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2000: Applications of cost loss models. Proc. Seventh Workshop on Meteorological Operational Systems, Reading, United Kingdom, ECMWF, 209–213.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Quart. J. Roy. Meteor. Soc, 127 , 24732489.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shapiro, D. E., 1999: The interpretation of diagnostic tests. Stat. Methods Med. Res, 8 , 113134.

  • Stephenson, D. B., 2000: Use of the “odds ratio” for diagnosing forecast skill. Wea. Forecasting, 15 , 221232.

  • Wilks, D. S., 2001: A skill score based on economic value for probability forecasts. Meteor. Appl, 8 , 209219.

  • Zhou, X-H., McClish D. K. , and Obuchowski N. A. , 2002: Statistical Methods in Diagnostic Medicine. John Wiley and Sons, 464 pp.

  • Zou, K. H., cited 2003: Receiver operating characteristic (ROC) literature research. [Available online at http://splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html.].

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    A generic situation involving a forecast of two classes

  • Fig. 2.

    Examples of ROC curves representing different levels of performance quality. The diagonal line corresponds to random forecasts (i.e., poor performance), while the curves away from the diagonal represent higher levels of performance. The following features are noted: (a) symmetric ROC curves, (b) symmetric and asymmetric curves, also overlapping one axis, and (c) extremely concave curves

  • Fig. 3.

    Schematics of (top) uniform class-conditional distributions, (middle) the corresponding ROC curve, and (bottom) the AUC curve as a function of δc = c1c0

  • Fig. 4.

    Same as in Fig. 3 but for triangular distributions over unbounded forecasts

  • Fig. 5.

    Same as in Fig. 3 but for Gaussian distributions

  • Fig. 6.

    Same as in Fig. 3 but for bounded (e.g., probabilistic) forecasts

  • Fig. 7.

    Same as in Fig. 3 but for beta distributions. The corresponding parameters are b0 = 2, b1 = 3, a0 = 2, with a1 taking values 2, 3, 4, and 5 (from top to bottom)

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 10489 2198 149
PDF Downloads 7224 1440 101