## 1. Introduction

In statistical model building one is often faced with the task of reducing the number of predictors. The reason for this is, usually, to preclude an “information overload” either for an a posteriori statistical analysis or for the benefit of the prospective user. As an example of the former situation, consider a dataset consisting of a number of predictors whose number exceeds the sample size of the data. Such a dataset is inadequate for statistical model building because the model^{1} is apt to overfit such a dataset. Overfitting generally refers to the situation in which a model (e.g., regression, discriminant analysis) performs well on the dataset employed for parameter estimation but performs poorly on an independent dataset. In fact, it can occur even when the number of predictors is less than the number of cases; it occurs because the model has more parameters than can be uniquely determined from data. One way for reducing the number of parameters in a model is by reducing the number of predictors without excessive loss of information. This situation is exemplified by numerous algorithms (Stumpf et al. 1998; Mitchell et al. 1998) that offer the user an unwieldy number of variables for predicting weather phenomena such as tornadoes or severe wind. A reduction in the number of variables may aid in better utilizing the algorithms for predictive purposes by avoiding the technical problem of overfitting and by precluding any information overload.

Such a reduction can occur in at least two ways: One method is to retain only linear (or nonlinear) combinations of the predictors that account for most of the variance in the data. A well-known example is principal component analysis. Such methods make no reference to the dependent variable, and so are not appropriate for selecting the best predictors. A second approach is to take linear (or nonlinear) combinations that actually constitute a set of best predictors of the event at hand. This is equivalent to building a model (regression, or neural network, etc.) for predicting the events. Although the existence of a model simplifies the task of identifying the best predictors, a model is not always readily available. If a model does exist, then it is possible to rank the predictors according to some measure of their predictive strength and retain only the best predictors. Some methods that accomplish this task are stepwise regression and stepwise discriminant analysis. However, stepwise methods usually do not yield an unambiguous ranking of the variables. The reason is as follows: Stepwise methods are based on the improvement of performance upon the inclusion of some variable in the model (i.e., “forward stepwise”), or the loss of performance brought about by the exclusion of some variable from the model (i.e., “backward stepwise”). It is possible that the forward and backward procedures will lead to the same ordering of the variables, but that outcome is neither guaranteed nor likely. Furthermore, the selection criterion and the predictive power are quantities that must be specified. As a result, the list of the best predictors arrived at in this way is not necessarily unique.

There are other methods for ranking variables according to their predictive strength, but most (if not all) invoke certain assumptions whose violation may be detrimental to the goal of finding the best predictors. The purpose of this article is threefold: first, to review some of the contingencies and difficulties in any attempt at ordering variables according to their predictive strength;second, to identify the conditions under which predictive strengths *can* be assigned; and, finally, to illustrate one example dealing with tornado prediction.

## 2. Contingencies and difficulties

In this section, a number of situations are considered that expose some of the contingencies and difficulties associated with the question of best predictors.

*n*predictors,

*x*

_{i}(

*i*= 1,

*n*), to a single dependent variable,

*y.*And suppose that the model is developed correctly. A well-known situation is when the model is linear in both the parameters and the variables:

*y*

*α*

_{1}

*x*

_{1}

*α*

_{2}

*x*

_{2}

*α*

_{i}(

*i*= 1,

*n*), are the parameters of the model. It is often said that the best predictor is the one with the largest (in magnitude)

*α,*or equivalently, that the variables can be ordered according to the magnitude of the

*α*’s. However, that conclusion is contingent on at least two assumptions—that the variables all vary over the same range, and that they are uncorrelated. The first assumption may be fulfilled by simply scaling all the predictors so as to vary over the same range. The second contingency, however, is difficult to deal with. It is easy to show that if there exists any collinearity between two (or more) variables, then the corresponding coefficients are ambiguous in that a linear combination of them will produce the same value of

*y*; this can lead to the best estimates for the

*α*’s that are excessively large positive (or negative) numbers (Draper and Smith 1981). As such, the

*α*’s become meaningless. Furthermore, it can be shown that the standard error for the

*α*’s increase with the amount of collinearity, and as a result, their estimates become less precise (Tacq 1997).

*y*

*α*

_{1}

*x*

_{1}

*α*

_{2}

*x*

^{2}

_{1}

*α*

_{3}

*x*

^{3}

_{1}

*β*

_{1}

*x*

_{2}

*β*

_{2}

*x*

^{2}

_{2}

*β*

_{3}

*x*

^{3}

_{2}+ · · · ,

*α*

_{i},

*β*

_{i}are all parameters. If this is the model that best fits reality, then each variable is no longer associated with a single coefficient, and so it is impossible to assign a single measure of strength to the variables.

*y*

*α*

_{1}

*x*

_{1}

*α*

_{2}

*x*

_{2}

*βx*

_{1}

*x*

_{2}

It is entirely possible and even likely that the underlying model of a real-world problem is nonlinear in the parameters and includes interactions. An example of such a model is described in Marzban and Stumpf (1996, 1998) and Marzban et al. (1997) wherein a neural network for tornado prediction is outlined. For such nonlinear and interacting problems, the question of best predictors is then entirely unaddressable, at least uniquely, based on the parameters of the model.

It is possible to approach that question from a point of view that does not involve a direct examination of the parameters. One set of such approaches, namely, the stepwise set, was mentioned in the introduction. As mentioned there, although it is possible to order the predictors according to the gain in performance upon their inclusion in the model, the reverse exercise (i.e., ordering the variables according to the loss in performance upon their systematic exclusion) can yield a different order. Consequently, any stepwise ordering of the variables according to their predictive strength is ambiguous and cannot lead to a unique set of best predictors.

All of the above mentioned methods presume the existence of a statistical model, be it linear regression or nonlinear regression methods such as neural networks. There exist situations, however, where even a statistical model does not exist. Examples include numerous meteorological algorithms that simply produce attributes of radar signatures that are believed to be associated with the phenomenon at hand (e.g., tornado, damaging wind) though without a model to relate the attributes directly to the phenomenon. The utility of such algorithms is not only in providing guidance, but also in providing the user with an arena wherein experimentation along with trial and error can induce a “mental model” that may in turn be employed for predictive purposes. Even when a statistical model exists, one cannot satisfactorily answer the question of best predictors. The issue becomes almost impossible to resolve when a model does not even exist.

It is interesting that the absence of a statistical model suggests an approach wherein the question of best predictors may actually be answered without being affected by the above mentioned difficulties. Regardless of the presense or absence of a statistical model, a reliable method for ordering the variables is a bivariate one (i.e., one predictor at a time). Such a bivariate analysis is model independent in that it does not presume the existence of a multiple regression model, a neural network, etc. As a result, it is unaffected by multicollinearity, interactions, and the other problems that infect multiple (independent) variable models. In this sense, a bivariate approach to ordering is the only meaningful approach, and it offers the additional flexibility of benefiting the users of an algorithm who have only a mental model to guide them in utilizing the various variables for predictive purposes.

## 3. Bivariate approaches

Identifying the best predictors in a bivariate analysis is still not free of ambiguities, but this time they are due to ambiguities in the definition of “best.” In this section, several bivariate approaches will be proposed to address the question of the best predictors of a dependent variable. In particular, the predictors are assumed to be continuous (e.g., temperature, height), and the dependent variable is assumed to be binary (e.g., tornado or no tornado, rain or no rain, generally referred to as event or nonevent, and labeled as 1 or 0, respectively).

*r.*When both the predictor and the dependent variable are continuous,

*r*is a measure of linear correlation between the two. In the current case the dependent variable is binary, but

*r*does still offer a measure of correlation, although a better description may be association (see the appendix). The correlation coefficient between two variables

*x*and

*y*is computed as

*r*varies between −1 and +1. A nonlinear generalization of

*r*will be discussed later in this section.

An alternative approach is suggested by considering the way in which forecasters employ the variables at their disposal. They may be interested in issuing forecasts that maximize some measure of performance. Assuming that either higher or lower values of the predictor are associated with events, an important quantity is the value of the variable above (or below), which a “warning” is to be issued. Furthermore, it is important to identify the threshold at which the measure of performance is maximized. This suggests the following method for assigning a predictive strength to the various predictors: For each variable, dichotomize it by introducing a decision threshold, form a 2 × 2 contingency table for the forecasts and observations, compute some measure of performance based on the contingency table, and then order the variables according to the maximum obtainable value of that measure.

^{2}A relatively unbiased measure is the Heidke skill score (HSS), and two relatively biased measures are the critical success index (CSI) and the likelihood ratio chi-square (LRC). They are defined as

**C**

**E**

**C**

*C*

_{1}and

*C*

_{4}are the number of correctly classified nonevents and events, respectively;

*C*

_{2}and

*C*

_{3}are the number of incorrectly classified nonevents and events, respectively. The expected matrix,

**E**

**C**

In spite of its inequitability (Gandin and Murphy 1992; Marzban 1998a; Marzban and Lakshmanan 1999), CSI is a popular measure in meteorology because it can be computed without any knowledge of *C*_{1}; in practice, forecasters do not keep track of the number of nonevents when warnings are not issued. Whereas CSI is a measure of accuracy, HSS is a measure of skill, and so it takes into account random forecasts; said differently, if **C****E**

*P*

_{1}(

*x*), given the value of a predictor,

*x.*This probability can be calculated from the conditional frequency distribution,

*N*

_{i}(

*x*), at a given value of

*x,*where

*i*= 0, 1 refers to nonevents and events, respectively. Specifically, it can be shown (Marzban 1998b) that Bayes’ theorem implies

*P*

_{1}(

*x*) changes significantly as a function of

*x.*Although one may order the predictors according to the change in

*P*

_{1}(

*x*) over the range of

*x,*it is more instructive to examine the plot of

*P*

_{1}(

*x*) as a function of

*x for each variable,*because such plots display a multifaceted view of the importance of a predictor.

The first two methods have a limitation that the last method does not; they are linear. This causes a nonlinear variable (e.g., *x*_{2} in Fig. 5) to be assigned an incorrect (and possibly low) predictive strength. The probabilistic method exposes the nonlinearity of such variables and will, therefore, assign a faithful predictive strength. The disadvantage of the probabilistic method is that it does not offer a means of quantitatively ordering the variables according to a single (scalar) measure. In other words, the multifaceted nature of the plot of *P*_{1}(*x*) as a function of *x* allows only for a coarse classification of the variables into a few classes of predictive strength (e.g., poor, marginal, good) and not a continuous ordering of the variables.

*P*

_{1}(

*x*) as a function of

*x*into a single, one-dimensional (scalar) quantity. In fact, this quantity is a nonlinear generalization of the linear correlation coefficient and is called the correlation ratio,

*η*(Croxton and Crowden 1955).

^{3}Its exact definition (and its relation to

*r*) is given in the appendix. For a binary dependent variable, its square can be written as

*N*

_{0}and

*N*

_{1}are the sample sizes for nonevents and events, respectively, and

*p*

_{1}=

*N*

_{1}/(

*N*

_{0}+

*N*

_{1}) is the a priori (or climatological) probability of the event. Finally, it must be pointed out that some information is lost any time a multifaceted quantity is reduced to a scalar. Therefore, although it is possible to order the variables according to their

*η,*the plot of

*P*

_{1}(

*x*) as a function of

*x*carries more information regarding the predictive strength of the variables (see the next section).

## 4. Application to tornado prediction

The National Severe Storms Laboratory’s Tornado Detection Algorithm (TDA) has recently been added to the Weather Surveillance Radar 1988-Doppler (WSR-88D) system. A descriptive outline of the TDA functionality, performance capability, strengths, and weaknesses can be found in Mitchell et al. (1998). The function of the TDA is to identify regions of strong azimuthal shear in Doppler velocity data that are often, but not always, associated with tornadoes. A strong azimuthal shear implies that a circulation is associated with a vortex. The TDA has replaced the original WSR-88D tornadic vortex signature algorithm, and as such, it is important to offer the prospective users of TDA some guidance so as to allow for a better utilization of the algorithm for predictive purposes. In particular, it is useful to know which of the many attributes of a vortex detected by TDA are most strongly associated with the occurrence of tornadoes. The answer will be considered within the context of the previously mentioned, bivariate methods.

The examined dataset consists of 43 cases (or 275 h) of WSR-88D data containing 207 tornado reports and over 173 severe wind reports from a variety of storm types from across the United States. This constitutes *N*_{0} = 7224 nontornadic circulations detected by the TDA and *N*_{1} = 730 TDA-detected tornadic circulations.^{4} Note that *N*_{0} ≫ *N*_{1}.

The predictors computed by TDA are listed in Table 1 (in no particular order); throughout this article, however, they will be referred to by the numerical labels appearing in that table. Most of the variables have a self-explanatory meaning. However, it is worth elaborating on gate-to-gate velocity difference and shear. The former is the difference between two adjacent velocity gates that are adjacent in azimuth and constant in range. In contrast, shear is the velocity difference divided by the distance between the adjacent velocity gates.

## 5. Results of the application

It is instructive to identify the variables that are correlated with one another, not only for gaining some substantive understanding of the data, but also as a check of the various methods; for example, if the predictive strengths of two highly collinear predictors are found to be significantly different, then one may suspect an error in the (bivariate) analysis. Pearson’s correlation coefficient, *r,* can again be utilized to identify the mutually correlated predictors. However, the rare-event nature of the dataset under study can cause *r* to be excessively large. For this reason, the correlation coefficients must be computed for the two classes, separately:*r*^{(0)} for nontornadoes and *r*^{(1)} for tornadoes. Pairs of variables that are highly correlated for both classes may be considered statistically equivalent (or redundant). The pairs with *r*^{(0)} > 0.8 and *r*^{(1)} > 0.8 are variables 5 and 8, and 6 and 7. The statistical equivalence of these variables is evident in their scatterplots (Fig. 1). The correlation coefficient between variables 5 and 8 is *r*^{(0)} = 0.97 for the nontornadic circulations and *r*^{(1)} = 0.96 for the tornadic circulations; the correlation coefficients between variables 6 and 7 are *r*^{(0)} = 0.88 and *r*^{(1)} = 0.86. The probability that values as large as these values of *r* could be obtained by chance was computed to be zero (to 12 decimal places).^{5} The standard errors [(1 − *r*^{2})/*N**r*’s are 0.001, 0.003, 0.003, and 0.01, respectively. Therefore, to a high level of significance, the corresponding pairs of variables are highly correlated.

The linear correlations between the predictors and the dependent variable (ground truth) are given in Fig. 2. The height of each bar is a measure of the predictive strength of the corresponding variable; a positive (negative) value for *r* implies that tornadoes are associated with larger (smaller) values of the corresponding variable. The standard error for these values of *r* is approximately 0.01. It is evident that according to this measure of predictive strength, variables *x*_{3}, *x*_{4}, *x*_{1}, and *x*_{9} are the best predictors, respectively, in descending order. Also note that, as expected, the collinear variables have equal predictive strengths (within the standard error).

As described in section 3, a predictor may be dichotomized by the introduction of a decision threshold, after which some categorical measure of performance may be computed. For example, Fig. 3a shows the dependence of the three measures on the value of the decision threshold placed on variable *x*_{2} (i.e., depth). It can be seen that a CSI of 0.15 can be reached if depths larger than approximately 6100 m are forecast as tornadic. Approximately the same threshold maximizes LRC, while to obtain a maximum HSS, the threshold must be placed at *x*_{2} = 7000 m. Figure 3b shows the corresponding values of POD, FAR, and bias. For example, it can be seen that a set of forecasts that maximize HSS lead to a POD of 45%, FAR of 82%, and are nearly unbiased (bias ∼ 1). By contrast, forecasts that maximize CSI or LRC are heavily biased (bias ≫ 1). Table 2 shows the analogous quantities for all the predictors. Note that the use of CSI or LRC leads to generally higher bias values than that of HSS.

The maximum score reached by placing a threshold on each of the predictors is displayed in Fig. 4 for the three different measures. The height of a bar is a measure of the predictive strength of the corresponding variable. Recall that since *x*_{1}, *x*_{5}, and *x*_{8} (and to a statistically insignificant level, *x*_{10}) are negatively correlated with tornados (Fig. 2), subthreshold values should be forecast as tornadic.

Evidently, if maximizing CSI is the goal (Fig. 4a), then variables *x*_{3}, *x*_{4}, *x*_{1}, and *x*_{9} are the best predictors in descending order. The high (linear) correlation between the variables *x*_{5} and *x*_{8}, and *x*_{6} and *x*_{7}, is manifest in Fig. 4a by their equal predictive strength. Note that employing *x*_{10} (i.e., range) appears to yield a nonzero CSI, in spite of the lack of any theoretical or physical reason for range to be a good predictor. This can be traced to the fact that CSI is not a measure of skill in that it does not take into account random forecasting. As advocated previously, the use of CSI may lead to false conclusions regarding the predictive strength of the various predictors.

A manifestation of the aforementioned inequitability of CSI is evident in Fig. 3; note that if one places a decision threshold at zero and proceeds to declare all detected circulations as tornadic, then a nonzero CSI is obtained. This may induce a forecaster to overforecast. In fact, in a rare-event situation (i.e., *N*_{0} ≫ *N*_{1}), CSI can reach its maximum at the lowest value of the predictor (Marzban 1998a), causing severe overforecasting on the part of a forecaster who employs CSI to gauge performance. Indeed, CSI would not have been included in this analysis were it not for its popularity (due to its independence of the *C*_{1} element of the contingency table).

Coincidentally, the best predictors according to CSI are the same set of predictors that maximize HSS (Fig. 4b). The noticeable and welcomed difference is that *x*_{10} is assigned a much lower predictive strength according to HSS.

If LRC is to be maximized, then the best predictors are *x*_{1}, *x*_{3}, *x*_{9}, and *x*_{4}, in descending order (Fig. 4c). The ability of LRC to better differentiate between the predictors is apparent in the erratic nature of the vertical bars in Fig. 4c. Also note that *x*_{10} emerges with an almost nonexistent predictive strength, and correctly so.

As for the probabilistic approach, the curves for *P*_{1}(*x*) are presented in Fig. 5 for all the predictors. The curve with the error bars is *P*_{1}(*x*) as a function of *x,* and the curves marked with 0 and 1 are the normalized probability densities [*N*_{0}(*x*)/*N*_{0} and *N*_{1}(*x*)/*N*_{1}]. The error bars on the *P*_{1}(*x*) curve reflect the sampling error. The range of the probabilities obtained in these plots is more meaningful if one realizes that the a priori probability of a TDA-detected circulation being tornadic, as estimated by *N*_{1}/(*N*_{1} + *N*_{0}), is about 0.09. It can be seen that variable *x*_{3} is an example of a good predictor, while a poor predictor is variable *x*_{10}.

These probability plots are multidimensional entities and, as such, do not directly allow for a quantitative ordering of the variables. Therefore, they are coarsely divided into three classes of predictive strength—poor, marginal, and good—corresponding to variables whose posterior probabilities generally vary in the 10%, 20%, and 50% ranges, respectively. The results are tabulated in Table 3.

A finer classification is possible if one allows for some loss of information. As shown in (11), the correlation ratio can be computed from *P*_{1}(*x*). As such, *η* allows for further distillation of the information contained in *P*_{1}(*x*). The predictive strengths of the variables according to *η* are given in Fig. 6. This figure is very similar to Fig. 2; in fact, *η* is almost equal to *r* for all of the variables. The only exceptions are *x*_{1}, *x*_{2}, and *x*_{9}, which have *η* > *r*; this is consistent with Fig. 5, where it can be seen that only these variables are nonlinear.

As mentioned previously, any distillation of the probability plots leads to loss of information. For example, as seen from Fig. 5, variable *x*_{2} has little or no predictive strength for *x*_{2} < 5000 m; only for *P*_{1}(*x*) > 5000 m does it begin to have any predictive strength. Even a measure like *η,* which is a measure of nonlinear correlation, leads to a single number that does not capture such nonlinearity. Said differently, a scalar measure has no diagnostic capability, though it can still determine the predictive strength of the variables.

## 6. Summary

First, it is argued that the task of assigning predictive strengths to a number of predictors is difficult, at best. Some of the assumptions/contingencies underlying that task are discussed, and it is shown that they are avoided in a bivariate analysis, that is, one independent variable at a time. Several such methods are offered, after which they are illustrated in an application to tornado prediction. It is found that the various tornado predictors in the National Severe Storms Laboratory’s Tornado Detection Algorithm portray a wide range of predictive strengths depending on the measure and the method of obtaining the predictive strength. Among the various methods and measures, a consensus does exist, however, regarding the choice of the best predictor.

The analysis suggests that variables *x*_{3}, *x*_{4}, *x*_{1}, and *x*_{9}, in descending order, have the highest linear correlation and correlation ratio with tornadoes. They also produce the highest performance as gauged by CSI and HSS. Maximizing LRC, on the other hand, leads to a different order for the same variables, namely, *x*_{1}, *x*_{3}, *x*_{9}, and *x*_{4}. As for the probabilistic method, the outstanding predictors for tornadoes are *x*_{2}, *x*_{3}, *x*_{4}, and *x*_{9} (in no particular order). Variables *x*_{3} (low-level gate-to-gate velocity difference), *x*_{4} (maximum gate-to-gate velocity difference), and *x*_{9} (tornado strength index) can be considered to meet the consensus of the best predictors.^{6}

## Acknowledgments

V. Lakshmanan and A. Witt are acknowledged for valuable discussion and a thorough reading of an early version of this manuscript. Support was provided by the NOAA/Operational Support Facility and the Federal Aviation Administration.

## REFERENCES

Bevington, P. R., and D. K. Robinson, 1992:

*Data Reduction and Error Analysis for the Physical Sciences.*McGraw-Hill, 328 pp.Croxton, F. E., and D. J. Crowden, 1955:

*Applied General Statistics.*Prentice-Hall, 843 pp.Draper, N. R., and H. Smith, 1981:

*Applied Regression Analysis.*John Wiley and Sons, 709 pp.Fienberg, S. E., 1977:

*The Analysis of Cross-Classified Categorical Data.*The MIT Press, 190 pp.Gandin, L. S., and A. Murphy, 1992: Equitable skill scores for categorical forecasts.

*Mon. Wea. Rev.,***120,**361–370.Marzban, C., 1998a: Scalar measures of performance in rare-event situations.

*Wea. Forecasting,***13,**753–763.——, 1998b: Bayesian probability and scalar performance measures in Gaussian Models.

*J. Appl. Meteor.,***37,**72–82.——, and G. Stumpf, 1996: A neural network for tornado prediction based on Doppler radar–derived attributes.

*J. Appl. Meteor.,***35,**617–626.——, and ——, 1998: A neural network for damaging wind prediction.

*Wea. Forecasting,***13,**151–163.——, and V. Lakshmanan, 1999: On the uniqueness of Gandin and Murphy’s equitable performance measures.

*Mon. Wea. Rev.,***127,**1134–1136.——, H. Paik, and G. Stumpf, 1997: Neural networks vs. Gaussian discriminant analysis.

*AI Appl.,***11,**49–58.Mitchell, E. D., S. V. Vasiloff, A. Witt, M. D. Eilts, G. J. Stumpf, J. T. Johnson, and K. W. Thomas, 1998: The National Severe Storms Laboratory Tornado Detection Algorithm.

*Wea. Forecasting,***13,**352–366.Panofsky, H. J., and G. E. Brier, 1968:

*Some Applications of Statistics to Meteorology.*The Pennsylvania State University, 224 pp.Stumpf, G. J., A. Witt, E. D. Mitchell, P. L. Spencer, J. T. Johnson, M. D. Eilts, K. W. Thomas, and D. W. Burgess, 1998: The National Severe Storms Laboratory Mesocyclone Detection Algorithm for the WSR-88D.

*Wea. Forecasting,***13,**304–326.Tacq, J., 1997:

*Multivariate Analysis Techniques in Social Science Research.*Sage Publications, 411 pp.

## APPENDIX

### Correlation Coefficients

In this appendix, the formulas for the linear correlation coefficient, *r,* and the correlation ratio, *η,* are given and specialized to the case wherein the dependent variable is binary (0 or 1).

*y*(

*x*

_{i}) =

*ax*

_{i}+

*b.*That quantity is called the coefficient of determination and can be written as

*y*

_{i}is the

*i*th observation of the dependent variable and

*y*

*N*observations. When the

*y*

_{i}take 0 or 1 values, then

*y*

*N*

_{1}/(

*N*

_{0}+

*N*

_{1}), where

*N*

_{0}and

*N*

_{1}are the sample sizes for the two classes. Note that this ratio is nothing but the a priori or climatological probability of tornado,

*p*

_{1}. Similarly, the variance of

*y*can be written as

*r*

^{2}can be written as

*r*is proportional to the distance between the means of the independent variables in the two classes. As such, it is better described as a measure of discrimination or association.

*η.*As in

*r,*it is defined as the proportion of the total variance that is explained by the fit, but in contrast with

*r,*the fit is not assumed to be linear. However, since the form of the nonlinear curve is not specified,

*η*is instead defined in terms of the average of the dependent variable for specific values of the independent variable. Specifically, its square can be written as

*y*

_{x}is the average of the dependent variable corresponding to some specified value of the independent variable

*x*and the summation is over the full range of

*x.*Here,

*N*

_{x}is the sample size for that value of

*x,*and it should be sufficiently large as to assure the smooth variation of

*y*

_{x}with

*x.*If

*y*

_{i}= 0, 1, then

*N*

_{1}(

*x*) and

*N*

_{0}(

*x*) are the sample sizes for the two classes but for a specific value of

*x.*Combining the above equations results in

*η*

^{2}is a measure of the amount by which the posterior probability of tornado differs from the a priori probability of tornado, averaged over the full range of

*x.*

The (linear) correlation coefficient, *r,* between the dependent variable (ground truth) and each of the predictors (see Table 1). Standard error = 0.01.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The (linear) correlation coefficient, *r,* between the dependent variable (ground truth) and each of the predictors (see Table 1). Standard error = 0.01.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The (linear) correlation coefficient, *r,* between the dependent variable (ground truth) and each of the predictors (see Table 1). Standard error = 0.01.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

(a) Performance measures, CSI (solid curve), HSS (dashed curve), and LRC (dashed–dotted curve), and (b) POD, FAR, and bias, as a function of the value of the decision threshold placed on the predictor *x*_{2} (i.e., depth). The horizontal (dotted) line has been drawn to point out the threshold at which bias = 1.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

(a) Performance measures, CSI (solid curve), HSS (dashed curve), and LRC (dashed–dotted curve), and (b) POD, FAR, and bias, as a function of the value of the decision threshold placed on the predictor *x*_{2} (i.e., depth). The horizontal (dotted) line has been drawn to point out the threshold at which bias = 1.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

(a) Performance measures, CSI (solid curve), HSS (dashed curve), and LRC (dashed–dotted curve), and (b) POD, FAR, and bias, as a function of the value of the decision threshold placed on the predictor *x*_{2} (i.e., depth). The horizontal (dotted) line has been drawn to point out the threshold at which bias = 1.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The maximum value of three performance measures obtained by dichotomizing the predictors.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The maximum value of three performance measures obtained by dichotomizing the predictors.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The maximum value of three performance measures obtained by dichotomizing the predictors.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The posterior probability of tornado, given the value of the variable (the curve with error bars), and the probability densities for nontornadoes (labeled with 0) and tornadoes (labeled with 1).

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The posterior probability of tornado, given the value of the variable (the curve with error bars), and the probability densities for nontornadoes (labeled with 0) and tornadoes (labeled with 1).

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The posterior probability of tornado, given the value of the variable (the curve with error bars), and the probability densities for nontornadoes (labeled with 0) and tornadoes (labeled with 1).

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The correlation ratio, *η,* between the dependent variable (ground truth) and each of the predictors. Standard error = 0.01.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The correlation ratio, *η,* between the dependent variable (ground truth) and each of the predictors. Standard error = 0.01.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The correlation ratio, *η,* between the dependent variable (ground truth) and each of the predictors. Standard error = 0.01.

Citation: Weather and Forecasting 14, 6; 10.1175/1520-0434(1999)014<1007:TNOBPA>2.0.CO;2

The list of the variables and their corresponding labels. Consult Mitchell (1998) for a precise definition of the variables.

The decision thresholds yielding the maximum obtainable scores CSI, HSS, and LRC, and the corresponding values of POD, FAR, and bias. No thresholds are given for variable *x*_{10}, since it has no true predictive strength.

The predictive strength—good, marginal, poor—of each variable according to the probabilistic approach.

^{1}

Throughout this article, unless otherwise stated, a *model* shall refer to a statistical model.

^{2}

A rare-event situation refers to when an event is far more likely than the nonevent.

^{3}

The authors are indebted to one of the reviewers for pointing out the existence of this measure.

^{4}

Tornadic circulations are those that can be associated in space and time with a reported tornado (ground truth). A time window is applied such that associated circulations present within 20 min before the starting time of the tornado, and 6 min after the ending time, are also deemed tornadic.

^{5}

*r*as large as the observed value of

*r*is given by (Bevington and Robinson 1992)

*ν*=

*N*− 2 is the number of degrees of freedom for an experimental sample of size

*N.*

^{6}

It must be emphasized that, given the bivariate nature of the analysis, the top two predictors would not necessarily constitute the best pair of predictors.