## Abstract

In previous work the authors demonstrated an empirical relation, in the form of an index, between U.S. monthly tornado activity and monthly averaged environmental parameters. Here a detailed comparison is made between the index and reported tornado activity. The index is a function of two environmental parameters taken from the North American Regional Reanalysis: convective precipitation (cPrcp) and storm relative helicity (SRH). Additional environmental parameters are considered for inclusion in the index, among them convective available potential energy, but their inclusion does not significantly improve the overall climatological performance of the index. The aggregate climatological dependence of reported monthly U.S. tornado numbers on cPrcp and SRH is well described by the index, although it fails to capture nonsupercell and cool season tornadoes. The contributions of the two environmental parameters to the index annual cycle and spatial distribution are examined with the seasonality of cPrcp (maximum during summer) relative to SRH (maximum in winter) accounting for the index peak value in May. The spatial distribution of SRH establishes the central U.S. “tornado alley” of the index, while the spatial distribution of cPrcp enhances index values in the South and Southeast and suppresses them west of the Rockies and over elevation. At the scale of the NOAA climate regions, the largest deficiency of the index climatology occurs over the central region where the index peak in spring is too low and where the late summer drop-off in the reported number of tornadoes is poorly captured. This index deficiency is related to its sensitivity to SRH, and increasing the index sensitivity to SRH improves the representation of the annual cycle in this region. The ability of the index to represent the interannual variability of the monthly number of U.S. tornadoes can be ascribed during most times of the year to interannual variations of cPrcp rather than of SRH. However, both factors are important during the peak spring period. The index shows some skill in representing the interannual variability of monthly tornado numbers at the scale of NOAA climate regions.

## 1. Introduction

The question of how climate signals such as the Madden–Julian oscillation (MJO), the El Niño–Southern Oscillation (ENSO), and changes in radiative forcing influence tornado activity is an important one and has been the subject of a number of recent studies (Trapp et al. 2007; Cook and Schaefer 2008; Trapp et al. 2009; Lee et al. 2013; Weaver et al. 2012; Thompson and Roundy 2013; Barrett and Gensini 2013; Diffenbaugh et al. 2013). Direct treatment of this question theoretically, statistically, or numerically is highly challenging for the following reasons:

the dynamics of tornadogenesis is highly complex and incompletely understood;

a long-term, high-quality homogeneous tornado report record is unavailable; and

numerical models that resolve climate signals do not currently resolve tornadoes.

On weather time scales, information about the environmental “ingredients” associated with severe weather and tornadic storms has proved useful to forecasters in interpreting observed soundings and short-range forecasts, and many studies have examined the question of which local environmental quantities are most informative regarding the likelihood of tornado formation (Maddox 1976; Brooks et al. 1994; Rasmussen and Blanchard 1998; Brooks et al. 2003; Grams et al. 2012). These studies have used environmental quantities from soundings in the proximity of severe thunderstorms and subdaily reanalysis data. Overall, measures of vertical wind shear and potential updraft strength have been found to be effective in characterizing environments that are conducive to tornado occurrence. However, tornadogenesis depends on multiple small-scale processes in addition to the ambient environment, and even when the environment is favorable and a thunderstorm has formed, the occurrence or nonoccurrence of a tornado remains highly uncertain (Wurman et al. 2012).

A similar ingredient-based approach has been used to study the connection between climate and tropical cyclones (TCs), beginning with the development by Gray (1979) of an empirical TC genesis “index” that characterizes the suitability of the local environment for TC genesis. Prior to any TC genesis index, Gray (1968) used climatological values of key environmental parameters such as vertical wind shear and relative vorticity to explain much of the global distribution and seasonal cycle of observed TC occurrence. Extensions and generalizations of Gray’s TC genesis index (e.g., Emanuel and Nolan 2004; Tippett et al. 2011) have been used to study the modulation of TC genesis frequency in observations and models by climate signals including the MJO, ENSO, and climate change (e.g., Camargo et al. 2007a,b; Vecchi and Soden 2007; Nolan et al. 2007; Camargo et al. 2009; Lyon and Camargo 2009; Yokoi et al. 2009; Yokoi and Takayabu 2009). Prediction of the environments favorable to TC formation is one method for producing seasonal hurricane activity forecasts (Vecchi et al. 2011).

A key distinction between the development and application of environmental indices in the tornado and TC studies listed above, in addition to the disparate phenomena under consideration, is the time and space scales of interest. Prediction of severe weather events is a prominent goal in the tornado context, and the use of high-spatial resolution and subdaily data is key in order to characterize as accurately as possible where and when tornadoes will occur. On the other hand, the aim of TC ingredient indices is often to describe the basin-scale modulation of TC activity by large-scale climate variability on time scales of weeks to decades. Consequently, TC environment indices are often based on data with temporal and spatial scales that are large compared to those of any single TC genesis event. The aim of the present work is to develop and assess the utility of tornado environment indices for capturing the variability of tornado activity related to climate variability, rather than for prediction of specific events, and this aim is the reason for our use of monthly averaged environmental quantities on a fairly coarse spatial grid.

An empirical relationship, expressed in the form of an environmental index, has recently been demonstrated between monthly averaged environmental quantities and tornado activity over the contiguous United States (CONUS; Tippett et al. 2012). As previously mentioned, the use of monthly averages is a significant distinction from previous work (e.g., Brooks et al. 2003, and others cited above), which has used environmental quantities on shorter (typically 6-hourly) time scales. The degree to which the monthly index covaries with reported tornado activity provides evidence for a connection between quantities varying on climate time scales and tornado activity. Such a connection is noteworthy since the lifetime of a tornadic event is no more than a few hours and often only a few minutes. Changes in the frequency of extreme subdaily environments associated with tornado occurrence correspond to changes in the tail of the distribution of environments occurring in the course of a month. At least conceptually, such changes can be caused by either changes in the mean or by changes in the spread of the distribution of environments. This idea is illustrated in Fig. 1, which shows two distributions with enhanced probability of exceeding the 90th percentile—in one case due to increased spread and in the other due to a shift in mean. The success of the monthly index suggests that changes in the frequency of extreme environments are to some extent accompanied by changes in the monthly average of those environments.

Tippett et al. (2012) investigated some general properties of the monthly tornado index including the climatological number of CONUS tornadoes per month predicted by the index, the annually averaged spatial pattern of the index, and the interannual variability of the number of CONUS tornadoes predicted by the index. However, more detailed analysis is required if the index is be used with any confidence as a tool to diagnose the impact of climate signals on tornadic activity. Here we examine the properties of the index in more depth, including aspects of the environmental parameter selection, systematic deficiencies, and regional behavior. The paper is organized as follows. Tornado and environmental data are described in section 2. Index construction and parameter selection are discussed in section 3. The annual cycle of the index is described in section 4, and its interannual variability is described in section 5. A summary and future prospects are given in section 6.

## 2. Data

### a. Tornado data

U.S. tornado data covering the period 1950–2010 is provided by the National Centers for Environmental Prediction (NCEP) Storm Prediction Center (SPC) tornado, hail, and wind database in the form of reports (Schaefer and Edwards 1999). As has been discussed extensively by other authors, substantial variability in the tornado report record is unrelated to tornado activity and is due to changes in reporting practices, introduction of Doppler radar, and other changes in technology (Verbout et al. 2006; Doswell et al. 2009). The annual number of reported weak (F0) tornadoes has increased dramatically, roughly doubling over the last 60 years (Fig. 2a), consistent with the findings of Brooks and Doswell (2001). The annual numbers of reported F1 (Fig. 2b) and F2–F5 (Fig. 2c) tornadoes do not show such strong trends, but there are some indications of changes occurring in the late 1970s and 1980s, especially in the F2–F5 reports. Reported annual totals from the last 20 years seem relatively homogeneous across each of the intensity levels. Ideally, the nonphysical variations in the observational record could be removed from the tornado record, and the SPC does compute an “inflation adjusted” annual number of U.S. tornadoes using a trend line for the period 1954–2007 and taking 2007 as a baseline. [The inflation-adjusted tornado count was developed by Harold Brooks of the National Severe Storms Laboratory (NSSL) and Greg Carbin of the SPC and is described at http://www.spc.noaa.gov/wcm/adj.html.] However, there is no rigorous justification for the use of a linear correction to tornado frequency. Spatially varying features of the observational records are even more difficult to quantify. Limiting our attention to reports of more intense tornadoes or to more recent periods (last two decades) has the disadvantage of substantially reducing the sample size, which may be a problem, especially for characterizing the spatial dependence.

The temporal inhomogeneities associated with the tornado record are a primary reason for taking the spatially varying, monthly averaged tornado report *climatology* as the target of the index fitting procedure (Tippett et al. 2012). Doing so avoids the possibility of the statistical analysis spuriously associating nonphysical changes (trends and shifts) in the tornado record with coincidental physical variations. This choice also leaves the interannually varying record as an independent dataset for further verification. Disadvantages of this approach are that the range of covariability of tornado occurrence with environmental parameters in the climatology data is greatly reduced, the joint distribution of parameters is only climatological, and there is no association of particular tornadic events with the physical environment. The monthly tornado data (F0 and greater) are put onto a 1° × 1° latitude–longitude grid (25°–50°N, 130°–60°W) for the 32-yr period 1979–2010. The upward trend in the number of reported tornadoes results in the gridded monthly tornado climatology being negatively biased with respect to the most recent period.

### b. North American Regional Reanalysis

Monthly averaged environmental parameters are taken from the North American Regional Reanalysis (NARR; Mesinger et al. 2006). NARR is produced though the assimilation of observations into the 32-km NCEP Eta Model Black (1994). A distinguishing feature of the NARR is its assimilation of precipitation observations as latent heating profiles and may account for NARR precipitation products being superior to those of a number of global reanalyses (Bukovsky and Karoly 2007). Becker et al. (2009) found that NARR seasonal precipitation totals throughout the year were very close to those observed, although in the context of daily precipitation there is a tendency to underestimate extremes and overestimate lighter events, especially during summer in the eastern half of the United States.

The Betts–Miller–Janjić convective parameterization scheme. (Betts 1986; Janjić 1994) found in NARR uses a convective adjustment following activation, determining appropriate temperature and moisture mean reference profiles, which it then nudges the model toward at each grid point. Activation is dependent on the stability of the parcel with the highest equivalent potential temperature in the lowest 200 hPa of the atmosphere. Based on the lifting of this parcel, parameterized cloud depth is determined, and depending if it is greater or less than 200 hPa, the deep precipitating (or shallow nonprecipitating) convection activates [for further details, see Baldwin et al. (2002)]. The scheme does not have an explicit triggering condition and thus convective inhibition is handled implicitly by the profile. The adjustments following activation of the scheme are based on mean thermodynamic reference postconvective profiles from a number of global locations that are applied in such a way to satisfy enthalpy conservation through cloud depth. This has a net result of lower-tropospheric drying and warming in the mid to upper levels. However the activation of the precipitating scheme is very sensitive to the subcloud moisture. The presence of deep convection activation within the scheme is identified based on the presence or nonpresence of convective precipitation in model output. The shallow part of the scheme can result in anomalous drying of the 820–920-hPa layer, potentially leading to unrealistic distortion of convective inhibition (and thereby activation of the deep convective scheme) or both positive or negative modification of the thermodynamic environment. This occurs because both shallow and deep convective adjustments modify the profile to be monotonic convectively mixed, which can obscure small-scale vertical structures.

NARR data are provided on a 32-km Lambert conformal grid, which we interpolate to a 1° × 1° latitude–longitude grid over the CONUS (25°–50°N, 130°–60°W). Only data over land points are used. We consider monthly averages of the following NARR variables: surface convective available potential energy (CAPE), surface convective inhibition (CIN), best (four layer) lifted index (4LFTX), the difference in temperature at the 700- and 500-hPa levels divided by the corresponding difference in geopotential height (lapse rate), the average specific humidity between 1000 and 900 hPa (mixing ratio), 3000–0-m storm relative helicity (SRH), the magnitude of the vector difference of the 500- and 1000-hPa winds (vertical shear), precipitation, convective precipitation (cPrcp), and elevation. Lapse rate and vertical shear are computed using monthly averages of the constituent variables. We take the natural logarithm of CAPE, SRH, vertical shear, precipitation, and cPrcp, consistent with previous analysis of environmental factors impacting severe weather on synoptic time scales (e.g., Brooks et al. 2003).

## 3. Poisson regression and parameter selection

Tippett et al. (2012) related the climatological monthly number of U.S. tornadoes to climatological monthly averages of collocated NARR atmospheric parameters using Poisson regression. The monthly number of tornadoes summed over *T* years in a grid box is assumed to be a Poisson distributed random variable with expected value *μ*. The expected value *μ* is the *monthly tornado activity index* and is assumed to have a log-linear dependence on the environmental parameters modeled by

where **x** is a vector of environmental parameters, **b** is a vector of regression coefficients, *c* is an intercept term, *ϕ* is the latitude, Δ*x* and Δ*y* are the longitude and latitude spacings in degrees, respectively, and *T* is the number of years. The last term accounts for the differing area of each grid box and the number of years used in the climatology and removes the dependence of the coefficients on grid resolution and climatology length. The regression model (1) with the same coefficients is used at all locations and all times of the year. In addition to relating tornado activity with environmental parameters, the regression can correct spatially and seasonally uniform systematic errors in the NARR environmental parameters. The regression coefficients are estimated by maximum likelihood, and a commonly used goodness of fit measure, deviance, is also determined (McCullagh and Nelder 1989).

A key issue is the choice of environmental parameters included in the index. Including too few environmental parameters gives a model that poorly fits the data, while including too many leads to overfitting and poor performance on independent data. Tippett et al. (2012) took the previously listed set of 10 monthly averaged parameters associated with tornado occurrence and used a forward selection procedure to find the best set of parameters for a given number of parameters. This approach reduces the parameter selection problem to one of selecting the number of parameters. Increasing the number of parameter always improves the (in-sample) fit of the index to the data. However, evaluation of the fit on out-of-sample data using cross-validation showed that including more than two parameters did not produce a significant increase in the overall fit. This finding does not rule out that additional parameters might result in significant improvements in fit for particular regions or months, nor does it say anything about the utility of additional parameters outside the climatological setting.

In the simplest sense, potential updraft strength and vertical wind shear are the two basic environmental factors considered favorable for tornado activity. However, there are many related parameters that measure these conditions. The deviance-based *R*-squared (Cameron and Windmeijer 1996) values of the six best (in the sense of minimizing mean cross-validated deviance) two-parameter models are shown in Fig. 3 and range from 0.53 to 0.67. The best one-parameter model uses cPrcp and has a deviance-based *R*-squared value of 0.46, giving an indication of the benefit of including an additional predictor. The uncertainty of the estimates, shown as plus and minus one standard deviation error bars, is computed from 10 repetitions of tenfold cross-validation.^{1} These six statistical models include one parameter associated with convective instability (CAPE or cPrcp) and one associated with vertical shear (SRH, mixing ratio, or vertical shear). The model with smallest deviance uses cPrcp and SRH, and replacing SRH with vertical shear does not give a significantly different fit. This result indicates that, as in the subdaily setting, there are multiple possibilities for combinations of environmental ingredients with useful information related to tornado frequency on monthly time scales.

The energy–helicity index (EHI), the product of SRH and CAPE, is often used on synoptic time scales as a forecast parameter (Davies-Jones 1993; Rasmussen and Blanchard 1998). A related quantity is the significant severe parameter, which is the product of CAPE and vertical shear (Davies and Johns 1993). Both of these quantities are included in the Poisson regression model framework and correspond to choosing the entries of **b** to be unity for the appropriate choice of parameters. Interestingly, the fit of the Poisson regression model with CAPE and SRH as parameters is significantly worse than that of the best two-parameter model, and the model with CAPE and shear is close to being significantly worse as well. Given the widespread use of the EHI and other CAPE-based measures on synoptic time scales, the poor performance of its constituent parameters in a monthly index deserves further investigation.

First, using either CAPE or inhibition on monthly time scales may be inappropriate because of their high-frequency fluctuations and tight coupling to convection. High CAPE often is present before major convective weather events, but CAPE is typically sharply reduced by the occurrence of deep convection. It is not clear that the time average of CAPE and the time average of deep convective activity need be related over land, though there is some relation over the tropical ocean (Bhat et al. 1996). Monthly averaged CAPE may simply fail to capture the relation with tornado activity that is observed in high-frequency data. Another possibility is that the relation between monthly averaged CAPE and tornado activity is not well fit by the functional form of the Poisson regression model. The coefficients of the Poisson regression model can be interpreted as the sensitivity of the expected monthly number of tornadoes to changes in the environmental parameters. Specifically, for a small change *δ***x** in the environmental variables, the change *δμ* in the expected number of tornadoes is given by

That is, for a 0.01 unit change in one of the environmental parameters, the value of its coefficient is the corresponding percent change in the expected value *μ*. Equivalently, the Poisson regression coefficients are the partial logarithmic derivatives of the expected monthly number of tornadoes with respect to the environmental variables since

As the coefficients are constant, the Poisson regression model assumes that the sensitivity of the number of tornadoes to the environmental parameters is constant, and, in particular, does not depend on the values of the environmental parameters.

The extent to which the tornado data and environmental parameters satisfy the Poisson regression functional form was investigated using the approach of Tippett et al. (2011). For each parameter, we compute its Poisson regression coefficient for different ranges of that parameter while allowing the other parameters to vary. Essentially we are computing the partial logarithmic derivative for different values of the parameters and checking if that derivative is constant. Note that the above procedure is different from computing the average number of tornadoes as a function of one of the variables and checking for a linear relation, which would be equivalent to taking the ordinary derivative and would give rather different results in the case of correlated quantities. Specifically, here we compute the Poisson regression coefficient of each parameter over four ranges defined by the 10th–30th, 30th–50th, 50th–70th, and 70th–90th percentiles of the parameter. Error bars for the coefficient estimate are defined as twice the standard deviation of 100 bootstrap estimates of the coefficients. Figure 4a shows clearly that the coefficient of log(CAPE) is not constant. There is enhanced sensitivity of climatological tornado occurrence to CAPE in the 10th–30th percentile range that decreases until log(CAPE) ≈ 4, at which point the coefficient is roughly the same as that obtained when the complete data are used. In the subdaily data setting, Brooks (2009) noted variations in the gradient of the probability of occurrence as a function of the CAPE/shear product.

We hypothesize that this mismatch between the observed sensitivity to monthly CAPE and that imposed by the Poisson regression functional form is the reason for the relatively poor performance of the CAPE-based models. More sophisticated models may be better able to accommodate the variable sensitivity of climatological monthly tornado activity to monthly averaged CAPE (Mestre and Hallegatte 2009; Villarini et al. 2010), or the behavior might be ameliorated with the inclusion of additional parameters. The choice of which strategy to pursue, non-log-linear dependence or additional parameters, would essentially depend on whether the behavior in Fig. 4a reflects a physical property or is an artifact of the analysis. On the other hand, Fig. 5 shows that the coefficient of cPrcp is approximately constant over the range of values and consistent with the value estimated from the complete data.

The sensitivity of the expected number of monthly tornadoes to SRH is similar whether SRH is used in conjunction with either cPrcp or CAPE. In both cases, the SRH coefficient confidence intervals over the 30th–50th percentile range fail to include the value estimated from the complete data, and there is some indication of greater sensitivity to SRH, especially in combination with CAPE (Fig. 4b). We return to this finding in later sections.

## 4. Climatological features

### a. Dependence on cPrcp and SRH

We first compare the index dependence on cPrcp and SRH with that of the observations. The index *μ*(cPrcp, SRH) expresses the expected number of tornadoes for given values of cPrcp and SRH. The corresponding observed quantity is the average number of tornadoes at all locations and months of the year with the given values of cPrcp and SRH. The observed climatological numbers of tornadoes are binned according to the corresponding values of cPrcp and SRH. Bin boundaries of cPrcp and SRH are chosen to correspond to percentiles and range from the 5th to the 95th percentile with a width of 5%. Figure 6 shows the average number of observed tornadoes and the index as functions of cPrcp and SRH. The log-linear form of the index means that its isolines as a function of log(cPrcp) and log(SRH) are straight lines. The index isolines are overlaid on the observed distribution to aid in comparison, and, for the most part, the observations and the index appear to have similar functional dependence on the parameters, especially for the parameter ranges associated with the largest number of tornadoes. The isolines of the observed distribution are not precisely straight and indicate greater sensitivity to larger values of SRH, consistent with the results of the previous section (Figs. 4b and 5b). The difference of the observations and index shows little indication of systematic bias over the parameter ranges associated with the majority of tornadoes. The largest discrepancies between the observations and the index are seen for simultaneously low values of cPrcp and SRH (the gray box marked B1 in Fig. 6c), corresponding to parameter values for which there are more observed tornadoes than predicted by the index. Conversely, for intermediate values of SRH and low values of cPrcp (the gray box marked B2 in Fig. 6c) there are no observed tornadoes while the index predicts small numbers.

The index biases associated with the parameter ranges in B1 and B2 correspond to fairly well-defined geographical regions and calendar months. Figure 7 shows the spatial distributions and annual cycles of the data with parameters in boxes B1 and B2. The negative bias in box B1 is seen to be due to the failure of the index to produce observed April–November tornadoes occurring west of the Rockies, concentrated in Southern California and corresponding to about 2.4 tornadoes per year. These tornadoes are likely associated with different environmental conditions than the index is able to detect and mainly comprise low CAPE and high shear environments (Hanstrum et al. 2002; Monteverdi et al. 2003; Kounkou et al. 2009). The positive bias in box B2 is due to the index indicating tornado activity mainly west of 100°W during October through April and corresponds to about 7.4 tornadoes per year. Both observations and index (by construction) have 999 tornadoes per year; Poisson regression, like linear regression, matches the mean of the data to which it is fit.

### b. Contribution of cPrcp and SRH to annual cycle and spatial pattern

The annual cycle of the total number of reported tornadoes and the annual cycle of the index are shown in Fig. 8a. The index captures the general phasing with maximum values in May and minimum values in winter. Overall the index shows less variability through the seasonal cycle than do the observations. The index shows a positive bias in August and September, a feature that we will examine in more detail later. The simplicity of the tornado index makes it possible to diagnose the contribution of the two environmental factors to the annual cycle. We compute the index with the annual cycle of SRH suppressed and with the annual cycle of cPrcp suppressed. In the first case only cPrcp contributes to the annual cycle and in the second case only SRH. The annual cycles of these single-factor indices are shown in Fig. 8b. The contribution of SRH to the annual cycle has maximum values in winter and minimum values in late summer. Nearly out of phase, the contribution of cPrcp to the annual cycle has maximum values in June and July when the contribution from SRH is nearly minimum.

The index can be written as the normalized product of the two single-factor indices

where denotes annual average. At each location, the annual cycle is exactly the product of the two single-factor indices normalized by , which has no annual cycle. The normalized product of the spatially summed single-factor indices,

may differ from the index annual cycle 〈*μ*(*x*_{1}, *x*_{2})〉; the notation 〈·〉 denotes the spatial sum. However, Fig. 8b shows that this product does have its maximum in May like the complete index and the observations; the two factors are nearly but not quite out of phase. The minimum of the SRH factor is in August, while the maximum of the cPrcp factor is in June. This difference in phasing is the reason that the product of the annual cycles of the two factors has its maximum in late spring when the contribution from cPrcp is already large and that from SRH is still fairly large. This result indicates that the May maximum of the index can be explained by the phasing of the annual cycles of the cPrcp and SRH contributions.

A similar approach can be used to determine how the two factors contribute to the annually summed spatial distribution of tornado occurrence. The annual distribution of reported (3 × 3 box-averaged smoothing) tornadoes and index values shown in Figs. 9a and 9b, respectively, have similar overall patterns. The index is missing the observed maximum in the northeast corner of Colorado where nonsupercell tornadoes are common and local effects contribute to the low-layer shear in this area (Wakimoto and Wilson 1989). The index values do not extend far enough into the northern high plains and extend too far south into Texas. To quantify the impact of the two environmental parameters on the spatial distribution, we compute the index with the spatial variability of SRH suppressed and the index with the spatial variability of cPrcp suppressed. In the first case only cPrcp contributes to the spatial variability and in the second case only SRH. cPrcp enhances tornado index activity in the South and Southeast, and limits it elsewhere (Fig. 9c). The SRH factor enhances the index in the “tornado alley” region and suppresses activity in the Southeast (Fig. 9d).

### c. Regional features of the annual cycle

We compute the annual cycle of the index and the tornado reports in the nine National Oceanic and Atmospheric Administration (NOAA) climate regions (Karl and Koss 1984; Fig. 10); boundary grid boxes are weighted according to the fraction of area within the region. The Pearson (rank) correlation between the observation and index regional annual cycles exceeds 0.85 (0.83) in all regions except for the Northwest and West, where the correlation is 0.38 and 0.68 (0.24 and 0.58), respectively. Positive biases are seen for the months of August through October in the South, Central, upper Midwest, and Plains regions, a feature we will examine in some detail. The index shows a substantial negative bias in the Southeast during September that may be related to tornadoes associated with tropical cyclones, which are observed to have a different relation with environmental parameters on synoptic time scales (Schultz and Cecil 2009; Edwards et al. 2012). The index has substantially fewer tornadoes than reported in the Southwest during the period May–July and indicates too many tornadoes in the Northwest especially during the months November–June. The index has roughly the correct phasing in the West but with positive biases in winter and early spring.

An overall measure of the similarity between the observed and index climatological spatial patterns is given by their monthly pattern correlation shown in Fig. 11. The lowest pattern correlation values occur in late summer and early fall, with the minimum occurring in September irrespective of whether the pattern correlation is centered (map average is removed) or uncentered (map average is not removed). The reason for the low pattern correlation values is seen in the spatial distributions of the July–September monthly index and tornado report climatologies (Fig. 12). Both the index and report climatologies show the northward shift of values in July. In August and September, the index weakens somewhat and shifts slightly southward. The behavior of the report climatology is rather different showing substantially less tornado activity than does the index over the central United States. This discrepancy is especially striking in September when the index has maximum values in the upper Midwest while the maximum report values are in the eastern and southern seaboard states. This behavior of the index is precisely accounts for the positive bias in the annual cycle noted earlier.

The erroneous spatial structure of the index in August and September concentrated in the northern central United States reflects that of the SRH, suggesting that the index response to SRH may be responsible. To understand better the positive bias of the index in this region during the late summer and early fall, we fit the index using data restricted to the box 33°–42°N and 100°–90°E. Figure 13a shows the annual cycle of tornado reports and the annual cycles of two indices: the index using coefficients estimated from all the data (“US coef.”) and an index using coefficients estimated from the box data (“box coef.”). The report annual cycle shows a much sharper decline in tornado activity in August than does the U.S. index. On the other hand, the behavior of the box index is more similar to that of the report data. The box index coefficients of cPrcp and SRH are 1.41 and 4.36, respectively, indicating that while the regional sensitivity to cPrcp is similar to its all-U.S. value, the regional sensitivity to SRH is more than double its all-U.S. value. Figure 13b shows the seasonal cycle of box-averaged cPrcp and SRH. Solid lines show the isolines of the all-U.S. index and dashed lines those of the box index. The isolines show that the index value in July using the all-U.S. coefficients is between that of April and May. Increasing the sensitivity of the index has the effect of increasing the slope of the isolines. The isolines of the index with box coefficients show that the value of the index in July is close to that of March, which is a more realistic result. Roughly speaking, the increased sensitivity to SRH results in a more vigorous annual cycle with enhanced maximum spring values and a more abrupt decline in late summer. This differing sensitivity to SRH may be due to time averaging, neglected factors, or deficiencies in NARR products. We do not believe that the sensitivity of tornado occurrence to subdaily values of SRH varies by location, all other factors being the same; in other words, we do not believe that the physics of the atmosphere varies by location (Brooks 2009).

## 5. Interannual variability

The CONUS-summed index values computed with interannually varying NARR data were shown to correlate well with total numbers of CONUS reported tornadoes on a monthly as well as on annual basis (Tippett et al. 2012). We assess the relative importance of the two environmental parameters for characterizing interannual variability by computing the index using climatological values of one of the parameters and interannually varying values of the other parameter, and then computing the correlation between the resulting single-factor index and reported tornado numbers. Table 1 shows the Pearson and rank correlations between CONUS sums of index values and reported numbers of tornadoes by calendar month. During most months of the year, the index computed with climatological SRH and interannually varying cPrcp has nearly the same correlation with CONUS totals as does the full index. On the other hand, when only annually varying SRH is included in the index, the resulting correlation is insignificant in the majority of months. Similar to the climatological setting where the best one-parameter model was the one based on cPrcp, the interannual variability of cPrcp alone explains much of the interannual variability. Only in May and June does the inclusion of interannually varying SRH lead to a marked increase in the correlation. This finding suggests that during the peak activity period both factors contribute to interannual variability, a result with important implications for prediction. First, Tippett et al. (2012) showed that, on average, monthly predictions of cPrcp had lower skill than those of SRH in initialized coupled GCM forecasts. Second, accurate prediction of peak season variability requires accurate forecasts of both cPrcp and SRH. Table 1 also shows the corresponding correlation when the index is constructed using CAPE rather than cPrcp; the values are somewhat lower, especially in April.

To assess the ability of the index to represent regional tornado activity, we computed the monthly and annual number of tornadoes for each of the nine NOAA climate regions and compared the resulting time series with the corresponding index values. Regional Pearson and rank correlations on a monthly and annual basis are given in Tables 2 and 3, respectively. Regions and months averaging less than 1 tornado per year are omitted. The South, Southeast, and Central regions average more than one tornado per month throughout the year, and significant skill is seen in most months with August–October tending to have poor skill depending on region and skill measure. Deficiencies in explaining the annual cycle are apparently reflected in the representation of interannual variability. Regional correlations are generally lower than CONUS ones, reflecting increased noise due to reduced averaging. Correlation of annual values is generally less than for monthly values since the correlation of the annual total is negatively impacted by temporally varying biases in mean and amplitude. Even in the Central and upper Midwest regions, where there is a mean bias, the correlation is still fairly good. The correlation values for the index computed with observed parameters is presumably an upper bound for the skill of forecasts based on this index, since forecast skill is limited by the imperfect relation between index and tornado reports, as well as the ability to predict the parameters.

## 6. Summary and conclusions

We have examined the properties of a recently developed empirical index (Tippett et al. 2012) designed to represent the expected monthly number of U.S. tornadoes as a function of monthly averaged convective precipitation (cPrcp) and storm relative helicity (SRH) taken from the North American Regional Reanalysis. Here we have examined its construction and characteristics in more detail, including aspects of the environmental parameter selection, systematic deficiencies, and regional behavior. While the convective available potential energy (CAPE) appears as a factor in many tornado indices, we find here that CAPE does not fit the log-linear functional form of the Poisson regression, and cPrcp takes its place as an indicator of potential updraft strength. Model cPrcp has been previously used to account for thunderstorm initiation in conjunction with CAPE and vertical shear (Trapp et al. 2009) but introduces the complication that the detailed features of cPrcp are expected to be sensitive to model convective parameterization schemes.

Pooling all locations and months of the calendar year, we find that the index favorably represents the climatological dependence of monthly tornado numbers on cPrcp and SRH. The index does fail to account for significant number of presumably nonsupercell tornadoes in Colorado and Florida. The index also does not represent modest numbers of cool-season tornadoes reported in Southern California that occur when the values of cPrcp, and implicitly instability, are too low for tornado occurrence to be likely according to the index. The index also indicates that SRH values are adequate for small numbers of tornadoes to occur west of the Rockies when few or none are reported.

The contributions of the two environmental parameters to the index are mostly independent, both with respect to annual cycle and spatial distribution. The annual cycle of the index and of the reported tornado numbers show similar phasing, although the index fails to capture the peak magnitude in May. The May peak of the index can be inferred from the relative phases of the annual cycles of SRH and cPrcp considered separately. In May, cPrcp is increasing and already fairly large, and SRH, although declining from its winter peak, is still large. In terms of the climatological spatial distribution, cPrcp serves to favor the southern part of the United States and suppresses the index west of the Rockies and over elevation. SRH strongly enhances the central United States and counteracts the role of cPrcp in the Southeast. These findings only apply to the monthly climatology and may be less relevant for day to day variability.

The largest deficiency in the annual cycle of the index occurs in late summer over the central United States, where it indicates a greater number of tornadoes than are reported. We found that this behavior can be explained in terms of the sensitivity of the index to SRH. When the index was fit using only data from this region, the sensitivity to SRH more than doubled. Increasing the sensitivity of the index to SRH resulted in the index having a more vigorous annual cycle with a larger spring peak value and a more rapid decline in late summer.

The index demonstrates some ability to represent the interannual variability of the number of U.S. tornadoes per month. During most months, cPrcp explains more of this variability than does SRH. However, both factors are important during the peak spring period. The regional variability of the index at the scale of the NOAA climate regions captures aspects of both annual cycle and interannual variability.

## Acknowledgments

MKT and JTA are supported by grants from the National Oceanic and Atmospheric Administration (NA05OAR4311004 and NA08OAR4320912), the Office of Naval Research (N00014-12-1-0911), and a Columbia University Research Initiatives for Science and Engineering (RISE) award. AHS and SJC acknowledge support from NOAA Grant NA08OAR4320912. The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

## REFERENCES

*The Tornado: Its Structure, Dynamics, Prediction, and Hazards, Geophys. Monogr.,*Vol. 79, Amer. Geophys. Union, 573–582.

*17th Conf. on Severe Local Storms,*St. Louis, MO, Amer. Meteor. Soc., 107–111.

**110,**16 361–16 366, doi:10.1073/pnas.1307758110.

*26th Conf. on Hurricanes and Tropical Meteorology,*Miami, FL, Amer. Meteor. Soc.,

*Meteorology over the Tropical Oceans,*D. B. Shaw, Ed., Royal Meteorological Society, 155–218.

*Generalized Linear Models.*2nd ed. Chapman and Hall, 532 pp.

*11th Conf. on Applied Climatology,*Dallas, TX, Amer. Meteor. Soc., 215–220.

**141,**2087–2095, doi:10.1175/MWR-D-12-00173.1.

*Geophys. Res. Lett.,*

**34,**L08702,

## Footnotes

^{1}

Tenfold cross-validation consists of splitting the data into 10 randomly selected sets, estimating the coefficients from 9 of those sets, and validating on the tenth.