Destruction and fatalities from recent tornado outbreaks in North America have raised considerable concerns regarding their climatic and geographic variability. However, regional characterization of tornado activity in relation to large-scale climatic processes remains highly uncertain. Here, a novel Bayesian hierarchical framework is developed for elucidating the spatiotemporal variability of the factors underlying tornado occurrence in North America. It is demonstrated that regional variability of tornado activity can be characterized using a hierarchical parameterization of convective available potential energy, storm relative helicity, and vertical wind shear quantities. It is shown that the spatial variability of tornado occurrence during the warm summer season can be explained by convective available potential energy and storm relative helicity alone, while vertical wind shear is clearly better at capturing the spatial variability of the cool season tornado activity. The results suggest that the Bayesian hierarchical modeling approach is effective for understanding the regional tornadic environment and in forming the basis for establishing tornado prognostic tools in North America.
Tornadoes are one of nature’s most hazardous phenomena, capable of causing significant destruction and numerous injuries and fatalities. In April 2011, the United States experienced 758 tornadoes that resulted in 363 deaths (Doswell et al. 2012). In May 2011, one enhanced Fujita scale category 5 (EF5) (Edwards et al. 2013) tornado caused the single deadliest tornadic event in history, with 158 deaths in Joplin, Missouri (Doswell et al. 2012). Over that year, tornadoes in the United States caused a total of over $28 billion in property damages (Doswell et al. 2012). Canada has also experienced significant losses caused by tornado activity, though to a lesser degree than the United States. For example, on 31 May 1985, a U.S.–Canadian tornado outbreak spawned over 40 events, with 14 in Ontario, including 2 with Fujita scale category 4 (F4) intensity (Fujita 1971). The latter outbreak resulted in 12 deaths in Ontario and destroyed hundreds of buildings with an estimated cost of over $200 million (Witten 1985). On June 2007, Canada experienced its first F5 tornado in southern Manitoba (McCarthy et al. 2008). Significant tornadic events have continued to occur in Canada in recent years (e.g., 17 tornadoes on 2 August 2006; 19 tornadoes on 20 August 2009; and an F3 tornado in Goderich, Ontario, on 21 August 2011) (Cheng et al. 2013). These events have brought the study of tornado activity to the forefront of climatological research, raising questions regarding the relative influence of large-scale climatic signals, the importance of changes in global radiative forcing, and the need to delineate the role of the underlying atmospheric processes at the appropriate spatiotemporal scale (Trapp et al. 2007; Diffenbaugh et al. 2013; Tippett et al. 2014).
Addressing these questions can be quite challenging because our understanding of the complex set of processes leading to tornadogenesis is arguably incomplete. Numerical weather prediction and global climate models resolve large-scale atmospheric conditions, which are then incorporated in severe weather forecasting tools to predict the occurrence of a regional tornadic event from hours to days (outbreak) in advance. Interestingly, most of the predictive attempts in the literature aimed at associating parameters that characterize the nearly immediate (hourly) atmospheric conditions with the occurrence of tornadic events (Brooks et al. 1994; Rasmussen and Blanchard 1998; Thompson et al. 2003). One of the important insights gained by subdaily observations of tornadic supercells is that supercell tornadogenesis is favored in environments characterized by distinctly higher levels of relative humidity and wind shear (Rasmussen and Blanchard 1998; Brooks et al. 2003; Craven et al. 2004). Other studies further suggested tornadogenesis as a “Goldilocks” problem requiring an optimal balance of storm-generated baroclinic vorticity, typically enhanced by cold outflows that strengthen the mesocyclones but weakened by low-level moisture that offsets the large negative buoyancy (Markowski et al. 2003; Markowski and Richardson 2009, 2014). Nonetheless, despite the improvements in tornado forecasting obtained through subdaily study resolutions, regional-scale tornado activity might not be predicted exclusively by the most immediate factors of tornadogenesis. Instead, larger-scale climate patterns may affect the sequence of atmospheric processes that ultimately determine the spatiotemporal development of tornadic storms (Cook and Schaefer 2008; Barrett and Gensini 2013).
Founded upon the idea that large-scale climate variability cascades down and shapes the prevailing conditions of tornadic environments, atmospheric parameters in longer time scales can conceivably be used as proxies of tornado activity. Counter to the modeling practices in other fields (e.g., limnology), studies have only recently adopted a coarser resolution by examining the capacity of atmospheric parameters expressed on monthly or seasonal time scales to explain the spatial distribution and seasonal cycle of tornado activity (Tippett et al. 2012, 2014; Cheng et al. 2015). In subdaily time scales, attempts to delineate the conditions preceding tornadic events have recognized that tornadic environments (the atmospheric parameters) may vary both regionally and seasonally (Hagemeyer and Schmocker 1991; Hanstrum et al. 2002; Monteverdi et al. 2003; Brooks 2009; Grams et al. 2012; Thompson et al. 2012). Importantly, the lifting mechanisms (Doswell et al. 1996), one of the key elements to producing the convective initiation processes, are not explicitly considered and unlikely to be extracted by current reanalysis datasets (Brooks 2009). Further, Brooks (2009, 546–547) argued the following:
Although it seems reasonable to believe that the physics of the atmosphere does not change from location to location, the fact that some important ingredients are not measured by soundings may limit the global applicability. Also, there is no reason to believe that all of the conditions that could be supportive of a particular kind of thunderstorm are necessarily sampled in the United States. It is conceptually plausible that the atmosphere may be capable of producing tornadoes, for instance, from a variety of environmental conditions but that some subset of those conditions do not occur often enough in the United States to be sampled with great frequency.
Moreover, tornadoes in different regions may tend to originate and evolve from very different spatiotemporal atmospheric trajectories [e.g., supercellular storms, quasi-linear convective systems (QLCS), and “landspout” tornadogenesis (Lemon and Doswell 1979; Klemp and Rotunno 1983; Wakimoto and Wilson 1989; Lee and Wilhelmson 1997; Trapp and Weisman 2003; Trapp et al. 2005; Atkins and St. Laurent 2009)] and so would have different relative tornadic environments (Grams et al. 2012; Smith et al. 2012; Thompson et al. 2012). Even the same type of tornadic storms under the same convective type shows substantial spatial variability (Thompson et al. 2014).
Although the relationship between atmospheric parameters and tornado occurrence in different time scales has partly been documented (Tippett et al. 2012, 2014; Cheng et al. 2015), the geographic variability of their relationship with tornado occurrences has yet to be factored in predictive frameworks. In this regard, it may be advantageous to consider a flexible hierarchical strategy that relaxes the assumption of a globally common parameterization in order to accommodate the region-specific tornado formation mechanisms during different periods of the year. Furthermore, because our knowledge of the tornado climatology is hampered by our monitoring capacity, the observational error should be explicitly considered so that the interplay between atmospheric processes and tornado activity is not obfuscated by nonmeteorological factors. The latter source of uncertainty is highly relevant to the monitoring practices of large regions for which complete tornado observations could never be practically gathered. In this study, we use a Bayesian hierarchical framework to depict the causal linkages between the annual or seasonal tornado occurrence and relevant atmospheric predictors that characterize most tornadic environments over the contiguous United States and Canada. Specifically, convective available potential energy (CAPE), a quantity of vertically integrated buoyant energy available to the storm, proportional to storm updraft strength, and 0–3-km storm-relative environmental helicity (SREH), a quantity of the streamwise vorticity within the inflow environment of a storm, therefore the potential strength of the updraft rotation (Davies-Jones et al. 1990); magnitude of the wind vector difference between the surface and 6 km (SHR0-6), a quantity of the deep-layer vertical wind shear that promotes storm-scale rotation about a vertical axis and enhances storm organization, intensity, and longevity (Klemp 1987); and vertical wind shear from surface to tropopause (SHR0-T), a quantity that considers change in wind speed with height from the surface to the tropopause, are considered. Under a hierarchical configuration, the problem of tornado prediction is dissected into levels (hierarchies) that explicitly account for the role of regional variability (Gelman and Hill 2006; Cheng et al. 2013, 2015). The two overarching features of the proposed framework, the explicit consideration of observation error and the relaxation of the constant characterization of atmospheric environment on tornado activity over the spatial domain modeled, could improve our ability to tease out the significant drivers in time and space and ultimately our predictive capacity.
a. Tornado data
Canadian tornado data covering the period 1980–2009 were obtained from the updated Environment and Climate Change Canada national tornado database (Sills et al. 2012; Cheng et al. 2013), whereas the U.S. tornado data for the same period were obtained from the NOAA Storm Prediction Center (Schaefer and Edwards 1999). The annual number of tornadoes in Canada does not demonstrate any systematic long-term patterns, but an increase in tornado observations is noted in the United States. We computed a gridded climatology (30-yr total) of annual and seasonal tornado observations by counting the number of reported tornadoes for the entire period, for the warm (March–August) and cool periods (September–February), and for each calendar season (March–May, June–August, September–November, and December–February) using their starting location in a geodesic 50 km × 50 km grid. Working with annual or seasonal average tornado climatology minimizes the likelihood of a spurious association between tornado records characterized by nonmeteorological temporal trends and physical drivers (Tippett et al. 2012, 2014; Cheng et al. 2013, 2015). This process was done separately for all tornado intensities (F0–F5) (Cheng et al. 2013, 2015).
b. Atmospheric and population data
The climatological and atmospheric variables used as predictors in the model were obtained from the North American Regional Reanalysis (NARR; Mesinger et al. 2006) and their annual and seasonal averaged values from 1980 to 2009 were calculated for our modeling framework (Fig. 1). NARR data were provided on a 32-km Lambert conformal grid and were subsequently interpolated to match our model grid configuration. CAPE, a widely used measure to quantify the subdaily atmospheric instability contributing to tornadic events, is chosen as the main predictor representing updraft strength (McCaul 1991; Hagemeyer and Schmocker 1991; Brooks et al. 1994, 2003; Rasmussen and Blanchard 1998; Hanstrum et al. 2002; Monteverdi et al. 2003; Thompson et al. 2003, 2012, 2014; Davies 2006; Trapp et al. 2007; Brooks 2009; Smith et al. 2012; Tippett et al. 2012, 2014; Edwards et al. 2012; Grams et al. 2012; Diffenbaugh et al. 2013; Cheng et al. 2015). Second, as vertical wind shear is known to be critical to storm organization, intensity, and possibly tornadogenesis, and different wind shear variables may be more intricately linked with different types of tornadoes, we explore in detail three shear-related parameters to elucidate their relative importance: SREH, SHR0-6, and SHR0-T.
As previously mentioned, the temporal trends of tornado records can be influenced by nonmeteorological factors. Specifically, nonmeteorological factors such as inconsistent reporting, verification and classification practices, public awareness, topographic factors influencing visibility, and the willingness to report tornadic events can profoundly influence the credibility of existing reports (Changnon 1982; Schaefer and Galway 1982; Tecson et al. 1983; Grazulis and Abbey 1983; King 1997; Ray et al. 2003). Of all the nonmeteorological factors investigated (e.g., obstruction of sight due to density of trees and hills, absence of roads, and buildings), population density is documented as the key nonmeteorological factor in determining the observation bias of tornadoes and nontornadic severe thunderstorms in many North American studies (Anderson et al. 2007; Ray et al. 2003; King 1997; Etkin and Leduc 1994; Paruk and Blackwell 1994; Snider 1977; Tecson et al. 1983; Schaefer and Galway 1982). Since population-sampling bias creeps into tornado reports and may cause spurious geographical variability (Doswell and Burgess 1988), we adopted the correction method in Cheng et al. (2013, 2015), which used population density data to quantify the observation error.
c. Hierarchical Bayes approach
The Bayesian hierarchical approach is used to relax the assumption of globally common model parameters and therefore obtain parameter values that can effectively accommodate regional variability. Under the hierarchical configuration, the problem of parameter estimation across the study area is viewed as a hierarchy. At the lower level, spatial heterogeneity is accommodated by introducing region-specific parameter distributions—that is, depending on the significance of potential sources of variability (e.g., predominant convective storm mode, climatic regime, geographical location, and orographic patterns). Model parameters can be drawn from one of those regional populations. In the upper level, the moments of the regional population parameter distributions are specified probabilistically in terms of global population parameters or hyperparameters (Gelman and Hill 2006). The observed data are used to estimate the region-specific model parameters and the hyperparameters. Thus, the hierarchical model dissects the problem into levels and allows regional parameter differences to accommodate the spatial variability over the North American model domain.
The present study proposes that the patterns of the local model error with spatially constant parameter specification can be used to delineate local populations (homogeneous regions) that may provide the foundation of hierarchical frameworks. Specifically, Cheng et al. (2015) introduced a model configuration in which gridcell-specific conditional autoregression (CAR) terms, drawn from a multivariate normal specification, were used to capture the residual variability unaccounted for by the globally common characterization of the explanatory processes considered. Thus, the gridcell-specific CAR values represent the model residuals stemming from a variety of factors, such as spatiotemporal variability of the physical mechanisms leading to tornado formation (e.g., supercell, quasi-linear convective systems, or landspout-type tornadoes); mismatch between the scale at which the actual processes occur and the resolution studied; local climatic and/or geographic processes that are not captured by the explanatory or predictor variables in the model, influencing the convection initiation process; and regional and/or seasonal climate regimes that may violate the assumption of a spatial homogeneous parameterization. For example, orographically influenced thunderstorms and/or convective processes influenced by sea- or lake-breeze convergence can shape the realization of CAPE and the convective mode of tornadic thunderstorms (Sills et al. 2004), thereby introducing spatial heterogeneity to the nature of the causal relationships between tornadogenesis and various predictors.
The introduction of CAR though is specifically formulated based on the grid configuration and temporal resolution, and thus any changes in the monitoring network or the time scale of the study will make the CAR estimates not useful for forecasting purposes. Because of this constraint, our approach limits its usage to the delineation of the spatial homogeneous regions considered under the hierarchical structure. In this study, seven CAR spatial distributions were extracted, based on the following temporal resolutions: annual, warm period (March–August), cool period (September–February), spring (March–May), summer (June–August), autumn (September–November), and winter (December–February). We subsequently selected critical percentiles of the populations of the gridcell-specific CAR terms to identify three homogeneous regions for the annual, warm, spring, and summer periods and two regions for the cool, fall, and winter periods.
In our empirical tornado occurrence model (Cheng et al. 2013, 2015), we compartmentalize the problem of predicting tornado observations into a series of conditional models that simultaneously consider (i) the nonmeteorological factors affecting the fidelity of tornado counts in our dataset (observation error model), (ii) the meteorological component causally linked with the tornado formation and evolution (explanatory model), and (iii) the uncertainty in parameter values (parameter model).
1) Observation error model
Our model postulates that the population density is the primary factor that determines the accuracy of tornado observations in our dataset. In particular, we assume that there is a threshold population density above which all tornadoes are expected to be observed, whereas below this threshold level the probability of tornado detection is proportional to the population density (King 1997; Anderson et al. 2007). We specify a binomial model in which the observed tornado counts Tobsij, for each grid cell i and region j, are conditioned upon the actual (but unobserved) tornado occurrences Tlatentij and the probability of detection pij(β):
The probability of detection pij(β) represents the likelihood of observing a tornado and is associated with the population density popdij by the following exponential expression:
where β is the population effect parameter, and exp(popdij) is the exponential transformation of the original population density data. The actual occurrence of tornadoes Tlatentij in the model domain is specified as a Poisson process, conditional on the average or expected tornado occurrence rate per grid cell λij, provided by the explanatory model
2) Explanatory model
The explanatory model expresses the logarithm of the expected tornado rate λij in a given grid cell i and region j as a linear function of a number of explanatory variables x:
where xijk is the value of the explanatory variable k (i.e., CAPE, SREH, SHR0-6, or SHR0-T) in grid cell i and region j, and αjk are regression coefficients corresponding to the kth predictor variable. For each period, we experimented with different combinations of the wind-related explanatory variables (SREH, SHR0-6, and SHR0-T) with CAPE.
3) Parameter model
In a Bayesian context, model parameters are treated as random variables rather than fixed quantities. As such, prior distributions can be formulated to depict our knowledge of the relative plausibility of their values before the consideration of the observed data (Gelman et al. 2004). In this study, though, we opted for “noninformative” or “flat” priors reflecting no prior knowledge of the model parameters. In particular, the prior distributions for the population effect parameter β, the region-specific parameters αj0,…,k, τj2, and the hyperparameters μk and σk2 were specified as follows:
where N and IG denote the normal and inverse gamma distributions, respectively. Sequences of realizations from the posterior distribution of the seven models were obtained using Markov chain Monte Carlo (MCMC) simulations (Lunn et al. 2000). Convergence of the MCMC chains was checked using the Brooks–Gelman–Rubin (BGR) scale-reduction factor (Brooks and Gelman 1998). The BGR factor is the ratio of between-chain variability to within-chain variability. The chains have converged when the upper limits of the BGR factor are close to one. This process is undertaken independently for each month or season to assess the intra-annual variability of the models.
We also conducted two additional exercises related to the predictive and structural confirmation of the present modeling framework. The first skill assessment test was based on splitting the 30-yr dataset into two subsets; namely, the calibration (1980–94) and predictive validation (1995–2009) datasets. The former one was used to obtain parameter estimates through Bayesian updating, and the derived model predictive posteriors were then tested independently against the latter dataset. The second skill assessment test aimed to examine the robustness of the inference drawn by the binomial–Poisson model, given that the analyzed tornado data have many zeros. The alternative statistical formulation was the zero-inflated Poisson model, based on a zero-inflated probability distribution that allows for frequent zero-valued observations (Lambert 1992). This model is a statistical description of a random event, containing excess zero-count data per unit of time or space or within a fixed interval of a relevant covariate. The model dissects the studied event (tornado occurrence) into two components that correspond to two zero-generating processes. The first process reflects the sampling and/or observation error and is governed by a binary distribution that generates structural zeros, while the second mechanism represents the tornado occurrence rate and is governed by a Poisson distribution that generates counts, some of which may be zero. The two model components can be described as follows:
In a similar manner, Tobsij denotes the observed tornado counts in grid cell i and region j; pij represents the likelihood to observe a tornado in grid cell i and region j of our model domain, which again is modeled as an exponential function of the corresponding population density data popdij; β is the population effect parameter; and λij is the average or expected tornado occurrence rate per grid cell i and region j, which is similarly expressed as a log-linear function of the standardized values of atmospheric predictors in grid cell i, using region-specific parameters derived by our hierarchical configuration. Similar to the binomial–Poisson model, we opted for noninformative (or flat) prior distributions, reflecting no prior knowledge of the model parameters.
a. Delineating hierarchical structures
CAR profiles were used to recreate the spatial residual variability in North America when using atmospheric predictor variables averaged over the following seven time scales: annual (12-month averages), warm and cool periods, and spring, summer, autumn, and winter seasons. Predictors included in the models are surrogates of the processes most relevant to tornado activity, as derived by both subdaily and longer time-averaged modeling experiments (Brooks et al. 2003; Edwards et al. 2012; Tippett et al. 2012; Cheng et al. 2015)—namely, CAPE, SREH, SHR0-6, and SHR0-T. The relative performances of seven combinations of the predictor variables (different permutations of SREH, SHR0-6, and SHR0-T paired with CAPE) were then evaluated for each time period. Residual (CAR) variability of the annual (CAPE–SREH), warm (CAPE–SREH), and cool periods (CAPE–SHR0-6–SHR0-T) models are shown in the left panels of Fig. 2. Spatial distributions of the CAR terms for the four calendar-based seasons are also shown in Fig. S1A of the supplemental information.
Hierarchical regions were then delineated based on the CAR term values in the different grid cells. The regionalization of the annual, warm, and cool seasons is shown in the right panels of Fig. 2, which are derived by the CAR spatial distributions in the corresponding left panels. For the models that include the peak tornado seasons (spring and summer)—that is, annual, warm, spring, and summer models—the entire domain is partitioned into three regions. The first or “low” region contains 50% of the areas or grid cells, where the CAR values were negative; the second or “intermediate” region contains the next 30% of the grid cells with CAR values within the 50th and 80th percentile; and the third or “high” region contains areas with the highest 20% of CAR term values. Interestingly, the three groups were characterized by distinctly different levels of tornado occurrence (top and middle panels of Fig. 3 and Figs. S2A and S3A in the supplemental information), suggesting that the inclusion of the CAR term counterbalances the tendency of the explanatory model with spatially constant parameterization to understate the actual frequency in tornado-prone areas and to overpredict the number of events in relatively low-risk regions. For the rest of the models (cool, fall, and winter seasons), the grid cells with the top 20% CAR values formed one “intermediate” region, while the rest of the domain was treated as a second functionally homogeneous “low” group. The delineation of only two regions with the latter models stems from the fact that tornado occurrences are not as frequent and widespread across North America during the cooler seasons of the year (bottom panel of Fig. 3 and Figs. S4A and S5A in the supplemental information).
In both annual and warm seasonal models, the low region (blue) predominantly reflected northern Canada and the eastern and western seaboard provinces. The intermediate (light gray) region comprised areas to the west of the U.S. Rockies, significant portions of the Prairies to southern Ontario and southern Quebec, and the Appalachian Mountain region. The high (gray) region grouped together small portions of the Prairies, the Great Plains, the Midwest, the Ohio River valley, and the northeastern U.S. seaboard (Figs. 4 and 5). Most of the Gulf Coast states are included in this region when using annual data but not with the warm season model. In the cool season, the more active intermediate (light gray) region generally lies south of Canada and covers most of the southeastern U.S. states, the southern Great Plains, and a small portion of the California coast (Fig. 6).
b. Bayesian hierarchical modeling
The performance of the hierarchical models with different permutations of the atmospheric variables was evaluated with the Pearson correlation coefficient r and root-mean-square error (RMSE) (Table 1 and Table S1A in the supplemental information). The fit of most models examined was fairly similar, with the main exception being the combination CAPE–SHR0-T, which demonstrated the worst performance during the annual and warm periods. The combination of CAPE and SREH resulted in the best fit, followed closely by the combination of CAPE, SREH, and SHR0-T. SREH appears to be a more reliable predictor of the annual and warm season tornado activity than SHR0-6 and/or SHR0-T. In the cool period, the combination of CAPE, SHR0-6, and SHR0-T provided the best fit, followed closely by the combination of CAPE–SREH–SHR0-6–SHR0-T. Our analysis provides evidence that SHR0-6 is more reliable in predicting cool season tornadoes than SREH and SHR0-T. Generally, the performance of the cool season models is not as high as those for the warm period or those developed to predict the annual tornado activity. It is also interesting to note that the top two performing models during the annual and warm periods are among the weakest during the cool period.
The models aiming to describe the spring tornado activity generally outperformed those developed for the rest of the calendar-based seasons. The best spring model was built with all four predictor variables (Table S1A), and the second (third) best model comprised SREH and SHR0-6 (SHR0-T). Similar to the models developed to describe the annual tornado activity (or tornado occurrences during the warm season), our analysis highlights the importance of SREH as a reliable covariate. Notably, the inclusion of SHR0-6 and/or SHR0-T (along with SREH) slightly improved the model performance with the spring models. Similar inferences could be drawn by the models examined to depict tornado spatial variability during the summer season (Table S1A), although the importance of SHR0-6 becomes more evident relative to what our models suggest for the annual and warm season. The best model of tornado occurrence during the autumn included all four predictors, while the second and third best models were identical to those developed for the spring season (Table S1A). Unlike the spring models though, SHR0-6 is identified as a more important predictor variable, given that the performances of the models that considered CAPE–SREH and CAPE–SHR0-6 are nearly identical. Overall, our predictive capacity appears to be better when using coarser (annual) resolution relative to shorter time scales, while the warm season models outperform those developed for the cool period.
The mean and standard deviation values of the parameter posteriors for the top three (from fourth best to worst) models are shown in Tables 2–4 and Tables S2A–S5A (Tables S6A–S12A) in the supplemental information and can be used to infer the relative importance of the corresponding predictors during the different periods examined. Because of the standardization implemented prior to the analysis, the posterior estimates of the intercepts reflect the expected tornado occurrence rates when the predictor variables tend to their mean values. According to our model predictions, the average total number of tornadoes over the course of the 30-yr period studied was 0.08 per grid cell in the low region, which was associated with the lower CAR values. Likewise, we can infer that the intermediate and high regions were characterized by an average total annual number of 2.75 and 15.03 tornadoes per grid cell, respectively. With a similar interpretation, the corresponding regional estimates for the warm period were 0.11, 2.14, and 11.13 tornadoes per grid cell. By contrast, the two regions delineated for the cool period were characterized by an average total number of tornadoes of 0.06 and 0.77 per grid cell, respectively. Lending credibility to the hierarchical strategy adopted, the signature of the parameters differs significantly among the regions delineated and generally tends to be stronger in the low region and lower in the intermediate/high region. CAPE exerts clear control over all three regions, and it is generally more influential in the low and intermediate geographical regions, correspondingly associated with the low and intermediate CAR values, than in the high region. Comparing the influence of SREH, SHR0-6, and SHR0-T among the different permutations of predictors examined, we infer that the effects of SREH are well identified and significant during the annual and warm periods for all three regions, including the spring and summer seasons, but not during the cool period of the year. The temporal effects of SHR0-6 are consistent and similar to SREH, but SHR0-6 shows higher influence during the cool periods. The signature of SHR0-T is generally stronger during the cool period, and this result is consistent when using a globally common characterization of the tornadic processes (Cheng et al. 2015).
Examination of the posterior patterns of the top three annual and warm period models suggests that the addition of SHR0-T and/or SHR0-6 changes the strength of the signature of SREH in the low and intermediate region. The relative strength of the causal association between tornado activity and SHR0-T or SHR0-6 is generally higher than with SREH in the low and intermediate region. By contrast, SREH is generally more influential and well identified in the high region, where the Great Plains, the Prairies, and the Midwest are located. This may reflect the predominance of strong, discrete storms associated with SREH in these areas. Discrete storms have persistent rotating updraft cores or mesocyclones, which have a direct correlation with strong SREH (Davies-Jones et al. 1990). Interestingly, our analysis also suggests that no multicollinearity issues arise when SREH, SHR0-T, and/or SHR0-6 are combined in the summer seasonal models. The latter result may suggest an independent control of the associated atmospheric processes over the summer tornado activity, which in turn generally lies in the east–west direction from just east of the Rocky Mountains to the northeastern United States.
The effects of SHR0-6 are distinct and well identified in both regions during the cool season. While SHR0-6 is more influential in the region associated with the low CAR values, the opposite holds true for SHR0-T. The strengthening of the SHR0-T signature in the intermdiate region (southeastern United States) is on par with existing evidence from the literature that the influence of the subtropical jet to tornado activity is strongest in the winter and can be a major factor for tornado formation and tornado outbreaks under extratropical storms (Hagemeyer and Schmocker 1991; Cook and Schaefer 2008). The effect of SREH is limited and generally confounded by the SHR0-6 and SHR0-T effects, as the sign of SREH changes depending on whether the latter predictors are included. The effect of SREH on the autumn tornado activity is mostly nonidentifiable in the low region (low or intermediate CAR values) and has a negative influence over the intermediate region (high CAR values). The projected weak and/or negative signature of SREH likely reflects the misalignment between the areas where the higher values of storm-relative helicity occur (i.e., Great Plains) and the region where the larger portion of tornadoes resides (i.e., southeastern seaboard states) (Tippett et al. 2014; Cheng et al. 2015). Some of the tornado activity in the latter region during the end-of-summer–early autumn season is typically associated with tropical cyclones, and therefore the relative importance of the underlying mechanisms is different (Edwards et al. 2012; see their Fig. 4). Last, SHR0-6, and to a lesser extent SHR0-T, are both important predictors of tornado activity during the winter period. While both predictors exert more influence on the low region, SHR0-T tends to be more strongly associated with the tornado occurrences in the low region, which may be related to the aforementioned strength of the subtropical jet. CAPE exerts clear influence over both regions but is generally more influential in the low region than in the intermediate region.
The observed and predicted tornado occurrences during the annual, warm, and cool periods are shown in Fig. 7 left and center panels, respectively (see also the same comparison in Fig. S6A of the supplemental information for the calendar-based seasonal models). It should be noted that these posterior predictions are based entirely on the hierarchical configuration of the explanatory model and do not consider the adjustments due to the potential observation bias introduced by the population density. The spatial patterns of the observed and predicted tornado occurrences are remarkably similar for all three studied periods. The tornado patterns for the annual and warm periods are very similar northward from the northern Great Plains and southern Ontario, indicating that tornadic events in those regions are largely a warm season phenomenon. The right panels of Fig. 7 illustrate the actual tornado occurrence (Tlatent) after accounting for the likelihood of a population bias. It can be seen that tornado occurrences are severely underestimated in areas of sparse population, such as the Prairies and to a lesser extent the northern Great Plains during the warm period. During the cool season, tornado observations are most frequent in the southeastern United States, followed by the Central Plains and the Atlantic seaboard states. The hierarchical configuration also remarkably captures areas with relatively lower frequency of tornado occurrence, like Idaho, Utah, Arizona, and the California coast region.
Seasonal variability characterized the posterior values of the population effect parameter β (Tables 2–4 and Tables S2A–S12A). The highest value was derived for winter, the lowest value during the summer (i.e., a higher population density is needed to observe tornadoes during the former period rather than the latter), and the value when using annual resolution is closer to the warm season’s values. Substituting β back into the exponential function to identify the population density thresholds where nearly all tornadoes can be observed [p(β) > 0.995], the threshold levels ranged from 5.9 to 7.2 people per square kilometer. The thresholds in annual and seasonal time scales are consistent with previous findings (Cheng et al. 2013, 2015; King 1997). The higher winter threshold levels may be related to the higher proportion of nocturnal tornadoes occurring in the winter and shorter day length, both of which affect the ability to observe tornadoes, particular when no damage results (Ashley et al. 2008). Despite the improved observational capacity, the number of unobserved tornadoes is predicted to be higher in the warm season because of higher activity in comparison to the cool season and the shifting of tornado activity northward to the northern plains and the Prairies (Fig. 7), where population densities are sparser. It is also possible that this increase in the warm season is a result of the stronger capping inversions and weaker synoptic-scale forcing experienced during that period of the year, which is not explicitly accounted for in the model as there are no predictor variables of convective inhibition (CIN) owing to its high collinearity with CAPE. Based on the current model configuration, the models indicate that tornado observations are typically underestimated from the western United States northward to the central Prairies, and toward eastern Ontario and southwestern Quebec, as well as in parts of the Central Plains region. With the exception of the Central Plains, tornado activity is primarily a warm season phenomenon.
As previously mentioned, the predictive confirmation of the present modeling framework was based on splitting the 30-yr dataset into two subsets; the calibration (1980–94) and predictive validation (1995–2009) datasets. The former one was used to obtain parameter estimates through Bayesian updating, and the resulting parameter posteriors along with the associated covariance were then tested independently against the latter dataset. This skill assessment test provided evidence that the posterior mean predictions per grid cell derived by the annual, warm, cool, or even the monthly model predictions were fairly consistent between the calibration and predictive confirmation temporal domains (Table 5 and Fig. S7A in the supplemental information). The same conclusion is supported by the second skill assessment test aiming to compare the inference drawn by the binomial–Poisson and zero-inflated Poisson models. The comparison of the performance of the two strategies, using CAPE–SREH (annual, warm, spring, and summer seasons) and CAPE–SHR0-6 (cool, autumn, and winter seasons) as predictors, indicated similar values of the root-mean-square error and Pearson correlation coefficient values between the observed number of tornadoes and posterior mean predictions per grid cell (Table 6 and Fig. S8A in the supplemental information). The parameter posterior distributions (central tendency and spread) were similarly robust between the two models across all the periods examined (Fig. 9A in the supplemental information).
We have developed a novel Bayesian hierarchical modeling approach to accommodate the regional variability in tornadic environments and more effectively predict tornado occurrence in different seasonal periods in North America. The delineation of the spatially homogeneous regions that formed the foundation of our hierarchical configuration was based on the spatial patterns of the CAR term used to capture the residual variability of the explanatory model. The basic premise of the proposed strategy was that the imperfections in the model structure could be partly ameliorated by relaxing the assumption of a constant characterization of the underlying mechanisms over the entire spatial domain modeled. Our Bayesian hierarchical framework illustrates that the tornado occurrence and the associated causal linkages exhibit clear regional and seasonal patterns, as reflected in the spatiotemporal variability of the relative importance of the explanatory variables examined.
Our analysis underscored the role of CAPE as a key explanatory variable of tornado occurrence. CAPE is a key criterion for deep moist convection, as it accounts for both lower-tropospheric moisture and lapse rate. For most tornadoes, it provides the buoyant energy needed for developing updrafts, which is one of the ingredients required for tornado formation (Rasmussen 2003). SREH appears to be critical in predicting tornado occurrence in the high-activity regions (Great Plains, Midwest and Ohio River valley, northeastern U.S. seaboard, and Gulf Coast states) and during the peak (warm) season, suggesting that mesocyclonic, supercell-type storms are the primary tornado producers in these areas. SHR0-6 and to a lesser extent SHR0-T are better predictors during the cool period of the year. This finding is indicative of more frequent QLCS tornadoes during the cool season (Smith et al. 2012), when the prevailing conditions are characterized by relatively weak thermodynamic instability but strong deep-layer wind shear; that is, wind increases significantly with height up to the upper troposphere, where the subtropical jet can generate regions of upper-level divergence (Trapp et al. 2005; Cook and Schaefer 2008; Edwards et al. 2012).
The distribution of tornadoes associated with different convective modes is not as well known west of the Rockies as to the east (Smith et al. 2012). Convective modes for tornadoes in Canada are even less certain. It is hypothesized that a nontrivial fraction of tornadoes in the eastern and western seaboard provinces, from the Prairies to southern Ontario and Quebec, are likely to be of the landspout type. They are typically associated with meso- and local-scale circulations in the boundary layer (e.g., sea- and lake-breeze fronts are thought to play a role in the development of these tornadoes) (Hagemeyer and Schmocker 1991; King et al. 2003). Although the necessary conditions for these formations (e.g., low-level tropospheric boundary with significant horizontal wind shear and rapidly deepening moist convection along the boundary) are difficult to quantify in reanalysis datasets, such tornadoes prefer an atmosphere with only weak vertical wind shear at relatively low levels (as opposed to supercells), which may be reflected in the relatively weaker and less identifiable influence of SREH and SHR0-6 (compared to SHR0-T) in the aforementioned regions. However, little research has been done to advance our understanding of the regional variability of convective modes in Canada. Further, the mechanisms underlying the convective mode in southwestern Nova Scotia are unlikely to resemble those in Manitoba, and therefore drawing general conclusions about the tornado activity over the entire Canadian region could be misleading. Tornadoes in autumn are among the most poorly predicted. Although this finding may be related to TC tornadoes requiring explanatory variables that were not considered in this study (e.g., TC radial distance and specific humidity gradient) (Belanger et al. 2009), we note that TC tornadoes mostly occur in September and they are not a large fraction for most years (Belanger et al. 2009). In a similar manner, the low-topped supercell tornadogenesis during the cold season in California are not representative of the majority of autumn tornadoes (Hanstrum et al. 2002). In any event, model performance in autumn is admittedly problematic.
Recent ensembles of global climate model projections suggest that a significant increase in the frequency of severe thunderstorm environments could be expected in the future (Diffenbaugh et al. 2013). Even though vertical wind shear is projected to decrease as the equator-to-pole temperature gradient decreases, future climate simulations show those days with both high CAPE (primarily because of an increase in low-level moisture and temperature) and strong low-level wind shear will likely increase, suggesting an increasing likelihood of atmospheric conditions favorable for tornadoes. While existing studies addressing the causal connection between tornadoes and climate change have focused mainly on environments that are conducive to severe thunderstorms capable of producing EF2 and greater intensity tornadoes (primarily supercell thunderstorms), rarely have studies examined projected changes in tornado activity for all intensities and presumably different types of development. Climate change could conceivably shape tornado development of all types, and consideration of region-specific tornadic environments may better characterize the atmospheric conditions favorable for tornadoes across different areas throughout North America. As a result, the present hierarchical modeling configuration could be adapted to climate change impact assessment to more effectively project the overall changes of tornado activity in high-risk areas.
Notwithstanding the encouraging results from the predictive and structural confirmation, our intent with the presented modeling framework is not to develop an early warning system but rather to offer a modeling tool that can be used to characterize the frequency of tornado occurrence in a particular location in North America (e.g., exceedance probability of more than 1 tornado within a selected period of the year) (Fig. 8). Following this line of reasoning, our approach effectively postulates that atmospheric parameters varying on climate time scales can be used to draw inferences about tornadic events with a lifetime of no more than a few hours or even only a few minutes. While this adoption of a coarser temporal resolution (seasonal or monthly averages) differs significantly from the typically used predictor variables on shorter (subdaily) time scales, we note that this modeling practice is popular in other disciplines (e.g., limnology), whereas statistical models are used to reproduce the average prevailing conditions in a certain period of the year (growing season) and subsequently to predict the likelihood of occurrence of episodic events, such as end-of-summer hypoxia, exceedance of certain threshold values of phytoplankton abundance, or cyanobacteria outbreaks (Arhonditsis and Brett 2005). In the context of ecosystem dynamics, the causal linkage is more evident as episodic events are typically the escalation of a complex interplay between physical, chemical, and biological processes that occur over a longer time period. Regarding the underlying mechanisms of tornadogenesis though, the credibility of the present probabilistic mapping relies on the strength of the association between “the changes in the moments (central tendency, spread) of the distribution of the environments occurring in the course of a month/season to faithfully capture the changes in the frequency of extreme subdaily environments (tails of the same distribution) associated with tornado occurrence” (Cheng et al. 2015, p. 9).
In conclusion, we introduced a Bayesian framework to obtain a regional characterization of tornado activity in North America in relation to large-scale climatic processes. In the context of statistical modeling, the hierarchical configuration represents a clear advancement over the technical limitations of the CAR term as well as the conceptual simplifications of the globally common parameter specification. Our analysis provides evidence that regional variability of tornado activity can be described using as explanatory variables the convective available potential energy, storm-relative helicity, and vertical wind shear data. The spatial variability of tornado occurrence during the warm season can be well explained by convective available potential energy and storm-relative helicity alone, while vertical wind shear is clearly better at capturing the spatial variability of the cool season tornado activity. A plausible next step would be the implementation of the present hierarchical framework to accommodate year-to-year variability rather than reproducing the spatial patterns of long-term average tornado occurrences. Two interesting facets of that exercise would be the identification of the optimal explanatory or predictor variables [including outputs of global climate models (Taylor et al. 2012)] and the most efficient time series modeling strategy to achieve year-specific predictions of tornado activity [e.g., dynamic linear modeling (Sadraddini et al. 2011)]. By the same token, the influence of large-scale climate oscillations, such as El Niño–Southern Oscillation and Madden–Julian oscillation, on the seasonal and regional tornado activity can be examined.
This project has received funding support from the Natural Sciences and Engineering Research Council of Canada through a Discovery Grant awarded to George Arhonditsis.
Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/JCLI-D-15-0404.s1.