Abstract

Trends in severe thunderstorms and the associated phenomena of tornadoes, hail, and damaging winds have been difficult to determine because of the many uncertainties in the historical eyewitness-report-based record. The authors demonstrate how a synthetic record that is based on high-resolution numerical modeling may be immune to these uncertainties. Specifically, a synthetic record is produced through dynamical downscaling of global reanalysis data over the period of 1990–2009 for the months of April–June using the Weather Research and Forecasting model. An artificial neural network (ANN) is trained and then utilized to identify occurrences of severe thunderstorms in the model output. The model-downscaled precipitation was determined to have a high degree of correlation with precipitation observations. However, the model significantly overpredicted the amount of rainfall in many locations. The downscaling methodology and ANN generated a realistic temporal evolution of the geospatial severe-thunderstorm activity, with a geographical shift of the activity to the north and east as the warm season progresses. Regional time series of modeled severe-thunderstorm occurrences showed no significant trends over the 20-yr period of consideration, in contrast to trends seen in the observational record. Consistently, no significant trend was found over the same 20-yr period in the environmental conditions that support the development of severe thunderstorms.

1. Introduction

In 2011, tornadoes claimed over 550 lives over 15 different states, the second highest toll on record (NOAA 2011a). This brings into focus the vulnerability of a large portion of the U.S. population to severe convective weather. Changes in the frequency and location of severe weather could have a large impact on food prices, community infrastructure, and the safety of the general public (Lemons 1942; Rogers 1996; Rosenzweig et al. 2001). As the gradual urbanization of the United States increases, urban sprawl may put even more of the population at risk to severe-weather outbreaks (e.g., Hall and Ashley 2008; Paulikas 2010; Richardson et al. 2012), making it important for society and decision makers to understand the current distribution of severe convective events and how it may be changing.

The Storm Prediction Center has amassed a nearly 60-yr record of severe thunderstorms in the United States (hail > 20 mm in diameter, wind > 25 m s−1, and any tornado).1 This record would be a logical resource to examine how severe convective weather has changed in the United States over the last six decades. For example, the tornado-report data in Fig. 1 might be interpreted to indicate that the United States has experienced a marked increase in tornadic events. The problem with this interpretation, as mentioned by Diffenbaugh et al. (2008) and others, is that the severe-storm record is inherently biased by nonmeteorological factors. Mainly, it is unclear how to deconvolve the effects of improvements in public education, changes in reporting procedures, and changes in population density from changes in the record associated with climate variability and change (Diffenbaugh et al. 2008).

Fig. 1.

Annual normalized counts of tornado reports, tornado days, and population as adapted from Diffenbaugh et al. (2008). The similar increasing trend seen in the population and both tornado-activity measures highlights the possible nonmeteorological bias in the severe-storm record.

Fig. 1.

Annual normalized counts of tornado reports, tornado days, and population as adapted from Diffenbaugh et al. (2008). The similar increasing trend seen in the population and both tornado-activity measures highlights the possible nonmeteorological bias in the severe-storm record.

Previous research

One approach to eliminating the nonmeteorological factors from the record has been to employ numerical modeling or other datasets. Unfortunately, tornadoes and the other severe phenomena occur on scales that are unresolved by most climatic datasets and models. This has forced previous work to be mostly centered on evaluating the larger-scale environment, with the assumption that environmental changes will be manifested in changes to unresolved, convective-scale phenomena (Brooks et al. 1994; Marsh et al. 2009; Trapp et al. 2007). This assumption is violated, for example, when the environmental conditions favor severe convective storms, but the storms fail to develop. To examine the trend in the actual number of severe events, storm initiation must also be considered.

High-resolution, “convection permitting” numerical simulations allow for explicit examination of storms that initiate and possibly realize the severe potential provided by the large-scale environment. This explicit approach was utilized by Trapp et al. (2011), who generated a series of daily, convective-permitting reforecasts using the Weather Research and Forecasting (WRF) model to dynamically downscale data from the National Centers for Environmental Prediction–National Center for Atmospheric Research Reanalysis Program (NNRP) (Kalnay et al. 1996) and classify storms as severe or nonsevere. Because convective hazards such as tornadoes are still not directly resolved in these simulations, storms were considered severe if gridpoint values of updraft helicity (UH) calculated between 2 and 5 km AGL, and model-derived radar reflectivity (RF) at 1 km AGL, simultaneously exceeded 40 m2 s−2 and 50 dBZ, respectively. Though relatively simplistic, the approach represented much of the climatological distribution of severe convective events, but not without limitation.

While this previous work provided some useful insight into using high-resolution numerical modeling for severe climatology reconstruction, the short evaluation period used (10 yr) did not provide enough data to evaluate trends on what is normally considered to be climatic time frames (Collins and Allen 2002). Also, only two variables were used in the previous study, with no guarantee that they constituted the ideal method for identifying severe convection in the model. It is reasonable to assume that additional variables could improve the analysis of severe convective storm occurrences. This raises an entirely new problem in choosing which variables and at what thresholds to employ. Given the myriad of variables available in the model output, coupled with the whole range of derived variables one could use (CAPE, shear, lifted index, etc.), this is a daunting task. It would be impractical to approach this problem through a brute-force method given the literally thousands of different combinations one could choose. The machine learning community may be able to provide a more efficient method of choosing variables through the use of artificial neural networks (ANNs). The results presented here explore the use of ANNs and dynamically downscaled data to approximate the actual climatology of severe convective events from coarse reanalysis. Furthermore, the methodology is designed in such a way as to evaluate the possibility of utilizing such an approach in the absence of observational data (i.e., in coarse-resolution global climate model simulations). The method is then applied over the most recent two decades to examine if there is model support for the observed positive trend in severe-thunderstorm occurrences in the absence of the nonmeteorological factors mentioned above.

2. Methods

a. General approach

The WRF model is used to dynamically downscale 20 yr (1990–2009) of warm-season (April–June: AMJ) data from the NNRP. Details on the WRF model setup, including choice of parameterizations schemes, are found in Table 1. Of particular relevance is the horizontal gridpoint spacing of 4.25 km, and the corresponding elimination of a cumulus parameterization. The decision to forgo the use of a cumulus parameterization at such a grid spacing is still a matter of academic debate. However, recent studies of the performance of convection-allowing setups of the WRF model in an operational setting indicate that the approach can provide skillful forecasts of convective storms, thus implying its applicability here (see Done et al. 2004; Kain et al. 2006, 2008, 2011; Schwartz et al. 2009; Weisman et al. 2008, Clark et al. 2012, and others).

Table 1.

Parameterizations, parameters, and initial/boundary conditions for the WRF model.

Parameterizations, parameters, and initial/boundary conditions for the WRF model.
Parameterizations, parameters, and initial/boundary conditions for the WRF model.

Initial and boundary conditions are generated with the NNRP dataset. These data are provided at 6-h intervals on a roughly 210-km horizontal grid, with 17 vertical levels, and include temperature, moisture, pressure, and wind. Temperature and moisture variables for the land surface are updated every 6 h at three soil depths (Kalnay et al. 1996; https://rda.ucar.edu/datasets/ds090.0/). Simulations are performed for every day during AMJ over 1990–2009, initialized at 1200 UTC and terminated at 1200 UTC the following day. This is done to minimize error growth, although it does prevent the existence of particularly long-duration nocturnal mesoscale convective systems, and it also inhibits longer-term feedbacks, such as those between precipitating convection and the land surface. In general, the approach maintains the proper diurnal cycle and is able to spin up preexisting convection within the first six simulation hours (Skamarock 2004; Trapp et al. 2011), which is during a time of lower convective activity.

The choice of a relatively coarse dataset for use in initial and boundary conditions is motivated by the idea that this methodology may also be applied to simulations of future climate using general circulation models (GCMs), which tend to be correspondingly coarse. Hence, the NNRP dataset is an ideal candidate for testing the methodology with “perfect” boundary conditions [although the NNRP dataset has biases of its own: see Kistler et al. (2001)], at a GCM-equivalent grid resolution. This consideration is also why data assimilation is not employed.

b. ANN design and training

The focus of this study is on the spatial climatology of severe convective events over the United States, as well as on the interannual variability of such events. As mentioned above, an ANN is designed and employed to indirectly determine model-simulated severe convective weather occurrences. An ANN is a set of self-modifying computer algorithms designed to accomplish a specific task (Hagan et al. 1996). ANNs have been theoretically shown to be able to fit functions of arbitrary complexity and are particularly well suited to work in the atmospheric sciences because of their ability to analyze nonlinear systems with little or no assumption, a priori, about the distribution of the data used (normality, stationarity, etc.) (see Manzato 2005; Yuan et al. 2007; Marzban 2000; Marzban and Stumpf 1996, 1998; Marzban et al. 1997; Marzban and Witt 2001; and others).

The task here is to classify model-generated storms as either severe or nonsevere. In order for the network to “learn” the characteristics of a severe convective storm, examples of severe and nonsevere storms are provided as inputs to the ANN for training. These training data come from simulated days from our dataset that correspond to days of observed convective storms.

A set of characteristic meteorological variables is taken from each candidate modeled storm. This includes convective available potential energy (CAPE), convective inhibition (CIN) (both calculated using a surface parcel), low-level specific humidity, low-level temperature, 0–1- and 0–3-km storm-relative environmental helicity (SREH), low-level wind magnitude, hourly rain accumulation, and 1-km model RF.2 Also included are the temperature, specific humidity, vertical velocity, and absolute magnitude of the wind speed at 850, 500, and 250 hPa. CAPE, CIN, specific humidity, and temperature are used here to characterize the storm potential of the environment in which the storm has initiated. UH and RF indicate storm rotation and intensity, respectively. Low-level wind speeds indicate both storm inflow/outflow strength as well as the potential for straight-line wind damage. Many of the above parameters, slightly modified, are included in indices commonly used for severe-weather forecasting (e.g., Brooks et al. 2003b; Thompson et al. 2003). Model output is examined hourly and the observations are preprocessed to limit the reports to one report per hour per grid box in order to make a one-to-one comparison.

Before sampling, the data are coarsened to a roughly 40-km grid to give some flexibility in the location and timing of the storm in the model when compared with observations: values of RF, hourly rain accumulation, vertical velocity, and UH are taken from the original grid point within a coarsened grid box with the highest UH values. Values of CAPE, temperature, and other environmental parameters are taken from the grid point with the highest CAPE value (see Fig. 2 for an example). This is done to avoid alteration of the environmental parameters by any ongoing convection.

Fig. 2.

Locations for data sampling submitted to the neural network. The solid black line represents the boundaries of the new, coarser grid box. The dotted lines represent the boundaries of the older, higher-resolution grid. Values for UH and CAPE from the original grid are given for each box. In this example, the coarse box would be assigned a UH value of 200 m2 s−2 and a CAPE value of 1400 J kg−1.

Fig. 2.

Locations for data sampling submitted to the neural network. The solid black line represents the boundaries of the new, coarser grid box. The dotted lines represent the boundaries of the older, higher-resolution grid. Values for UH and CAPE from the original grid are given for each box. In this example, the coarse box would be assigned a UH value of 200 m2 s−2 and a CAPE value of 1400 J kg−1.

The use of these simulated data along with the severe-storm observational record presents a unique, twofold problem. First, the record of observed severe convective events is biased by the nonmeteorological factors as mentioned above, and, second, there is no guarantee that the model will generate a severe storm at the same time/location at which it actually occurred.

To minimize the effects of these two uncertainties, preprocessing checks are employed before cases are used for network training. First, a preprocessing threshold on RF is applied. The assumption is that storms exhibiting relatively high RF values are more likely to be severe than those with relatively low RFs. Any storm with a maximum RF below the threshold is excluded from training and eventual analysis. It is also assumed that a model-generated storm with a derived RF above the chosen threshold and located at the same time/place as an observed storm report is an example of a severe storm. Two separate RF thresholds are tested: 45 and 50 dBZ. Thresholds lower (higher) than 45 (50) dBZ were considered but resulted in poorer network performance and dramatically increased processing time. In terms of the overall RF distribution from the model (excluding occurrences lower than 5 dBZ), this corresponds to approximately the 90th and 97.5th percentiles. All other instances with RF greater than the threshold but without a corresponding severe report are considered to be candidates for strong, nonsevere storms. These nonsevere events are filtered to ensure that an actual storm existed in the observations at the same time and location by checking to see if an echo exceeding 40 dBZ existed in a historical National Operational Weather Radar (NOWRAD) archive (Davis et al. 2003). The procedure shows only a small sensitivity to changes in this threshold.

Admittedly, these preprocessing steps do not guarantee that the training set is free from errors. In fact it is almost certain that the network will be provided with at least one incorrect classification given the difficulties mentioned above. The hope is that the preprocessing steps will minimize the inclusion of these errors to a point where the network can still achieve accurate results. Inevitably, the final skill of the network in classifying severe storms will be limited by the quality of the training set.

During training, 10 yr (1996–2005) of examples are iteratively presented to the network for classification. These years are chosen based on the availability of the aforementioned NOWRAD data. These data are also separated out into training and validation sets—approximately 70% of the data over 1996–2005 are used in training and the remaining 30% are used to test for overfitting. The network determines a storm's classification by multiplying each variable by its respective weight and combining them through a series of algorithms, or neurons. Each neuron has its own weights for each of the input variables (Fig. 3). The network used here consists of 22 inputs, one intermediate neuron, and one output neuron and was designed with the MATLAB Neural Network Toolbox (Demuth and Beale 1993). More complex networks were also tested but produced similar results to those of less complicated ones. It should be noted that this simplistic network is basically a multiple linear regression. Equal numbers of severe and nonsevere cases are presented to the network for classification (approximately 3700 total cases over all 10 yr). After each iteration, the network compares its output with the classifications determined in the preprocessing stage above. The network adjusts its internal algorithms by differentiating the mean-square error (MSE) with respect to each of the input variables and adjusting the weight given to the variable (or neuron) that decreases the MSE most dramatically. Output from the ANN is a value that ranges between 0 and 1 for each candidate storm. This is then converted into a severe (>0.5) or nonsevere (≤0.5) classification. This threshold was increased to 0.7 for the networks trained with the 45-dBZ limit because it resulted in many more identified storms. The training is repeated until further weight adjustments do not result in any increase in accuracy.

Fig. 3.

An example of a multilayer, multineuron ANN. The final output neuron gives a number between 1 (severe) and 0 (nonsevere) according to a saturated linear equation. The simplest network that results in low RMSE values for severe convective days consists of 22 inputs, one intermediate neuron, and a single output neuron.

Fig. 3.

An example of a multilayer, multineuron ANN. The final output neuron gives a number between 1 (severe) and 0 (nonsevere) according to a saturated linear equation. The simplest network that results in low RMSE values for severe convective days consists of 22 inputs, one intermediate neuron, and a single output neuron.

The entire training process was repeated multiple times because of the sensitivity of the results to the initial neuron weights. Each network was tested using 15 different sets of nonsevere-storm examples, and each set was used with 65 different sets of initial conditions for the network weights, resulting in a total of 975 training repetitions. This number of iterations was chosen because initial testing indicated a significant decrease in the rate of network improvement past this number of trials. The network with the best root-mean-square error (RMSE) when comparing model-derived events with the observations was chosen.

One concern that may be raised about the training procedure is the possible nonstationarity of the observational database, owing, for example, to the improved ability to detect severe storms over the last two decades. This increase has the potential to bias our training procedure and result in a final climatology that is also biased. This would occur if the change in detection frequency resulted in training the network with storms that have significantly different distributions of variables from those outside the training set. To check for this, the training set was split into two parts—one composed of severe and nonsevere training examples from 1996 to 2000, and the other with storms from 2001 to 2005. The distribution of variables over the first time period from each classification type was compared with variable distributions of that same type in the second time period using a quantile–quantile (Q–Q) plot (see Fig. 4). A straight line following y = x signifies that the two distributions are the same. Here, we show results only from the UH parameter, but Q–Q plots for the other variables have similar shapes. With the exception of the very end of the tails (where data are sparse), we see from Fig. 4 that the distributions from the respective severe and nonsevere storms over the two periods are approximately the same, meaning the physical characteristics of these storms are not changing considerably with time. Moreover, when the distribution of severe-storm examples is compared with that of the nonsevere examples, we see markedly different distributions. This gives us confidence that the nonmeteorological biases may have minimal effect on the variables describing the storms used in the training set and that the resulting network should not be biased by the nonstationary bias in the observations. This also shows that our preprocessing method successfully separates out two distinct types of storms, as viewed from the distribution of the UH parameter. The reader is directed to Hagan et al. (1996) for a more in-depth discussion of neural network design, training, and limitations.

Fig. 4.

Quantile–quantile plots for the UH parameter for (top) severe and (middle) nonsevere-storm examples comparing the first and second halves of the training set. The line shown is the y = x line. The close fit between the line and the Q–Q plots implies that the distribution of this variable for both severe and nonsevere cases does not change considerably over the training set. (bottom) Additionally, comparing severe vs nonsevere training examples over the entire training set shows that, as expected, the two different classifications have different distributions for the UH parameter.

Fig. 4.

Quantile–quantile plots for the UH parameter for (top) severe and (middle) nonsevere-storm examples comparing the first and second halves of the training set. The line shown is the y = x line. The close fit between the line and the Q–Q plots implies that the distribution of this variable for both severe and nonsevere cases does not change considerably over the training set. (bottom) Additionally, comparing severe vs nonsevere training examples over the entire training set shows that, as expected, the two different classifications have different distributions for the UH parameter.

3. Results

a. Average monthly rainfall

Before examining the results from the neural network, it is useful to check to see if the downscaling approach generates a climatological distribution of a variable that is directly predicted and directly observed. Here, we examine average monthly rainfall, using the model variable of rainwater mixing ratio, and data from the Parameter–Elevation Regressions on Independent Slopes Model (PRISM). Although inherently coarser in resolution than the model output, this dataset uses point observations of precipitation, temperature, and other variables, along with knowledge of elevation and geographical datasets to create a gridded climatic dataset and serves as the official climatological record for the U.S. Department of Agriculture (Daly et al. 2002).

As demonstrated in Fig. 5, the locations of the model-generated precipitation patterns are particularly well placed in the southern Great Plains in April and across the southern United States and Florida in June. The centered pattern correlation values between the standardized PRISM dataset and the standardized modeled precipitation field indicate a high degree of correlation with values of 0.85 0.79, and 0.72 for April, May, and June, respectively. In terms of average precipitation magnitudes, the model significantly overpredicts the amount of rainfall in many locations. This is particularly evident in the latter part of the season, in the southeastern United States and the Midwest, where some portions of the domain experience almost 33% more rainfall than was observed. Previous literature has also documented this positive precipitation bias in high-resolution, convective-permitting model simulations (Weisman et al. 2008). Given the focus here on severe convective hazards represented in the severe-storm-report database, a more thorough analysis of the model-produced rain fields is left to a subsequent paper; however, it suffices to say that these results indicate that the model seems to generate precipitating storms in approximately the correct geographical location, but perhaps with greater intensity.

Fig. 5.

(top) Average and (bottom) standardized mean monthly precipitation from both the WRF model simulations and the PRISM dataset. This standardized field was generated by taking gridpoint averages over the entire 20 yr for the respective months and then standardizing each by subtracting the domainwide average precipitation and dividing by the domainwide standard deviation. The numbers at the bottom of each month in the bottom row represent the centered pattern correlation value between the standardized model and observational data. While the WRF model generally overpredicts the mean precipitation, it does a better job at placing the local maxima/minima.

Fig. 5.

(top) Average and (bottom) standardized mean monthly precipitation from both the WRF model simulations and the PRISM dataset. This standardized field was generated by taking gridpoint averages over the entire 20 yr for the respective months and then standardizing each by subtracting the domainwide average precipitation and dividing by the domainwide standard deviation. The numbers at the bottom of each month in the bottom row represent the centered pattern correlation value between the standardized model and observational data. While the WRF model generally overpredicts the mean precipitation, it does a better job at placing the local maxima/minima.

b. Occurrences of severe convective thunderstorms

Next, we consider the average annual frequency of severe-thunderstorm occurrences for April–June, as provided through the neural network. Following Brooks et al. (2003a) and Doswell et al. (2005), the occurrences are quantified in terms of a severe-thunderstorm day: a grid point experiences a severe day if, during any time during the 24-h simulation, there is a severe storm observed (or simulated for the model results) within a coarsened grid box. The coarsened grid box again has a horizontal dimension of approximately 40 km and accounts for the uncertainty in the observed reports. In Fig. 6, results are shown for two different networks using either a 50- or 45-dBZ preprocessing threshold. Results are also shown for an averaging of these two networks and provide the lowest RMSE value when compared with observations. In general, the mean occurrence frequencies of severe convection increase and spread north- and eastward as the season progresses (Fig. 6). Well represented throughout the season is an area of known higher severe-thunderstorm activity over the Great Plains (Brooks et al. 2003b). Slightly higher numbers of days just east of the Appalachian Mountains in April and May are seen, as is higher activity in the central Texas–Texas Panhandle region in June. With the exception of June, magnitudes of the model-generated days agree well with those observed. Most maxima in the model are shifted eastward from those seen in the observations.

Fig. 6.

The annual frequency of occurrence of severe-weather days (left) in the observations and (remaining columns) in the model for (top) April, (middle) May, and (bottom) June utilizing networks trained with both a 50- and 45-dBZ preprocessing threshold. The result on the far right is the average of the 45- and 50-dBZ networks. Model results show a distinct northward and eastward shift as the season progresses from a synoptically dominated forcing to a more locally forced one. May and June show model maxima displaced to the north and east of what is seen in the observations.

Fig. 6.

The annual frequency of occurrence of severe-weather days (left) in the observations and (remaining columns) in the model for (top) April, (middle) May, and (bottom) June utilizing networks trained with both a 50- and 45-dBZ preprocessing threshold. The result on the far right is the average of the 45- and 50-dBZ networks. Model results show a distinct northward and eastward shift as the season progresses from a synoptically dominated forcing to a more locally forced one. May and June show model maxima displaced to the north and east of what is seen in the observations.

It is instructive to examine regionally averaged model-simulated severe-thunderstorm days over the southern Great Plains (SGP), northern Great Plains (NGP), Southeast (SE), Northeast (NE), and Midwest (MW) (see Fig. 7). As shown in Table 2, RMSE values increase as the warm season progresses, indicating poorer network/model performance for the later part of the warm season. This may be due to a change in the scale of the convective forcing, from a well-resolved synoptically dominated pattern in April to more locally forced convection in June.

Fig. 7.

Boundaries used for regional analysis and time series trends. Data over the oceans and other bodies of water are masked before analysis. The number of grid points within each region is listed in the table to the right.

Fig. 7.

Boundaries used for regional analysis and time series trends. Data over the oceans and other bodies of water are masked before analysis. The number of grid points within each region is listed in the table to the right.

Table 2.

Regional RMSE values for observed vs modeled severe days for 45- and 50-dBZ preprocessing thresholds over all 20 yr of WRF model simulations and for the average values using both thresholds. The average of the AMJ 50 and AMJ 45 results produces the lowest seasonal RMSE value for the United States as a whole.

Regional RMSE values for observed vs modeled severe days for 45- and 50-dBZ preprocessing thresholds over all 20 yr of WRF model simulations and for the average values using both thresholds. The average of the AMJ 50 and AMJ 45 results produces the lowest seasonal RMSE value for the United States as a whole.
Regional RMSE values for observed vs modeled severe days for 45- and 50-dBZ preprocessing thresholds over all 20 yr of WRF model simulations and for the average values using both thresholds. The average of the AMJ 50 and AMJ 45 results produces the lowest seasonal RMSE value for the United States as a whole.

Finally, we examine regional time series of seasonal modeled and observed severe days to determine if the increasing trend in the observed severe reports is also realized in the modeling. These trends are compared with trends in the local product of 0000 UTC CAPE and 0–6-km shear (S06). This “environmental control parameter” is calculated from the NNRP and then averaged over the respective regions, as well as over the respective months. This parameter has been shown to be a reasonably good indicator of when the environment is favorable for severe thunderstorms (Brooks et al. 1994; Doswell et al. 2005; Trapp et al. 2007).

All regions except the SGP show a positive, statistically significant trend in the number of observed severe thunderstorms (Fig. 8). The significance level (p < 0.05) is determined by fitting a linear regression to the time series and performing a Student's t test on the slope parameter; a stricter statistical test of these trends may result in different significance levels. None of the regions show positive, statistically significant trends in severe thunderstorms generated from the model or in the environmental control parameter (p < 0.05). Further, a longer time series of the control parameter over the last 60 yr does not show statistically significant trends over four of the five regions examined and shows a statistically significant negative trend over the Northeast (p < 0.05) (Fig. 9). This is in agreement with the recent study by Gensini and Ashley (2011), who did not find a positive trend in a time series of the severe-thunderstorm environment determined from the North American Regional Reanalysis (NARR) dataset. A reconciliation of these results with studies that indicate increasing severe-thunderstorm activity with anthropogenic climate change is offered in the next section.

Fig. 8.

Regional time series of severe days from the model proxy (blue), observations (red), and the average monthly product of CAPE × S06 (green). Notice that over most regions there is an increase in the number of reported severe events; however, this trend is not recreated in the modeled events.

Fig. 8.

Regional time series of severe days from the model proxy (blue), observations (red), and the average monthly product of CAPE × S06 (green). Notice that over most regions there is an increase in the number of reported severe events; however, this trend is not recreated in the modeled events.

Fig. 9.

A long-term time series of CAPE × S06 from 1950 to 2008. A solid line indicates that the slope is significant at the 95% confidence level. In contrast to the increase in reported severe events, there is no statistically significant trend in the environmental control for severe convection, except in the NE, where the trend is negative.

Fig. 9.

A long-term time series of CAPE × S06 from 1950 to 2008. A solid line indicates that the slope is significant at the 95% confidence level. In contrast to the increase in reported severe events, there is no statistically significant trend in the environmental control for severe convection, except in the NE, where the trend is negative.

4. Discussion

a. Importance of specific variables to severe classification

It is important to understand which variables are most important to the network in classifying a storm as severe or nonsevere. This gives an indication of how sensitive the network may be to factors such as model grid spacing. As an example, if the most important variables are known to scale with resolution, then obviously a network trained with one resolution could not be used to classify storms in another. This information might also be useful to those employing the model operationally, as the resolution and model setup used here are frequently selected for day-to-day forecast operations. One of the unfortunate consequences of using an ANN is its relative opacity compared to other methods. It is normally not immediately apparent which variables are most important to the network for classification; however, the relative simplicity of the network used here makes interpretation more straightforward. Following the approach of Jackson (2002), we use a randomized Monte Carlo approach to determine which weights would result if the network had been trained using a random target set. To determine confidence levels for the network weights, networks are created using the same meteorological data but with randomized target data. This is done many times, starting with the same initial conditions, and the distribution of the final weights for each variable is examined. If the weights determined using the nonrandomized data fall at or above the 95th percentile of the weights generated using the random targets, then they are considered significant. For negative network weights, confidence is established if the value is less than the 5th-percentile value. From these results we find that high values of UH and CAPE increase the likelihood of a storm being classified by the network as severe. This makes sense considering that earlier approaches using simpler UH-based proxies were able to identify severe storms with some success (Sobash et al. 2008; Trapp et al. 2011). Similarly, we find that CIN, 500–250-hPa wind magnitude, vertical velocity, 0–3-km SREH, 850–250-hPa temperature, and upper-level humidity also serve to increase the likelihood of a severe classification, but to a lesser extent. The importance of CAPE and UH also implies that the network would have to be retrained if the model data were at a finer grid spacing, as both variables would show sensitivity to this change.

b. June severe occurrences along the East Coast

Though the current modeling methodology does particularly well at representing the observed occurrences of severe thunderstorms in certain months of the year and in certain areas of the domain, it does particularly poorly in the month of June in the northern plains and along the East Coast. The exact cause of these model discrepancies is still under investigation, but an examination of the severe occurrences broken down by hazard might shed some light on the issue. June is unique in that the maxima in the tornado, hail, and wind events are actually spatially separated from each other (Fig. 10). Notably, the geographical area where the downscaling method performs the poorest seems to be dominated by severe-wind events. A plausible explanation could be that the results derived by the ANN are less sensitive to these types of events. This would not be too surprising since the variable contributing the most to a “severe” classification from the network (UH) is also a variable that indicates the possible development of supercell thunderstorms. Given that straight-line winds frequently arise from nonsupercellular storms (in quasi-linear convective systems as an example), it is possible that the network may miss some of these events.

Fig. 10.

(top) Breakdown of frequency of occurrence for observed severe events from 1990 to 2009 for June by type, and (bottom) total observed and modeled events; all plots use the scale from Fig. 6. The model shows a large underprediction of events over the East Coast and southern plains. This comparison shows that this may be a result of the network (model) failing to classify (generate) strong wind events.

Fig. 10.

(top) Breakdown of frequency of occurrence for observed severe events from 1990 to 2009 for June by type, and (bottom) total observed and modeled events; all plots use the scale from Fig. 6. The model shows a large underprediction of events over the East Coast and southern plains. This comparison shows that this may be a result of the network (model) failing to classify (generate) strong wind events.

Another possibility may be that our implementation of the WRF model has a regional bias in surface-level wind speeds along the East Coast. This would explain why the general magnitude of the total severe events matches well in areas outside of this region. If one examines the physical surface properties used in the model (land use, land cover, vegetation fraction, surface roughness, etc.), one would find that there are boundaries in these parameters that separate this area from regions west of the Appalachian Mountains. Given that low-level wind speed is sensitive to these parameters, errors in the surface and boundary layer physics cannot be ruled out as a possible reason for the poor downscaling performance in this region. Indeed, here and in other parts of the domain, a variety of physics-related errors may be at fault including the role and prediction of the nocturnal low-level jet, over-/underprediction of CAPE/CIN or the stabilization of the planetary boundary layer.

c. On climate change

The lack of significant trends in the modeled severe-thunderstorm days over the 1990–2009 period may seem at odds with previous studies (Marsh et al. 2007; Trapp et al. 2007, 2009), which expect increasing activity with increasing anthropogenic climate change. In fact, we find that the relatively flat 20-yr time series in Fig. 8 are consistent with very gradual positive trends over the 150-yr time series analyzed by Trapp et al. (2009). Indeed, the Trapp et al. (2009) results suggest that it might take multiple decades before a response can be realized. Most long-term climate experiments compare average results over climatic periods separated by many years, allowing the climate change signal to be amplified. An experiment utilizing this method along with data from a fully coupled climate model would be more appropriate for assessing the long-term change in convective activity. It should also be mentioned that our results may be dependent on the reanalysis data used to force the simulations. Other reanalyses may have slightly different average conditions and would perhaps result in different climatologies (Thorne and Vose 2010).

As an aside, it is interesting to note that the interannual variability of the model-derived reports is fairly consistent with that of the observed reports, as well as that of the environmental control; the latter gives us confidence that the model is internally consistent with the large-scale environment. In general, years with high (low) values of the environmental control parameter show high (low) values of derived reports. The regional correlations between the model-derived severe days and the convective forcing range from about 0.7 to 0.2; a similar range can be seen with the correlations between the observations and the parameter (Table 3). The low correlation seen in both the model and the observations highlights that the environmental control parameter cannot, by itself, be a direct measure of convective activity, since convection must actually initiate to realize its potential. Indeed, this is precisely the motivation for this downscaling approach.

Table 3.

Correlation values between CAPE × S06 and severe occurrence days for both observations and the proxy.

Correlation values between CAPE × S06 and severe occurrence days for both observations and the proxy.
Correlation values between CAPE × S06 and severe occurrence days for both observations and the proxy.

As a final caveat, it should be kept in mind that the accuracy of these results is very much dependent on the accuracy of the model in simulating the convective conditions over the 20 yr examined. While previous research has shown that this approach does seem to adequately simulate the climate statistics of severe convective storm occurrences, and attempts are made here to compare the model output with observations for validation, missed events will have an impact on the results. A prime example of this is the tornado outbreak on 3 May 1999. On this day, a series of very strong supercells moved across Oklahoma and the central Great Plains, resulting in 33 tornadoes rated as category 2 or higher on the Fujita scale (F2+), including one designated as F5 (NOAA 2011b). Our WRF model reforecast for this day does not produce discrete cells, but rather a well-developed squall line. This type of error may affect the total number of events identified in years containing these large-outbreak days. The issue is particularly problematic and difficult to address. The best way to compensate for this may be to take an ensemble-based approach, with the idea that slight perturbations in the initial conditions or model physics may bring the simulation more in line with the observations. This increase in the number of model simulations, however, also dramatically increases the processing and storage requirements needed to complete a large enough sample in order to achieve statistical significance.

5. Future work

While the results here are promising, there is still room for improvement. As mentioned previously, most maxima in the model are shifted noticeably to the north and east, and the modeled activity on the East Coast during June is still much lower than observed. The breakdown of observed severe reports in this area (Fig. 10) suggests that the problem may lay with severe-wind events, but it has yet to be determined with certainty if this is the case.

This apparent dependence of network performance on severe event type highlights another result lacking from this analysis: examination of the trends of each type of severe event. Even if the total number of severe events has not changed over the last 20 yr, it is important to see if the relative distribution of tornadoes, large hail, and high winds also stays the same. It is possible that we may be experiencing an increase in one type, but a decrease in another. A future with the same amount of overall severe weather but more tornadoes is significantly different than a future in which the severe-weather distribution across all hazards stays the same. The discrimination of these events from the model thus far has been difficult. The model resolution used here may be at the upper threshold of what is needed to resolve many of the processes that discriminate among these different types of hazards. The ANN can be modified to output the results into discrete classification categories, but only if the training set can be reliably stratified as well. This requires the model to be much more accurate, and raises issues of data availability when it comes to rare events such as tornadoes. Development of this capability is ongoing.

Finally, as mentioned in section 2, the daily model reinitialization used in this approach prohibits some land surface feedbacks, such as the effect of mesoscale boundaries from one day on convection during the next day. Ongoing work is examining the sensitivity of the solution to this choice of reinitialization frequency and will determine if longer continuous model simulations produce better (or worse) results.

6. Conclusions

The objective herein was to explore the use of ANNs and dynamically downscaled reanalysis data to produce a realistic climatology of severe thunderstorms. Analysis of average monthly rainfall indicated that the downscaling methodology generated precipitation in approximately the correct geographical location but overestimated the overall magnitude. The downscaling methodology and ANN generated the temporal evolution of the severe-thunderstorm season, with a geographical shift in the storm activity to the north and east as the warm season (April–June) progresses. Time series of modeled severe-thunderstorm occurrences showed no significant trends over the 20-yr period of consideration, in contrast to trends seen in the observational record. Consistently, no significant trend was found over the same 20-yr period in the environmental conditions supportive of severe-thunderstorm occurrences. The relative success of the method in recreating the spatial and temporal patterns of evolution of the convective season, along with the interannual variability seen in the observational record, shows that the method presented here may be a viable candidate for studying severe convective activity in the absence of observational data, such as in general circulation model simulations.

Acknowledgments

The authors benefited greatly from the comments and criticisms of the three anonymous reviewers. The authors also thank the National Climatic Data Center for providing access to the NNRP dataset, the National Center for Atmospheric Research Accelerated Scientific Discovery Initiative, and the National Science Foundation for providing monetary and computational resources under Grant NSF ATM 0541491. This is Purdue Climate Change Research Center Paper 1332 and contributes to the Clouds, Climate, and Extreme Weather initiative at Purdue University.

REFERENCES

REFERENCES
Brooks
,
H. E.
,
C. A.
Doswell
III
, and
J.
Cooper
,
1994
:
On the environments of tornadic and nontornadic mesocyclones
.
Wea. Forecasting
,
9
,
606
618
.
Brooks
,
H. E.
,
C. A.
Doswell
III
, and
M. P.
Kay
,
2003a
:
Climatological estimates of local daily tornado probability for the United States
.
Wea. Forecasting
,
18
,
626
640
.
Brooks
,
H. E.
,
J. W.
Lee
, and
J. P.
Craven
,
2003b
:
The spatial distribution of severe thunderstorm and tornado environments from global reanalysis data
.
Atmos. Res.
,
67–68
,
73
94
.
Chen
F.
, and
J.
Dudhia
,
2001
: Coupling an advanced land surface–hydrology model with the Penn State–NCAR MM5 modeling system. Part I: Model description and implementation. Mon. Wea. Rev.,129, 569–585.
Clark
,
A. J.
,
J. S.
Kain
,
P. T.
Marsh
,
J.
Correia
,
M.
Xue
, and
F.
Kong
,
2012
:
Forecasting tornado pathlengths using a three-dimensional object identification algorithm applied to convective-allowing forecasts
.
Wea. Forecasting
,
27
,
1090
1113
.
Collins
,
M.
, and
M. R.
Allen
,
2002
:
Assessing the relative roles of initial and boundary conditions in interannual to decadal climate predictability
.
J. Climate
,
15
,
3104
3109
.
Daly
,
C.
,
W. P.
Gibson
,
G. H.
Taylor
,
G. L.
Johnson
, and
P.
Pasteris
,
2002
:
A knowledge-based approach to the statistical mapping of climate
.
Climate Res.
,
22
,
99
113
.
Davis
,
C. A.
,
K. W.
Manning
,
R. E.
Carbone
,
S. B.
Trier
, and
J. D.
Tuttle
,
2003
:
Coherence of warm-season continental rainfall in numerical weather prediction models
.
Mon. Wea. Rev.
,
131
,
2667
2679
.
Demuth
,
H. B.
, and
M. H.
Beale
,
1993
: Neural network toolbox for use with MATLAB user's guide. The MathWorks. [Available online at http://mathworks.com.]
Diffenbaugh
,
N. S.
,
R. J.
Trapp
, and
H. E.
Brooks
,
2008
:
Challenges in identifying influences of global warming on tornado activity
.
Eos, Trans. Amer. Geophys. Union
,
89
,
553
554
.
Done
,
J.
,
C. A.
Davis
, and
M.
Weisman
,
2004
:
The next generation of NWP: Explicit forecasts of convection using the Weather Research and Forecasting (WRF) model
.
Atmos. Sci. Lett.
,
5
,
110
117
.
Doswell
,
C. A.
, III,
H. E.
Brooks
, and
M. P.
Kay
,
2005
:
Climatological estimates of daily local nontornadic severe thunderstorm probability for the United States
.
Wea. Forecasting
,
20
,
577
595
.
Dudhia
,
J.
,
1989
: Numerical study of convection observed during the Winter Monsoon Experiment using a mesoscale two-dimensional model. J. Atmos. Sci.,46, 3077–3107.
Gensini
,
V. A.
, and
W. S.
Ashley
,
2011
:
Climatology of potentially severe convective environments from North American regional reanalysis
. Electron. J. Severe Storms Meteor.,
6
(8). [Available online at http://www.ejssm.org/ojs/index.php/ejssm/issue/view/33.]
Hagan
,
M. T.
,
H. B.
Demuth
, and
M. H.
Beale
,
1996
: Neural Network Design. 1st ed. PWS Publishing, 730 pp.
Hall
,
S. G.
, and
W. S.
Ashley
,
2008
: Effects of urban sprawl on the vulnerability to a significant tornado impact in northeastern Illinois. Nat. Hazards Rev.,9, 209–219.
Hong
,
S.-Y.
, and
J. J.
Lim
,
2006
: The WRF single-moment 6-class microphysics scheme (WSM6). J. Korean Meteor. Soc.,42, 129–151.
Iacono
,
M. J.
,
E. J.
Mlawer
,
S. A.
Clough
, and
J. J.
Morcrette
,
2000
: Impact of an improved longwave radiation model, RRTM, on the energy budget and thermodynamic properties of the NCAR Community Climate Model, CCM3. J. Geophys. Res.,105, 14 873–14 890.
Jackson
,
D. A.
,
2002
: Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model.,154, 135–150.
Kain
,
J. S.
,
S. J.
Weiss
,
J. J.
Levit
,
M. E.
Baldwin
, and
D. R.
Bright
,
2006
:
Examination of convection-allowing configurations of the WRF model for the prediction of severe convective weather: The SPC/NSSL Spring Program 2004
.
Wea. Forecasting
,
21
, 167–181.
Kain
,
J. S.
, and
Coauthors
,
2008
:
Some practical considerations regarding horizontal resolution in the first generation of operational convection-allowing NWP
.
Wea. Forecasting
,
23
,
931
952
.
Kain
,
J. S.
,
D. R.
Bright
,
A. R.
Dean
,
M. C.
Coniglio
, and
S. J.
Weiss
,
2011
:
Probabilistic forecast guidance for severe thunderstorms based on the identification of extreme phenomena in convection-allowing model forecasts
.
Wea. Forecasting
,
26
, 714–728.
Kalnay
,
E.
, and
Coauthors
,
1996
:
The NCEP/NCAR 40-Year Reanalysis Project
.
Bull. Amer. Meteor. Soc.
,
77
,
437
471
.
Kistler
,
R.
, and
Coauthors
,
2001
:
The NCEP–NCAR 50-Year Reanalysis: Monthly means CD-ROM and documentation
.
Bull. Amer. Meteor. Soc.
,
82
, 247–267.
Lemons
,
H.
,
1942
:
Hail in American agriculture
.
Econ. Geogr.
,
18
,
363
378
.
Manzato
,
A.
,
2005
:
The use of sounding-derived indices for a neural network short-term thunderstorm forecast
.
Wea. Forecasting
,
20
,
896
917
.
Marsh
,
P. T.
,
H. E.
Brooks
, and
D. J.
Karoly
,
2007
:
Assessment of the severe weather environment in North America simulated by a global climate model
.
Atmos. Sci. Lett.
,
8
,
100
106
.
Marsh
,
P. T.
,
H. E.
Brooks
, and
D. J.
Karoly
,
2009
: Preliminary investigation into the severe thunderstorm environment of Europe simulated by the Community Climate System Model 3. Atmos. Res.,93, 607–618.
Marzban
,
C.
,
2000
:
A neural network for tornado diagnosis: Managing local minima
.
Neural Comput. Appl.
, 9, 133–141.
Marzban
,
C.
, and
G. J.
Stumpf
,
1996
:
A neural network for tornado prediction based on Doppler radar-derived attributes
.
J. Appl. Meteor.
,
35
,
617
626
.
Marzban
,
C.
, and
G. J.
Stumpf
,
1998
:
A neural network for damaging wind prediction
.
Wea. Forecasting
,
13
,
151
163
.
Marzban
,
C.
, and
A.
Witt
,
2001
:
A Bayesian neural network for severe-hail size prediction
.
Wea. Forecasting
,
16
,
600
610
.
Marzban
,
C.
,
H.
Paik
, and
G. J.
Stumpf
,
1997
:
Neural networks vs. Gaussian discriminant analysis
.
AI Appl.
,
11
,
49
58
.
Mellor
G. L.
,
T.
Yamada
,
1982
: Development of a turbulence closure model for geophysical fluid problems. Rev. Geophys. Space Phys.,20, 851–875.
Mlawer
E. J.
,
S. J.
Taubman
,
P. D.
Brown
,
M. J.
Iacono
, and
S. A.
Clough
,
1997
: Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave. J. Geophys. Res., 102, 16 663–16 682.
NOAA
, cited
2011a
: Killer tornadoes. [Available online at http://www.spc.noaa.gov/climo/torn/STATIJ11.txt.]
NOAA
, cited
2011b
: The Great Plains tornado outbreak of May 3–4, 2009. [Available online at http://www.srh.noaa.gov/oun/?n=events-19990503.]
NOAA
, cited
2011c
: Why one inch hail criterion? [Available online at http://www.nws.noaa.gov/oneinchhail/.]
Paulikas
,
M.
,
2010
: Thunderstorm hazard risk for the Atlanta, GA metropolitan area. M.S. thesis, Dept. of Atmospheric Sciences, Northern Illinois University, 124 pp.
Richardson
,
Y.
, and
Coauthors
,
2012
:
The pretornadic phase of the Goshen County, Wyoming, supercell of 5 June 2009 intercepted by VORTEX2. Part II: Intensification of low-level rotation
.
Mon. Wea. Rev.
,
140
, 2916–2938.
Rogers
,
S.
,
1996
:
Hail damage: Physical meteorology and crop losses
.
Proc. Fla. State Hortic. Soc.
,
109
,
97
103
.
Rosenzweig
,
C.
,
A.
Iglesius
,
X. B.
Yang
,
P. R.
Epstein
, and
E.
Chivian
,
2001
: Climate change and extreme weather events—Implications for food production, plant diseases, and pests. Global Change and Human Health, Vol. 2, No. 2, NASA Publ., 90–104.
Schwartz
,
C. S.
, and
Coauthors
,
2009
:
Next-day convection-allowing WRF model guidance: A second look at 2-km versus 4-km grid spacing
.
Mon. Wea. Rev.
,
137
,
3351
3372
.
Skamarock
,
W. C.
,
2004
:
Evaluating mesoscale NWP models using kinetic energy spectra
.
Mon. Wea. Rev.
,
132
, 3019–3032.
Sobash
,
R.
,
D. R.
Bright
,
A. R.
Dean
,
J. S.
Kain
,
M.
Coniglio
,
S. J.
Weiss
, and
J. J.
Levit
,
2008
: Severe storm forecast guidance on explicit identification of convective phenomena in WRF-model forecasts. Preprints, 24th Conf. on Severe Local Storms, Savannah, GA, Amer. Meteor. Soc., 11.3. [Available online at https://ams.confex.com/ams/pdfpapers/142187.pdf.]
Thompson
,
R. L.
,
R.
Edwards
,
J. A.
Hart
,
K. L.
Elmore
, and
P.
Markowski
,
2003
:
Close proximity soundings within supercell environments obtained from the rapid update cycle
.
Wea. Forecasting
,
18
,
1243
1261
.
Thorne
,
P. W.
, and
R. S.
Vose
,
2010
: Reanalyses suitable for characterizing long-term trends
Bull. Amer. Meteor. Soc.
,
91
,
353
–361.
Trapp
,
R. J.
,
N. S.
Diffenbaugh
,
H. E.
Brooks
,
M. E.
Baldwin
,
E. D.
Robinson
, and
J. S.
Pal
,
2007
:
Changes in severe thunderstorm environment frequency during the 21st century caused by anthropogenically enhanced global radiative forcing
.
Proc. Natl. Acad. Sci. USA
,
104
,
19 719
19 723
.
Trapp
,
R. J.
,
N. S.
Diffenbaugh
, and
A.
Gluhovsky
,
2009
:
Transient response of severe thunderstorm forcing to elevated greenhouse gas concentrations
.
Geophys. Res. Lett.
, 36, L01703, doi:10.1029/2008GL036203.
Trapp
,
R. J.
,
E.
Robinson
,
M.
Baldwin
,
N.
Diffenbaugh
, and
B.
Schwedler
,
2011
: Regional climate of hazardous convective weather through high-resolution dynamical downscaling.
Climate Dyn.
, 37, 677–688.
Weisman
,
M. L.
,
C.
Davis
,
W.
Wang
,
K. W.
Manning
, and
J. B.
Klemp
,
2008
:
Experiences with 0–36-h explicit convective forecasts with the WRF-ARW model
.
Wea. Forecasting
,
23
,
407
437
.
Yuan
,
Y.
,
S. L.
Mullen
,
X.
Gao
,
S.
Sorooshian
,
J.
Du
, and
H.-M. H.
Juang
,
2007
:
Calibration of probabilistic quantitative precipitation forecasts with an artificial neural network
.
Wea. Forecasting
,
22
,
1287
1303
.

Footnotes

1

Prior to 2010, this is defined as mentioned above. On 5 January 2010, the minimum diameter threshold for severe hail events was changed to 25 mm (NOAA 2011c). No attempt is made to correct for this change.

2

The parameters used for “low level” wind, temperature, and humidity differ over parts of the 20-yr experiment. The 2-m temperature and humidity and 10-m wind speed are used in 1991–2000, whereas the lowest-model-level values are used in 1990 and 2001–09. However, examination of quantile–quantile plots for these variables indicates that the distributions of these variables are similar over their respective time frames.