Proximity sounding studies typically seek to optimize several trade-offs that involve somewhat arbitrary definitions of how to define a “proximity sounding.” More restrictive proximity criteria, which presumably produce results that are more characteristic of the near-storm environment, typically result in smaller sample sizes that can reduce the statistical significance of the results. Conversely, the use of broad proximity criteria will typically increase the sample size and the apparent robustness of the statistical analysis, but the sounding data may not necessarily be representative of near-storm environments, given the presence of mesoscale variability in the atmosphere. Previous investigations have used a wide range of spatial and temporal proximity criteria to analyze severe storm environments. However, the sensitivity of storm environment climatologies to the proximity definition has not yet been rigorously examined.
In this study, a very large set (∼1200) of proximity soundings associated with significant tornado reports is used to generate distributions of several parameters typically used to characterize severe weather environments. Statistical tests are used to assess the sensitivity of the parameter distributions to the proximity criteria. The results indicate that while soundings collected too far in space and time from significant tornadoes tend to be more representative of the larger-scale environment than of the storm environment, soundings collected too close to the tornado also tend to be less representative due to the convective feedback process. The storm environment itself is thus optimally sampled at an intermediate spatiotemporal range referred to here as the Goldilocks zone. Implications of these results for future proximity sounding studies are discussed.
Proximity sounding studies have been used for many decades to examine severe storm environments (e.g., Showalter and Fulks 1943; Fawbush and Miller 1952, 1954; Beebe 1958; Darkow 1969; Maddox 1976; Davies and Johns 1993; Brooks et al. 1994; Rasmussen and Blanchard 1998; Davies 2004; Thompson et al. 2007). In this approach, distributions of severe weather parameters are compiled from soundings that occur within spatial and temporal proximity to particular events (e.g., tornadoes, ≥2-in.-diameter hail). The primary objective is to identify criteria that can help forecasters anticipate when and where certain types of severe weather may occur. For example, the supercell composite parameter (SCP) is based on a study (Thompson et al. 2003) of model analysis soundings that identified several sounding parameters that discriminate well between supercell and nonsupercell environments.
Proximity sounding studies require the selection of a set of proximity criteria that presumably provide a representative sampling of the “storm environment,” that is, the region of the atmospheric parameter space supporting the storm during the time(s) when severe weather occurred. Unfortunately, the spatial and temporal scales of the typical storm environment are not known with any degree of precision. This uncertainty is reflected in the wide range of proximity criteria that have been employed in previous studies (Table 1). The more restrictive of these criteria implicitly acknowledge the large mesoscale variability that is often observed on severe weather (particularly tornado outbreak) days (Doswell 1982; Davies-Jones 1993, Markowski et al. 1998). It is tempting to presume that conditions nearer a severe weather event are generally more representative of the region of the atmosphere that fostered the development of the parent storm. Although this is undoubtedly true for some range of spatial and temporal scales, thunderstorms often substantially modify the atmosphere immediately around them, creating conditions that are uncharacteristic of the environment that gave rise to the severe weather. Most proximity sounding studies mitigate this effect by removing soundings that are obviously convectively contaminated; however, soundings taken in regions that have been more subtly affected by the nearby storm are unlikely to be identified and removed. Most studies have also excluded soundings objectively or subjectively determined to be valid outside the storm inflow region (e.g., Rasmussen and Blanchard 1998).
Another important consideration in the selection of proximity criteria is the inherent trade-off between collecting larger numbers of soundings (more inclusive criteria) and sampling the environment closer to the storm (more restrictive criteria). Except during field research programs when special serial soundings may be launched, the rarity of severe weather and the typically large separation in time and space between soundings can make it tempting to adopt less restrictive criteria in order to obtain statistically robust sample sizes. Whether such a step is justified depends on the typical scales of the storm environment, which, again, have not been well defined.
The preceding discussion makes clear that the selection of proximity criteria is a nontrivial matter. Nevertheless, until now, little attempt has been made to statistically assess the impacts of proximity criteria on the analyzed climatological storm environment. In this paper, we examine and compare the climatologies of significant tornado (SIGTOR; F2+) environments obtained using several sets of proximity criteria. We seek to answer two important questions: 1) Do different definitions of proximity result in (statistically) significantly different climatologies? 2) If so, can any of these climatologies be confidently identified as being most representative of the storm environment?
The methods we use to answer these two questions are described in section 2. The impacts of varying the spatial proximity criteria on the climatological SIGTOR environment are examined in section 3. A similar analysis of the temporal proximity criteria is presented in section 4. A summary and proposed future work follow in section 5.
This study makes use of a database of 1265 significant tornado soundings (valid 0000 UTC) collected by Craven and Brooks (2004) from the conterminous United States (CONUS) for the period 1957–96. In that study, proximity was defined as the event occurring within 185 km of the sounding release location between 2100 and 0300 UTC (within 3 h of the sounding). As stated in Craven and Brooks (2004), the soundings were not subjectively modified, such as by manually adjusting surface conditions, as it was anticipated that the effects of unrepresentative, contaminated, or erroneous data would be damped out in the statistical analysis. However, some basic quality control measures were employed. Soundings with most unstable (MU) CAPE < 150 J kg−1 were removed, as in Brooks et al. (1994), in order to exclude soundings that were unlikely to be representative of the inflow sector (e.g., those affected by convective outflow or located behind cold fronts or drylines). In addition, steps were taken to minimize the inclusion of physically unrealistic data in the sounding analyses. Lapse rates exceeding 11°C km−1 in the 0–3 km AGL layer were removed, as were those greater than 10.2°C km−1 in the 0–6 km AGL and 850–700- or 700–500-hPa layers. Shear values exceeding 50 m s−1 (100 m s−1) in the 0–1-km (0–6 km) layer were also excluded. Soundings with MUCAPE or 100-hPa mean-layer (ML) CAPE greater than 5000 J kg−1 were manually inspected, and suspect soundings were excluded (Craven and Brooks 2004). Since soundings close to the storm are more likely to be modified by nearby convection (e.g., Weisman et al. 1998), in the present study all soundings taken within 40 km of a tornado were subjectively examined for convective contamination. This resulted in six additional contaminated soundings being identified and removed. Finally, in order to prevent extreme outliers (many of which were caused by erroneous or unrepresentative sounding data) from contaminating the statistical analysis, all sounding parameter values occurring outside of the 2.5th–97.5th percentile range were omitted. Seventeen kinematic, thermodynamic, and composite sounding parameters identified in previous studies as being important for significant tornadogenesis were examined (Table 2).
Two nonparametric (no probability distribution assumed) statistical significance tests are used to identify differences between the parameter distributions obtained using different proximity criteria. A permutation test (Efron and Tibshirani 1993) is used to identify significant differences between the distribution means, and a Kolmogorov–Smirnov (K–S) test (Conover 1999) is used to assess differences between the empirical distribution functions of parameters at various temporal and spatial distances. Statistical comparison tests are crucial since subjective comparison of distributions (e.g., visual inspection of box-and-whisker plots) does not account for sample size. It is very important to note, however, that failure to reject the null hypothesis does not necessarily mean that either the null hypothesis is true or the difference between the populations is small [a discussion of this and other limitations of null hypothesis significance testing can be found in Nicholls (2001)]. Two samples may come from very different populations and yet be too small for significance tests to confidently establish that the two populations are indeed different. It should also be borne in mind that rejection of the null hypothesis merely indicates there is sufficient evidence to declare the two distributions to be different. It is then necessary to consider whether these differences are large enough to be practically meaningful. For example, a difference of 100 J kg−1 between the mean ML CAPE calculated using two different proximity criteria may mean little to a forecaster. This is because errors in vertical temperature and dewpoint measurements can easily result in uncertainty in CAPE of 100 J kg−1 or more. On the other hand, a difference in CAPE of 100 J kg−1 combined with similarly small differences in several other critical parameters may be important to the performance of a statistical prediction technique that distinguishes between, for example, SIGTOR and non-SIGTOR environments. Thus, in determining which proximity criteria to adopt, it is important to consider how the resulting analysis will be used. If a less restrictive proximity definition provides a much larger sample size without changing the results in a way that is meaningful to their application, then it may be advantageous to adopt the broader criteria.
3. Sensitivity to spatial criteria
The set of soundings valid within 1 h of a significant tornado report was subdivided into four categories based on the distance to the report: 0–40, 40–80, 80–121, and 121–185 km (Fig. 1). Sample sizes for all the proximity categories used in this study are given in Table 3. The K–S and permutation tests were performed on each pair of the distributions listed above. From Table 4, significant differences are readily apparent.
Given that the analyzed storm environment is sensitive to the proximity of the sounding to the (significant) tornado, it is now necessary to determine whether any of our proximity categories provide a more representative sampling of the storm environment than do the other categories. Since significant tornadoes tend to occur in regions of larger instability, vertical wind shear and storm-relative helicity, and lower ML lifting condensation levels (LCLs) (e.g., Thompson et al. 2003), it is reasonable to expect measures of these properties to become more favorable as the proximity to the storm increases. However, very close to the storm, some of these properties may become less favorable due to convective feedbacks into the near-storm environment (e.g., anvil shadow, cold outflow, and precipitation). The most representative (and thus optimum) storm environment may therefore occur at some intermediate distance and/or time from the storm. We will call this hypothetical region the Goldilocks zone (GZ).
In the bootstrapping method (Efron and Tibshirani 1993), an original data sample is repeatedly resampled (10 000 times in our case) with replacement. These samples are then used to derive the empirical distribution function of the desired test statistic (in our case, the mean). Figures 2 –5 show box-and-whisker plots of the bootstrapped means for selected parameters. The boxes in the plots indicate the interquartile range (IQR) of the bootstrapped means for each distribution. The hatched area in each IQR indicates the 95% confidence interval for the median. The “whiskers” extend to 1.5 times the IQR, and the lines above and below the box-and-whisker diagrams depict outliers. In general, the parameters either become more favorable or remain relatively constant as the distance to the tornado decreases from 121–185 to 40–80 km. Between the 40–80- and 0–40-km ranges, however, the parameters tend to either become less favorable or remain relatively constant. [There may be regions of the atmospheric parameter space where, once a “sufficiency” threshold of a particular parameter is met, exceeding this threshold does not make significant tornadogenesis more likely. Such a threshold appears to exist for deep-layer shear within the context of supercell development (Weisman and Klemp 1984; Thompson et al. 2007). However, this does not change the fact that, in general, more “favorable” values of any given parameter will make significant tornadogenesis more likely]. This pattern suggests that a GZ indeed exists within the climatological SIGTOR environment and that the 40–80-km spatial proximity criterion does a better job of sampling this atmospheric “sweet spot” than do the other three spatial criteria, at least for the 0–1-h time frame. In some cases, the 40–80-km annulus did not do a substantially better job of representing the most favorable climatological storm environment, but in no case did it do a substantially worse job. The finding that significant-tornadic storms can significantly modify their nearby environment is consistent with the supercell simulations of Weisman et al. (1998). Also consistent with that and other numerical sensitivity studies is our finding that the kinematic environment is more strongly modified by convective feedbacks than is the thermodynamic environment. Away from the strongest storm feedbacks (40–185 km), however, the climatological kinematic environment appears to be more spatially uniform than the thermodynamic environment.
Figures 2 –5 also illustrate the dangers of misinterpreting the results of null hypothesis significance tests. The tendency toward less favorable parameter values very close to the tornado is evident in the plots of ML convective inhibition (CIN) and ML 3-km CAPE. However, the p values for the statistical comparisons of the 0–40- and 40–80-km distributions are larger than 0.20 for both of these cases. Based solely on the significance tests, we would not be justified in saying that these parameters become less favorable very close to the storm. However, this trend is apparent for a variety of the examined parameters and, in fact, is statistically quite significant (p < 0.05) for some of them. Thus, the failure to reject the null hypothesis in those cases where a trend appears in the data likely arises (in at least some of these cases) from insufficient sample size, and not necessarily because the two samples were collected from the same or very similar populations.
Clearly, different proximity criteria can produce parameter distributions that are significantly different and the 40–80-km proximity criterion provides a more representative sampling of the climatological SIGTOR storm environment than do the other three considered criteria. Whether these differences have important operational implications must now be addressed. Table 5 lists the absolute differences between the sample means of each pair of distributions for each sounding parameter. Many of the differences in the sample means have magnitudes that are similar to or smaller than the typical measurement and model analysis and forecast errors associated with these parameters (Elmore et al. 2002). This is true even for those distributions whose means are distinguishable at a 95% significance level. At first glance then, it seems that the choice of proximity criteria may not be a major concern to an operational forecaster. However, such differences affect objective event classification techniques that use some of these parameters as input. Thus, in deriving a climatology of SIGTOR environments, it may be prudent to restrict the sounding set to those valid within the GZ identified here (40–80 km). Of course, this is assuming a temporal criterion of 0–1 h; the impacts of sampling further in time from the storm are examined in the next section.
4. Sensitivity to temporal criteria
The proximity sounding database is next stratified by spatial proximity (0–40, 40–80, 80–121, and 121–185 km) and then by temporal proximity (0–1, 1–2, and 2–3 h). This allows us to determine the sensitivity of the analyzed climatology to the temporal proximity criterion for different intervals of distance from the tornado. Results of the permutation and K–S tests as well as the differences between the sample means for different temporal intervals are shown in Tables 6 –9.
The p values obtained in these tests indicate that, regardless of the spatial proximity criterion, there are many significant differences between the three examined temporal cross sections of the climatological SIGTOR environment. However, the sample mean differences, especially those supported by lower permutation test p values, permit an additional conclusion: the sensitivity of the parameter distributions to the temporal interval generally decreases as the range from the tornado increases. For example, the mean ML CAPE differs (depending on the temporal criterion) by up to 605, 620, 278, and 166 J kg−1 for the 0–40-, 40–80-, 80–121-, and 121–185-km categories, respectively. This pattern appears in the majority of the examined parameters and likely occurs because most of the variability due to convective feedback is confined very close to the storm. In all four datasets, the 2–3-h temporal criterion produced the least favorable environment when compared to the 40–80-km, 0–1-h distributions. For the 80–121-km dataset, the 0–1-h distributions generally appeared to better represent the storm environment, whereas for the 0–40-km dataset, the 1–2-h distributions appeared most representative. The latter result is most dramatically evidenced in the differences between the SCP and significant tornado parameter (STP) distributions (Fig. 6), which combine differences between the kinematic and thermodynamic environments. Thus, as the spatial proximity to the event decreases, it is necessary to look farther in time from the event in order to minimize the influence of the storm.
As mentioned previously, one goal of this type of analysis is to identify proximity criteria that maximize the sample size while remaining representative of the storm environment. Thus, it is useful to observe that no significant differences were identified between the 40–80-km, 0–1-h and 40–80-km, 1–2-h distributions. Additional permutation and K–S tests (not shown) also revealed no significant differences between the 40–80-km, 0–1-h and 0–40-km, 1–2-h distributions. These results indicate that the proximity definition identified in the previous section as best representing the GZ (40–80 km, 0–1 h) can safely be expanded to include all soundings taken within 40–80 km and 0–2 h or 0–40 km and 1–2 h of a significant tornado.
5. Summary and future work
Previous proximity sounding studies have employed a wide range of proximity criteria. Until now, little has been done to assess the sensitivity of the analyzed severe storm environment climatology to the definition of proximity. Using a large database of significant tornado soundings, it is found that varying the definition of proximity can produce statistically significant differences in the distributions of thermodynamic, kinematic, and composite sounding parameters. The importance of these differences depends on how the climatology is intended to be applied. It is recommended that future proximity sounding studies perform sensitivity analyses similar to this one in order to avoid potentially inappropriate proximity criteria.
This analysis revealed the existence of what we have termed a Goldilocks zone—a spatiotemporal distance from a thunderstorm-related event (in this case, significant tornadoes) that, climatologically, is close enough to the parent storm to be representative of its background environment, yet distant enough to minimize the effects of convective feedbacks. The more closely a proximity definition matches the location in space and time of the GZ, assuming one exists for a particular type of climatological storm environment, the more accurately the climatological storm environment can be assessed.
There are a number of potentially valuable extensions of the current work. The same procedure used to assess the dependence of SIGTOR climatology on the choice of proximity criteria could also be applied to storms producing significant (≥2-in. diameter) hail. The impacts of proximity definition on the analyzed differences between SIGTOR and significant hail environments could then be examined. By virtue of the relatively large sample sizes available and the care taken in selecting the proximity criteria, a potentially valuable by-product of these efforts would be highly reliable environmental climatologies of two very important kinds of severe weather.
The authors thank Jeff Craven for providing the sounding dataset used for this study and John Hart for calculating the sounding analysis parameters. We also thank Jonathan Davies and two anonymous reviewers for their helpful comments and suggestions. This material is based upon work supported by the National Science Foundation under Grant ATM-0097651. NSSL/NOAA also provided partial support for this work.
Corresponding author address: Corey K. Potvin, School of Meteorology, University of Oklahoma, 120 David L. Boren Blvd., Norman, OK 73072. Email: firstname.lastname@example.org