## 1. Introduction

Fundamentally, deep convection initiation (CI) requires that a volume of air is lifted to a level where it is able to realize considerable positive buoyancy over a significant depth. Positive area on a thermodynamic diagram for some lifted parcel is a necessary condition; however, the effects of dilution on the buoyancy that an actual updraft is able to realize cannot be neglected (Houston and Niyogi 2007, hereafter HN07). Also, the amount of lift that is needed depends on the amount of inhibition present below the level of free convection (LFC). These ideas are captured by the three “ingredients” of Johns and Doswell (1992)—instability, moisture, and lift. Consideration of the processes that govern convection suggests a slight modification to that approach in which CI is examined in the context of two pairs of factors—buoyancy and dilution, and lift and inhibition. The “moist layer of sufficient depth” (Johns and Doswell 1992) has two roles. The first is to produce a parcel with sufficient equivalent potential temperature (*θ*_{e}) to achieve positive buoyancy given the temperature profile and the assumptions of parcel theory. The second role (and the primary reason that the depth of the moisture is important) is to limit the dilution of the parcel as it ascends.Therefore, we contend that it is better to consider buoyancy and dilution as the governing factors. Furthermore, HN07 showed that a positive feedback may exist between dilution and buoyancy. Lift and inhibition are paired since the amount of inhibition is what determines if a given amount of lift is sufficient to initiate a thunderstorm.

Buoyancy and inhibition are frequently assessed using parameters based on parcel theory, particularly convective available potential energy (CAPE; Moncrieff and Miller 1976) and convective inhibition (CIN). However, the collocation of significant CAPE and minimal CIN does not guarantee that deep convection will develop, even when a lifting mechanism is present (Ziegler and Rasmussen 1998, hereafter ZR98).

Vertical motion is a quantity that is difficult to accurately diagnose in the atmosphere, due largely to sparse and flawed observational data. It has been known for quite some time that convergence lines are favored locations for CI (Purdom 1982; Wilson and Schreiber 1986). Accordingly, low-level convergence is often used as a measure of lift in forecasting CI, and airmass boundaries are favored locations for this to occur.

The explicit exclusion of parcel dilution is a major limitation of traditional parcel theory. Dilution occurs when a rising parcel entrains environmental air with lower *θ*_{e} and lower total water content, which act to reduce the amount of buoyancy the parcel can realize. Entrainment of dry environmental air with lower *θ*_{e} will reduce the parcel *θ*_{e} through mixing and, if the parcel is saturated, evaporative cooling. The theory of criticality proposed by HN07 is an effort to include the feedback between buoyancy and parcel dilution in the CI process. In their numerical experiments, deep convection only occurred if the rate at which parcels could gain buoyancy through ascent exceeded the rate at which buoyancy was lost through dilution. The presence or absence of deep convection was found to be related to the lapse rate of the active cloud-bearing layer (ACBL), which is the layer above the LFC where “active” convection is occurring (Stull 1985). Although dilution is a cumulus-scale process that cannot be directly measured or computed from the data available, environmental parameters relevant to dilution (relative humidity, ACBL lapse rate, and vertical wind shear) can be measured, and that is the intent here.

The purpose of this work is to determine how often each of the basic factors (buoyancy, dilution, lift, and inhibition) is the difference between thunderstorms initiating and thunderstorms not initiating. Even though the reasoning outlined above applies to deep convection in general, the data in this study are generated from a subset of deep convection that produced cloud-to-ground lightning. The most accurate description of this dataset is thunderstorms that produce cloud-to-ground lightning; however, in the interest of brevity, these will be described as “thunderstorms.” This determination requires quantifying the factors at locations where CI occurred as well as other locations where CI did not occur as a point of comparison. These locations need to be related enough to make meaningful pairwise comparisons. To quantify the factors, a number of parameters will be computed from hourly Rapid Update Cycle (RUC)-2 analyses (Benjamin et al. 2004; NCDC 2011a), with the intent that they be independent of geography and/or season as much as possible. To be clear, the use of parameters is not intended as a search for an as-yet-undiscovered “magic bullet” to forecast thunderstorm initiation. Rather, the relative importance of a parameter is used to indicate the importance of the factor it is measuring. Since multiple parameters may be used to measure the same basic factor, some insight can be gained on the most effective ways of quantifying each factor.

## 2. Description of parameters

Although an essentially infinite range of parameters could be computed from RUC-2 data, the parameters chosen for this study are intended to represent physical processes occurring in the environment that would affect the development of convection. The parameters to be computed from the RUC-2 analysis data are shown in Table 1. The descriptions and justifications for the parameters used in this work follow.

Parameters to be computed from RUC-2 analysis data, listed by basic factors the parameters are designed to quantify.

### a. CIN

One of the most commonly used metrics to forecast initiation is CIN, as it quantifies how much lift must be provided for a parcel to reach its LFC. There are three main “parcels” that are commonly used to compute CIN (and CAPE)—the surface parcel, the mixed-layer parcel, and the most unstable parcel. For this work, all three will be used with the purpose of comparing the outputs to evaluate the assumptions made about the properties of the parcels responsible for initiating thunderstorms. The surface-based method assumes the parcels that initiate convection mainly originate near the surface, which is clearly inadequate for cases where convection is elevated above a low-level stable layer. The most unstable parcel method assumes that the parcels with the highest *θ*_{e} are most relevant. In many cases, the surface-based and most unstable methods are equivalent since the surface parcel has the highest *θ*_{e}. The mixed-layer method uses a parcel with the mean mixing ratio and potential temperature of the lowest 100 hPa or lowest 1 km of the atmosphere (here the lowest 100-hPa layer is used). This method is an attempt to account for mixing within the boundary layer, and it generally yields lower values of CAPE and larger values of CIN than the other methods. Parcel ascent is treated as a pseudoadiabatic process and the virtual temperature correction (Doswell and Rasmussen 1994) is used in all parcel-based computations. Significantly less CIN should be expected in cases with storms, although there should also be null cases with minimal CIN.

### b. Maximum omega *Ω* and height of maximum omega H_{Ω}

In the past, estimates of vertical motion were not available at the spatial and temporal resolution needed for forecasting thunderstorm initiation, so vertical motion was assumed or inferred from other fields. With the advent of the RUC and other similar models, estimates of vertical motion are available at 20-km horizontal grid spacing and hourly resolution. While this horizontal grid spacing is unable to resolve meso-*γ*-scale updrafts that directly initiate thunderstorms, it is worth testing the ability of this parameter to discriminate environments that do or do not initiate thunderstorms. Specifically, both the magnitude of maximum upward motion no higher than 100 hPa above the LFC (denoted Ω) and the height of that maximum value (denoted *H*_{Ω}) will be obtained from the RUC-2 data, as both the strength and depth of lift may be relevant to initiation. Both of these parameters should be higher where initiation occurred. Care was taken to avoid representing the signal of the RUC convective parameterization in these quantities, primarily by selecting points that were away from preexisting storms (see sections 3b and 3c).

### c. H_{LFC}

As described by ZR98, *H*_{LFC} is the ratio of the height of maximum upward motion (*H*_{Ω}, assumed to represent the top of the mesoscale updraft) and the height of the LFC. The layer for finding Ω and *H*_{Ω} (parcel level to 100 hPa above the LFC) is chosen to ensure that the identified maximum updraft is within the region relevant for CI while still allowing for *H*_{LFC} values to be significantly above 1 if the updraft extends above the LFC. Although the ratio *R*_{LFC} defined by Ziegler et al. (2007) mitigates an additional limitation of parcel theory by taking into account the ratio of the horizontal wind and updraft width scales and is therefore a stronger CI condition, *H*_{LFC} values greater than 1 favor CI. Note that *R*_{LFC} is not employed here because the data available are not sufficient to determine the updraft width scale.

### d. Convergence

Many studies (e.g., Wilson et al. 1992; Xue and Martin 2006) have related CI to areas of enhanced convergence. The depth of convergence has also been shown to be important (Wilson et al. 1992; ZR98; Ziegler et al. 2007); thus, the convergence over a “deep” layer (such as parcel level to LFC) may be more useful than surface convergence alone. The computation of both surface and 0-LFC mean convergence will allow this hypothesis to be tested. The 0-LFC mean convergence is the integral of convergence from the parcel level to the LFC for the most unstable parcel, divided by the distance between the two levels. Although moisture flux convergence is frequently used in the forecasting of severe storms, Banacos and Schultz (2005) suggest that simple mass convergence provides essentially the same information and is more physically sound. However, as noted by Doswell and Schultz (2006), divergence can be a rather noisy and volatile field.

### e. Subcloud wind shear

RKW theory (named for Rotunno, Klemp, and Weisman; Rotunno et al. 1988) states that when the horizontal vorticity associated with the cold pool is equal and opposite the environmental vorticity, strong vertical updrafts are created along the gust front. Since we are interested in first initiation, there should be no “cold pools,” although there could be airmass boundaries, which have horizontal vorticity associated with them. The application of RKW theory to CI was lent some credence by Lee et al. (1991), who evaluated a case of thunderstorm initiation along colliding boundaries and found that the removal of low-level vertical shear in a model simulation diminished the convection. Here, low-level shear is defined as shear between the parcel level and the LCL to represent subcloud shear and to be able to account for elevated parcels.

### f. Δz*

First introduced by HN07, Δ*z** is defined as LFC height minus initial parcel height. This is useful since it is related to the depth of lift needed to initiate a thunderstorm instead of just the strength of the lift. Smaller values of *Δz** should be found in cases with storms.

### g. CAPE

The existence of positive CAPE is a necessary, but not sufficient, condition for thunderstorms to occur. In theory, more CAPE would produce a stronger updraft given that the parcel is able to reach the LFC. Surface-based, mixed-layer, and most unstable parcels will be used to compute CAPE.

### h. ACBL lapse rate

This parameter was shown to be important in the success or failure of CI by HN07 (here the ACBL is defined as a 1.5-km-deep layer starting at the LFC). As concluded by HN07, larger lapse rates increase the vertical displacement of parcels caused by an airmass boundary due to reduced static stability. Also, steeper lapse rates above the LFC allow parcels ascending through the layer to gain buoyancy more rapidly. Parcels for which the gain of buoyancy through ascent exceeds the loss of buoyancy through entrainment are termed “supercritical” by HN07.

### i. LCL to LCL+2 km CAPE (LCLCAPE)

This parameter has been developed specifically for this work and is defined as the sum of the CAPE and CIN in a 2-km layer based at the LCL. This layer was chosen to represent the region immediately above the cloud base where the feedback between buoyancy and dilution is most important while being adaptable to deep boundary layers and elevated convection. As the sum of CAPE and CIN, this quantity can be either positive or negative. The purpose of this metric is to quantify how quickly a parcel can gain buoyancy. Larger values of this parameter should be found in environments that support CI since parcels are more able to overcome the negative effects of dilution.

### j. Surface to top of ACBL mixing ratio difference (MRD)

This is another parameter developed specifically for this work. It is designed to represent the cumulative potential entrainment a rising parcel might experience as it ascends to a level where it is significantly buoyant. By integrating the difference between parcel mixing ratio and environmental mixing ratio from the parcel level to the top of the ACBL, the overall dryness of the environment during the critical early stages of convective cloud development can be characterized. Ziegler et al. (1997) showed that in mesoscale updrafts along a dryline where thunderstorms develop, the change in mixing ratio from the surface to the LFC is minimal, in contrast to nearby areas where they do not develop. This deepening of the moist layer is a result of persistent convergence and upward motion. It is hypothesized that if rising parcels must pass through deep dry layers before significant buoyancy is achieved, then the likelihood of thunderstorm initiation will be reduced.

### k. ACBL wind shear

While vertical wind shear is known to help in storm organization and severity, a number of studies have suggested that vertical shear above the boundary layer has a negative effect on storm initiation. Weisman and Klemp (1982) and Lee et al. (1991) showed that increased vertical shear tended to decrease the maximum updraft speed of the convection and delay its onset. Possible mechanisms by which increased shear above the LFC can inhibit convection are increased entrainment and the advection of developing clouds away from the boundary layer updraft (ZR98; Peckham and Wicker 2000). It is hypothesized that there will be less wind shear in the ACBL in the cases of deep convection. The notion of wind shear having opposite effects depending on the layer of the shear (i.e., subcloud compared to surface to ACBL) is consistent with the results of Lee et al. (1991).

## 3. Methodology

### a. Radar-based thunderstorm identification

Accurately identifying locations of thunderstorm initiation requires first identifying and tracking individual thunderstorms. For a large spatial domain covered by multiple radars, this is best done by combining the radar data into a common grid and identifying and tracking thunderstorms within that grid. Level-II radar data were downloaded from the National Climatic Data Center (NCDC) archive (NCDC 2011b) for 2005–07 for 44 radars covering the Great Plains (Fig. 1). The Thunderstorm Observation by Radar (ThOR; Lahowetz et al. 2010) algorithm was used to identify thunderstorm tracks from these data. ThOR consists of the following key steps: 1) remove nonmeteorological echoes using a neural network quality control algorithm [the w2qcnn algorithm of Lakshmanan et al. (2007a)]; 2) merge the data from individual radars into a common three-dimensional grid [the w2merger algorithm of Lakshmanan et al. (2006)]; 3) attenuate stratiform precipitation using fuzzy logic; 4) identify candidate thunderstorms through image segmentation of radar reflectivity to form reflectivity clusters [the w2segmotionll algorithm described by Lakshmanan et al. (2009)]; 5) track these clusters over time; and 6) associate lightning to clusters along the tracks to classify tracks as thunderstorms. The w2qcnn, w2merger, and w2segmotionll algorithms are included in the Warning Decision Support Services–Integrated Information (WDSS-II; Lakshmanan et al. 2007b) package.

The horizontal extent of the grid used by w2merger is shown by the black box in Fig. 1. The grid spacing was 0.014° latitude × 0.011° longitude, approximately 1 km × 1 km. The w2segmotionll algorithm operated on the composite reflectivity fields (the maximum reflectivity within each vertical column) that were generated by w2merger at 5-min granularity and subsequently modified by stratiform filtering. The reflectivity clusters identified by the w2segmotionll algorithm are constrained to have a composite reflectivity value between 30 and 70 dB*Z* and a minimum area of 50 km^{2}.

The algorithm to create tracks from the reflectivity clusters starts by identifying a cluster centroid that has not been placed on a track. The 0–6-km mean wind from the North American Regional Reanalysis (NARR; Mesinger et al. 2006; NCDC 2010) is used as the initial motion estimate for the first 10 min of each candidate track. After 30 min, the motion estimate is derived from the position history of the track; between 10 and 30 min the motion estimate is the weighted average of the NARR and position history estimates. Examining the observed clusters at subsequent times, the tracking algorithm creates all unique candidate tracks that begin at the given cluster (Fig. 2). The candidate track with the lowest mean error (defined as the difference between actual position and projected position) over the duration of that candidate track is chosen as the correct track, provided that the candidate track contains at least two clusters (Fig. 2). Tracking has been verified against both human tracks (used to represent “best practices” in tracking) and tracks produced by a benchmark tracking algorithm.^{1} ThOR tracks matched the human tracks reasonably well, and outperformed the benchmark algorithm.

Cloud-to-ground lightning data containing strike location, polarity, and multiplicity at one-minute granularity obtained from the National Lightning Detection Network (NLDN) are used to classify tracks as thunderstorms. Only those tracks that have at least one cluster at the same time and location as a strike are counted as thunderstorms (cluster positions and shapes are interpolated to account for the lightning data being at 1-min granularity and the clusters being at 5-min granularity). The use of only cloud-to-ground lightning data will omit some legitimate thunderstorms from the dataset.

### b. Identification of initiation points

From the final thunderstorm tracks output by ThOR, the times and locations of CI can be determined. Because an interval of time exists between the initiation of significant deep ascent and the existence of a radar echo large and intense enough to be identified as a cluster, initiation points used for this work are identified by extrapolating the storm tracks backward 15 min from the start of each track (Fig. 3a). These candidate initiation points are then checked to see if they are within a threshold distance, Δit, of established storms at the time of initiation (see Table 2 for the distance thresholds used for this work and Fig. 3a for a schematic of this procedure). “Established” storms are defined as thunderstorms that are at least 15 min old (30 min old when considering the backward extrapolation described above). Candidate initiation points beyond the threshold distance from established storms are retained. If a candidate initiation point is within the threshold distance, it is considered connected to the ongoing convection. As a result, the entire track is considered established, and the initiation point is no longer considered. The primary interest of this study is the first initiation within an area, rather than initiation of new convective cells within an area where convection was already present, such as a preexisting multicell system. The main reason for excluding initiations near existing storms is that established storms modify the environment at temporal and spatial scales that are not well resolved by the available data (20-km RUC-2).

Description of the distance thresholds and the values used in this study.

### c. Selection of points for parameter collection

#### 1) Initiation points

The initiation points remaining after the steps described in section 3b are grouped into hourly bins centered at the nominal RUC-2 analysis times, with the center of the bin defined as *t*_{0}. Within each bin, initiation points within threshold Δii (Table 2, Fig. 3b) of each other are clustered into a single representative point. The method for determining which points should be grouped together is an adaptation of connected component analysis from graph theory (two points are considered “adjacent” if they are within Δii of each other). The mean center of each group is defined as the mean latitude and mean longitude of all candidate initiation points in the group. The candidate initiation point nearest to this mean center is cataloged as the representative for this group (Fig. 3b). Candidate initiation points beyond Δii from all other initiation points (“isolated” initiations) are also cataloged.

The motivation for this spatial grouping is to avoid biasing the final results by having many samples from the same location and time. This study is interested in whether a given environment produces deep convection, so whether one storm or five occur in that environment should not matter, and the sampling approach should reflect that. Similar reasoning was used by Thompson et al. (2003), who used time and space separation thresholds for their supercell climatology to avoid biasing their results to single events with a large number of supercells.

#### 2) Null points

To be most useful, the null cases chosen in this study need to represent an environment which is close to initiating convection, and is perhaps missing just one ingredient. The strategy adopted in this study for selecting points to represent the null case environments takes points that are a threshold distance Δni from the cataloged initiation points (Fig. 3c). To ensure that null points are actually away from areas of convection, only the candidate null points beyond a threshold distance Δnt from all thunderstorm locations within the hourly bin (Fig. 3c) and Δni from all candidate initiation points within a 3-h bin centered at *t*_{0} (Fig. 3d) are cataloged. In Fig. 3, these criteria eliminate six of the candidate null points.

The value of Δni (the distance between an initiation point and null points) should be small enough that null points represent environments that are close to initiating deep convection and large enough that the initiation point and the null point do not use the same grid point. As a result of the 20-km horizontal grid spacing and the method of selecting the model grid point used to compute the parameters, Δni should be at least 60 km to ensure that the initiation point and the null point do not use the same grid point.

For isolated initiation points, candidate null points are identified at a distance Δni from the initiation point in the eight cardinal directions. For grouped initiation points, the shape of the group is approximated by a rectangle (Figs. 3b,c). The length of the rectangle is equal to the maximum distance between candidate initiation points in the group. The width of the rectangle is twice the maximum distance from a candidate initiation point to the line connecting the maximally separated points. The actual latitude and longitude differences between those maximally separated points give the “rotation” of the rectangle. Candidate null points are then identified Δni from the corners of the rectangle and Δni from the midpoints of its sides (Fig. 3c). As the aspect ratio (length/width) of the rectangle becomes larger, the diagonal search directions compress toward the long axis of the rectangle. This is desirable since a linear pattern of initiation points is likely indicative of a linear initiation mechanism (e.g., an airmass boundary), and the most useful null points are likely those along this linear feature.

This approach for selecting null points will likely reduce separation in distributions of parameter values between the two categories since the null environments are similar to the initiation environments. However, it should allow for the isolation of the “missing ingredients.” This approach also allows pairwise differences to be used to compare storm and no-storm cases, which is a way to eliminate event-to-event variability in the convective environments. This technique will be discussed further in section 4c.

### d. Attribution of parameter values to cataloged points

The atmospheric parameters described in section 2 will be calculated from hourly RUC-2 analyses. The RUC-2 has a horizontal grid spacing of 20 km and produces new analyses and forecasts every hour. As described by Benjamin et al. (2004), analysis fields are developed using the 1-h forecast from the previous run as the first guess. This first guess is then adjusted based on observations ingested from a variety of sources [for more details on the RUC-2 data assimilation methods, see Benjamin et al. (2004)]. This approach to developing analysis fields (best described as a “warm start” because precipitation features are not explicitly assimilated) means that vertical motion has been spun up at the analysis time. Thompson et al. (2003) showed that the RUC-2 proximity soundings are similar enough to observed soundings (temperature errors <0.5 K, mixing ratio errors <0.2 g kg^{−1}) to provide adequate representation of the near-convective environment. Although Coniglio (2012) showed that such errors could be reduced by incorporating surface observations into RUC fields, we concluded that the net effect of such a procedure on the results of this work was not sufficient to justify the significant complexity involved.

The RUC grid point nearest the cataloged initiation or null point with positive most unstable CAPE (MUCAPE) will be used as the data source for that initiation or null point (Fig. 3d). The reason for this criterion is that positive MUCAPE is a necessary condition for deep convection to occur, and if the RUC grid point does not satisfy this condition then it is not a representative profile for an initiation point. This criterion is used for null points since the values of the other parameters are trivial if the necessary condition is not met. If none of the four bounding grid points for a cataloged initiation or null point has positive MUCAPE, then that initiation or null point will not be used further (applies to one initiation point and one null point in Fig. 3d).

## 4. Results

### a. Analysis of bulk statistics

Box-and-whisker plots are used to analyze the parameter values for initiation and null points. For all box-and-whisker plots shown hereafter, the box represents the middle 50% of the data, the black line is the median, and the whiskers extend to the maximum/minimum data value within 1.5 times the interquartile range. Outliers beyond this range are not plotted. Parameter values for null points at Δni values of 60, 120, and 180 km are shown along with the parameter values for initiation points. The dataset contains 55 103 initiation points that are retained after the procedures described in section 3, and 324 000 to 352 000 null points (depending on Δni). This means that each initiation point can be paired with an average of 6–7 null points at each range, out of the 8 that were originally considered.

The plots of parameter values are shown in Figs. 4 and 5. It is clear that there is significant overlap between initiation and null distributions at all three Δni values. This indicates that the differences in individual pairs are small and that robust thresholds that distinguish initiation from noninitiation for all cases do not exist. This is likely due the inability of the RUC model to resolve the narrow, deep, and intense meso-*γ*-scale updrafts responsible for many CI events (its 20-km grid spacing gives a minimum resolvable wavelength of 40 km). Because the dataset includes both surface-based and elevated convection as well as a wide range of climate zones, the spread in background environments also contributes to the overlap. This result suggests that more sophisticated pairwise analysis is required to extract the most useful information from these data (section 4b).

Another characteristic of most variables is that the separation in medians increases as Δni increases, suggesting that the favorable CI environment can be more skillfully identified at a precision of 180 or 120 km than 60 km. This is expected, as the scale of features resolvable by a model with 20-km horizontal grid spacing is *O*(100 km).

To quantify the discriminatory ability of a particular parameter, the absolute value of the difference in medians divided by the interquartile range of the initiation point values is calculated. This quantity will hereafter be referred to as the “separation.” Applying this technique to the data (Table 3) shows that the parameters with the most discriminatory ability are Ω and convergence (both surface and 0-LFC). Shear values seem to make very little difference, as all four boxes are essentially identical for both 0-LCL and ACBL shear (Fig. 5).

Separation values for parameters using Δni = 120 km. Boldface values indicate the three largest separation values for each category (all, surface-based, elevated).

### b. Pairwise differences

^{−1}is much more significant when the values are −5 and −15 than when they are −90 and −100) and accounts for the possibility of one or both parameters being negative. For this transformation the normalized difference for initiation points is defined asand for null points it is defined aswhere init and null are the parameter values at initiation and null points, respectively. Even though

*N*= −

*I*, both are included in Figs. 6 and 7 to better illustrate the overlap in the distributions. Variables that are negative by convention (e.g., CIN and Ω) will have positive values of

*I*when initiation values are less negative (smaller in magnitude) than null values. Variables that are always of one sign will have values of

*I*and

*N*that are always between −1 and 1. For variables that have meaningful values of either sign,

*I*and

*N*are bound by −2 and 2. The above formulation is vulnerable to both init and null being zero. In these situations the normalized difference is set to zero.

The box-and-whisker plots of *I* and *N* for each parameter are shown in Figs. 6 and 7. Like the bulk distributions, the discriminatory ability of each parameter increases with greater separation between initiation and null points. Also, the best-performing parameters still appear to be Ω, Δ*z**, convergence, mixed-layer CAPE (MLCAPE), and mixed-layer CIN (MLCIN). One attribute of the convergence distributions (Fig. 7) worth noting is that even though the medians are separated by a considerable amount, the area of overlap between the boxes is quite large. The histogram of *I* values for 0-LFC convergence at Δni = 120 km shows that the distribution is bimodal (Fig. 8). This bimodality is common to both convergence parameters at all Δni, which implies that convergence values at initiation and null points are not very well correlated and that the convergence field is noisy rather than smoothly varying. This enhances the possibility of getting unrepresentative raw or normalized difference values for convergence if the RUC places a convergent boundary incorrectly, even if the error is fairly slight.

The ultimate goal of this work is to provide insight as to which of the relevant factors for convection is most often the one missing from cases of initiation failure. To see how many of the initiation/null pairs could be correctly identified by various combinations of parameters, the frequencies of normalized differences that are “significant” for each parameter are cataloged. Differences are deemed to be significant if they are outside the overlapping part of the boxes on the box-and-whisker plots. Mathematically, the significance threshold is *q*_{3,init} and *q*_{3,null} are the upper quartile for the initiation and null distributions, respectively (Fig. 9). If the value of *I* for a particular pair is between *T* and *−T* (e.g., I1 in Fig. 9), then the value is considered to be insignificant. If the median *I* for all initiation/null pairs is positive, then the parameter value for a particular pair is considered significantly good if *I >T* (e.g., I3 in Fig. 9) or significantly bad if *I <* −*T* (e.g., I2 in Fig. 9). If the median *I* for all pairs is negative, then the definitions of “significantly good” and “significantly bad” would be reversed. Given the above definition of *T*, 25% of the pairs will have significantly bad normalized differences, but parameters with better overall separation in the distributions will have a greater number of significantly good differences.

A ranking of the parameters according to the percentage of significantly good differences using the entire dataset appears in Table 4. It is likely that considerable interdependences exist within groups of parameters (e.g., the three CAPE parameters, the three CIN parameters, Ω and convergence). To account for this, a procedure is implemented in which the significance of a parameter is evaluated while controlling for other significant parameters. The procedure begins by considering only those pairs with an insignificant difference for Ω, the most significant parameter according to Table 4. Among these pairs, the most significant parameter is identified as before using the percentage of pairs with significant differences. Next, only those pairs with an insignificant difference for this parameter and an insignificant difference for Ω are considered and the most significant parameter among these pairs is identified. This process is executed recursively until no more significant pairs are left. When a parameter is controlled for in this manner, other parameters strongly linked with it should be largely controlled for as well.

Ranking of parameters by percentage of significantly good differences (number of significantly good differences/all pairs) for the entire dataset at the 120-km range.

Application of this procedure (Fig. 10) revealed that the three most significant and independent parameters are Ω, Δ*z**, and MLCAPE. The effect of selecting MUCAPE rather than Ω or Δ*z** in the first or second iteration of the procedure is simply to replace MLCAPE with MUCAPE in the final set. The fourth most significant parameter is 0-LFC convergence, which is dependent on Ω. A CIN parameter does not appear until seventh on the list (MLCIN). However, it is correlated^{2} with MLCAPE (0.42) and Δ*z** (−0.42), so when those parameters are controlled for the apparent importance of MLCIN decreases. The fact that MLCAPE and Δ*z** appear before MLCIN suggests that CIN may not be the best way to quantify the relevant processes.

It is possible to evaluate how often initiation/null pairs are characterized by a significant difference for Ω, MLCAPE, or Δ*z** and insignificant differences for the other two parameters. In other words, how often is a given parameter the key parameter for CI? As illustrated in Table 5, when only one of the three parameters has a significant difference, it is most frequently Ω, followed by MLCAPE and Δ*z**.

Conditional probabilities of significantly good normalized differences (count of such occurrences in parentheses) for each parameter while controlling for the other two parameters.

### c. Surface-based versus elevated convection

Some of the parameters collected are unlikely to be relevant to elevated convection [e.g., surface-based CAPE (SBCAPE), surface-based CIN (SBCIN), MLCAPE, MLCIN, and surface convergence], so it makes sense to separately examine surface-based and elevated convection. Although the conceptual definition of elevated convection as convection that does not ingest near-surface air (Glickman 2000) is easy to grasp, practically differentiating between surface-based and elevated convection is difficult and rather uncertain (Thompson et al. 2007; Corfidi et al. 2008). As a result of this uncertainty, we omit the portion of the parameter space in which MUCAPE is greater than SBCAPE and both are nonzero [such as the sounding shown in Fig. 6 of Thompson et al. (2007)] because the extent to which surface-based parcels are “contributing” to the convection cannot be determined. We define elevated storm environments as those with zero SBCAPE, and surface-based environments as those in which SBCAPE and MUCAPE are equal. This filter is applied to both initiation and null points without regard for which initiation and null points are paired. In this dataset, approximately 60% of the points were classified as surface-based, approximately 15% were classified as elevated, and the remaining 25% were indeterminate.

For surface-based cases, mixed-layer CAPE/CIN offers more discrimination than most unstable CAPE/CIN (Fig. 11, Table 3). Compared to the plots for the entire dataset (shown in gray in Fig. 11), CAPE is larger and CIN is closer to zero in the surface-based cases. It is worth noting that 53%–55% of the null points had zero most unstable CIN (MUCIN), yet failed to initiate convection.

Other parameters that look to offer some discrimination for surface-based cases are convergence (both surface and 0-LFC), Ω, and Δ*z** (Fig. 12). Separation values for the surface-based cases (Table 3) show that mixed-layer CAPE and CIN and both convergence parameters perform better for surface-based cases than for the entire dataset. The separations for Ω and Δ*z** are approximately the same as for the whole dataset. The correlation between the two convergence variables is not particularly strong (0.31), and both are negatively correlated with Ω (~ −0.5 for both).

By definition, surface-based and mixed-layer CAPE and CIN should have little to no discriminatory ability for elevated cases, leaving the most unstable parcel as the only useful choice for any parcel-based properties. Boundary layer convergence is not especially important for elevated cases, but MUCIN and LCLCAPE are (Fig. 13, Table 3). Note that Ω and Δ*z** are relevant for both surface-based and elevated storms, which suggests that they have a robust relationship with the initiation of thunderstorms. The fact that Ω values are similar for elevated cases despite reduced mean convergence and reduced correlation between convergence and Ω (−0.17 for elevated cases compared to −0.5 for surface-based cases) suggests that the circulation could have an approximately vertical structure but be rooted above the surface near the CI location, or the circulation could be rooted near the surface but displaced laterally from the location of CI, assuming a sloped structure (Banacos and Schultz 2005).

Separating the pairs identified in the previous section into surface-based and elevated categories (only retaining the pair if both points meet the criteria) yields approximately 160 000 surface-based pairs and approximately 30 000 elevated pairs. Application of the techniques described in section 4b to find significant differences produces the results shown in Tables 6 and 7. As in Table 3, these results show a noticeable difference in the set of parameters that are most useful for surface-based convection and those most useful for elevated convection. When the recursive analysis of significant differences is applied to the data, the most important independent parameters for surface-based convection are 0-LFC convergence, MLCAPE, and surface convergence. For elevated convection, these parameters are LCLCAPE, Ω, and MUCAPE. For surface-based cases, controlling for two of the three to find which parameter is most often the key parameter identifies 0-LFC convergence as most important (conditional frequency of 34.69% at Δni = 120 km), consistent with the conclusions of Wilson et al. (1992) and ZR98. Applying the same procedure to the elevated cases identifies Ω as most important (conditional frequency of 35.10% at Δni = 120 km). These results support the conclusion that lift is most often the key factor in CI.

Ranking of parameters by percentage of significantly good differences (number of significantly good differences/all pairs) for surface-based pairs at the 120-km range.

Ranking of parameters by percentage of significantly good differences (number of significantly good differences/all pairs) for elevated pairs at the 120-km range (surface-based and mixed-layer CAPE and CIN not used).

## 5. Discussion

The parameters consistently identified in this work as the most robust indicators of thunderstorm initiation are Ω, Δ*z**, and CAPE. All three CAPE parameters are highly correlated, although SBCAPE is outperformed by both MLCAPE and MUCAPE. SBCAPE is of little value for the initiation of elevated thunderstorms and lacks the ability of MLCAPE to implicitly account for subcloud dilution. The value of MUCAPE compared to MLCAPE is principally a function of whether surface-based or elevated convection is anticipated (Table 3).

Relating these parameters back to the four basic factors indicates that lift is the most important single factor in determining where thunderstorms will initiate. In addition to its role as a trigger for thunderstorm initiation, upward motion also serves a role in preconditioning the atmosphere. Persistent updrafts along a boundary act to locally deepen the moist boundary layer while weakening the overlying capping inversion and lowering the LFC, making it more suitable for subsequent updrafts to reach their LFC (Ziegler et al. 1997). Since both lift and inhibition were considered in this work, it is apparent from these results that initiation failures due to “insufficient” lift are more commonly due to differences in lift rather than differences in inhibition. Also, buoyancy appears to be a more important factor than dilution, although both MLCAPE and Δ*z** have a relationship to dilution. The criterion that all points used in the analysis have positive MUCAPE already controlled for buoyancy to a limited degree.

Since deep convection is parameterized in the RUC model, the potential impact of the model’s convective parameterization scheme (CPS) on the vertical motion at initiation grid points must be considered. It is unlikely that the CPS has a significant influence on the results for several reasons. First, the vertical domain for finding Ω extends to at most 100 hPa above the LFC, so the highest midlevel updraft values should not be sampled. Also, if the CPS is triggered, omega values should increase significantly above the LFC and the largest value should occur at the upper bound of the domain used to find Ω. This should result in significant differences in *H*_{Ω} compared to null points where the CPS is not active. It should also result in *H*_{LFC} values significantly above 1. However, neither of these indicators is found in the dataset, as *H*_{Ω} shows essentially no difference between initiation and null points, and the median *H*_{LFC} value is very near 1. Additionally, comparing the data collected near initiation time to data at the same locations an hour earlier showed only a very slight increase in Ω, which is not consistent with a change in the status of the CPS. Although it is impossible to rule out the possibility of CPS involvement in a few cases, it seems reasonable to conclude that the differences in Ω are legitimate variations in the environment rather than artifacts of the CPS.

Neither subcloud shear nor ACBL shear shows any meaningful differences between initiation and null environments. However, the orientation of the shear with respect to a possible initiating boundary was not considered, and this is likely important for the shear in both layers. To account for this, one would need to identify if a boundary is present near the initiation or null point and define its local orientation. Within an automated scheme, this would likely be accomplished by identifying coherent, linear maxima in fields such as convergence and gradients of temperature or moisture. The orientation could be defined as the direction in which the gradient of the relevant field (e.g., convergence) is minimized, and then this orientation could be compared to the orientation of the shear vector.

The goal of this study is to identify which parameters (and ultimately processes) are most important to the initiation of thunderstorms. It is beyond the scope of this work to develop a model (e.g., through multiple linear regression) for thunderstorm forecasting based on the “most important” parameters identified here. However, future work could translate the pairwise difference approach to something computable on a grid using differences from a neighborhood mean. These values could be used as “interest fields” (Mecikalski and Bedka 2006) or as input to a number of decision-making techniques, including decision trees and logistic regression. Additional possibilities for future research include comparing CI environment parameters between seasons and/or geographic regions.

## 6. Summary

The goal of this study was to determine which of the four basic factors (lift, inhibition, buoyancy, and dilution) that regulate convection is most often responsible for CI. To this end, a suite of environmental parameters were derived from 20-km RUC-2 analysis near a large sample of CI points and null (non-CI) points. Analysis of the parameter values showed that a wide range of environments are capable of initiating convection and as a result there are no “magic numbers” that effectively discriminate initiation and null points. This analysis highlighted maximum omega and convergence as the most significant parameters.

Separately examining the environments associated with surface-based and elevated CI shows that maximum omega and Δ*z** are useful discriminators for both types. Convergence is useful for surface-based cases, but not for elevated cases. In surface-based cases the mixed-layer parcel provides the most useful CAPE and CIN values, but for elevated cases only the most unstable parcel is applicable.

Analysis of pairwise differences between initiation and null points shows that, in the presence of nonzero CAPE, lift is the most important single factor for thunderstorm initiation. Moreover, the maximum upward motion in the column (no higher than 100 hPa above the LFC) is the most effective way to quantify lift. Buoyancy is found to be the next most important factor (through MLCAPE) and inhibition is the third most important factor (through Δ*z**). Given the presumed importance of CIN to CI, it is somewhat surprising that Δ*z** is the most significant CI-inhibiting parameter. Even though the top four parameters do not directly relate to dilution, dilution is implicitly captured in both MLCAPE and Δ*z**: unlike MUCAPE and SBCAPE, MLCAPE attempts to include the effects of subcloud dilution, and the distance a parcel has to travel (Δ*z**) influences the amount of dilution a parcel can experience below the LFC. Given the role played by lift in both preconditioning and triggering, its dominance, as identified in this work, is not surprising. Nevertheless, this finding supports the conclusion that, ultimately, given nonzero CAPE, lift is the most important factor regulating thunderstorm initiation.

This work was funded by NSF Grant AGS-0757189 and utilized computational resources at the Holland Computing Center at the University of Nebraska. The authors also wish to thank Mark Anderson, Qi Hu, Conrad Ziegler, and two anonymous reviewers for their comments on the manuscript.

## REFERENCES

Banacos, P. C., , and D. M. Schultz, 2005: The use of moisture flux convergence in forecasting convective initiation: Historical and operational perspectives.

,*Wea. Forecasting***20**, 351–366.Benjamin, S. G., and Coauthors, 2004: An hourly assimilation–forecast cycle: The RUC.

,*Mon. Wea. Rev.***132**, 495–518.Coniglio, M. C., 2012: Verification of RUC 0–1-h forecasts and SPC mesoscale analyses using VORTEX2 soundings.

,*Wea. Forecasting***27**, 667–683.Corfidi, S. F., , S. J. Corfidi, , and D. M. Schultz, 2008: Elevated convection and castellanus: Ambiguities, significance, and questions.

,*Wea. Forecasting***23**, 1280–1303.Doswell, C. A., III, , and E. N. Rasmussen, 1994: The effect of neglecting the virtual temperature correction on CAPE calculations.

,*Wea. Forecasting***9**, 625–629.Doswell, C. A., III, , and D. M. Schultz, 2006: On the use of indices and parameters in forecasting severe storms.

,*Electronic J. Severe Storms Meteor.***1**(3). [Available online at http://www.ejssm.org/ojs/index.php/ejssm/article/viewArticle/11/12.]Glickman, T. S., Ed., 2000:

*Glossary of Meteorology.*2nd ed. Amer. Meteor. Soc., 855 pp.Houston, A. L., , and D. Niyogi, 2007: The sensitivity of convective initiation to the lapse rate of the active cloud-bearing layer.

,*Mon. Wea. Rev.***135**, 3013–3032.Johns, R. H., , and C. Doswell, 1992: Severe local storms forecasting.

,*Wea. Forecasting***7**, 588–612.Lahowetz, J., , A. Houston, , G. Limpert, , A. Gibbs, , and B. L. Barjenbruch, 2010: A technique for developing a U.S. climatology of thunderstorms: The ThOR algorithm.

*Extended Abstracts, 25th Conf. on Severe Local Storms,*Denver, CO, Amer. Meteor. Soc., 16B.1. [Available online at https://ams.confex.com/ams/25SLS/techprogram/paper_176174.htm.]Lakshmanan, V., , T. Smith, , K. Hondl, , G. Stumpf, , and A. Witt, 2006: A real-time, three- dimensional, rapidly updating, heterogeneous radar merger technique for reflectivity, velocity, and derived products.

,*Wea. Forecasting***21**, 802–823.Lakshmanan, V., , A. Fritz, , T. Smith, , K. Hondl, , and G. Stumpf, 2007a: An automated technique to quality control radar reflectivity data.

,*J. Appl. Meteor. Climatol.***46**, 288–305.Lakshmanan, V., , T. Smith, , G. Stumpf, , and K. Hondl, 2007b: The Warning Decision Support System–Integrated Information.

,*Wea. Forecasting***22**, 596–612.Lakshmanan, V., , K. Hondl, , and R. Rabin, 2009: An efficient, general-purpose technique for identifying storm cells in geospatial images.

,*J. Atmos. Oceanic Technol.***26**, 523–537.Lee, B. D., , R. D. Farley, , and M. R. Hjelmfelt, 1991: A numerical case study of convection initiation along colliding convergence boundaries in northeast Colorado.

,*J. Atmos. Sci.***48**, 2350–2366.Mecikalski, J., , and K. Bedka, 2006: Forecasting convective initiation by monitoring the evolution of moving cumulus in daytime GOES imagery.

,*Mon. Wea. Rev.***134**, 49–78.Mesinger, F., and Coauthors, 2006: North American Regional Reanalysis.

,*Bull. Amer. Meteor. Soc.***87**, 343–360.Moncrieff, M. W., , and M. J. Miller, 1976: The dynamics and simulation of tropical cumulonimbus and squall lines.

,*Quart. J. Roy. Meteor. Soc.***102**, 373–394.NCDC, cited2010: North American Regional Reanalysis (NARR) data for 2005–2007. [Available online at http://nomads.ncdc.noaa.gov/data.php?name=access#narr_datasets.]

NCDC, cited2011a: Rapid Update Cycle (RUC) 20 km grid for 2005–2007. [Available online at http://ruc.noaa.gov/.]

NCDC, cited2011b: Level-II NEXRAD data for 2005–2007. [Available online at http://has.ncdc.noaa.gov/pls/plhas/HAS.FileAppSelect?datasetname=6500.]

Peckham, S. E., , and L. J. Wicker, 2000: The influence of topography and lower-tropospheric winds on dryline morphology.

,*Mon. Wea. Rev.***128**, 2165–2189.Purdom, J. F. W., 1982: Subjective interpretations of geostationary satellite data for nowcasting.

*Nowcasting,*K. Browning, Ed., Academic Press, 149–166.Rotunno, R., , J. B. Klemp, , and M. L. Weisman, 1988: A theory for strong, long-lived squall lines.

,*J. Atmos. Sci.***45**, 463–485.Stull, R. B., 1985: A fair-weather cumulus cloud classification scheme for mixed-layer studies.

,*J. Climate Appl. Meteor.***24**, 49–56.Thompson, R. L., , R. Edwards, , J. Hart, , K. Elmore, , and P. Markowski, 2003: Close proximity soundings within supercell environments obtained from the Rapid Update Cycle.

,*Wea. Forecasting***18**, 1243–1261.Thompson, R. L., , C. M. Mead, , and R. Edwards, 2007: Effective storm-relative helicity and bulk shear in supercell thunderstorm environments.

,*Wea. Forecasting***22**, 102–115.Weisman, M. L., , and J. B. Klemp, 1982: The dependence of numerically simulated convective storms on wind shear and buoyancy.

,*Mon. Wea. Rev.***110**, 504–520.Wilson, J. W., , and W. E. Schreiber, 1986: Initiation of convective storms at radar-observed boundary-layer convergence lines.

,*Mon. Wea. Rev.***114**, 2516–2536.Wilson, J. W., , G. B. Foote, , N. A. Crook, , J. C. Fankhauser, , C. G. Wade, , J. D. Tuttle, , C. K. Mueller, , and S. K. Kreuger, 1992: The role of boundary-layer convergence zones and horizontal rolls in the initiation of thunderstorms: A case study.

,*Mon. Wea. Rev.***120**, 1785–1815.Xue, M., , and W. J. Martin, 2006: A high-resolution modeling study of the 24 May 2002 dryline case during IHOP. Part II: Horizontal convective rolls and convective initiation.

,*Mon. Wea. Rev.***134**, 172–191.Ziegler, C. L., , and E. N. Rasmussen, 1998: The initiation of moist convection at the dryline: Forecasting issues from a case study perspective.

,*Wea. Forecasting***13**, 1106–1131.Ziegler, C. L., , T. J. Lee, , and R. A. Pielke Sr., 1997: Convective initiation at the dryline: A modeling study.

,*Mon. Wea. Rev.***125**, 1001–1026.Ziegler, C. L., , E. N. Rasmussen, , M. S. Buban, , Y. P. Richardson, , L. J. Miller, , and R. M. Rabin, 2007: The “triple point” on 24 May 2002 during IHOP. Part II: Ground radar and in situ boundary layer analysis of cumulus development and convection initiation.

,*Mon. Wea. Rev.***135**, 2443–2472.

^{1}

The benchmark algorithm made tracks by projecting the movement of clusters using only the NARR mean wind (rather than blending toward the observed motion) and extending the track by choosing the cluster nearest to the projected location (rather than evaluating the mean error of all possible candidate tracks).

^{2}

All correlation coefficients presented are based on Spearman rank correlation.