The utility of radar-derived rotation track data for the verification of supercell thunderstorm forecasts was quantified through this study. The forecasts were generated using a convection-permitting model ensemble, and supercell occurrence was diagnosed via updraft helicity and low-level vertical vorticity. Forecasts of four severe convective weather events were considered. Probability fields were computed from the model data, and forecast skill was quantified using rotation track data, storm report data, and a neighborhood-based verification approach. The ability to adjust the rotation track threshold for verification purposes was shown to be an advantage of the rotation track data over the storms reports, because the reports are inherently binary observations whereas the rotation tracks are based on values of Doppler velocity shear. These results encourage further pursuit of incorporating observed rotation track data in the forecasting and verification of severe weather events.
Convection-permitting modeling has become increasingly prevalent in research and operational numerical weather prediction (NWP) applications. With horizontal grid spacings on the order of a few kilometers, NWP models, like the Weather Research and Forecasting (WRF; Skamarock et al. 2008) Model, are capable of nominally resolving cloud-scale processes without using a cumulus parameterization (CP) scheme (e.g., Done et al. 2004; Weisman et al. 2008). Explicit simulation of mesoscale and convective processes provides qualitative information about the expected convective mode, which aids forecasters in anticipating predominant severe weather hazards (Kain et al. 2008; Weisman et al. 2008; Kain et al. 2010).
Even with horizontal grid spacings of 1–4 km, convection-permitting forecasts are still too coarse to fully resolve smaller-scale convective features and severe weather phenomena such as mesocyclones and tornadoes. In an effort to overcome this limitation of grid resolution, various surrogates of these features and phenomena have been developed using grid-resolved model output fields. For example, a widely used surrogate for mesocyclones in supercell thunderstorms is a diagnostic quantity known as updraft helicity (UH; e.g., Kain et al. 2008; Carley et al. 2011; Sobash et al. 2011; Clark et al. 2012, 2013). Considering that supercell thunderstorms account for a large portion of severe weather events, especially large hail and tornadoes (e.g., Duda and Gallus 2010; Kain et al. 2008), a means of identifying simulated supercells in NWP forecasts has proven to be especially valuable.
Objective verification of supercell forecasts in convection-permitting models has two particularly challenging issues. The first relates to the relatively small scale of predictands, like convective storms, and how even small errors in predicted location and timing severely degrade forecast skill in traditional measures of grid-scale model performance such as root-mean-square error. Discussed at length in Ebert (2008), fuzzy or neighborhood verification approaches have been developed to mitigate this verification dilemma. For example, such methods have been used to objectively verify quantitative precipitation forecasts produced from convection-permitting ensembles (e.g., Schwartz et al. 2010; Clark et al. 2010), as well as to verify storm surrogate forecasts against storm reports (Schwartz et al. 2015; Sobash et al. 2016). The second issue regards the lack of optimal observations. Severe thunderstorm occurrence is documented mostly through voluntarily reported observations of tornadoes, damaging winds, and large hail. These storm reports are known to possess numerous nonmeteorological biases (e.g., Brooks et al. 2003; Doswell et al. 2005). In addition to a known dependence on population density, reporting practices create uncertainty in estimating the intensity of these phenomena (Trapp et al. 2006; Verbout et al. 2006). Also, these discontinuous point observations fail to capture the full spatial extent of the severe weather hazards. Indeed, Sobash et al. (2011) conducted neighborhood-based verification of severe storm surrogates determined from UH forecasts using storm reports but noted “significant concerns” with using these observations.
A proposed alternative to storm report observations is the rotation tracks product developed by the National Severe Storms Laboratory (NSSL), which has recently been employed in the verification of low-level vertical vorticity forecasts by Skinner et al. (2016). Formally, the rotation track data used in this study are a measure of maximum azimuthal shear in the lowest 3 km of the atmosphere (Smith and Elmore 2004). These radar-derived data have been used to identify tracks of potentially severe storms (Miller et al. 2013). While these data are subject to the shortcomings of Doppler radar sampling (e.g., Wood and Brown 1997), the rotation tracks provide a dataset appropriate for verification that is not prone to the aforementioned nonmeteorological biases found in the storm reports database. Additionally, the rotation tracks are more continuous than storm reports and are more closely related to the severe storm surrogates for rotation in the model.
The increasing use of model-predicted severe storm surrogates in research and operational forecasting practices suggests the need to identify the most appropriate methods and datasets for surrogate verification. This study aims to show that rotation track data are comparable, and in many ways advantageous, to storm reports for objectively verifying supercell surrogates as predicted via model diagnostics such as UH and maximum 0–3-km vertical vorticity. Because of the inherent predictability challenges of convective-scale NWP, a probabilistic approach to forecasting and verification is taken. An ensemble forecasting system is employed to produce retrospective forecasts of the severe weather that occurred on 19, 20, 30, and 31 May 2013. Neighborhood-based verification methods are applied to quantify and compare forecast skill when rotation track data and storm report data are used as alternate sets of verifying observations. Quantifying the skill in probabilistic supercell surrogate forecasts provides a means of assessing the practical predictability of severe weather phenomena in convection-permitting ensembles. This type of exercise is especially valuable as NWP advances toward approaches like Warn-on-Forecast (Stensrud et al. 2009).
2. Model and data
a. Ensemble design
Ensembles of explicit convection forecasts were generated using the Advanced Research version of the WRF (WRF-ARW, version 3.5.1; Skamarock et al. 2008). For each event, a five-member ensemble was produced by varying the initial conditions (ICs) and lateral boundary conditions (LBCs). ICs for the five-member ensemble were retrospectively drawn at random from the 30-member National Center for Atmospheric Research (NCAR) real-time WRF Data Assimilation Research Testbed (WRF-DART; Anderson et al. 2009) mesoscale ensemble analyses (Schwartz et al. 2015; Weisman et al. 2015) at 1200 UTC on the date of each event. Unique LBCs were generated for each ensemble member forecast using the fixed covariance perturbation method of Torn et al. (2006) centered on forecasts from the operational GFS model. The forecasts were initialized at 1200 UTC and were integrated for 18 h. The outermost computational domain covered the continental United States (CONUS) in addition to parts of Canada and Mexico with 15-km horizontal grid spacing (Fig. 1). An inner, nested domain spanned roughly the eastern two-thirds of the CONUS and had a horizontal grid spacing of 3 km. Both domains used a vertical grid with 40 levels.
Physical parameterizations included the Morrison double-moment microphysics scheme (Morrison et al. 2009), the Mellor–Yamada–Janjić (MYJ; Janjić 1994) planetary boundary layer (PBL) scheme, and the Noah land surface model (Chen and Dudhia 2001). Longwave and shortwave radiation were parameterized with the Rapid Radiative Transfer Model for GCMs (RRTMG; Mlawer et al. 1997). Cumulus convection was parameterized on the outer domain using the Tiedtke scheme (Tiedtke 1989; Zhang et al. 2011); convective processes were treated explicitly on the inner domain.
b. Supercell surrogates
To combat the challenges of spatially and temporally resolving severe weather phenomena in convection-permitting NWP, Kain et al. (2010) recommended the computation of relevant model diagnostics at each time step and saving the maximum value in every grid column between model output times. These resultant surrogates are superior to those from traditional hourly data as the surrogates exhibit swaths revealing the storm evolution rather than instantaneous snapshots of storm characteristics. One such supercell surrogate is UH, which as noted above is widely used to identify mesocyclone surrogates in model output. Following the definition outlined by Kain et al. (2008), UH was computed in the model as
where w is the vertical velocity (m s−1), ζ is the vertical component of vorticity (s−1), and z is the height above ground level (AGL). Because UH incorporates updraft strength, which is not available directly from radar data, maximum vertical vorticity in the 0–3-km layer of the model (hereafter Z3) was also diagnosed to simulate a surrogate more analogous to the rotation track data. Low-level maximum vertical vorticity has previously been used as a diagnostic in works focused on explicit prediction of tornadic supercells (e.g., Yussouf et al. 2013; Yussouf et al. 2015; Skinner et al. 2016). In this present work, analysis of both diagnostics was conducted using the hourly maximum fields as opposed to the instantaneous fields at model output times.
Locations of supercell surrogates were determined when and where UH or Z3 forecasts exceeded specified thresholds within a 900 km × 900 km verification domain located over the central Great Plains (see Fig. 1). Two thresholds were chosen for each diagnostic, with one threshold corresponding to the 99.5th percentile and the other at the more restrictive 99.9th percentile. The 99.5th and 99.9th percentile values for UH and Z3 were determined using the distribution of all values within the verification domain at each forecast hour. For each of the four cases, a 12-h verification period was considered from 1800 to 0600 UTC (i.e., forecast hours 7–18). In total, 48 hourly percentile values were computed and averaged to determine the respective thresholds for UH and Z3. These percentile values corresponded to 40 and 100 m2 s−2, respectively, for UH (i.e., UHthresh) and 0.006 and 0.008 s−1, respectively, for Z3. The UH thresholds are within the range of values that have been used for various research and forecasting applications (e.g., Kain et al. 2008; Trapp et al. 2011; Carley et al. 2011; Sobash et al. 2011; Clark et al. 2012). The Z3 thresholds are slightly lower than the nominal value often used to define mesocyclones in observation-based and idealized model simulation studies of supercells [0.01 s−1; e.g., Brandes et al. (1988); Trapp (2013)].
c. Observational data
Two sets of observational data, storm reports and rotation tracks, were employed to verify the supercell surrogate forecasts. Rotation track data were produced using the Warning Decision Support System–Integrated Information (WDSS-II; Lakshmanan et al. 2007) software package. Level II data (Crum and Alberty 1993) for the entire 12-h verification period of each case were obtained from the National Climatic Data Center (now known as NCEI) for 19 Weather Surveillance Radar-1988 Doppler (WSR-88D) stations located within the verification domain: Amarillo, Texas (KAMA); Dodge City, Kansas (KDDC); Des Moines, Iowa (KDMX); Kansas City, Missour (KEAX); Frederick, Oklahoma (KFDR); Dallas–Fort Worth, Texas (KFWS); Goodland, Kansas (KGLD); Wichita, Kansas (KICT); Tulsa, Oklahoma (KINX); North Platte, Nebraska (KLNX); St. Louis, Missouri (KLSX); Little Rock, Arkansas (KLZK); Omaha, Nebraska (KOAX); Springfield, Missouri (KSGF); Fort Smith, Arkansas (KSRX); Oklahoma City, Oklahoma (KTLX); Topeka, Kansas (KTWX); Hastings, Nebraska (KUEX); and Vance Air Force Base, Oklahoma (KVNX). The reflectivity data were quality controlled, and radial velocities were dealiased before calculating the azimuthal shear with the linear least squares derivative (LLSD) algorithm (Smith and Elmore 2004; Newman et al. 2013; Lakshmanan et al. 2014) for all available tilts and scans for each radar. The single-radar azimuthal shear radial data were mapped onto a three-dimensional latitude–longitude–height grid, and the vertical maximum was recorded in a two-dimensional latitude–longitude grid. The single-radar vertical maximum azimuthal shear values were merged using data from all 19 radars. When merging the data, the maximum 0–3-km azimuthal shear value was again recorded if two or more radars observed this layer at the same 2D grid box. The merged rotation tracks were accumulated to create hourly rotation tracks for the entire verification domain and time period, and the hourly data were interpolated onto the model grid for verification.
The azimuthal shear recorded in these rotation track data (hereafter ROT) provides observations of shearing radial velocities that were potentially attributable to mesocyclonic circulations. To further constrain the data, thresholds were applied to the azimuthal shear magnitudes to discriminate severe storms in the ROT data. Following the approach used for the model diagnostics, the 99.5th and 99.9th percentile thresholds were determined by calculating the percentile values from hourly distributions of ROT within the verification domain and averaging the 48 hourly percentile values. For ROT, the 99.5th and 99.9th percentile thresholds corresponded to 0.009 and 0.013 s−1, respectively.
Local storm reports (hereafter LSRs) were obtained from the Storm Prediction Center website (http://www.spc.noaa.gov/wcm/). Because supercell thunderstorms can produce a broad spectrum of hazards and account for roughly half of LSRs (Duda and Gallus 2010), reports of all severe hazard types (i.e., tornado, wind, and hail) were included in this study. The LSR data were mapped to the 3-km model grid using a nearest-neighbor approach. This process was conducted in hourly intervals over the 12-h verification period. Each 3-km grid box containing one LSR was treated the same as a grid box containing more than one LSR; therefore, the number of LSRs in each grid box had no influence on the verification scores. Because the LSR data essentially were treated as binary data (yes/no of a report) in each grid box, thresholds could not be applied to these data in the same way as were applied to the model diagnostics and ROT data. Therefore, the probability of spatial occurrence (i.e., fractional spatial coverage) was used to ensure that the ROT and LSR data used for verification were consistent for the appropriate supercell surrogates. The probability of spatial occurrence of LSRs was computed for each hour of the verification period for each case, and the 48 values were averaged. The average spatial coverage of gridded LSRs within the verification domain was approximately 0.01%, which corresponded to the average spatial coverage of the ROT exceedances identified with the 99.9th percentile threshold. [Note that although a 99.9th percentile would suggest that the spatial coverage of exceedances should be 0.1%, the spatial pattern of exceedances (e.g., clustering) likely caused a lower average spatial coverage.] Therefore, the grid-scale LSR data were used (in contrast with the 99.9th percentile ROT exceedances) to verify the simulated supercell surrogates determined with the 99.9th percentile thresholds (Fig. 2a). To verify the 99.5th percentile supercell surrogates, the spatial coverage of LSRs was artificially increased to approximate that of the ROT exceedances identified with the 99.5th percentile threshold (Fig. 2b). This process was achieved by adding an LSR to a previously empty grid box if at least one LSR was recorded within 3 km from the center of the empty grid box (thus, expanding LSRs to include “hits” in adjacent grid boxes). All grid boxes that contained LSRs in the grid-scale data remained unchanged. After applying this procedure, the average spatial coverage of the expanded LSRs within the verification domain was approximately 0.075%, which roughly corresponded to the average spatial coverage (0.082%) of the ROT exceedances identified with the 99.5th percentile threshold.
3. Verification approach
a. Probabilistic guidance
Ensemble forecasting systems inherently possess probabilistic information that can aid in quantifying the uncertainty in a forecast. This implicit probabilistic information can be extracted from the ensemble system by way of the ensemble probability (EP). Using the specified threshold for each supercell surrogate, the forecasts from each ensemble member were converted into a dichotomous field of binary probabilities (BPs), such as
for the kth ensemble member, at the ith grid point; an analogous procedure was followed to determine the BPs based on the Z3 threshold. Using BPs, EP was then computed at the ith grid point as
where n is the number of ensemble members. Within the framework of this study, the EP estimates the likelihood of midlevel or low-level rotation occurring within each grid column based on the number of ensemble members that predict UH or Z3 to exceed its specified threshold.
In an effort to account for the previously mentioned challenge of verifying convective-scale predictands, neighborhood-based probabilistic guidance was incorporated. Following Roberts and Lean (2008), a neighborhood was established for each grid box by specifying a radius of influence (ROI). The neighborhood was then defined to include all of the grid boxes whose centers fell within the ROI. Using additional information from these surrounding grid boxes, neighborhood probabilities (NPs) were computed to allow for spatial uncertainty on the high-resolution model domain. NPs were computed as
where m is the grid point within the neighborhood and Nb is the number of grid boxes within the neighborhood. The NP can be thought of as the relative frequency of events forecasted or observed to occur within a given radius of a grid box. Figure 11 in Schwartz et al. (2010) illustrated how the NP for the center grid box would be equal for the forecast and observations, but the point-to-point comparison indicates an incorrect forecast.
In Schwartz et al. (2010), the ensemble and neighborhood approaches to forecast probabilities were combined to create the neighborhood ensemble probability (NEP). In this study, the NEP was computed as
Schwartz et al. (2010) posited that merging the neighborhood and ensemble approaches to produce forecast probabilities serves to better represent the true probability density function of the atmospheric state. Moreover, for forecasts with fine grid spacing, guidance can still be of considerable value even if the grid-scale prediction is not exact. If the mean forecast probabilities within a specified ROI are equivalent to the areal coverage of the event exceeding the threshold over the same ROI, the forecast is considered perfect. Within this framework, the NP and NEP depend on the specified ROI, but the EP is independent of this specification.
b. Quantification of forecast skill
To calculate forecast skill, the fractions skill score (FSS; Roberts 2005; Roberts and Lean 2008) and area under the relative operating characteristic curve (ROC; Mason 1982) were computed for the entire verification domain, as these performance metrics are appropriate for a probabilistic forecasting and verification approach. FSS quantifies the skill of a forecast relative to a worst-case reference forecast, and is defined as
where the fractions Brier score (FBS; Roberts 2005) of the model forecast and the worst-case reference forecast are defined as
where Nx (Ny) is the number of grid points in the east–west (north–south) direction and FPij and OPij are the respective forecast and observation probabilities at the ijth grid point. FSS ranges from 0 to 1. A value of 1 indicates a perfect forecast, and 0 indicates a forecast with no skill.
While FSS only grades the forecasts against the observations in a holistic sense, ROC curves assess how well the forecasts discriminate between observed events and nonevents (Mason 1982). ROC curves were developed by selecting a range of probability thresholds (0%–100%) to convert the continuous probabilistic forecast and observation fields into discrete dichotomous events. At each probability threshold, the elements of the 2 × 2 contingency table for dichotomous events (Table 1; Wilks 1995) were then used to compute the probability of detection [POD = hits/(hits + misses)] and probability of false detection [POFD = false alarms/(false alarms + correct negatives)]. POD and POFD were plotted against one another to produce the ROC curves.
In this study, the area under the curve (AUC) was computed using a trapezoidal approximation to quantify the degree of discrimination shown in the forecasts. AUC ranges from 0 to 1, with an area of 1.0 representing a perfect probabilistic forecast. An AUC below 0.5 indicates a forecast possesses no discriminatory skill (i.e., POFD is greater than POD), while Buizza et al. (1999) determined that AUC must be greater than 0.7 in order to classify a forecast as useful.
As previously noted, two thresholds (99.5th and 99.9th percentiles) were applied to UH and Z3 to determine supercell surrogates, which indicate simulated storms capable of producing storm damage. This process was conducted for each forecast hour using the hourly maximum fields, and the hourly surrogates were aggregated over a 12-h period (1800–0600 UTC, i.e., forecast hours 7–18) before computing NEP. This occurrence determination and aggregation over the same 12-h period was also conducted for the ROT and LSR data. For simplicity, the NEPs computed with the 99.5th and 99.9th percentile thresholds will be referred to as NEP5s and NEP9s. In a similar fashion, NPs were computed using exceedances determined from the gridded ROTs using the 99.5th and 99.9th percentile thresholds (hereafter, ROT5s and ROT9s, respectively). NPs were also computed using the grid-scale and artificially expanded areal coverage LSRs approximating the 99.9th and 99.5th percentiles (hereafter, LSR9s and LSR5s, respectively). Following this procedure, four forecast probabilities and four observation fractions fields (see Table 2) were computed for the entire verification domain and used for verifying the four cases of interest.
a. Low- and high-threshold results
In all four cases, FSSs computed from the UH and Z3 NEP forecasts displayed a clear dependency on ROI (Figs. 3 and 4). Beginning with a 0-km ROI, little skill was found in the NEPs, regardless of whether LSRs or ROTs were used for verification. The NEPs calculated with this 0-km ROI were essentially EPs and, thus, served as probabilistic forecast information verified at the grid scale. Grid-scale forecasts showed a dependency on the specified threshold level. The FSS for all the NEP5s was approximately 0.1, while the NEP9 FSSs averaged 0.03. For each respective forecast–observation pair, the FSS for the NEP9s never exceeded the FSS calculated from the less restrictive NEP5s. When comparing forecast–observation pairs at the 99.9th or 99.5th percentile threshold levels, FSS did not show a consistent difference based on the model surrogate or observation type.
As the ROI was increased, the FSSs increased for both the NEP9s (Fig. 3) and NEP5s (Fig. 4). This was expected because both the forecast probabilities and observed fractions fields were smoothed (see Fig. 5) as the neighborhoods grew in size (Roberts and Lean 2008). At ROIs of 50 km and greater, FSS was found to approach or exceed 0.5 for some forecast–observation probability pairs. Again, no consistent distinction was found between the FSS determined with the LSRs and that determined with the ROTs. This was true for the forecasts and observations at both the 99.5th and 99.9th percentiles.
While these FSS trends indicated an apparent degree of skill in the NEP forecasts, examination of ROC curves showed that these forecasts possessed very little discriminatory skill. For the NEP5s, the AUC was generally between 0.5 and 0.7 for all forecast–observation pairs at all ROIs (Fig. 6). The AUC was always highest for the grid-scale NEPs. In fact, the AUC computed from the NEP5s only approached the 0.7 “useful” threshold determined by Buizza et al. (1999) for the grid-scale verification of the 31 May case (Fig. 6d). The lack of discriminatory skill displayed by the NEP forecasts was likely due to a large discrepancy between the forecast and observation probability distributions at low ROIs. With a 10-km ROI, the maximum NEP5s found within the verification domain were approximately 20% for UH and Z3 (Table 3). The corresponding maximum NPs computed from the ROT5s and LSR5s ranged from approximately 62% to 92%.
As the ROI was increased above 50 km, the AUC decreased and approached a constant 0.5 for all forecast–observation probability pairs. The NEP9s showed even less discriminatory skill, with AUC rarely exceeding 0.5 for any ROI (Fig. 7). This constant 0.5 AUC was attributable to the smoothing of the forecast probabilities and observation fractions fields toward zero as the neighborhood sizes were increased with larger ROI (Table 3). In this scenario, when POD and POFD are calculated at each probability threshold, POD and POFD equal 1 for the 0% probability threshold, and both equal 0 for all other probability thresholds. This returned the constant ROC AUC of 0.5 that was found at large ROIs.
b. Mixed threshold results
In the previously discussed results, the LSR and ROT data were applied such that their respective spatial coverage percentages were roughly equal. Within the context of the cases included in this study, the reporting of severe weather phenomena was assumed to be adequate given the high impact of these events and the proximity to a populated metropolitan area. Even if this assumption was invalid, the LSR expansion increased the neighborhood where storm damage was reported to have occurred and theoretically accounted for the potential underreporting of severe weather phenomena. These hazards occur over continuous swaths rather than discrete point locations along the paths of observed storms. Rotation tracks convey storm damage potential aligned with the storm motion, and these more continuous swaths provide verification data in areas where storm reports might have been difficult to obtain as a result of a lack of infrastructure. Achieving LSR5 areal coverage meant altering the local number of LSRs in an isotropic fashion, thus introducing an additional degree of freedom into the analysis. While we do not advocate this practice, it was demonstrated herein to maintain parity between the observation datasets for verifying the 99.5th percentile surrogates. To eliminate this procedure and demonstrate the advantage of the rotation track data relative to the storm report data, mixed threshold verification was conducted using the LSR9s and the 99.5th percentile forecast surrogates and rotation track exceedances.
FSS values for the NEP5s were always higher when ROT5s were employed as opposed to the LSR9s (Fig. 8). This was especially evident in the 19 and 20 May results. Using the LSR9s as observational truth yielded much lower FSSs because the NEP5s were biased forecasts within this framework (Mittermaier and Roberts 2010). This frequency bias effect was removed when the NEP5s were verified with the ROT5s. While the disparity in the spatial coverage of the observations was intentional, these results demonstrate an advantage for the rotation tracks over storm report data. The rotation track data are a continuous distribution akin to the model diagnostics, which enables specification of thresholds to prevent the frequency bias from affecting the FSS results. This characteristic arguably adds increased adaptability to the list of advantages the rotation track data possess relative to the storm report data.
c. Results within the context of previous studies
NEP forecasts of UH were found to have comparable degrees of forecast skill when verified against ROT and LSR data. Because the ROT data were shown to have a distinct advantage over the LSR data within this framework, the results of the previous section would encourage using ROT data to verify supercell surrogate forecasts. As previously mentioned, updraft strength is included in the UH calculation but is not directly available in the radar data. In addition, UH diagnoses rotation in the midlevels between 2 and 5 km AGL, while the ROT data used herein observed rotation in the lowest 3 km of the atmosphere. Therefore, Z3 was also diagnosed and verified as a surrogate that is more analogous to the ROT data. The FSS and ROC AUC calculated for the Z3 forecasts displayed similar trends to those found for the UH forecasts. In fact, the verification metrics were not consistently higher or lower for UH or Z3 (see Figs. 3, 4, and 6–8). This finding is reasonable as both surrogates diagnose mesocyclone rotation and overlap between 2 and 3 km. Given the similarity in the UH and Z3 results and the correspondence between the Z3 and the ROT data, this study suggests that diagnosing Z3 in addition to UH is reasonable for severe convective forecasting and for forecast verification against rotation track data.
Computing NEP as stated in section 3a led to an upper limit of about 20% for the chosen supercell surrogates considered for the four cases in this study. The NEP maxima were significantly lower than NEP values shown in work such as Yussouf et al. (2015). While this current study used thresholds of 0.006 and 0.008 s−1 for 0–3-km maximum vertical vorticity, Yussouf et al. (2015) used thresholds of 0.002 and 0.004 s−1 for 0–1-km maximum vertical vorticity. In addition, Yussouf et al. (2015) was a storm-scale data assimilation study in which reflectivity and Doppler velocities were assimilated into a storm-scale ensemble every 5 min before generating 0–1-h forecasts. It is reasonable to expect that such storm-scale data assimilation would dramatically improve the location of forecast objects that would affect forecast probabilities in a strongly positive manner. The differences in thresholds, vertical vorticity layers, and the forecast integration periods could all be attributing to the disparities in maximum NEPs in these two studies.
This work aimed at determining the best way to utilize rotation track data via a probabilistic approach using percentile thresholds of select severe storm surrogates against two types of verification data. In a similar study, Skinner et al. (2016) also conducted objective verification of low-level vorticity forecasts against radar-derived rotation tracks. This study differs in several important ways. First, Skinner et al. (2016) utilized an object-based method with distance and amplitude score to quantify skill in 0–1-h forecasts. This study verified 0–18-h forecasts using a neighborhood-based probabilistic verification approach. Additionally, both updraft helicity and low-level vertical vorticity were verified, and storm reports were evaluated as verification observations in addition to rotation tracks.
5. Summary and conclusions
Ensembles of convection-permitting forecasts were produced for four severe weather events during May 2013, including high-impact events on 20 and 31 May 2013. The five ensemble members were initialized by randomly drawing 1200 UTC mesoscale analyses from the 30-member NCAR real-time WRF-DART ensemble data assimilation system (Schwartz et al. 2015). Using WRF-ARW, 18-h convection-permitting forecasts were produced on a nested domain covering the central portions of the CONUS with 3-km horizontal grid spacing. Probabilistic guidance for model-simulated rotation was produced using forecasts of UH and maximum 0–3-km vertical vorticity. These probabilistic forecasts were then verified against storm reports and radar-derived rotation tracks to investigate the practicality of using these observations to objectively verify forecasts of supercell thunderstorm occurrences. Using the 99.5th and 99.9th percentiles as thresholds to identify supercell surrogates, the levels of model skill for the UH and Z3 forecasts were comparable for all four cases of interest. These positively correlated model diagnostics appear to be interchangeable in this exercise, thus encouraging the further proliferation of diagnosing maximum low-level vorticity in addition to UH in severe weather forecasting across a broader range of forecast scenarios.
Finally, the selection of ROTs or LSRs for verification data did not have a consistent impact on the level of skill determined from the NEP forecasts, given consistent spatial coverage between the ROT exceedances and the gridded LSRs. It was found in the cases considered in this study that radar-derived rotation tracks were largely equivalent to storm reports when the areal coverages were equal. Yet, greater skill was evident when a lower rotation track threshold was used for verification, an option not reasonably obtainable with LSRs. Thus, the rotation track data have a distinct advantage over the storm reports in this application, as it is trivial to adjust the rotation track verification threshold. Here, specified thresholds for the ROT, UH, and Z3 distributions were modified to change the frequency of threshold exceedances for both model and verification data. We also demonstrate altering the spatial coverage of the storm reports to achieve equal areal coverage at lower forecast thresholds, thus requiring an undesirable artificial expansion of LSR areal coverage. Identifying model-predicted surrogates and radar-observed exceedances using frequencies from continuous distributions allows for the straightforward adjustment of thresholds, which can be easily modified to fit forecasting applications for varying severe weather types, intensities, and locations. Because of this ease in fitting the rotation track data and the absence of the nonmeteorological biases present in the storm report database, the rotation track data warrant increased use in developing and calibrating severe storm surrogates used to identify thunderstorms capable of producing severe weather in convection-permitting forecasts.
We thank the National Science Foundation (NSF) for support of this work (AGS-1230085) and the National Severe Storms Laboratory for supplying the WDSS-II application that was used to generate the rotation tracks. This research began while the lead author participated in the Significant Opportunities in Atmospheric Research and Science (SOARS) program. Kiel Ortega, Craig Schwartz, and Sarah Tessendorf all provided helpful assistance and discussions during SOARS and/or thereafter. The authors would also like to thank the three anonymous reviewers for their feedback.