A Geospatial Verification Method for Severe Convective Weather Warnings: Implications for Current and Future Warning Methods

Gregory J. Stumpf aCooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado
bMeteorological Development Laboratory, NOAA/NWS, Office of Science and Technology Integration, Silver Spring, Maryland

Search for other papers by Gregory J. Stumpf in
Current site
Google Scholar
PubMed
Close
and
Sarah M. Stough cCooperative Institute for Severe and High-Impact Weather Research and Operations, University of Oklahoma, Norman, Oklahoma
dNOAA/National Severe Storms Laboratory, Norman, Oklahoma

Search for other papers by Sarah M. Stough in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Legacy National Weather Service verification techniques, when applied to current static severe convective warnings, exhibit limitations, particularly in accounting for the precise spatial and temporal aspects of warnings and severe convective events. Consequently, they are not particularly well suited for application to some proposed future National Weather Service warning delivery methods considered under the Forecasting a Continuum of Environmental Threats (FACETs) initiative. These methods include threats-in-motion (TIM), wherein warning polygons move nearly continuously with convective hazards, and probabilistic hazard information (PHI), a concept that involves augmenting warnings with rapidly updating probabilistic plumes. A new geospatial verification method was developed and evaluated, by which warnings and observations are placed on equivalent grids within a common reference frame, with each grid cell being represented as a hit, miss, false alarm, or correct null for each minute. New measures are computed, including false alarm area and location-specific lead time, departure time, and false alarm time. Using the 27 April 2011 tornado event, we applied the TIM and PHI warning techniques to demonstrate the benefits of rapidly updating warning areas, showcase the application of the geospatial verification method within this novel warning framework, and highlight the impact of varying probabilistic warning thresholds on warning performance. Additionally, the geospatial verification method was tested on a storm-based warning dataset (2008–22) to derive annual, monthly, and hourly statistics.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Gregory J. Stumpf, greg.stumpf@noaa.gov

Abstract

Legacy National Weather Service verification techniques, when applied to current static severe convective warnings, exhibit limitations, particularly in accounting for the precise spatial and temporal aspects of warnings and severe convective events. Consequently, they are not particularly well suited for application to some proposed future National Weather Service warning delivery methods considered under the Forecasting a Continuum of Environmental Threats (FACETs) initiative. These methods include threats-in-motion (TIM), wherein warning polygons move nearly continuously with convective hazards, and probabilistic hazard information (PHI), a concept that involves augmenting warnings with rapidly updating probabilistic plumes. A new geospatial verification method was developed and evaluated, by which warnings and observations are placed on equivalent grids within a common reference frame, with each grid cell being represented as a hit, miss, false alarm, or correct null for each minute. New measures are computed, including false alarm area and location-specific lead time, departure time, and false alarm time. Using the 27 April 2011 tornado event, we applied the TIM and PHI warning techniques to demonstrate the benefits of rapidly updating warning areas, showcase the application of the geospatial verification method within this novel warning framework, and highlight the impact of varying probabilistic warning thresholds on warning performance. Additionally, the geospatial verification method was tested on a storm-based warning dataset (2008–22) to derive annual, monthly, and hourly statistics.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Gregory J. Stumpf, greg.stumpf@noaa.gov

1. Introduction

National Weather Service (NWS) severe thunderstorm warnings (SVRs) and tornado warnings (TORs) are currently issued as deterministic polygons that cover a two-dimensional area of expected threats of severe convective weather (hail, wind, and tornadoes). These warnings have a policy-directed duration of 15–60 min (NWS 2020) and a typical spatial scale of about 100 km2. An initiative known as Forecasting a Continuum of Environmental Threats (FACETs; Rothfusz et al. 2018) is evolving this paradigm and studying innovative methods to convey convective weather hazard information at the warning scale.

FACETs is focusing on two new methods to improve the communication of severe thunderstorm and tornado hazards. Planned to be implemented first if approved, threats-in-motion (TIM; Stumpf and Gerard 2021, hereinafter SG21) is a warning dissemination approach that would upgrade NWS warnings from the current static polygon approach to rapidly updating polygons that move forward with a storm. Beyond TIM is probabilistic hazard information (PHI), which, if approved, would include a trend of probability that reflects the confidence a forecaster places upon the storm’s ability to produce an anticipated hazard over its duration (Rothfusz et al. 2018). The probability trend would be used to create PHI plumes, which represent the probability that locations downstream from a storm will be impacted by the hazard. Rapidly updating warnings could be derived from PHI plumes using a combination of different static or dynamic thresholds for probability, expected arrival times, and/or impacts. PHI could inform a wide range of custom severe weather products, as well as legacy SVRs and TORs. These could be used to aid decision-making for government partners (e.g., emergency managers and broadcasters). The commercial weather enterprise could take advantage of the data to develop applications for their customers. Both TIM and PHI have been under development and testing at the NOAA Hazardous Weather Testbed (HWT) since 2008 (Stumpf et al. 2008; Kuhlman et al. 2008; Karstens et al. 2015, 2018; Hansen et al. 2018).

The current techniques for warning verification are too limited for the anticipated advancements in hazard information delivery at the warning scale (NWS 2011c, 2022). First, the current verification method is not designed for rapidly updating warnings at 1-min intervals. Second, with the potential for custom warnings derived from PHI, the verification method must be flexible to handle multiple warning decision thresholds. Third, severe weather events cover two-dimensional areas, which change with time and are not point (or line) observations as they are recorded now. For example, a report at a single point in space and time (hereinafter “report point”) that meets or exceeds severe weather criteria (25 m s−1 wind, 25-mm hail diameter, and/or tornado) is used to verify a warning, regardless of the polygon’s size and duration. This can leave a portion of a verified polygon falsely warned. Finally, there is a desire to improve warning precision (e.g., reducing false alarm area), and there is no current way to measure the impact of the falsely warned areas and durations of both verified and unverified warnings. Unverified large-area and/or long-duration warnings are scored the same as small-short warnings, with equal penalties for false alarms. This motivates a revamp of the current NWS warning verification technique to a new geospatial verification method, by which warnings and observations are placed on equivalent grids, and they better account for the temporal aspect of warnings.

This paper will first describe traditional NWS techniques for verifying severe convective weather warnings and address their limitations. Following this, we will explain our geospatial verification technique. Various datasets will be utilized to highlight the verification technique, including NWS storm-based warning data, and we will demonstrate how it complements two innovative warning techniques currently under consideration for future operational implementation within the FACETs initiative (TIM and PHI).

2. Background

The NWS transitioned from a county-based system to a storm-based warning system on 1 October 2007 (NWS 2020). This transition was motivated, in part, by advances in digital dissemination systems such as GPS-enabled mobile devices, which alert if a device is within the warning polygon. With the use of the NWS Advanced Weather Interactive Processing System—2nd Generation (AWIPS-2) workstation Warning Generation (WarnGen) software, NWS forecasters draw warnings as a polygon representing the swath that the severe weather is expected to cover within the duration of the warning. The storm-based warning concept was also designed to limit the warning area to only that area expected to be impacted by the storm threat, with the intent to make warnings smaller and more precise. Storm-based warning verification necessitates that only one storm report is required within the polygon to verify a single storm-based warning (NWS 2011c, 2022). However, this method has limitations.

Forecasts are typically verified for performance via the use of a 2 × 2 contingency table (Table 1; Wilks 2011). A verified forecast is defined as a hit (a) when a forecast is issued and an event is observed. An unverified forecast is defined as a false alarm (b) when a forecast is issued, but an event is not observed. An observed event that is not forecasted is defined as a miss (c). A correct null (d) is a correct forecast of no event.

Table 1.

The “generic” 2 × 2 contingency table for forecasts and observations.

Table 1.

The NWS traditionally scores warning performance using several verification measures derived from the 2 × 2 table. The first is the probability of detection (POD), the ratio of verified forecasts or hits (a) to the number of all events (a + c):
POD=a/(a+c).
Another measure is the false alarm ratio (FAR), otherwise known as probability of false alarm, which is the ratio of false forecasts (b) to all forecasts of an event (a + b):
FAR=b/(a+b).
Both POD and FAR are combined into the critical success index (CSI), which is the ratio of hits to the sum of all hits, misses, and false alarms. CSI can be expressed as a function of a, b, and c [(3a)]:
CSI=a/(a+b+c),
and through algebraic manipulation, a function of POD and FAR [(3b)] is given is follows:
CSI=f(FAR,POD)=1[1(1FAR)+(1POD1)].
For the NWS warning verification method, storm report points are used to verify warning areas. If a report point falls outside all warning polygons, that report point is counted as one miss. If a warning area (the polygon) contains no report points, the warning area is considered one false alarm.

Hits are counted for both report points and warning areas. If a report point is inside a warning polygon, then that report point is counted as one hit. If a warning area (the polygon) contains at least one report point, that warning area is considered a hit. Both hit values—observation hits and forecast hits—are used for NWS verification. As mentioned in a footnote in Brooks (2004), this is problematic and is explained in more detail here.

Since warnings and reports are represented using two different reference frames, two 2 × 2 contingency tables for verifying storm-based warnings must be considered: one for the warning areas or polygons (Table 2) and one for the report points (Table 3). To compute warning FAR, NWS uses the 2 × 2 table for warning areas (Table 2). The performance measures from this 2 × 2 table are given a subscript of 1:
FAR1=b1/(a1+b1).
To compute warning POD, NWS uses the 2 × 2 table for report points (Table 3). The performance measures from this 2 × 2 table are given a subscript of 2:
POD2=a2/(a2+c2).
The values for false alarm b1 and miss c2 are from two different tables. Two values for a hit are used: 1) the number of verified warning areas a1 and 2) the number of verified report points a2. Because multiple reports can be contained within a single warning, a2a1.
Table 2.

The 2 × 2 contingency table for warning polygon areas. Note that c1 and d1 do not exist in this situation.

Table 2.
Table 3.

The 2 × 2 contingency table for storm report points. Note that b2 and d2 do not exist in this situation.

Table 3.
The NWS formula for CSI (hereinafter CSInws) uses FAR1 from the first 2 × 2 table (Table 2) and POD2 from the second 2 × 2 table (Table 3):
CSInws=f(FAR1,POD2)=1[1(1FAR1)+(1POD21)].
Substituting for FAR1 and POD2 in the above equation gives
CSInws=a1a2a1a2+a2b1+a1c2.
CSInws [(7)] is not equivalent to CSI [(3a)] unless a1 = a2, which is only possible if there is exactly one hit report point for every hit warning in the dataset to be scored.

The NOAA National Centers for Environmental Information (NCEI) Storm Events Database and its companion publication, Storm Data, contain the official observational data used to verify areal warning polygons (NWS 2018). However, this database rarely includes the full areal scope of the events. For instance, hail and wind damage can occur over broad areas or “swaths” within a storm but are mostly recorded as single points in space and time (Trapp et al. 2006). Similarly, tornado paths are recorded as a single point or series of points at 1-min intervals. Because these data are not recorded as two-dimensional areas, then for every dataset, essentially every time, a2 > a1.

Tornado paths are scored using percent event warned (PEW), which is the fraction of the total number of 1-min segments warned, regardless of track length. The average PEW (PEW¯) of all tornadoes in a dataset, shown in (8),
PEW¯=1Ni=1NPEWi,
is substituted for POD2 in (6) to give (9):
CSInwsp=f(FAR1,PEW¯)=1[1(1FAR1)+(1PEW¯1)].
The PEW¯ metric is problematic, as N is the total number of events and not the total number of tornado segments. A short-duration tornado is given the same weight as a long-duration tornado, even if the number of warned segments in the long-duration tornado is larger than the number of warned segments in the short-duration tornado.

Because only one report point is required to verify a warning polygon area, a single report point is used to verify a polygon of any size or duration. As shown in Fig. 1, both the polygon areas are scored as hits, the small warning, and the large warning. A larger-in-area and longer-in-duration warning has a greater likelihood of capturing a storm report within the warning area and time. There is also a greater chance of having multiple report points within the warning, resulting in multiple hits a2. Because the NWS formula for POD is based on report points (POD2), a forecaster issuing larger and longer warnings is potentially rewarded with a larger average POD of all their reports.

Fig. 1.
Fig. 1.

Examples of two warning polygons, with the one on the right larger than the one on the left.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

The FAR is calculated using the 2 × 2 contingency table for polygon areas (FAR1). If any warning area ends up not being verified (no report points within the area), then that warning area is counted as a single false alarm, regardless of the size and duration of that warning area. There is no false alarm penalty for making warnings larger and longer, and as shown earlier, doing so improves the chances of a higher POD by potentially capturing more report points. Furthermore, if the warning is issued with a very long duration and well before anticipated severe weather, this increases the chances of having a larger lead time. If that warning never verifies, there is no negative consequence on the calculation of average lead time for all warned events because, to compute lead time, a storm report is required. “Casting a wide net” potentially improves POD and lead time without significantly affecting FAR. This ends up being counter to the stated benefits of storm-based warnings, the reduction in the size of warnings and greater warning precision and specificity. For large and long warnings, even if they are verified, there can be a tremendous amount of area falsely warned and for a large amount of time.

All of these pitfalls are addressed with an improved warning verification methodology that can better measure precision and provide a host of other benefits, including ways to measure the goodness of several proposed warning service techniques.

3. The geospatial verification method

a. The grids

To consolidate the verification measures into one 2 × 2 contingency table and reconcile the area versus point issues, the forecast and observation data are placed into the same coordinate system sharing a common reference frame. This is facilitated by using a gridded approach, with spatial and temporal resolutions fine enough to capture the detail of the observational data. For this study, a grid spacing of 1 km2 × 1 min was used because this represents the smallest scale at which severe convective processes can be reasonably resolved in operational data.

The forecast grid is created from warning polygons that cover a period matching the warning duration. The warning polygons (defined as a series of latitude/longitude pairs) are digitized so that grid cells that are more than 50% inside (outside) the polygon are assigned a value of 1 (0). The grid has a 1-min interval so that warnings appear (disappear) on the grid the exact minute they are issued (canceled/expired). Warning polygon coordinates that are also modified via the use of severe weather statements (SVSs) are reflected as changes in the forecast grid.

The observation grid can be created using either 1) ground truth information (e.g., Storm Data1), 2) ground truth data augmented by human-determined locations using radar and other data as guidance, 3) radar proxies [e.g., gridded hail size from the Multi-Radar Multi-Sensor (MRMS) system (Smith et al. 2016), useful in data-sparse areas], or 4) a combination of any of these. For official NWS warning verification, the first option is used. Line data in the case of tornado tracks (and seldom, hail swaths) are divided into 1-min increments, with each increment treated as a report point. On this grid, the observations can be depicted as points (one grid cell), lines (a “path” of grid cells), or 2D areas (for high-resolution hail and wind observations). The latter cannot be adequately represented in the observation reference frame of the traditional NWS verification system unless the 2D areas are converted into numerous point observations, which would artificially inflate the POD.

Typically, severe storm hazards, especially tornadoes, are rare and only cover an extremely small area within a warning, which leads to a low base rate of occurrence and challenges for the scoring measures used by NWS. Therefore, when converting the point data into a truth grid, we increase the base rate of occurrence by applying a sphere of influence or “splat” around the report point. The observation splat is also used to 1) account for small uncertainties in the timing and position of the report locations (Witt et al. 1998) and 2) to account for a representative “safety margin” around events, a distance which could be informed by social science survey data describing how close individuals would want to be to a hazard to warrant a warning. The splat distance is set to the grid spacing (e.g., 1 km) if no splat is desired.

An additional optional grid can be used for computing the scores. The value of correct nulls d—every grid cell outside of each warning polygon and each observation splat—is much greater than all other grid values (a, b, and c) because severe weather observations and warnings are rare events (Marzban 1998). The value of d can be reduced by excluding those cells where it is trivially obvious that a warning should not be issued, namely, grid cells that are outside of storm areas (Brooks 2004). This can be accomplished by thresholding by radar reflectivity, lightning flash extent density or any other quantity (e.g., Mazur et al. 2004) that could be used to define storm areas. The exclusion grid is made up of the grid cells outside of all warning polygons, observation splats, and storm areas (e.g., defined by a radar reflectivity threshold) combined—these nonevent grid cells are not used to calculate the performance measures. All of these grid values are shown graphically for a hypothetical storm, warning, and report point splat area in Fig. 2.

Fig. 2.
Fig. 2.

Hypothetical storm, warning, and report point splat area. The correct null area (blue) roughly outlines the “storm.” The warning polygon is comprised of the false alarm area (red) and the hit area (gray). The report point splat area is comprised of the miss area (white), and it shares the hit area (gray) with the warning polygon. All grid cells outside of these areas are considered nonevents.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

Given the forecast grid, the observation grid, and the optional exclusion grid, this new verification scheme provides two methods for measuring warning performance.

b. Grid-scoring method

The above grids are used in a single 2 × 2 contingency table that represents a common reference frame (Table 4). At each 1-min forecast interval, the following grid cell values are created:

  • Hit (a): The grid cell is warned AND the grid cell is within the splat range of a report point.

  • False (b): The grid cell is warned AND the grid cell is outside the splat range of any report point.

  • Miss (c): The grid cell is not warned AND the grid cell is within the splat range of a report point.

  • Correct null (d): All other grid cells OR all other grid cells not within the exclusion grid, if used.

  • Nonevent: All grid cells within the exclusion grid, if used.

Table 4.

The 2 × 2 contingency table for grid cells. The colors in the parentheses correspond with those shown in Fig. 2.

Table 4.

The splat can also be treated as a probabilistic observation, similar to those used for “practically perfect” forecast verification (Hitchens et al. 2013). This is also valuable when employing remotely sensed data for verification where hazard observations are reasonably known to exist but actual observations are sparse. This can be achieved by applying a distance-weighted scheme (e.g., Cressman 1959) from the center to the edge, where values of a and c range from 1 to 0, respectively, depending on whether the observation splat grid cell is inside (a) or outside (c) a warning area. In this context, b = 1 − a and d = 1 − c.

Performance measures from this single 2 × 2 table for each grid cell, at each time step, are computed. These include POD, FAR, and CSI, and measures that include d, such as the Heidke skill score (HSS), defined as (10),
HSS=2(adbc)(a+b)(b+d)+(a+c)(c+d),
and used in this paper later. Alternatively, other metrics mentioned in Marzban (1998) or a measure based on a cost–loss ratio with economic or societal benefits may be more suitable as an overall measure of performance.

c. Truth-event-scoring method

Additional scoring methodologies are needed to address the issue of lead time (LT) at specific grid cell locations. One must consider grid cells that are downstream of an observation event that eventually get impacted by that event. To determine LT, “truth events” are constructed for each grid cell. To build truth events, the timeline of the grid cell score for specific grid cell locations is analyzed. As storm events and warnings pass over specific grid cells, the score conditions for those grid cells will vary between a nonevent, correct null, false, miss, and hit. A truth event is defined as a continuous time period during which a specific grid cell is under a warning and/or a storm observation and/or surrounded by at least 1 min of a correct null condition (or a nonevent condition if an exclusion grid is used). The following types of truth events are recorded:

  • False event: The grid cell remains in a “false condition” throughout the event (forecast grid value = 1; observation grid value = 0).

  • Miss event: The grid cell remains in a “miss condition” throughout the event (forecast grid value = 0; observation grid value = 1).

  • Hit event: The grid cell experiences a “hit condition” for at least 1 min during the event (forecast and observation grid value = 1).

  • Correct null event: The grid cell remains in a “correct null condition” throughout the event (forecast and observation grid value = 0).

Hit events can comprise several different scenarios. The most common scenario is as follows: 1) a warning is issued for a particular grid cell, 2) a hazard observation impacts that grid cell after warning issuance, 3) the hazard observation leaves the grid cell while the warning remains in effect, and 4) the warning ends for that grid cell. For these scenarios, the grid cells will be in a false condition both before and after the hazard passes over that location to establish the hit condition. For the truth-event method, the periods of false condition for grid cells that experience a hit condition do not contribute to the false alarm area or time but instead are used to compute lead time and departure time for those grid cells deemed hit events.

For this common scenario, the truth event is defined by the starting and ending time of the warning. Since the warning is issued before the observation affecting the grid cell, LT > 0. If, in a second scenario, an observation affects a grid cell before a warning, LT < 0. This differs from the NWS verification method, which assigns LT = 0 whenever the LT is negative (NWS 2011c, 2022). If, in a third scenario, an observation affects a grid cell that is never warned (a miss event), then no lead time—not zero (0) lead time—is recorded. This also differs from the current NWS verification method, which records LT = 0 for missed events. In essence, missed events (either whole or partial) are treated the same as warned events that were warned “just in time” under the NWS directives, which is undesirable (Erickson and Brooks 2006; Brotzge and Erickson 2009).

In addition, grid cells that are behind an event that has already passed that location are used to determine a new metric, departure time (DT). DT is the amount of time that a grid cell remains warned after the threat has already passed. DT > 0, as with the first scenario, is chosen to represent the condition when the warning remains in effect after the threat has passed. DT < 0 is chosen to represent the condition when the warning ends too early. When the warning is canceled or expires at the same time the threat ends, DT = 0. Ideally, DT should be minimized but remain ≥0. However, achieving a DT = 0 goal is likely impractical due to imprecision in the timing and location of observed hazards, as well as potential changes in hazard speed between warning updates. Allowing the warning to persist for a few extra minutes may be desirable to account for these factors.

A third metric, false alarm time (FAT), is also analyzed. This is the total time that a false event remains warned. These events remain in a false condition throughout their lifetime.

For each truth event, the following quantities are calculated:
LT=tobsBeginstwarnBegins,
DT=twarnEndstobsEnds, and
FAT=twarnEndstwarnBegins,
where twarnBegins is the time that the warning begins, twarnEnds is the time that the warning ends, tobsBegins is the time that the observed event begins, and tobsEnds is the time that the observed event ends. LT and DT are only calculated for hit events. FAT is only calculated for false events.

Hit events, miss events, false events, and correct null events can also be used within a 2 × 2 table to compute POD, FAR, CSI, HSS, and other measures based on the truth events.

The number of grid cells (1 km2) that are characterized as false events (regardless of their FAT) is used to calculate an additional metric, the false alarm area (FAA). Both FAT and FAA are the measures that can be used to measure precision in warnings.

4. Data

a. Official NWS storm-based warnings (1 January 2008–31 December 2022)

SVRs and TORs, along with their associated SVSs for this period, were used to create two sets of 1 km2 × 1 min forecast grids representing each warning type. Tornado, hail, and wind reports for this period were used to create the 1 km2 × 1 min observation grids. Each report point was splatted to a 5-km radius for grid scoring, a value we considered sensible to account for the reasons mentioned in section 3a. Each report point used a 1-km radius (i.e., no splat) for truth-event scoring so that LT and DT accurately depicted the actual onset and offset of the event and not locations displaced ahead or behind the event. One set of grids containing just the tornado observations was used to verify TORs. Another set of grids containing the tornado, wind, and hail observations was used to verify SVRs.

b. 27–28 April 2011 superoutbreak

1) Storm datasets

The verification technique was tested on the data collected during the 27–28 April 2011 superoutbreak of tornadoes (NWS 2011a). Two datasets were used, and both were scored using TORs as forecasts and tornado reports as observations.

The first dataset contained a single long-tracked tornadic supercell that affected Tuscaloosa (TCL) and Birmingham (BHM) Alabama during the afternoon and evening of 27 April 2011 (hereinafter the “TCL storm”). This storm traversed the entire state of Alabama from southwest to northeast and produced two long-tracked violent tornadoes within Alabama (Fig. 3, top). The total duration of all tornadoes (187 min) comprised 71% of the total warned duration of all mesocyclones (262 min). The domain area for this first dataset comprised the Birmingham, Alabama, NWS Weather Forecast Office (BMX WFO) county warning area (CWA).

Fig. 3.
Fig. 3.

Tornadic mesocyclone paths during the 27–28 Apr 2011 outbreak. The tornadic (nontornadic) portion of the path used for the analysis is red (blue). (top) Path for the TCL storm affecting the BMX CWA (thick gray). Radar reflectivity images are overlain. Times are annotated. (bottom) Paths for the afternoon–evening–overnight portion of the outbreak affecting the four CWAs (JAN, BMX, HUN, and FFC) in the analysis domain (thick gray). The inset shows the tornado warning polygons between 1200 UTC 27 and 1200 UTC 28 Apr 2011 (courtesy V. Gensini).

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

The second dataset comprised every tornadic and nontornadic supercell from the outbreak during the afternoon and overnight period from 1830 UTC 27 April 2011 to 0900 UTC 28 April 2011 (Fig. 3, bottom). The total duration of all tornadoes (1685 min) comprised 54% of the total warned duration of all mesocyclones (3129 min). The domain area comprised the CWAs from these four WFOs: BMX; Jackson, Mississippi (JAN); Huntsville, Alabama (HUN); and Peachtree City, Georgia (FFC).

2) Observations

For this dataset, precise tornado observation information was desired along entire tracks. The tornado observation dataset from NCEI does not have the requisite level of precision and only includes the start and end locations/times for each county segment. Therefore, the observation grid was created by manually determining the centroid locations of each tornado using radar data and damage survey information at the radar data update rate, roughly 5-min intervals, and then interpolated at 1-min intervals. As with the previous set of observations, each report point was splatted to a 5-km radius for grid scoring and a 1-km radius for truth-event scoring. While this technique is quite laborious and not intended as the standard approach for most historical datasets, our verification system is well suited for future, more precise datasets built using technologies such as the Damage Assessment Toolkit (Camp et al. 2017), high-resolution satellite data, uncrewed aircraft systems, and crowd-sourced severe weather report data (Elmore et al. 2014).

An exclusion grid for determining correct null information was created using a time-matched median-filtered composite reflectivity (CREF) field from MRMS (Smith et al. 2016). Grid cells where CREF < 30 dBZ and are outside of observation splats and forecast grids were excluded from processing.

3) Forecasts

For the 27–28 April 2011 data, three types of forecast grids were used to compare warning performance.

(i) Official NWS warnings

For the first dataset, the BMX WFO issued six TORs covering the TCL storm over 4 h and 22 min (2038–0100 UTC), within Alabama. Each of the warnings was modified during their valid period using SVSs, which removes warning areas behind the threats. For the second dataset, the JAN, BMX, HUN, and FFC WFOs issued more than 200 TORs (Fig. 3, inset) throughout the dataset (1830–0900 UTC).

(ii) TIM

With the proposed TIM paradigm (or scheme), a warning polygon would be attached to and moved along with the threat (SG21). This could provide more equitable LT for all locations downstream of the event, which can be beneficial when warning for storms expected to last longer than the average warning duration (30–60 min), such as long-tracked significantly tornadic supercells.

To create TIM polygons, the procedure described by SG21 was followed. The process began with creating mesocyclone tracks by determining the centroid locations of each mesocyclone using radar data at roughly 5-min intervals and then interpolating at 1-min intervals. To match the warning decision times of the actual events, only those portions of the mesocyclone paths between the start time of the first TOR to the end time of the final TOR were used. Motion vectors were calculated for each centroid position by doing a time-weighted average of the past positions (higher weight was given to more recent positions).

TIM warning polygons were created at each warning decision time (either a new warning or a warning updated with an SVS) using a “default” warning polygon that is similar to the one created by the AWIPS-2 WarnGen application (Fig. 4). This default polygon uses the mesocyclone centroid as its starting position. Projecting the starting position using the storm motion vector and duration gives the ending position. A 20-km box is drawn around the starting threat position. A 30-km box is drawn around the projected ending position and is larger to account for storm motion uncertainty. The far corners of each box are connected to create a trapezoidal polygon. Between the warning decision times, the polygon drifted forward at 1-min intervals with slight expansion as described by SG21. A fixed duration of 45 min was used, which is the approximate average duration of the TORs for the outbreak.

Fig. 4.
Fig. 4.

The default TIM warning polygon that was used in the analysis.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

(iii) PHI

With the proposed PHI paradigm, a probability p plume would be attached to the threat and move forward along with it. The Thunderstorm Environment Strike Probability Algorithm (THESPA; Dance et al. 2010) was used to create probabilistic plumes for the mesocyclone locations in the 27–28 April 2011 dataset (Figs. 5 and 6). THESPA plumes are based on a statistical strike probability model that was developed using a database of storm detection centroids. The model determines probabilities where the storm areas will strike within a certain time. Because probabilities evolve with time, it makes little sense to issue probabilistic plumes at regular rewarning intervals (e.g., every 30 min). Otherwise, there would be dramatic changes in probabilities at downstream locations with each update. Thus, the strike probability plumes were updated every minute in the same fashion as TIM. Circular areas centered on the hazard with a 5-km radius were used to generate 180-min duration plumes. This duration was chosen because the p = 0.25 plume contour is similar in length and area to a typical traditional NWS tornado warning, representing a probability threshold at which a warning forecaster might decide to issue a warning.

Fig. 5.
Fig. 5.

Probabilistic plumes derived from THESPA for (a) a fast-moving storm and (b) a slow-moving storm. From Dance et al. (2010).

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

Fig. 6.
Fig. 6.

The THESPA-derived probabilistic grid at 2230 UTC during the 27–28 Apr 2011 outbreak.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

5. Results using the geospatial verification method

a. NWS warnings

1) Official NWS storm-based warnings (2008–22)

Annual statistics were calculated, showing the variation in warning performance for 11 years of storm-based NWS warnings (Fig. 7). For TOR, LT remained steady for the first 4 years (2008–11), declined for 3 years (2012–14), and then remained steady over the remainder of the period; DT and FAT also decreased post-2011 (Fig. 7a). POD peaked in 2011, declined for 3 years (2012–14), and then slowly increased for the rest of the period (Fig. 7b). The number of warnings and the average number of grid cells per TOR showed a dramatic difference between the periods 2008–11 and 2012–22 (Fig. 7c). Not only has the number of TORs decreased but also the average size of TORs has decreased. Beginning in 2012, the normalized number of false grid cells per warning (or FAA) has decreased with time, while the number of hit grid cells has remained mostly constant (Fig. 7d). These statistics are consistent with Brooks and Correia (2018), who concluded that an emphasis on reducing FAR led to fewer warnings, and a change in default warning durations that same year led to smaller warnings.

Fig. 7.
Fig. 7.

Annual statistics from 2008 to 2022: (a) Average lead time (blue), average departure time (red), and average false alarm time (purple) using the truth-event-scoring method for TOR; (b) POD1 using the grid-scoring method for TOR; (c) the number of warning products (red; TOR and SVS) and the average number of 1-km2 grid cells per TOR (blue); (d) the normalized number of false (blue) and hit (red) grid cells for TOR. (e)–(h) As in (a)–(d), but for SVR. For SVR scoring in (e) and (f), hail, wind, and tornadoes are used as observations. Note that the ordinates in (c) and (g) have different scales [(g) is 4 times larger than (c)].

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

By comparison, for SVR, LT, DT, FAT, and POD have remained mostly flat throughout the period (Figs. 7e,f). The number of grid cells per SVR nearly doubled starting in 2012, becoming approximately 4 times larger than for TOR through 2016, and then decreased again thereafter (Fig. 7g). This is possibly due to changes in the average durations for SVR (the default is 45 min vs 30 min), and perhaps, larger SVRs were being issued for squall lines between 2012 and 2016. The normalized number of false and hit grid cells per warning was nearly identical throughout the entire period (Fig. 7h). Apparently, the strategy to reduce FAR for TORs was not applied to SVRs. A deeper dive into the causes of these trends is beyond the scope of this paper.

2) The TCL storm

All six of the NWS warnings verified in the traditional sense—each contained a report of a tornado, and both tornadoes were warned. There were no misses and no false alarms. Thus, POD2 = 1.0, FAR1 = 0.0, and CSInws = 1.0. For both tornadoes, 184 of the 186 1-min tornado segments were warned, giving a PEW of 0.99 and CSInwsp = 0.99 (there was a 2-min unwarned gap southwest of Tuscaloosa). The average LT for all 1-min segments was 22.1 min. These numbers are considered very respectable and well above the average NWS goals as defined by the mandate of the Government Performance and Reporting Act (GPRA; Ralph et al. 2013).

The grid-scoring method gave POD = 0.9885, FAR = 0.9594, and CSI = 0.0406. The POD was comparatively similar to the traditional value of PEW. However, FAR was very large, and thus, CSI was very small compared to the traditional values. For each of the accumulated 1-min intervals of the 4 h and 22 min of warning for this event, over 99% of the grid cells within the warning polygons were not within 5 km of the tornado. Because the FAR here is an accumulated measure of the amount of area (by the number of grid cells) and time (by the number of grid intervals) that an area is falsely warned, this number is similar to FAT.

What is considered an acceptable value of FAA? Given the uncertainties of weather forecasting/warning and the limitations of the remote sensors (radar) to detect tornadic circulations, it should not be expected that the warnings perfectly match the paths of the hazards. The grid-scoring method can be used to determine if warnings are too large and too long (i.e., casting a wide net). One way to analyze this is to vary the size of the splat radius of influence around the tornado observations. Figure 8 depicts the range of POD, FAR, CSI, and HSS (which uses d in Table 4) for splat sizes ranging from 1 to 100 km at 1-km intervals. CSI and HSS are maximized for a splat radius of 20 km, understandably, as this is the width of the backside of the default warning polygons for this dataset (Fig. 4). Varying the warning width gives similar results (not shown). Therefore, if the aim is to provide warnings to all locations within an aforementioned safety margin around events, both the warning widths and the choice of splat radius should be similar. This balance is important, as warnings that are too large (small) will lead to higher FAA (more missed events).

Fig. 8.
Fig. 8.

Variation of POD (blue), FAR (red), CSI (orange), and HSS (purple) as skill scores as the observation splat size (km) varies.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

b. Comparing NWS warnings with threats-in-motion

Threats-in-motion is hypothesized to improve average LT, reduce average DT, reduce average FAT, and improve LT equitability for all grid cells. Truth-event-scoring metrics were used to measure the differences between NWS and TIM warnings for the TCL storm and the entire 27–28 April 2011 outbreak. The reader should consult SG21 for a thorough analysis of TIM on these and other cases.

1) The TCL storm

For the TCL storm, all of the measures were improved using TIM warnings (Table 5)—the average LT was more than doubled, the average DT was reduced to 1/3, and the average FAT was reduced.

Table 5.

Average lead time, average departure time, average false alarm time, mean absolute deviation of lead time, and interquartile range of lead time for all 1-min tornado segments for the central Alabama tornadic storm that affected Tuscaloosa and Birmingham (TCL–BHM storm) from 2038 to 0044 UTC 27–28 Apr 2011 and for all afternoon and overnight storms in the JAN–BMX–HUN–FFC domain for 27–28 Apr 2011. Units are minutes.

Table 5.

Figure 9 depicts the tornado LT timelines for each 1-min segment of the tornado track. For NWS warnings, as the storm moved through each warning, the warning LT for each segment increased by 1 min from the upstream to the downstream end of the warning. When the subsequent warning was issued, the LT for those segments of the tornado that were contained within the subsequent warning “reset” such that upstream (downstream) segments had a smaller (larger) LT. This “sawtooth” pattern indicates that the NWS warning LTs were not equitable—locations in the upstream portions of each NEW warning got much less LT than locations in the downstream portions of each NEW warning. The timeline shows that there were several portions of the tornado paths with NWS warning LT < 10 min, including a few LT < 0 across the unwarned gap. By comparison, with each TIM warning 1-min update as the warning persists, the next 1-min segment of the tornado track was placed under a new warning. The tornado LTs for the TIM warnings were more equitable, meaning adjacent locations along the tornado path received roughly the same LT.

Fig. 9.
Fig. 9.

Timeline of 1-min tornado segment lead times (min) for the central Alabama tornadic storm that affected Tuscaloosa and Birmingham on 27–28 Apr 2011, for the portion of the storm within Alabama from 2143 to 0044 UTC. NWS warnings are in blue, and 45-min TIM warnings are in red. The red arrows indicate locations where new NWS warnings became effective for those portions of the tornado tracks. The gap indicates when there was no tornado. Times are UTC.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

Figure 10 (top) illustrates the geospatial distribution of LT values (cf. this to Fig. 9). There were sharp discontinuities of LT at the downstream edges of NWS warnings (indicated by yellow arrows). These discontinuities were virtually eliminated with the TIM warnings.

Fig. 10.
Fig. 10.

Location-specific (top) LT, (middle) DT, and (bottom) FAT for the two tornadoes associated with the TCL storm. (left) NWS warnings and (right) 45-min TIM warnings. LT discontinuities for NWS warnings are indicated with yellow arrows at the top left. Cooler (warmer) colors indicate shorter (longer) times. Units are minutes.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

Another way to look at LT equitability is to examine the distribution of LT values along the tornado paths for both NWS and TIM warnings [Fig. 11 (top)]. For TIM warnings, there were many more values of LT > 40 min. LT equitability can also be measured using both the mean absolute deviation (MAD; the average of the absolute deviations from the mean) and the interquartile range (IQR; the difference between the 75th and 25th percentiles) of the distribution. The values of MAD and IQR for the TIM warnings were much less than for NWS warnings (Table 5).

Fig. 11.
Fig. 11.

Frequency distribution histograms of values for each 1-min tornado segment for the central Alabama tornadic storm that affected Tuscaloosa and Birmingham on 27–28 Apr 2011, for the portion of the storm within Alabama from 2143 to 0044 UTC: (top left) lead time for NWS warnings, (middle left) departure time for NWS warnings, (bottom left) false alarm time for NWS warnings, (top right) lead time for 45-min TIM warnings, (middle right) departure time for 45-min TIM warnings, and (bottom right) false alarm time for 45-min TIM warnings. Units are minutes.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

The average value of DT was lower for TIM warnings (Table 5). For TIM, DT is less than 10 min everywhere across the pathlength of both tornadoes [Fig. 10 (middle) and Fig. 11 (middle)]. In comparison, the NWS DTs were much greater. Some locations along the tornado remained under warning for more than 30 min after the threat had passed.

The average value of FAT was slightly lower for TIM warnings (Table 5). There were large areas within the NWS polygons that remained warned for over 50 min, even though these entire warnings would be considered hits using traditional scoring [Fig. 10 (bottom) and Fig. 11 (bottom)]. The total number of 1-km2 grid cells that were false events, or FAA, was 13 567 km2 for NWS warnings and 11 581 km2 for the TIM warnings. That is a 15% reduction in FAA with the TIM warnings. However, it should be noted that the default WarnGen polygons were used for TIM, and the average NWS polygon size was slightly larger than the default size for this event.

2) The afternoon–evening outbreak

Statistics for the entire afternoon and overnight tornados outbreak on 27–28 April 2011 over the four-CWA domain are also shown in Table 5. As with just the TCL storms, these measures were all improved using TIM warnings. Note, however, that the average DT for the TIM warnings was not as low as the TCL storm. Due to training storms, especially in northern Alabama, some of the TIM warnings for upstream storms overlapped with other storms downstream. In essence, the tornado locations on the first storm remained warned by the second storm’s warnings even after the tornadoes had moved away from those locations. Strategies to overcome this limitation include 1) removing the overlapping portions of the upstream warnings (by reducing their duration) despite the potential for reduced lead time for the upstream hazards and 2) implementing custom messages for overlapping warnings that would provide detailed information on arrival times for both hazards.

c. The effect of variable probability thresholds using PHI

With the proposed PHI paradigm, rapidly updating warnings could be derived from PHI plumes using probabilistic threshold values. For example, low (high) probability warnings could be used for users who are more (less) risk-averse or vulnerable to hazards. Middle-of-the-road probabilities could be used to derive “legacy” NWS warnings. Using the low (high) probability values as thresholds for warnings results in warning areas that are larger (smaller) than middle probability values (Fig. 12). Because the low (high) probability plumes will affect downstream locations earlier (later) than a middle probability plume, the result will be that impacted locations will be warned with potentially longer (shorter) LT than middle probability values.

Fig. 12.
Fig. 12.

The THESPA-derived probabilistic grid at 2230 UTC during the TCL storm. The thin yellow contour is the outline of the damage path from the first tornado. The red contours are three arbitrary warning “polygons,” solid, dashed, and dotted, corresponding to low, middle, and high probability values, respectively.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

Using the TCL storm, truth-event-scoring metrics were calculated for warnings derived from variable probability contour thresholds on the THESPA-generated probability plumes for this storm. Figure 13 depicts the variation in LT, FAR, and FAA for all truth events within the TCL storm time and space domain. A trade-off when using lower probability thresholds to increase LT is an increase in FAR and FAA. Note that while the THESPA plume model was chosen to illustrate this trade-off, specific values for each threshold will vary depending on the probability plume used.

Fig. 13.
Fig. 13.

Variation of LT (red), FAR (blue), and FAA (purple) for different warning probability thresholds.

Citation: Weather and Forecasting 39, 5; 10.1175/WAF-D-23-0153.1

6. Discussion

This proposed new geospatial warning verification method addresses the limitations of the current NWS warning verification method by consolidating the verification measures into one 2 × 2 contingency table with a common reference frame for both warnings and observations. Correct null forecasts can also be included in the measures if desired, as well as other performance metrics beyond those used in this paper. The method provides a more robust way to determine location-specific LT and introduces new metrics DT and FAT. The new method rewards spatial and temporal precision in warnings and allows users to better understand the impacts of the casting a wide net warning approach by measuring FAA and FAT even in verified warnings. This may also help in understanding if there truly is a high false alarm perception by the public (e.g., NWS 2011b).

The same caveats noted by Brooks (2004) should apply: making warnings smaller to reduce FAA might result in a larger number of miss events and a lower POD unless there is an improvement in the quality of the warning system. On the other hand, because the cost of missing a tornado is likely greater than the cost of false alarms, it might be reasonable to accept a certain amount of FAA and FAT. This could be achieved by allowing for larger splats around observations. To meet the demands of an acceptable warning service from a social science perspective, the geospatial verification method could be used to determine the optimal observation buffer (a representation of a “safe” distance from a tornado to be warned) and a complementary warning size to meet that requirement. Note that while this technique offers an alternative method to evaluate the quality of warnings, social science studies are still needed to help determine how user decisions actually map to this new technique.

The new verification method was used to demonstrate the benefits of the threats-in-motion (TIM) warning concept, which includes more-equitable LT for all users downstream of a threat, as well as better average LT, DT, and FAT. The TIM concept is essential for any future warning system that is based on PHI because the probabilities are rapidly updating. As threats approach locations downstream from a hazard, the probabilities should gradually increase. Therefore, it would be beneficial to employ TIM in any warning system that includes a geospatial probabilistic component such as PHI.

If fixed probability thresholds are used to draw warning contours, warnings will have variable sizes during the evolution of an event, especially at the start and end of an event where the warning areas will be minuscule. To avoid tiny warnings, the warning decision could instead be triggered by a probability threshold but then employ a minimum or a fixed duration to determine the warning length. One drawback to this latter approach is that warnings could contain varying minimum probabilities, which might be confusing for some users.

Increasing the duration of a traditional warning has the same effect as choosing a low probability threshold: greater LT but with the trade-off of higher FAA and FAT.2 The choice of probability threshold used to warn is dependent on specific users’ choice of LT, false alarm ratio, and their acceptable cost–loss ratio. As the science of predicting convective weather improves, the performance of warnings at greater lead times should improve. Probability plumes will likely be more focused (narrower) and extend higher probabilities farther downstream from the storm. The geospatial verification method will be beneficial to measure the performance of these improvements.

Goals for official NWS warning statistics fall under the mandate of the Government Performance and Reporting Act (GPRA; Ralph et al. 2013). These goals are based on the current verification method. If the NWS were to adopt this proposed new warning verification method, new metrics and policies for warning area and duration would need to be defined. Because longer LT leads to greater false alarms, any LT goals must be coupled with complementary performance goals.

1

By its nature, Storm Data collection sometimes does not record multiple verifying events. That is, if a single report verifies a warning polygon, sometimes no others are sought.

2

Note that with traditional verification, increasing warning durations would not affect FAR.

Acknowledgments.

Sincere appreciation is given to the following individuals: Harold Brooks (whose original idea inspired this paper), Kim Elmore, Lans Rothfusz, Alan Gerard, Travis Smith, Brandon Smith, and Vicki Farmer (NOAA National Severe Storms Laboratory), Mike Magsig and Jim LaDue (NWS Warning Decision Training Division), Kevin Manross (NOAA Global Systems Laboratory), Sandy Dance (Bureau of Meteorology, Australia), Kevin Scharfenberg (NWS Forecast Decision Training Division), Chris Karstens (NWS Storm Prediction Center), and Stephan Smith and Judy Ghirardelli (NWS Meteorological Development Laboratory). The manuscript was substantially improved, thanks to the constructive comments of several anonymous reviewers. This research was supported in part by the Office of Weather and Air Quality through the U.S. Weather Research Program, by the Cooperative Institute for Research in the Atmosphere (CIRA) via NOAA Cooperative Agreement NA19OAR4320073 and by the Cooperative Institute for Mesoscale Meteorology Studies (CIMMS) via NOAA Cooperative Agreement NA11OAR4320072. The statements, findings, conclusions, and recommendations are those of the authors and do not necessarily reflect the views of their institutions.

Data availability statement.

The data and documentation described in this paper are available by contacting the corresponding author.

REFERENCES

  • Brooks, H. E., 2004: Tornado warning performance in the past and future: A perspective from signal detection theory. Bull. Amer. Meteor. Soc., 85, 837844, https://doi.org/10.1175/BAMS-85-6-837.

    • Search Google Scholar
    • Export Citation
  • Brooks, H. E., and J. Correia, 2018: Long-term performance metrics for National Weather Service tornado warnings. Wea. Forecasting, 33, 15011511, https://doi.org/10.1175/WAF-D-18-0120.1.

    • Search Google Scholar
    • Export Citation
  • Brotzge, J., and S. Erickson, 2009: NWS tornado warnings with zero or negative lead times. Wea. Forecasting, 24, 140154, https://doi.org/10.1175/2008WAF2007076.1.

    • Search Google Scholar
    • Export Citation
  • Camp, J. P., P. Kirkwood, J. G. LaDue, L. A. Schultz, and N. Parikh, 2017: National weather service damage assessment toolkit: Transitioning to operations. Fifth Symp. on Building a Weather-Ready Nation: Enhancing our Nation’s Readiness, Responsiveness, and Resilience to High Impact Weather Events, Seattle, WA, Amer. Meteor. Soc., 9.1, https://ams.confex.com/ams/97Annual/webprogram/Paper312451.html.

  • Cressman, G. P., 1959: An operational objective analysis system. Mon. Wea. Rev., 87, 367374, https://doi.org/10.1175/1520-0493(1959)087<0367:AOOAS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Dance, S., E. Ebert, and D. Scurrah, 2010: Thunderstorm strike probability nowcasting. J. Atmos. Oceanic Technol., 27, 7993, https://doi.org/10.1175/2009JTECHA1279.1.

    • Search Google Scholar
    • Export Citation
  • Elmore, K. L., Z. L. Flamig, V. Lakshmanan, B. T. Kaney, V. Farmer, H. D. Reeves, and L. P. Rothfusz, 2014: MPING: Crowd-sourcing weather reports for research. Bull. Amer. Meteor. Soc., 95, 13351342, https://doi.org/10.1175/BAMS-D-13-00014.1.

    • Search Google Scholar
    • Export Citation
  • Erickson, S., and H. Brooks, 2006: Lead time and time under tornado warnings: 1986–2004. Proc. 23rd Conf. on Severe Local Storms, St. Louis, MO, Amer. Meteor. Soc., 11.5, https://ams.confex.com/ams/23SLS/webprogram/Paper115194.html.

  • Hansen, T. L., and Coauthors, 2018: FACETs—The 2017/2018 Hazard Services–Probabilistic Hazard Information (HS-PHI) experiments at the NOAA Hazardous Weather Testbed. Eighth Conf. on Transition of Research to Operations, Austin, TX, Amer. Meteor. Soc., 6.3, https://ams.confex.com/ams/98Annual/webprogram/Paper328341.html.

  • Hitchens, N. M., H. E. Brooks, and M. P. Kay, 2013: Objective limits on forecasting skill of rare events. Wea. Forecasting, 28, 525534, https://doi.org/10.1175/WAF-D-12-00113.1.

    • Search Google Scholar
    • Export Citation
  • Karstens, C. D., and Coauthors, 2015: Evaluation of a probabilistic forecasting methodology for severe convective weather in the 2014 Hazardous Weather Testbed. Wea. Forecasting, 30, 15511570, https://doi.org/10.1175/WAF-D-14-00163.1.

    • Search Google Scholar
    • Export Citation
  • Karstens, C. D., and Coauthors, 2018: Development of a human–machine mix for forecasting severe convective events. Wea. Forecasting, 33, 715737, https://doi.org/10.1175/WAF-D-17-0188.1.

    • Search Google Scholar
    • Export Citation
  • Kuhlman, K. M., T. M. Smith, G. J. Stumpf, K. L. Ortega, and K. L. Manross, 2008: Experimental probabilistic hazard information in practice: Results from the 2008 EWP Spring Program. 24th Conf. on Severe Local Storms, Savannah, GA, Amer. Meteor. Soc., 8A.2, https://ams.confex.com/ams/24SLS/techprogram/paper_142027.htm.

  • Marzban, C., 1998: Scalar measures of performance in rare-event situations. Wea. Forecasting, 13, 753763, https://doi.org/10.1175/1520-0434(1998)013<0753:SMOPIR>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Mazur, R. J., G. J. Stumpf, and V. Lakshmanan, 2004: Quality control of radar data to improve mesocyclone detection. 20th Int. Conf. on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Seattle, WA, Amer. Meteor. Soc., P1.2a, https://ams.confex.com/ams/84Annual/webprogram/Paper71377.html.

  • NWS, 2011a: Service assessment: The historic tornadoes of April 2011. NOAA National Disaster Survey Rep., 76 pp., http://www.weather.gov/media/publications/assessments/historic_tornadoes.pdf.

  • NWS, 2011b: NWS Central Region service assessment: Joplin, Missouri, Tornado – May 22, 2011. NOAA National Disaster Survey Rep., 41 pp., https://www.weather.gov/media/publications/assessments/Joplin_tornado.pdf.

  • NWS, 2011c: Verification. National Weather Service Instruction 10-1601, September 28, 2011, 100 pp., https://www.nws.noaa.gov/directives/010/archive/pd01016001e.pdf.

  • NWS, 2018: Storm Data preparation. National Weather Service Instruction 10-1605, July 26, 2021, 110 pp., https://www.nws.noaa.gov/directives/sym/pd01016005curr.pdf.

  • NWS, 2020: WFO severe weather products specification. National Weather Service Instruction 10-511, April 15, 2020, 76 pp., http://www.nws.noaa.gov/directives/sym/pd01005011curr.pdf.

  • NWS, 2022: Verification. National Weather Service Instruction 10-1601, July 7, 2022, 7 pp., https://www.nws.noaa.gov/directives/sym/pd01016001curr.pdf.

  • Ralph, F. M., and Coauthors, 2013: The emergence of weather-related testbeds linking research and forecasting operations. Bull. Amer. Meteor. Soc., 94, 11871211, https://doi.org/10.1175/BAMS-D-12-00080.1.

    • Search Google Scholar
    • Export Citation
  • Rothfusz, L. P., R. Schneider, D. Novak, K. Klockow-McClain, A. E. Gerard, C. Karstens, G. J. Stumpf, and T. M. Smith, 2018: FACETs: A proposed next-generation paradigm for high-impact weather forecasting. Bull. Amer. Meteor. Soc., 99, 20252043, https://doi.org/10.1175/BAMS-D-16-0100.1.

    • Search Google Scholar
    • Export Citation
  • Smith, T. M., and Coauthors, 2016: Multi-Radar Multi-Sensor (MRMS) severe weather and aviation products: Initial operating capabilities. Bull. Amer. Meteor. Soc., 97, 16171630, https://doi.org/10.1175/BAMS-D-14-00173.1.

    • Search Google Scholar
    • Export Citation
  • Stumpf, G. J., and A. E. Gerard, 2021: National Weather Service severe weather warnings as Threats-in-Motion (TIM). Wea. Forecasting, 36, 627643, https://doi.org/10.1175/WAF-D-20-0159.1.

    • Search Google Scholar
    • Export Citation
  • Stumpf, G. J., T. M. Smith, K. Manross, and D. L. Andra, 2008: The Experimental Warning Program 2008 spring experiment at the NOAA Hazardous Weather Testbed. 24th Conf. on Severe Local Storms, Savannah, GA, Amer. Meteor. Soc., 8A.1, https://ams.confex.com/ams/24SLS/techprogram/paper_141712.htm.

  • Trapp, R. J., D. M. Wheatley, N. T. Atkins, R. W. Przybylinski, and R. Wolf, 2006: Buyer beware: Some words of caution on the use of severe wind reports in post-event assessment and research. Wea. Forecasting, 21, 408415, https://doi.org/10.1175/WAF925.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Elsevier, 676 pp.

  • Witt, A., M. D. Eilts, G. J. Stumpf, E. D. Mitchell, J. T. Johnson, and K. W. Thomas, 1998: Evaluating the performance of WSR-88D severe storm detection algorithms. Wea. Forecasting, 13, 513518, https://doi.org/10.1175/1520-0434(1998)013<0513:ETPOWS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
Save
  • Brooks, H. E., 2004: Tornado warning performance in the past and future: A perspective from signal detection theory. Bull. Amer. Meteor. Soc., 85, 837844, https://doi.org/10.1175/BAMS-85-6-837.

    • Search Google Scholar
    • Export Citation
  • Brooks, H. E., and J. Correia, 2018: Long-term performance metrics for National Weather Service tornado warnings. Wea. Forecasting, 33, 15011511, https://doi.org/10.1175/WAF-D-18-0120.1.

    • Search Google Scholar
    • Export Citation
  • Brotzge, J., and S. Erickson, 2009: NWS tornado warnings with zero or negative lead times. Wea. Forecasting, 24, 140154, https://doi.org/10.1175/2008WAF2007076.1.

    • Search Google Scholar
    • Export Citation
  • Camp, J. P., P. Kirkwood, J. G. LaDue, L. A. Schultz, and N. Parikh, 2017: National weather service damage assessment toolkit: Transitioning to operations. Fifth Symp. on Building a Weather-Ready Nation: Enhancing our Nation’s Readiness, Responsiveness, and Resilience to High Impact Weather Events, Seattle, WA, Amer. Meteor. Soc., 9.1, https://ams.confex.com/ams/97Annual/webprogram/Paper312451.html.

  • Cressman, G. P., 1959: An operational objective analysis system. Mon. Wea. Rev., 87, 367374, https://doi.org/10.1175/1520-0493(1959)087<0367:AOOAS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Dance, S., E. Ebert, and D. Scurrah, 2010: Thunderstorm strike probability nowcasting. J. Atmos. Oceanic Technol., 27, 7993, https://doi.org/10.1175/2009JTECHA1279.1.

    • Search Google Scholar
    • Export Citation
  • Elmore, K. L., Z. L. Flamig, V. Lakshmanan, B. T. Kaney, V. Farmer, H. D. Reeves, and L. P. Rothfusz, 2014: MPING: Crowd-sourcing weather reports for research. Bull. Amer. Meteor. Soc., 95, 13351342, https://doi.org/10.1175/BAMS-D-13-00014.1.

    • Search Google Scholar
    • Export Citation
  • Erickson, S., and H. Brooks, 2006: Lead time and time under tornado warnings: 1986–2004. Proc. 23rd Conf. on Severe Local Storms, St. Louis, MO, Amer. Meteor. Soc., 11.5, https://ams.confex.com/ams/23SLS/webprogram/Paper115194.html.

  • Hansen, T. L., and Coauthors, 2018: FACETs—The 2017/2018 Hazard Services–Probabilistic Hazard Information (HS-PHI) experiments at the NOAA Hazardous Weather Testbed. Eighth Conf. on Transition of Research to Operations, Austin, TX, Amer. Meteor. Soc., 6.3, https://ams.confex.com/ams/98Annual/webprogram/Paper328341.html.

  • Hitchens, N. M., H. E. Brooks, and M. P. Kay, 2013: Objective limits on forecasting skill of rare events. Wea. Forecasting, 28, 525534, https://doi.org/10.1175/WAF-D-12-00113.1.

    • Search Google Scholar
    • Export Citation
  • Karstens, C. D., and Coauthors, 2015: Evaluation of a probabilistic forecasting methodology for severe convective weather in the 2014 Hazardous Weather Testbed. Wea. Forecasting, 30, 15511570, https://doi.org/10.1175/WAF-D-14-00163.1.

    • Search Google Scholar
    • Export Citation
  • Karstens, C. D., and Coauthors, 2018: Development of a human–machine mix for forecasting severe convective events. Wea. Forecasting, 33, 715737, https://doi.org/10.1175/WAF-D-17-0188.1.

    • Search Google Scholar
    • Export Citation
  • Kuhlman, K. M., T. M. Smith, G. J. Stumpf, K. L. Ortega, and K. L. Manross, 2008: Experimental probabilistic hazard information in practice: Results from the 2008 EWP Spring Program. 24th Conf. on Severe Local Storms, Savannah, GA, Amer. Meteor. Soc., 8A.2, https://ams.confex.com/ams/24SLS/techprogram/paper_142027.htm.

  • Marzban, C., 1998: Scalar measures of performance in rare-event situations. Wea. Forecasting, 13, 753763, https://doi.org/10.1175/1520-0434(1998)013<0753:SMOPIR>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Mazur, R. J., G. J. Stumpf, and V. Lakshmanan, 2004: Quality control of radar data to improve mesocyclone detection. 20th Int. Conf. on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Seattle, WA, Amer. Meteor. Soc., P1.2a, https://ams.confex.com/ams/84Annual/webprogram/Paper71377.html.

  • NWS, 2011a: Service assessment: The historic tornadoes of April 2011. NOAA National Disaster Survey Rep., 76 pp., http://www.weather.gov/media/publications/assessments/historic_tornadoes.pdf.

  • NWS, 2011b: NWS Central Region service assessment: Joplin, Missouri, Tornado – May 22, 2011. NOAA National Disaster Survey Rep., 41 pp., https://www.weather.gov/media/publications/assessments/Joplin_tornado.pdf.

  • NWS, 2011c: Verification. National Weather Service Instruction 10-1601, September 28, 2011, 100 pp., https://www.nws.noaa.gov/directives/010/archive/pd01016001e.pdf.

  • NWS, 2018: Storm Data preparation. National Weather Service Instruction 10-1605, July 26, 2021, 110 pp., https://www.nws.noaa.gov/directives/sym/pd01016005curr.pdf.

  • NWS, 2020: WFO severe weather products specification. National Weather Service Instruction 10-511, April 15, 2020, 76 pp., http://www.nws.noaa.gov/directives/sym/pd01005011curr.pdf.

  • NWS, 2022: Verification. National Weather Service Instruction 10-1601, July 7, 2022, 7 pp., https://www.nws.noaa.gov/directives/sym/pd01016001curr.pdf.

  • Ralph, F. M., and Coauthors, 2013: The emergence of weather-related testbeds linking research and forecasting operations. Bull. Amer. Meteor. Soc., 94, 11871211, https://doi.org/10.1175/BAMS-D-12-00080.1.

    • Search Google Scholar
    • Export Citation
  • Rothfusz, L. P., R. Schneider, D. Novak, K. Klockow-McClain, A. E. Gerard, C. Karstens, G. J. Stumpf, and T. M. Smith, 2018: FACETs: A proposed next-generation paradigm for high-impact weather forecasting. Bull. Amer. Meteor. Soc., 99, 20252043, https://doi.org/10.1175/BAMS-D-16-0100.1.

    • Search Google Scholar
    • Export Citation
  • Smith, T. M., and Coauthors, 2016: Multi-Radar Multi-Sensor (MRMS) severe weather and aviation products: Initial operating capabilities. Bull. Amer. Meteor. Soc., 97, 16171630, https://doi.org/10.1175/BAMS-D-14-00173.1.

    • Search Google Scholar
    • Export Citation
  • Stumpf, G. J., and A. E. Gerard, 2021: National Weather Service severe weather warnings as Threats-in-Motion (TIM). Wea. Forecasting, 36, 627643, https://doi.org/10.1175/WAF-D-20-0159.1.

    • Search Google Scholar
    • Export Citation
  • Stumpf, G. J., T. M. Smith, K. Manross, and D. L. Andra, 2008: The Experimental Warning Program 2008 spring experiment at the NOAA Hazardous Weather Testbed. 24th Conf. on Severe Local Storms, Savannah, GA, Amer. Meteor. Soc., 8A.1, https://ams.confex.com/ams/24SLS/techprogram/paper_141712.htm.

  • Trapp, R. J., D. M. Wheatley, N. T. Atkins, R. W. Przybylinski, and R. Wolf, 2006: Buyer beware: Some words of caution on the use of severe wind reports in post-event assessment and research. Wea. Forecasting, 21, 408415, https://doi.org/10.1175/WAF925.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Elsevier, 676 pp.

  • Witt, A., M. D. Eilts, G. J. Stumpf, E. D. Mitchell, J. T. Johnson, and K. W. Thomas, 1998: Evaluating the performance of WSR-88D severe storm detection algorithms. Wea. Forecasting, 13, 513518, https://doi.org/10.1175/1520-0434(1998)013<0513:ETPOWS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Examples of two warning polygons, with the one on the right larger than the one on the left.

  • Fig. 2.

    Hypothetical storm, warning, and report point splat area. The correct null area (blue) roughly outlines the “storm.” The warning polygon is comprised of the false alarm area (red) and the hit area (gray). The report point splat area is comprised of the miss area (white), and it shares the hit area (gray) with the warning polygon. All grid cells outside of these areas are considered nonevents.

  • Fig. 3.

    Tornadic mesocyclone paths during the 27–28 Apr 2011 outbreak. The tornadic (nontornadic) portion of the path used for the analysis is red (blue). (top) Path for the TCL storm affecting the BMX CWA (thick gray). Radar reflectivity images are overlain. Times are annotated. (bottom) Paths for the afternoon–evening–overnight portion of the outbreak affecting the four CWAs (JAN, BMX, HUN, and FFC) in the analysis domain (thick gray). The inset shows the tornado warning polygons between 1200 UTC 27 and 1200 UTC 28 Apr 2011 (courtesy V. Gensini).

  • Fig. 4.

    The default TIM warning polygon that was used in the analysis.

  • Fig. 5.

    Probabilistic plumes derived from THESPA for (a) a fast-moving storm and (b) a slow-moving storm. From Dance et al. (2010).

  • Fig. 6.

    The THESPA-derived probabilistic grid at 2230 UTC during the 27–28 Apr 2011 outbreak.

  • Fig. 7.

    Annual statistics from 2008 to 2022: (a) Average lead time (blue), average departure time (red), and average false alarm time (purple) using the truth-event-scoring method for TOR; (b) POD1 using the grid-scoring method for TOR; (c) the number of warning products (red; TOR and SVS) and the average number of 1-km2 grid cells per TOR (blue); (d) the normalized number of false (blue) and hit (red) grid cells for TOR. (e)–(h) As in (a)–(d), but for SVR. For SVR scoring in (e) and (f), hail, wind, and tornadoes are used as observations. Note that the ordinates in (c) and (g) have different scales [(g) is 4 times larger than (c)].

  • Fig. 8.

    Variation of POD (blue), FAR (red), CSI (orange), and HSS (purple) as skill scores as the observation splat size (km) varies.

  • Fig. 9.

    Timeline of 1-min tornado segment lead times (min) for the central Alabama tornadic storm that affected Tuscaloosa and Birmingham on 27–28 Apr 2011, for the portion of the storm within Alabama from 2143 to 0044 UTC. NWS warnings are in blue, and 45-min TIM warnings are in red. The red arrows indicate locations where new NWS warnings became effective for those portions of the tornado tracks. The gap indicates when there was no tornado. Times are UTC.

  • Fig. 10.

    Location-specific (top) LT, (middle) DT, and (bottom) FAT for the two tornadoes associated with the TCL storm. (left) NWS warnings and (right) 45-min TIM warnings. LT discontinuities for NWS warnings are indicated with yellow arrows at the top left. Cooler (warmer) colors indicate shorter (longer) times. Units are minutes.

  • Fig. 11.

    Frequency distribution histograms of values for each 1-min tornado segment for the central Alabama tornadic storm that affected Tuscaloosa and Birmingham on 27–28 Apr 2011, for the portion of the storm within Alabama from 2143 to 0044 UTC: (top left) lead time for NWS warnings, (middle left) departure time for NWS warnings, (bottom left) false alarm time for NWS warnings, (top right) lead time for 45-min TIM warnings, (middle right) departure time for 45-min TIM warnings, and (bottom right) false alarm time for 45-min TIM warnings. Units are minutes.

  • Fig. 12.

    The THESPA-derived probabilistic grid at 2230 UTC during the TCL storm. The thin yellow contour is the outline of the damage path from the first tornado. The red contours are three arbitrary warning “polygons,” solid, dashed, and dotted, corresponding to low, middle, and high probability values, respectively.

  • Fig. 13.

    Variation of LT (red), FAR (blue), and FAA (purple) for different warning probability thresholds.

All Time Past Year Past 30 Days
Abstract Views 173 173 0
Full Text Views 805 803 121
PDF Downloads 550 550 74