Thunderstorms in central Florida frequently halt outdoor activities, requiring that one wait some prescribed time after an assumed last flash before safely resuming activities. The goal of this research is to develop a high-skill probabilistic method that can be used in high pressure real-world operations to terminate lightning warnings more quickly while maintaining safety. Probabilistic guidance tools are created for isolated warm season storms in central Florida using dual-polarized radar data at 1-min intervals. The parameters examined are maximum reflectivity and graupel presence at the 0°, −5°, −10°, −15°, and −20°C levels as well as composite reflectivity. Random samples of the radar data are used to train a generalized linear model (GLM) to make a probabilistic prediction whether a given flash is the storm’s last flash. The most statistically significant predictors for lightning cessation are found to be the storm’s maximum reflectivity in the composite and the 0°C levels, along with graupel presence or absence at the −5°, −10°, −15°, and −20°C levels. Statistical verification is used to analyze the performance of the two GLMs at different probability thresholds (95.0%, 97.5%, and 99.0%). When applying the cessation guidance as though storms are occurring in real time, results showed ~99% of the storms produced no additional lightning after the GLM suggested cessation had already occurred. Although these results are encouraging, the procedure must be tested on much larger datasets having different convective modes and different areal coverages to prove its value compared to operational forecasters.
Lightning remains a deadly weather phenomenon in the United States in spite of recent reductions in lightning-related injuries and fatalities (Holle 2016). During 2006–17, a total of 376 people were killed by lightning in the United States (Jensenius 2018), and lightning-related damage in the United States exceeds $5 billion annually (National Lightning Safety Institute 2008). Furthermore, the threat of lightning can lead to costly and inconvenient delays in outdoor activities such as airport ground operations and sporting events (Steiner et al. 2014). It would be valuable to know with confidence whether a thunderstorm has produced its last flash so that activities can resume safely.
The U.S. Air Force’s 45th Weather Squadron (45WS) provides lightning- and other weather-related forecasts for operations at Kennedy Space Center (KSC), Cape Canaveral Air Force Station, and Patrick Air Force Base. These facilities are located along the Atlantic coast of the Florida peninsula, where warm season thunderstorms frequently are induced by sea and river breezes (e.g., Laird et al. 1995). The storms produce the greatest mean annual cloud-to-ground (CG) flash densities in the United States (e.g., Hodanish et al. 1997; Huffines and Orville 1999; Rudlosky and Fuelberg 2010, 2011; Mazzetti and Fuelberg 2017).
The 45WS issues lightning watches and warnings for 10 different locations in the KSC area for personnel safety and resource protection (Roeder et al. 2017). They issue an average of over 2500 lightning watches/warnings each year. A lightning warning is issued when total lightning [cloud-to-ground (CG) + lightning aloft (in-cloud or IC) = total lightning] is imminent or occurring within 5 nautical miles (n mi; 1 n mi = 1.852 km) of the location(s). The lightning warning is terminated when the threat of total lightning is over. This is generally done after waiting 15 min since the last total lightning was detected and development of new total lightning is not expected. Although providing enhanced safety, the lightning cessation criteria are a major cause of delays. The main goal of this research is to develop a high-skill probabilistic method that can be used in high pressure real-world operations to terminate lightning warnings more quickly while maintaining safety. Probabilistic lightning cessation guidance also has applications in a far-ranging set of other outdoor scenarios, including aviation, sporting events (e.g., the Florida State football stadium seats ~84 000 persons), and concerts. Many of these outdoor interests use the well-known 30–30 rule for lightning safety (e.g., Holle et al. 1999) whose second component means waiting at least 30 min after no lightning has occurred before resuming outdoor activities.
Forecasting lightning cessation is a major challenge, and relatively little research has explored this topic from the perspective of determining whether a given flash is the last flash of a particular storm. Several studies have explored relationships between cessation and reflectivity at specified isothermal levels (e.g., Hinson 1997; Holmes 2000; Wolf 2006; Melvin and Fuelberg 2010). Stano et al. (2010, hereafter ST10), found that flash interval (i.e., the time between successive flashes) was the best of several predictors they examined for lightning cessation.
Rather than investigating cessation using reflectivity alone or time between successive flashes, it is more appropriate to consider cessation from the perspective of the noninductive charging (NIC) mechanism, which is thought to be the dominant way by which clouds become electrically charged (e.g., Takahashi 1973; Saunders et al. 1991). NIC assumes that collisions between graupel and ice crystals in the presence of supercooled water droplets transfer electric charge between the hydrometeors, with the ice crystals acquiring a positive charge and the graupel carrying a negative charge. The lighter positively charged ice crystals then are lofted by the storm’s updraft, while the heavier negatively charged graupel particles remain primarily within the region from −10° to −20°C. Although the vertical electrical structure of thunderstorms varies (Rust and Marshall 1996), a tripole structure often is observed, with a shallow positive layer near 0°C, a strong negative layer between 10° and −20°C, and a positive layer above −20°C (Krehbiel 1986; Williams 1989). However, some storms exhibit an inverted electrical structure because graupel in the main midlevel charge region carries a positive charge (e.g., Lang et al. 2004; Wiens et al. 2005; Tessendorf et al. 2007; MacGorman et al. 2008; Weiss et al. 2008). If a sufficiently strong charge separation occurs, CG and/or IC lightning can occur, with the CG flashes requiring a charge source near the surface to connect with the charged flow from the cloud.
Radar-derived estimates of ice mass and the vertical flux of frozen hydrometeors have shown strong correlations to total lightning activity (e.g., Deierling et al. 2005, 2008; Wiens et al. 2005; Bruning et al. 2007; Lund et al. 2009). The lightning forecasting guidance that we develop here utilizes dual-polarized radar data to discern the presence of graupel and other ice particles with the hypothesis that it could be an important factor in guidance for lightning cessation.
Little research has examined the utility of using dual-polarized radar data as cessation guidance. Carey et al. (2009) and Schultz et al. (2013) examined radar differential phase signatures and the vertical alignment of ice crystals as cessation neared. Seroka et al. (2012) studied vertically integrated ice and the timing of IC and CG flashes in relation to lightning cessation. Preston and Fuelberg (2015, hereafter PF15) and Davey and Fuelberg (2017, hereafter DF17) investigated the use of dual-polarized data in the KSC area. PF15 constructed a dataset of conventional and dual-polarized radar products at the 0°, −10°, and −20°C isothermal levels from 50 isolated thunderstorms undergoing lightning cessation in Oklahoma and central Florida. These were environmental isothermal levels, which may be thousands of feet lower than values within the warmer updraft core where graupel develops. To be considered “isolated,” any reflectivity channels between adjacent clouds had to be less than 15 dBZ. PF15 found that when conventional horizontal reflectivity of the isolated storms decreased below 35 dBZ at the −10°C level, and there was no indication of graupel based on a hydrometeor classification algorithm (HCA), waiting an additional 10 min before terminating a lightning advisory would safely allow outdoor operations to resume. DF17 investigated 50 nonisolated warm season thunderstorms using radar products; however, no combination of dual-polarized or conventional radar-derived parameters could accurately, and most importantly safely, determine when lightning cessation had occurred in these more complex storms. Neither PF15 nor DF17 expressed cessation in a probabilistic manner, opting instead to derive deterministic forecasting techniques.
This study examines 184 nonsevere warm season storms over central Florida to determine the probability that a given storm at a specific time will produce no additional flashes. We are not attempting to forecast the time at which a storm’s last flash will occur, but rather whether no additional lightning is reasonably expected on an ongoing minute-by-minute basis. We also do not address how long to wait until it is safe after a storm passes but does not dissipate. At many outdoor events, for example, the main issue is when a storm will be safely past the venue (i.e., nondissipating). “Bolts from the blue,” first described by Rison et al. (2003) and later examined by Dowdy and Mills (2012), are an example of this dangerous phenomenon since they can extend more than 15 km from cloud edge in central Florida (Fig. 2 of Fuelberg et al. 2014).
The presence of graupel at different temperature levels in the mixed-phase (MP) region (i.e., 0°, −10°, −15°, and −20°C) is assumed to be a proxy for storm electrification. Values of radar reflectivity in the MP regions also are considered. We examine these radar-derived properties of each ongoing isolated thunderstorm on a minute-by-minute basis through lightning cessation and ultimately, storm dissipation. Random samples of the dataset are used to train two predictive models that output the probability that cessation has occurred. The performance of the models then is quantified by testing on a different set of storms. To our knowledge, this is the first study that has attempted to develop probabilistic guidance using dual-polarized radar data for determining whether total lightning cessation has occurred in isolated thunderstorms.
2. Data and methods
We only considered storms during the warm season months (May–September) of 2013–15 and within the KSC area. The predictive models that we develop produce a probability that cessation has occurred given a set of real-time inputs describing an ongoing thunderstorm. Our hypothesis is that if reflectivity values at specified isothermal levels decrease below a certain threshold, and if graupel is no longer present, then another lightning flash is unlikely. A schematic example of the hypothesis is in Fig. 1. Before cessation (top row), maximum reflectivities exceed 45 dBZ and graupel is present (pink shaded region, see Fig. 2). After cessation (bottom row) there is no graupel and maximum reflectivities have decreased.
a. Lightning data
Two systems were employed to detect lightning flashes. The first was the National Lightning Detection Network (NLDN) (e.g., Orville 2008) whose flashes have a location accuracy less than 500 m in the area of interest and a CG detection efficiency of 90%–95% (e.g., Cummins and Murphy 2009, Nag et al. 2011). The NLDN detects CG flashes better than their IC counterparts (e.g., Nag et al. 2014). Therefore, total lightning (IC + CG) was obtained from this and a second data source, the second generation Lightning Detection and Ranging (LDAR-II) network (Poehler and Lennon 1979, Boccippio et al. 2001, Roeder 2010) that encompasses KSC (Fig. 1 of PF15). The network consists of nine sensors that together capture in three dimensions the IC flash channels and upper portions of CG strike channels. LDAR-II senses electromagnetic pulses (sources) in the very high-frequency (VHF) band that are emitted as lightning channels propagate, providing a flash detection efficiency exceeding 90% within 100 km of the center of the network (Boccippio et al. 2001).
The Warning Decision Support System–Integrated Information (WDSS-II) software (Lakshmanan et al. 2007b) was used to ingest the LDAR-II source data and consolidate them into lightning flash channels based on spatial and temporal criteria (Lakshmanan et al. 2007b). Details about WDSS-II are given in PF15, DF17, and the references therein. The algorithm was used with its default spatial settings, similar to PF15 and DF17. A 300 ms time constraint between individual sources greatly reduced the probability that two flashes would be combined into one. We required that both IC and CG flashes contain at least three VHF sources (Nelson 2002). Both the LDAR-II and NLDN flash data were displayed in WDSS-II at 1-min intervals.
Based on LDAR-II’s detection efficiency, our study domain was limited to a 100-km radius from the center of the network (Fig. 1 of PF15). The time of lightning cessation for each thunderstorm was defined as the minute at which the final flash occurred. That last flash could be either CG or IC; we did not distinguish between them in the model development.
b. Radar data
Dual-polarized Level-II WSR-88D S-band radar data (e.g., Kumjian 2013) from the National Weather Service in Melbourne, Florida (KMLB), were used. Many factors influence results based on radar data, including the volume coverage pattern (VCP) being used, details of beam refraction, and whether the radar is properly calibrated. An especially pertinent issue for this study is the radar's “cone of silence,” an inverted cone extending upward from the radar site that cannot be sampled by the available elevation angle scans. Specifically, the cone prevents sampling the upper portions of storms near the radar site. Although KMLB’s cone of silence extends into part of our domain, any potential storm candidate whose sampling was compromised by it was not included in the final dataset of 184 storms.
Several aspects of processing the radar data using WDSS-II should be noted. First, we employed the WDSS-II internal quality-control algorithm (w2qcnn) that accounts for beam blockages and removes ground clutter and other anomalous returns (Lakshmanan et al. 2007a). Second, the research required, and WDSS-II provided, radar data on isothermal surfaces (e.g., 0°, −5°C, etc.) within the MP region. The altitudes of these specified temperatures were obtained hourly from the National Centers for Environmental Protection’s Rapid Refresh (RAP) model analyses at 13-km grid spacing (Earth System Research Laboratory 2017). As done by PF15 and DF17, WDSS-II merged these hourly heights with the radar scans using the w2merger algorithm (Lakshmanan et al. 2006). The radar products at 1-km horizontal and vertical resolutions are updated as data from individual radar elevation angles become available. This allows the radar data to be updated at ~1-min intervals instead of waiting for the completion of a full volume scan. The impact of using WDSS-II to obtain 1-min radar products instead of 4–5-min volume scans is not as great as one might think because our storms were fairly close (<80 km) to the KMLB WSR-88D. Thus, the 1-min interval only equates to the addition of 1–2 elevation scans, depending on the VCP being utilized. The “hybrid” 1-min data presently are not available to the National Weather Service or the 45WS for operational use. Thus, our study is meant to be a “proof of concept” of the methodology.
The presence of graupel in a storm is an important aspect of NIC theory and our lightning cessation hypothesis. Since the archived Level-II radar data did not contain hydrometeor types, we created them at 1-min intervals using the Hydrometeor Classification Algorithm (HCA) internal to the WDSS-II software (Schuur et al. 2003, Ryzhkov et al. 2005). The algorithm determines which type of hydrometeor (see Fig. 2) most likely is associated with each 1 km × 1 km grid box by using a fuzzy logic scheme based on inputs of dual-polarized and conventional radar products (Kumjian 2013).
Accurately tracking each storm also was an important part of the methodology. WDSS-II can track a storm based on user-selected parameters such as composite reflectivity, vertically integrated liquid (VIL), or HCA value by using a K-means storm clustering and tracking algorithm denoted w2segmotionll (Lakshmanan and Smith 2009, Lakshmanan et al. 2009, Lakshmanan 2012). We employed this algorithm to track each potential storm based on values of composite reflectivity. Specifically, the algorithm identified the individual 1 km × 1 km grid boxes having preselected values of reflectivity, defined a polygon around those grid points, and compiled a text database of storm parameters such as reflectivity and HCA within that polygon at different isothermal levels at 1-min intervals as the polygon (storm) moved in time. Since the algorithm sometimes had difficulty tracking smaller or decaying storms, we manually verified that each automated track was consistent with its associated radar and lightning data. The few cases of incorrect tracking were corrected manually or deleted. Fewer than 10% of the 184 storms in our dataset had any relevant data points that needed to be corrected manually for accuracy. In the majority of the cases in which corrections needed to be made, only one or two 1-min observations were not automatically tracked correctly.
Assuming that each 1-min WDSS-II-derived storm observation is independent of those before and after will exaggerate the statistical significance of any differences in the sample distributions. Time series of atmospheric variables always have some degree of autocorrelation, also denoted serial correlation (von Storch and Zwiers 1999). Autocorrelation is defined as the correlation of a set of values with itself, thereby measuring the degree of serial dependence in a time series. Autocorrelation is relatively large when sampling at short time intervals, such as our consecutive 1-min storm samples. Rudlosky and Fuelberg (2013) calculated autocorrelation functions and effective sample sizes (Leith 1973, Wilks 2006) to evaluate the degree of serial correlation in time series of radar products obtained from the WDSS-II software, very similar to the radar data used here. Results showed that their 2-min data were effectively independent after only 6–12 min. We expect similar independence for our 1-min data. However, the important point is that our lightning cessation models (described later), did not employ time series of radar products in their development. Thus, serial correlation should have little or no effect on the models.
c. Storm dataset
Although several hundred potential storms were investigated using WDSS-II products, only 184 of them met the strict requirements to be included in the dataset. Selected storms had to remain within 100 km of the center of the LDAR-II network from their first flash to dissipation. Each storm had to produce at least 3 IC or CG flashes. The storms also had to be “isolated” at all times as defined in PF15. Specifically, none of the cells could have a composite reflectivity channel greater than 15 dBZ connecting the storm in question to a nearby storm. PF15 noted that no flashes originating from another convective cell extended to the storm of interest using this reflectivity threshold. We verified this required absence of flash interactions using the LDAR-II data. Finally, candidate storms had to always remain at least ~20 km away from the radar due to the cone of silence issue described previously. Any potential storm that met these requirements and had good WSR 88-D and LDAR-II coverage was included in the final set of 184 storms. An example storm at the time of its last flash is shown in Fig. 3.
Warm season thunderstorms in the study area frequently are forced by sea- and river-breeze circulations that favor storm development over land during the afternoon or early evening. The storms sometimes are isolated along the sea-breeze front or can occur in segments along it. The spatial distribution of the 184 isolated storms (Fig. 4) shows that ~85% of them underwent cessation over land. Except for the lack of storms in the cone of silence, no preferential storm locations by month or by year (not shown) are noticed over land during the May–September 2013–15 study period. The monthly and yearly storm counts (Table 1) show relatively similar numbers each month except September when the general strength of the sea- and river-breeze circulations begins to weaken. Of course, the strength of the circulations varies during all summer months, due especially to whether synoptic-scale offshore or onshore flow is occurring.
d. Model development
The first step in developing the cessation model was to select potential storm parameters that likely were related to the probability of lightning cessation. PF15 assessed numerous dual-polarized and conventional radar products and concluded that graupel presence and maximum horizontal reflectivity at three isothermal levels (0°, −10°, and −20°C) were most useful in predicting cessation in isolated storms. We added the −5° and −15°C levels to the list of potential parameters to determine whether the improved vertical resolution that they provided yielded better cessation results. Thus, we tracked and “data mined” at 1-min intervals the maximum horizontal reflectivity and the presence of graupel within the set of 1 km × 1 km WDSS-II-derived grid points defining each storm (i.e., inside the storm’s polygon) at five isothermal levels in the MP region (0°, −5°, −10°, −15°, and −20°C). These temperatures describe the storms’ environment, not their updraft core, since the RAP analyses at 13 km are too coarse to make this distinction. The 11 potential parameters and their levels of interest are listed in Table 2, with an example shown in Fig. 3. It is important to note that time during a storm’s lifetime was not considered a potential parameter because our goal was to develop guidance that only depended on storm characteristics.
Graupel presence was archived as a binary variable, with “one” indicating graupel presence at that minute and altitude, and “zero” denoting no graupel. If graupel occurred in at least one of the 1 km × 1 km grid boxes defining the storm at that level, then the value “one” was assigned at that time. One should note that the WDSS-II HCA product denotes the most likely hydrometeor category at each location. This approach could underestimate the presence of graupel if it were present but not the most likely hydrometeor at that location. The HCA provides no information about the quantity of graupel that is present in a particular grid box.
Values of the 11 parameters (Table 2) then were used to train a generalized linear model (GLM; Nelder and Wedderburn 1972; Agresti 2013), which would be used to calculate the probability that cessation had occurred. Each minute of radar data was classified as occurring either before lightning cessation (i.e., while lightning is still ongoing) or after cessation (i.e., when it is safe to terminate a lightning advisory). The GLM methodology is a flexible generalization of ordinary linear regression that allows response variables to have error distributions other than normal. It generalizes linear regression by allowing the model to be related to the response variable via a link function, by permitting the magnitude of the variance of each measurement to be a function of its predicted value, and unifies other statistical models, including linear, logistic, and Poisson regression. When using a binomial distribution to estimate error distributions for the GLM, the probability is calculated using the link function (Agresti 2013):
where x1, x2, x3, … represent statistically significant input parameters to the GLM such as maximum reflectivity (dBZ) at certain levels; c1, c2, c3, … are coefficients calculated by the GLM to be multiplied by their associated input parameter xi; and c0 is the intercept for the linear model. MATLAB software (The MathWorks Inc. 2016) was used to create the GLM cessation model. The GLM function within MATLAB outputs statistical testing information about each potential input variable, including its t statistic and the probability associated with that t statistic.
Two approaches were used for training the GLM, with each producing its own version of the final cessation model. The first approach divided the 184 storms into a training set (storms during 2013 and 2014, ~65% of the total) and a testing set (storms during 2015, ~35% of the total). Thus, the training data consisted of entries at 1-min intervals, each corresponding either to ongoing lightning or no further lightning (cessation had already occurred). It is important to note that the entries were not used in the form of a time series. This first approach produced a model that will be denoted the “independent version.”
The second training approach (depicted in Fig. 5) took a 40% random sample of 1-min observations from the complete dataset of storms both before and after lightning cessation and regardless of year. Once again, these training data were not used as a time series. The procedure was repeated 1000 times, following a widely used statistical procedure called bootstrapping whose goal is to treat a finite data sample as similarly as possible to the unknown population from which it is drawn. The procedure is described in detail by Wilks (2006, p.167), Efron and Tibshirani (1993), and the references therein. Use of this procedure yields the predictive model that will be denoted the “bootstrapped version.”
Both approaches to model training used forward stepwise selection with backward elimination of the original 11 predictors (Table 2) to pick those that provided the most statistical significance. Forward selection begins with no selected variables, then tests the addition of each variable using a chosen model fit criterion (p < 0.01), and adding that variable if its inclusion gives a significant improvement of the fit. At each stage in the process, after a new variable is added, a test is made to check if any previously selected variables can be deleted (backward elimination) without appreciably increasing the residual sum of squares. This process is repeated until no additional parameters improve the model to a statistically significant extent.
The independent version of the GLM selected the seven predictors listed in Table 3, while the bootstrapped version of the GLM chose the six predictors in Table 4. The only difference between the two approaches is that graupel presence at 0°C was selected only by the independent version. These radar-derived predictors correspond to x1, x2, x3, … in (1).
Maximum composite reflectivity and reflectivity often are used subjectively in operations to infer whether cessation likely has occurred. However, the regression described above also selects graupel presence or absence in the mixed phase region as providing statistically significant information (p < 0.01 level) about cessation.
The physical reasoning behind the selection of the predictors can only be hypothesized. Perhaps maximum reflectivities at the lower levels, instead of higher altitudes, provide the best estimate of thunderstorm intensity and its propensity to form new graupel. In addition, graupel presence in the mid- and upper levels may best indicate charge separation in the ongoing thunderstorm. We found that graupel at 0°C tended to persist long after cessation had occurred (not shown), possibly explaining its failure to be selected for the bootstrapped version of the GLM. However, it was selected for the independent version. Upper-level graupel possibly was selected for both versions because its presence indicated that electrification was still ongoing and therefore was more indicative of a potential future lightning flash.
Coefficients for each selected input parameter are in the rightmost columns of Tables 3 and 4. Values for the independent approach were based on storms during 2013 and 2014. For the bootstrapping method, the coefficient of each predictor is the median value after training the GLM 1000 times, each time based on a different random sample of observations from all 184 storms.
a. Evaluation of the cessation models
We now evaluate results from the two versions of the lightning cessation guidance model. The independent version (Table 3) was tested on the 35% of storms not used in training (i.e., each storm was considered separately during its life cycle). In the second evaluation, observations were not related to specific storms, but as one large group of observations. Although this second approach does not represent how a model would be applied operationally, the results are informative. We performed both evaluations at each 1-min radar observation; however, if a model were being used operationally, it could be applied at any time or interval during a storm’s existence.
The bootstrapped version of the model (Table 4) had to be evaluated differently than its independent counterpart since median parameter coefficients for the bootstrapped version were derived from 1000 random selections of the 1-min data. Thus, there was no separate independent dataset. Therefore, the testing data were a random sample of 60% of all 184 storms (110 storms). Data comprising these storms were utilized in the same two ways described for the independent model (i.e., testing each storm at 1-min observations, or as one large dataset that was not related to any particular storm). In both versions, the probability that cessation had occurred was calculated at each 1-min observation.
We needed to establish a probability threshold that when exceeded would imply that lightning cessation had occurred. In other words, we wanted to shift from a probabilistic to a deterministic result for the purpose of evaluation. We chose the probability thresholds of 95.0%, 97.5%, and 99.0%. These high probabilities were selected because ensuring safety is the primary concern of the 45WS and other potential users. The results of each probability threshold for the two GLMs were used to prepare 2 × 2 contingency tables (e.g., Table 5). Every minute/observation being tested was categorized as either A, B, C, or D and then used to calculate the statistical metrics defined in (2)–(8) and enumerated in Tables 6–7. The statistical metrics employed were (Wilks 2006) the following:
Since we seek to forecast the end of an event instead of its onset (the usual goal of forecasting), it is important to understand the meanings of the metrics in (2)–(7). In the lightning cessation paradigm, a false alarm at an observation time (B in Table 5) means that the statistical model determines that cessation has occurred, whereas observations indicate that lightning is still occurring. This is the most dangerous outcome from a safety perspective. Conversely, hits (A) indicate an observation/minute when the predictive model correctly forecasts that lightning cessation has already occurred. Correct null events (D) occur when the model correctly indicates that lightning is still ongoing. Finally, misses mean that the predictive model is being too cautious, indicating that cessation has not yet occurred although observations reveal that it has (C). Misses are not dangerous, although they do not provide a time savings.
It is encouraging that FARs for both models using the evaluation method where each 1-min observation is considered independently (Table 6) are less than 0.5% or even zero for each of the three probability thresholds. Thus the “dreaded” forecasting of cessation too early is rare–at least for this dataset. The PODs in Table 6 range from ~0.54 to ~0.70. Although the PODs vary slightly between the probability thresholds, there is little difference in values between the independent and bootstrap methods at a given threshold. The PODs are less than the ideal value of 1 because both versions of the GLM in some cases wait until after cessation has occurred to forecast its occurrence. Finally, although the differences are small, the skill scores HSS, TSS, and CSI suggest that the 95.0% threshold is the best performing threshold. The bootstrapped model performs slightly better than the independent GLM in most of the forecast metrics, with both favoring the 95.0% probability threshold. However, many more cases must be tested to confirm these tentative and rather subtle differences.
Reliability diagrams (e.g., Fig. 6; Bröcker and Smith 2007) indicate whether a predictive model tends to over- or underforecast an event over a range of probabilities. All points of a perfectly reliable model would fall along a 1:1 diagonal line on the diagram, with points above the diagonal indicating that the model is under forecasting the probability of cessation (forecasting cessation too late), while points below the diagonal represent the model overforecasting the probability of cessation (forecasting cessation too early). The results in Fig. 6 indicate that both cessation models over- or underpredict cessation by only small amounts at the various probabilities. However, the bootstrapped version is somewhat more reliable than its independent counterpart, probably due to the former’s method of training.
b. Evaluation of individual storms
The second method of evaluation treats each storm as an event for which the models are trying to successfully forecast lightning cessation. The evaluation of a storm begins with its first observed lightning flash and is continued at 1-min intervals until the GLM-derived probability of cessation exceeds the prescribed threshold (i.e., 95.0%, etc.). This approach mimics how a model might be used operationally. Figure 7 is a conceptual timeline of a hypothetical isolated thunderstorm of 60-min duration that shows how the evaluation is performed and how the predictive model might be utilized operationally. Certainly the radar characteristics of a storm at t + 1 min will be similar to those at t = 0 min and other nearby times, but that serial correlation was not a factor in the model development described earlier. The evaluation merely monitors a storm at 1-min intervals. As a baseline for comparison, the 45WS generally waits 15 min after the last flash before ending an advisory (Roeder et al. 2017).
Figure 7 contains a vertical red line at observed lightning cessation (t = 0 min) and a dashed black line at 15 min after cessation. Locations to the right of the red line correspond to storm observations after cessation. The black, red, and blue arrows represent the three probability thresholds that will be examined below.
Figures 8–10 display results when applying the concepts of Fig. 7 to the individual storms. The figures mimic the results of a forecaster who was using one of the GLM models as their only guidance for cessation (i.e., not relying on personal experience or any other information). In each figure “minutes relative to cessation” increases along the y axis, with positive values indicating that cessation was forecast after it actually occurred. The x axis represents an arbitrary identification number assigned to each storm. Each symbol represents the predicted cessation time relative to the observed cessation time (solid red line in Fig. 7). Based on the present 15-min wait time used by the 45WS, the goal of the cessation model is to predict cessation between the times of the solid red and dashed black lines. This indicates that a lightning advisory is safely ended after the final flash but before the presently used wait time, thereby providing a time savings.
Figure 8 shows results for the independent version of the GLM when applied to the 35% of storms not used in model development/training. No storm at any of the three probability thresholds is forecast to experience cessation before it actually occurs (i.e., no symbol plotted below the red line). However, several storm symbols appear only slightly above the red line at 95% probability, indicating that the model waited only 1 or 2 min after the last observed lightning flash to indicate an end to the lightning advisory. In regard to safety, these could be considered “close calls.” Most, but not all storms are correctly forecast to experience cessation before the presently used 15-min wait time by the 45WS. Median wait times range from 7.5 min at the 95.0% threshold to 10.0 min at the 99.0% threshold.
Although somewhat incestuous, it is instructive to consider how the independent GLM performs on the entire dataset of 184 storms (Fig. 9), not just the test storms in Fig. 8. The median wait times increase slightly, now ranging from 8.0 min at the 95.0% threshold to 12.0 min at 99.0%. Although each of these medians still is an improvement on the 15-min guidance, the cessation of numerous individual storms is forecast for longer than 15 min. Even worse, when applied to this larger dataset, there are three storms for which the model would have forecast cessation too soon (points below the red line in Fig. 9) at the 95.0% probability threshold. Lightning cessation is still forecast too early for two of the three storms even at the 99.0% threshold. Thus, it is clear that the model is imperfect and that relying solely on the GLM for guidance might end a lightning advisory too early. We strongly discourage this exclusive use of the model. The outlier storm when cessation was forecast ~20 min too soon is attributable to the storm developing very rapidly and not being sampled by a complete 4–5-min volume scan before its first flash occurred. Storms that develop a sufficiently strong charge separation to induce lightning before they are adequately scanned by the radar (at least one complete volume scan) appear to be a problem when applying these GLMs in areas of strong low-level sea-breeze-induced convergence that often occurs over the Florida peninsula during the warm season.
The bootstrap method of model development employed 1000 random selections of the storms, each containing 40% of the total, with no independent data withheld on which to test the derived GLM. Therefore, Fig. 10 was created using all 184 cases and employing the parameters and coefficients for the bootstrapped GLM (Table 4). Median wait times range from 9.0 to 14.0 min, slightly longer than those from the independent model (Fig. 9), but still sooner than 15 min. There are two storms for which cessation was forecast too soon (symbols below the red line). A careful examination of Fig. 10 shows that increasing the confidence threshold from 95% to 99% has little effect on reducing the number of unsafe outcomes. The number changes from two storms at 95% and 97.5% to one storm at 99%. However, there is a greater difference in the time savings; the bootstrap method at 99% confidence produces more storm forecasts (82) that do not provide time savings than the same procedure at 95.0% (35 storms with no time savings). Future research will be required to determine whether both safety and improved time savings can be achieved simultaneously.
It is important to note that even the 99.0% probability threshold of the independent and bootstrap models yields one dangerous result (blue plus sign symbol below the red line, Figs. 9–10). Thus, even requiring a very high probability threshold does not rule out every dangerous case. Increasing the threshold to >99.0% would produce a longer wait time in an attempt to provide unattainable absolute safety. This result is in the spirit of Roeder and Glover (2005), ST10, and PF15 who all determined that the probability of another lightning flash will never truly drop to zero due to the complexity of the charging mechanisms for lightning. Therefore, a useful lightning cessation tool is one that maximizes safety while making the best attempt possible at reducing wait times.
Drawing on the abovementioned information about the distribution of wait times, we calculate a final evaluation statistic that utilizes a different approach. It still counts any storm indicated by the model to be electrically inactive before cessation actually occurs as a false alarm (i.e., cases below the red line in Figs. 8–10). Any storm with a wait time between 1 and 15 min after cessation (between the red solid and dashed black lines) is designated a hit as in Table 5. Finally, any storm whose symbol is above the dashed black line is considered a miss. This approach still considers false alarms to be the worst desirable result; misses remain undesirable but do not endanger safety. However, this new framework does not contain “correct nulls” (D events, Table 5) since all storms eventually undergo lightning cessation.
The bias metric in (8) does not require information about null events. It assesses
that is, whether a forecast tends to overpredict (and produce excessive false alarms) or underpredict (and produce excessive misses). A “perfect” bias score is “one,” corresponding to all of the symbols in Figs. 8–10 falling between the black dashed line and the red line. Table 7 shows that both models at all three thresholds exhibit bias values < 1.0 (underprediction of cessation), with the bias of the independent model being slightly greater (better) than its bootstrapped counterpart. The underprediction is a favorable outcome. That is, both models are biased toward safety—they do not put life and property at risk by being overzealous. Finally, compared to the original methodology (Table 6), the present approach (Table 7) produces greater (better) values of POD and CSI, but slightly greater (worse) values of FAR.
c. Example case
Either probability equation can be applied in real time during any part of the life cycle of an isolated storm. We consider the example storm in Fig. 3 whose characteristics at 0100 UTC 7 August 2013 are given in Table 8. This time corresponds to the storm’s final flash (i.e., the time of cessation). Values of the two reflectivities indicate that the storm still is rather “potent.” Graupel is present (value = 1) at the two warmer isotherm levels but not at the two colder isothermal levels (value = 0). Using the final bootstrapped version of the GLM outlined in Table 4 yields a calculated probability of cessation of 13.3%. This small probability is appropriate since the final flash actually occurred at this time. However, as the storm continues to decay, the reflectivities in Table 8 will decrease and the graupel will dissipate, causing the probability of cessation to increase, thereby providing increasing confidence that no additional flashes will occur.
4. Summary and conclusions
When lightning is occurring, it is desirable to know whether a specific flash is the storm’s last flash (i.e., is it safe to resume activities that were postponed?). This research has developed a high-skill probabilistic method that can be used in high pressure real-world operations to terminate lightning warnings more quickly while maintaining safety. The research developed and tested two statistical guidance models for calculating the probability of lightning cessation in warm season isolated thunderstorms near Kennedy Space Center. Dual-polarized radar data at 1-min intervals from 184 isolated thunderstorms both before and after lightning cessation were used to train the models to assess the probability that cessation had already occurred. Statistical significance testing showed that maximum reflectivity at the 0°C level, composite reflectivity, as well as graupel presence at the −5°, −10°, −15°, and −20°C levels were most useful in predicting cessation (Table 3).
Testing demonstrated that the models predicted that ~99% of the storms had gone through cessation after the last observed lightning occurred (i.e., safe outcomes). The models usually, but not always, safely shortened the 15-min wait time currently employed by the 45WS. Median wait times ranged from 7.5 min at the 95.0% threshold to 10.0 min at the 99.0% threshold.
Although present results are encouraging, we emphasize that they are very preliminary and must be tested on a much larger dataset. Future research in central Florida should employ a larger warm season dataset and also consider cool season storms. Also, the guidance technique developed here may not be effective in other geographic regions with different synoptic and mesoscale environments (e.g., non-sea-breeze environments). Most important is that the great majority of thunderstorms in Florida and elsewhere are not isolated but consist of multiple cells or lines that interact in various complex ways. We suggest that these more commonly occurring storms be a major focus of future research. DF17 developed a statistical cessation tool that utilized radar data for one specific type of nonisolated storm; however, results were mixed at best. We recommend a renewed effort toward forecasting cessation in nonisolated storms using a probabilistic approach.
The statistical model described here was evaluated against the 15-min wait time presently used by the 45WS. One wonders whether an operational forecaster subjectively using fewer dual polarized products would achieve cessation results that are better or worse than the regression-derived approach used here. Stated differently, is model use worth the trouble? That should be a topic for future research.
Numerous issues complicate the prediction of lightning cessation in central Florida and elsewhere. For example, remnant charge in orphan anvils or other storm debris can influence the propensity for future lightning (e.g., Lund et al. 2009), something not considered with our predictive model. In addition, lightning strokes can propagate horizontally away from the visible cloud edge before reaching the surface, known as a “bolt from the blue” (Dowdy and Mills 2012; Fig. 2 of Fuelberg et al. 2014) or emanate from storms without precipitation reaching the surface (“dry lightning,” Rorig et al. 2007).
The exact location and timing of any future lightning flash is still impossible to determine. Thus, absolute safety is not possible with the available data and our present understanding of electrical charging and lightning. Working toward this goal will require much additional research.
The research was sponsored by NASA Contract NNX13AB95G through the Kennedy Space Center. We appreciate the expertise provided by William Roeder and the other scientists at the 45th Weather Squadron and at NASA KSC. The support personnel for the WDSS-II software answered numerous questions about the use of this valuable software. The NLDN lightning data were provided by Ron Holle of Vaisala, Inc. The LDAR-II data were obtained from the 45th Weather Squadron (http://kscwxarchive.ksc.nasa.gov/Reports/Lightning). Level II WSR-88D radar data were available on the website of the National Centers for Environmental Information (https://www.ncdc.noaa.gov/data-access/radar-data). The RAP reanalyses were obtained from the Atmospheric Radiation Measurement (ARM) research archive at http://www.archive.arm.gov/discovery/. We appreciate the valuable comments of the editor and reviewers, which greatly improved the manuscript.
Current affiliation: NOAA/National Weather Service Forecast Office, Goodland, Kansas.