Total lightning probability forecasts for 26 mostly springtime days and 27 summertime days were analyzed for their usefulness and economic value. The mostly springtime forecast days had a relatively high number of severe weather reports compared with the summertime forecast days. The lightning forecasts were made with a dynamic lightning forecast scheme (DLS), and each forecast dataset used lightning assimilation to hasten convective initiation and, in most cases, to improve short-term forecasts. A spatial smoothing parameter σ of 48 km yielded more skillful, reliable, and economically valuable hourly forecasts than other values of σ. Mostly springtime forecasts were more skillful and had more hours of useful skill than summertime forecasts, but the latter still demonstrated useful skill during the first two forecast hours. The DLS forecasts were compared to those obtained with the “McCaul” diagnostic scheme, which diagnoses lightning flash data. The DLS had significantly higher fractions skill scores than the McCaul scheme for or at least one event/flash (10 min)−1. Bias values of the forecast lightning fields with both schemes were overall small. Yet, DLS forecasts started in the early summer evening with RAP data did have positive bias, which was attributed to initial conditions within the RAP. Correlating fractions skill scores for lightning and precipitation indicated that more accurate forecasts of lightning were associated with more accurate precipitation forecasts for convection with a high, but not lower, number of severe weather reports.
Lightning is a hazard to life and property (American Meteorological Society 1924a,b; López and Holle 1996; López et al. 1995; Rorig and Ferguson 2002; Holle et al. 1996; Hodanish et al. 2004; Holle et al. 2005; Ashley and Gilson 2009; Wallmann et al. 2010; Schultz et al. 2009; Koshak et al. 2015), as deadly as tornadoes and hurricanes (in some years), but less so than flash floods (Curran et al. 2000; Holle et al. 2005; Ashley and Gilson 2009). Lightning can strike airplanes flying into areas of convective activity (Mazur et al. 1986; Mäkelä et al. 2013), and cause pilots to temporarily lose sight and/or hearing. While airplanes are built to withstand lightning strikes (Sweers et al. 2012), they can suffer structural damage from them (e.g., Rash 2010).
Lightning discharges are generally referred to as either cloud to ground (CG, with positive or negative polarity) or intracloud (IC) (MacGorman and Nielsen 1991; Branick and Doswell 1992; Shafer et al. 2000). High total lightning (CG + IC) rates are associated with severe storms (e.g., Goodman and MacGorman 1986; Reap and MacGorman 1989; Kane 1991; Perez et al. 1997; Steiger et al. 2007; Lynn et al. 2012; Rudlosky and Fuelberg 2013). Total lightning has also been used in lightning jump algorithms to monitor or predict the onset of severe storm severity (Schultz et al. 2009; Gatlin and Goodman 2010; Chronis et al. 2015) because storms with lightning jumps are more likely to produce longer-lasting storms (Chronis et al. 2015) with severe characteristics (Schultz et al. 2009).
Data from current lightning observation networks are technically advanced and are used to make nowcasts of lightning strikes (Stern et al. 1994; MacGorman and Rust 1998; Short et al. 2004; Price 2008; Saxen et al. 2008; Kohn et al. 2011; Sieglaff et al. 2011). Radar reflectivity has also been used as a predictor variable for short-term changes in lightning magnitude (see also Hondl and Eilts 1994; Gremillion and Orville 1999; Yang and King 2010; Mosier et al. 2011; Seroka et al. 2012). While typical nowcasts are between ½ and 1 h in length (Roberts and Rutledge 2003; Wilson et al. 2004; Sieglaff et al. 2011; Mecikalski et al. 2015; Rossi et al. 2015), average useful nowcast lead times are 24–33 min (Walker et al. 2012).
By assimilating lightning into weather prediction models, nowcasts can be extended to forecasts (>~30 min). Fierro et al. (2012), for instance, suggested one very useful approach, and Fierro et al. (2015) describe its extended and successful use in predicting precipitation amounts during warm-season forecast experiments in the United States. Lynn et al. (2015) used this approach to show that lightning assimilation leads to improved forecasts of CG.
While other lightning forecast approaches exist (e.g., Dahl et al. 2011; Yair et al. 2010; Fierro et al. 2013), lightning forecasts made using the Lynn et al. (2012) dynamic lightning scheme (DLS) will be compared with the McCaul et al. (2009) (statistical/diagnostic) scheme, since the latter is in operational use in the HRRR (Benjamin et al. 2016). Both of these schemes are designed to provide forecasts of lightning within convection-allowing models (CAMs; e.g., with grid spacing of 4 km, as used here).
The accuracy of forecasts of convective activity and its associated lightning threat or forecast occurrence in CAMs is limited by error growth on the small scales (e.g., Roberts and Lean 2008). For this reason, neighborhood approaches (e.g., Clark et al. 2010) are used to calculate statistical scores. Further, recognizing the limitation in raw or event-smoothed forecast data for forecasting purposes, others have detailed an approach to predict the probability of the occurrence of severe weather reports in CAMs (Ebert 2009; Brooks et al. 1998; Sobash et al. 2011). Sobash et al. (2011) wrote that “while CAMs lack the required resolution to simulate many severe phenomena associated with convection (e.g., large hail, downburst winds, and tornadoes), they can still provide unique guidance for the occurrence of these phenomena if ‘extreme’ patterns of behavior in simulated storms are strongly correlated with observed severe phenomena.” The authors created surrogate severe storm reports using a threshold approach: when updraft helicity exceeded certain values, they assumed that this would represent a severe weather report. Using a Gaussian smoother, surrogate severe probabilistic forecasts (SSPFs) were made from each field’s simulated surrogate severe reports. Relative operating characteristic curves (ROC) and reliability diagrams were used to show that their probabilistic forecasts of severe weather could be useful to forecasters. This approach can be adapted to evaluating total lightning forecasts as well.
Lazo et al. (2011) have described the U.S. economic sensitivity to weather, part of which is a weather sensitivity to lightning. Industries such as munitions, aeronautics, wind-turbine power production (Kithil 2011), and recreation, as well as tall towers (Warner et al. 2013), must be protected from lightning strikes. As such, there is a particular interest in being able to predict a binary field of lightning (i.e., yes/no) (C. Sloop 2015, personal communication; Ashley and Gilson 2009; Mäkelä et al. 2013).
Hence, this work aims to assess how well forecasts are able to predict the probability of at least one lightning event (either cloud to ground or IC) during mostly springtime (with many severe storm reports) and summertime (with fewer reports) convective storms over a region of the United States characterized by good lightning data coverage. This paper also calculates the savings over climatology (SOC) from forecasts versus a range of cost/loss ratios (Thompson and Brier 1955). Additionally, it examines the correlation between the skill of lightning and precipitation forecasts. As high probabilities of at least one lightning event are often associated with higher threshold values of total lightning, some attention will be paid to these matters as well. Section 2 presents the methods, and section 3 presents the results. The results section also addresses the question of whether accurate lightning forecasts imply accurate precipitation forecasts. The conclusions are provided in section 4. The appendix provides additional support in favor of extending the assimilation of lightning to just prior to the forecast. It also examines the sensitivity of precipitation bias to the choice of microphysical scheme.
a. Brief description of the WRF Model
A set of forecasts was made using the Weather Research and Forecasting Model (Skamarock et al. 2008). Version 3.4 was used for the set of forecasts with many severe storm reports. The forecasts made during the summer of 2015, with fewer reports (see below), were done later with version 3.6. The setup was similar to that used in the NCEP WRF-ARW (4 km), and the NCEP HRRR (with 3-km grid spacing). The WRF single-moment 6-class scheme (referred to as WSM6; Hong and Lim 2006) was used to simulate cloud condensation processes on a 4-km grid. The time step used in all simulations was 20 s. The WSM6 scheme predicts the mass mixing ratio of water vapor, cloud water and rainwater, ice, snow, and graupel. The shortwave radiation physics used the scheme of Dudhia (1989), while the longwave radiation used the parameterization of Mlawer et al. (1997). The Noah land surface model (LSM; Chen and Dudhia 2001a,b) simulated surface fluxes (see also Smirnova et al. 1997, 2000), and the boundary layer was parameterized using the scheme of Mellor–Yamada–Janjić (MYJ; Mellor and Yamada 1982; Janjić 2002).
b. Lightning data and lightning forecasts
As described in Lynn et al. (2012), Earth Networks provided the lightning pulse data, a comprehensive dataset for CG and IC lightning. The Earth Networks Total Lightning Network (ENTLN) uses adaptive digital filtering in an effort to reduce local noise, allowing the individual sensors to adapt to their local environment. The ENTLN sensors detect radio emissions (sferics) from lightning over a wide frequency range (1–12 MHz), which employs two analog-to-digital converters. The lowest frequencies are utilized for long-range detection, the middle frequencies are used for locating CG strokes, while the highest frequencies are useful in locating fast cloud pulses (such as IC flashes). Earth Networks claims that IC detection is greater than 60% over much of the eastern United States and as much as 80% in some areas of the country. The CG detection rate is claimed to be >95% in the denser network over the entire eastern United States, but less elsewhere in the country.
The pulse lightning data from the ENTLN were mapped to the 4-km WRF native grid in 10-min increments for assimilation purposes (following Fierro et al. 2012). The assimilation scheme of Fierro et al. (2012) uses observed, gridded total lightning to determine how much water vapor is to be added at constant temperature in a confined layer within the atmospheric column. The incremental increase in water vapor locally increases the inertial moist static instability, which then leads to a convective response that has been shown to improve convective forecasts of severe storms, precipitation (Fierro et al. 2012, 2014, 2015), and lightning (Lynn et al. 2015).
The DLS of Lynn et al. (2012) was used to forecast total lightning. The forecast lightning results from microphysical and dynamical processes occurring on the native forecast grid. As described in detail in Lynn et al. (2012), the development of the lightning parameterization scheme began with the utilization of the lightning potential index (LPI; Yair et al. 2010). The LPI is a measure of the ability of the atmosphere to charge within clouds when strong up-/downdrafts occur in the presence of ice and liquid water. In Yair et al. (2010), the LPI is calculated within the charge separation region of convective clouds within the atmospheric layer, defined by the 0° and −20°C isotherms, where the noninductive mechanism involving collisions of ice and graupel particles in the presence of supercooled water is most effective (Takahashi 1978; Saunders 2008; Mansell et al. 2010). The LPI has its largest values in the presence of strong vertical velocities (see Van den Broeke et al. 2005), when graupel exists in equal ratios relative to snow, ice, and water. An LPI equal to one indicates that the relative ratios of water, ice, snow, and graupel are such that the potential for charging is maximized in strong updrafts [in accordance with the description of Saunders (2008)].
Lynn et al. (2012) proposed that the LPI multiplied by the mass of ice and then divided by a unit coulomb of charge C represents the electrical potential of the cloud fields. The units of this electrical potential are volts. Lynn et al. (2012) calculated the LPI over different layers for negative cloud-to-ground, positive cloud-to-ground, and intracloud lightning. Negative cloud-to-ground lightning is assumed to originate from the lower portion of the cloud, while positive cloud-to-ground lightning is assumed to originate from the upper portion of the cloud, although more complex conceptual models of the charge structure within storms have been proposed (e.g., Stolzenburg et al. 1994). Intracloud lightning is assumed to initiate anywhere within the cloud volume. This assumption is based on the classical tripolar thunderstorm charge model of Williams (1989).
As the charge density builds within the clouds, so does the potential, whose source term was referred to in Lynn et al. (2012) as the power index. The equation for the power index includes a constant whose units are coulombs, such that the appropriate amount of energy builds up over several model time steps and then is discharged during the time interval t of a lightning event propagating from the height of discharge to the ground as a vertical leader. The time scale over which this occurs depends on the layer from which the energy begins to be discharged: it is largest for +CG and smallest for IC. A forecast lightning event can be, as noted above (and consistent with observed lightning), either a positive or negative CG stroke or an IC lightning discharge that occurs in milliseconds. The forecast total lightning is the sum of all CGs and ICs.
Moreover, through the formulation of a four-dimensional (x, y, z, t) derivative equation, the scheme implicitly accounts for energy buildup in convective and stratiform clouds as well as its possible advection from convective to stratiform clouds. For this reason, the scheme was referred to as dynamic.
The dynamic approach differs from diagnostic approaches in which the lightning that is forecast at one time step does not depend on the buildup of energy and its possible dissipation by cloud discharge processes at previous time steps. For instance, the McCaul et al. (2011) scheme is a statistical approach. It is very similar to the forecast lightning scheme in McCaul et al. (2009), but with some modifications to reduce forecast bias and to produce flash density rate output (in km−2). It does not, as implied above, require the inclusion of additional state variables. Hence, while it uses forecast variables to predict lightning, it is really a diagnostic output.
The McCaul total lightning scheme is a blend of two diagnostic variables: the first diagnostic variable is calculated from the forecast model’s graupel flux at −15°C. The second is calculated from the column-integrated ice-phase mass. McCaul et al. (2009) note that there is established observational evidence linking aspects of the precipitating ice hydrometeor fields to total flash rates. Each diagnostic output was calibrated in McCaul et al. (2009) by comparing domain-wide statistics of the peak values of simulated flash-rate [2 × 2 km−2 (5 min)−1] proxy fields against domain-wide peak total lightning flash-rate density data from observations. Tests show that the first variable (i.e., that uses the graupel flux) was able to capture most of the temporal variability of the lightning threat, whereas the second does a better job of depicting the areal coverage of the threat. As noted, they blend the diagnosed flash rates (in a proportion of 0.95 to 0.05, respectively).
McCaul et al. (2009) calculated flash origin density (5 min−1) on a 2-km WRF grid and compared these forecast densities to observed densities on a grid of the same size. Since our grid uses 4-km grid spacing, the calculated values were multiplied by a factor of 4. Minimum values of graupel flux and column-integrated ice were set in order reduce the positive bias (in predicting small values of lightning) previously obtained in operational runs using the McCaul scheme (McCaul et al. 2011; E. W. McCaul 2016, personal communication). When these variables were less than the specified values, the lightning threat forecasts were zero.
c. Domain setup and case study days
The size of the simulation domain for the mostly springtime forecasts was 700 × 600 grid points in the east–west and north–south directions, respectively (Fig. 1). The domain in grid points was centered at 34.50°N and 86.50°W. For the summertime forecasts, the domain was centered at the same position, but the north–south extent of the domain was reduced from 600 to 575 grid points. The slightly smaller domain was used for the summertime forecasts because computational resources were somewhat limited, as described below.
Two sets of data were analyzed: one with a relatively high number of severe weather reports and one with a relatively low number. The former were mostly in springtime, while the latter were in summertime. The summertime forecasts were added because summertime convection is typically a more difficult forecast challenge than mostly springtime forecasts (especially with organized convective systems with a high number of severe weather reports). The former will be referred to as “high,” while the latter as “low.”
There were 26 convective cases in the high dataset (Table 1). The Storm Prediction Center’s mapped records of severe storm reports were examined and days were chosen that showed a relatively high number of reports. The second set (low) consisted of hourly forecasts produced over 27 consecutive days during the summer of 2015 (19 July–14 August). The 27 low event days were selected by finding consecutive days during the midsummer where most days had relatively few severe weather reports. During this 27-day period, the greatest number of severe weather reports (of any type) in a 24-h period was 224. By comparison, the 26 high events all had at least 194 events in a 24-h period, with a maximum of 935. Moreover, Table 2 shows that there were in total almost 4 times as many severe weather reports in the high than the low dataset. In addition, there were almost 4½ times as many hail reports, 3¼ times as many wind reports, and more than 25 times as many tornadoes.
In contrast, Fig. 2 shows the number of grid elements (averaged) versus the number of observed strokes [4 × 4 km2 (10 min)−1; hereafter only (10 min)−1 for brevity]. Both datasets (high and low) have very similar distributions of lightning, indicating that lightning intensity should not be used as a possible descriptor of the case study days.
North American Mesoscale Forecast System (NAM) initial and lateral boundary conditions were used to initialize the set of high forecast experiments. To provide at least 3 h of model spinup and additional latency (model run) time before the start of the forecast evaluations, simulations were started at 1200 UTC. Total lightning was used to initiate convection in the appropriate locations and was assimilated from 1200 through 1700 UTC. At least 24 of the high case study days had at least one 10 × 10 km2 convective cloud with at least one lightning event per 10 min in each of the five assimilation hours. Total lightning amounts per hour were about 40% of the maximum amount of lightning per hour (which occurred at 2300 UTC).
Since the position of smaller (lower lightning intensity root) clouds is very difficult to simulate accurately, a possible advantage of continuing data assimilation until 1700 UTC was that the model maintains these small-scale convective clouds until just prior to the development of more severe afternoon convection (generally after 1800 UTC, when lightning amounts during the mostly springtime storms increased by about 25% compared with 1700 UTC). The 5-h period was used consistently throughout each forecast, and the forecast lightning in these simulations was later used in a comparison against a similar set of forecasts, but without lightning assimilation.
Forecasts with lightning assimilation will be referred to as LDA, while those without will be referred to as CNTL. Forecasts were evaluated from 1800 UTC onward, or 1 h after the end of lightning assimilation. The nomenclature “F1” refers to simulation hours 5–6, or 0–1 h after lightning assimilation ended (1700–1800 UTC). The terms F2, F3, …, F12 refer to the forecast hour 2 (1800–1900 UTC), forecast hour 3 (1900–2000 UTC), etc., up to 12 h after lightning assimilation ended (0500–0600 UTC the next day).
For hourly forecasts during the summer of 2015, hourly RAP1 analyses and forecasts were used for initial and lateral boundary conditions. (Note that NAM data are not available hourly and could not be used for this set of forecasts. Moreover, historical RAP data do not have soil moisture, so it could not be used retrospectively for the forecasts with a high number of severe weather reports.) Lightning data were assimilated 3.5 h prior to the start of each 3-h forecast.2 The forecast hour “A4/F0” is the simulation hour 3–4, and includes 30 min of lightning assimilation. Each of these forecasts ran for 7 h, and the domain setup was such that the forecasts ran in real time on a high-powered computing platform. Hence F1, F2, and F3 refer to simulation hours 4–5, 5–6, and 6–7, respectively. Note that model statistics were calculated from forecast aggregates from hourly forecasts that began at 0000 UTC and ended at 2300 UTC.
Forecasts of lightning were output every 10 min for the DLS, but every 5 min for the McCaul scheme, and then integrated over the same 10-min intervals. As suggested by E. W. McCaul (2016, personal communication) this was done to more comprehensively represent shorter time-scale fluctuations on the hourly values of forecast lightning.
All forecasts with strong convection were repeated but without lightning assimilation, while only a subset of the summertime forecasts at 1700 UTC (beginning on 1 and ending 14 August 2015) were repeated, owing to limited available computer resources. Nevertheless, it should be noted that summertime forecasts beginning (i.e., the time of A4/F0) at 1700 UTC produced the largest values of lightning compared with other hours of the day.
d. Forecast evaluation
Forecast probabilities of hourly lightning were evaluated using the methods described in Sobash et al. (2011), where ROC curves, reliability diagrams (RDs), histograms of forecast numbers of points versus probability, and fractions skill scores (FSSs) were calculated for different values of the smoothing parameter σ [the distance from the grid point to the point marking the observation dn should be squared in their Eq. (2)].3 Note that, with the exception of comparing individual case study FSS values versus those for precipitation forecasts, FSS was computed from the aggregate of all forecast data.
Forecast lightning strokes were based on 10-min model output, as this is a reasonable interval to resolve the movement of individual convective cells (Fierro et al. 2012). This means that forecast hits at model grid points were calculated—like the observations—from lightning data having units of per 10 min [(10 min)−1], with thresholds of 1, 10, 25, and 50.
Reliability diagrams show the probability of observed lightning versus the forecast probability of lightning. They demonstrate how predicted probabilities correspond to their observed frequencies and, hence, are a measure of reliability. A 1:1 matching represents perfect reliability. These diagrams also show a horizontal line depicting climatology and a no-skill line that is halfway between perfect reliability and climatology (no resolution). These lines are intended to aid in the visual evaluation of forecast accuracy or reliability. Reliability diagram curves that fall below the no-skill/climatology line indicate that the forecasts are not useful.
The ROC curves assess the ability of the forecasts to discriminate between events and nonevents (here, between lightning or no lightning at a certain threshold). This is, hence, a matter of forecast resolution. The parameter of relevance is the area under the ROC curve (referred to as AUC), which was calculated with the trapezoidal approximation. AUC values range between 0 and 1, with 1 representing a perfect score. Sobash et al. (2011) mention that values of about 0.7 represent the lower limit of useful skill. Forecasts with AUC values of less than 0.5 were referred to as no skill, while forecasts with AUC values above 0.8 were referred to as high skill. Forecasts with AUC values between 0.7 and 0.8 are referred to as skillful, while those above 0.5 and less than 0.7 are considered to be low skill. Histograms of the log of the number of grid elements versus model forecast probability were used to further compare the effect of smoothing on forecast probabilities. Ideally, forecasts would provide a wide range of probabilities, that is, between 0% and 100%. If the forecasts do not produce high probabilities over a number of case study days, then the smoothing parameter is most likely too large.
The FSS, like the AUC, allows for a direct comparison between the forecast and observed probabilities in a single metric.4 Yet, it accounts for missed forecasts as well as false alarms. This metric is a skill score based on the mean squared error of the forecasts, relative to uniform skill (Roberts and Lean 2008). Useful forecasts have an FSS greater than uniform skill, where uniform skill is halfway between random skill and perfect skill (i.e., half of climatological skill + perfect reliability).
Aggregate bias, probability of detection (POD), false alarm rate (FAR), and equitable threat scores (ETSs) were calculated following the neighborhood approach described in Clark et al. (2010). The forecasts were not corrected for bias because the biases (as shown below) were very small. These calculations were done to provide additional context to the analysis. The neighborhood calculations were made for about 3000 locations of interest (airports) within the simulation domains. Note that the skill scores are calculated from aggregate data. The aggregate data are generated by first computing the standard contingency tables for each forecast day individually and, then, by summing all these tables (matrices) over all forecast days, as in Clark et al. (2010) and Fierro et al. (2015).
LDA forecasts of total lightning were also compared with CNTL forecasts. These were examined for significant differences using the resampling technique described in Clark et al. (2010) to determine whether differences in the ETS were statistically significant (for α = 0.05; bootstrapped resampling repeated 10 000 times).
Forecasts of lightning strokes and flashes produce different distributions as a function of the lightning threshold [(10 min)−1], which can be seen in Fig. 3. To compare the forecasts from the DLS model to the McCaul scheme, the same points in the distribution of McCaul were used. Besides comparing forecasts of at least one stroke or flash [(10 min)−1], it is important to compare forecasts at higher threshold values. For instance, it was inferred from tabular distribution of flashes (not shown) that approximately 20% of the grid elements had at least 6 flashes (10 min)−1. In contrast, at least 20% of the grid elements had 10 strokes (10 min)−1. Similarly, 10% of the grid elements in each distribution had at least 11 flashes or 25 strokes (10 min)−1, respectively. Hence, the analysis below comparing these different lightning forecast schemes will compare forecasts of 6 flashes versus 10 strokes, and 11 flashes versus 25 strokes (10 min)−1. As the data were binned as integer values, we chose the integer value closest to the percentages mentioned. (The results of the analysis, though, were relatively insensitive to the choice of integer value within ±1 of the value chosen.)
As noted, the distributions of the forecasts with the DLS and McCaul schemes are different. Hence, the bootstrap method (Efron and Tibshirani 1993; Wilks 2006) was used to determine the significance of the differences in forecast ETS obtained with the DLS and diagnostic schemes. A random-number generator was used to create 1000 sets of paired forecasts, each with 26 case study days, sampled at random, with replacement from the actual study days. A histogram plot of the number of forecasts with forecast differences within 0.01 intervals produced simulated differences centered on the difference from the actual observed versus forecast data. A 95% confidence interval was computed for the true differences by trimming off the 25 most extreme values at both ends of the histogram (or, in total, 5% of the distribution). If the difference histogram interval did not include 0, it was concluded that the forecast methods were different at the 5% variance level (corresponding to the 95% level for the confidence interval). Moreover, narrower histograms have less uncertainty than wider ones. If the histogram includes 0, then the forecasts were not statistically significant at the 5% variance level.5
An economic analysis is based on the method presented in Thompson and Brier (1955). While other papers have presented more complex forms of analysis or calculation of the optimal cost/loss ratio C/L (Thompson and Zucchini 1990), this paper follows the method presented in the original paper of Thompson and Brier (1955) because of its simple approach to the problem of calculating economic value. Moreover, the lightning dataset is large enough to avoid the small-sampling problem described in Thompson and Zucchini (1990). Here, the cost refers to the cost to the user of forecasts for taking preventive action against damage, and the loss when the user does not and the damaging event occurs. The C/L ratio can be a unique ratio for each forecast user. The cost of producing the forecasts is not relevant to the user’s C/L ratio.
In brief, the economic savings attributable to probability forecasts6 P, referred to as savings over climatology, over forecasts based on climatology per unit forecast per unit loss L are
Here, Ef is the cost to protect, while Ep and Ep′ are the costs due to adverse weather events over N forecast days when the probabilities of such an event are >C/L or <C/L, respectively.
Economic analysis is most useful for a wide range of C/L, and this can be accomplished by combining the forecast and observed weather frequencies into a single table.
To do so, it is necessary to calculate Ng, which is the number of grid elements within ±0.01 forecast probabilities of the value of C/L. Within each interval there are NObs. Then, Ng and NObs were summed from the smallest intervals of C/L to the largest (0.015–0.995). The results are cumulative fractional probabilities FN and FNO, respectively, for Ng and NObs.
After manipulation, the following equations were then obtained as in Thompson and Brier (1955). When the probability is >C/L,
and when the probability is <C/L,
where Pc = the climatological probability of lightning,
Table 3 shows an example of these calculations, where the Pc is approximately 0.12. The cumulative probability of the hits is 0.071, and the economic saving is 0.029. Also shown in various figures is the economic savings over climatology for perfect forecasts, where perfect is found by substituting observations for forecasts. Keeping in mind that the completeness and accuracy of observed lightning is not 100%, comparing the economic value of perfect to actual forecasts gives a measure of context, or an indication of the room for improvement.
a. Mostly springtime convection
1) Choosing the value of the smoothing parameter
Figure 4 shows hourly observed and forecast probabilities of total (IC + CG) lightning for a forecast starting at 1200 UTC 21 May 2013, for the forecast from 1800 to 1900 UTC. The smoothing radius σ was 48 km. This is the second forecast hour after the end of lightning assimilation, or the first complete useful hour (F2). The model produced high probabilities (>80%) of total lightning in Oklahoma and Texas, as well as probabilities > 30% (40%) in Arkansas (Tennessee), within 10% of the observed lightning probabilities, but missed the >40% probability of lightning in eastern Arkansas. Broad areas of 10%–20% lightning probabilities were forecasted correctly in South Carolina, Florida, and southeastern New York, although there were some discrepancies in the location of small maxima and total areal coverage.
In contrast, using a smaller smoothing parameter (8 km) produced a map with a high level of correct detail, but also obvious areas in which forecast probabilities were high and observed probabilities were low or vice versa (not shown). When using a larger smoothing parameter (96 km), forecasts produced probability maps with no obvious errors, but the maximum probability of lightning was only 30%–40% (not shown). Moreover, many areas had a low probability of lightning, relatively far from where lightning actually occurred. Frequency histograms for F3 and σ = 32, 48, 64, and 128 km are shown in Fig. 5. Only forecasts that are smoothed with σ = 32 and 48 km have forecast points (grid-element locations) where forecast probability values reach at least 80%. In contrast, with a smoothing parameter of σ = 64 km, the forecast probabilities were as high as 60%–70%, while σ = 128 produced maximum probabilities of only 30%–40%. Hence, smoothing parameter values similar to these larger values are not, at first glance, appropriate for forecast probabilities of lightning.
Figure 6 shows ROC curves for forecast hour 3, which was typical of other forecast hours between F1 and F5. ROC curves are shown for σ ≥ 8 km, and each had AUC values indicative of an acceptable level of skillfulness. However, only forecasts with σ ≥ 16 km had useful skill, while the maximum value of AUC was obtained with forecasts with σ = 48 km and 64 km. Hence, lightning probabilities made with σ = 48 and 64 km have better forecast resolution compared to forecasts made with other σ values.
Figure 7 shows a reliability graph for F3, which indicates that forecasts with σ = 48 km had the highest reliability. Forecasts with σ = 8 and 16 km were generally unreliable throughout the forecast period. Considering that Roberts and Lean (2008) suggested 5Δx as the minimum grid spacing for skillful forecasts, this is, perhaps, not surprising. Forecasts with σ = 32 km generally overforecasted the probability of lightning, while forecasts with σ = 64 and 128 km generally underforecasted its probability (as noted above). These results are in contrast to the findings of Sobash et al. (2011), who found that there was a trade-off between forecast reliability and skill. Here, the forecasts with σ = 48 km both exhibited forecast resolution (between actual and nonevents) and reliability in predicting probability.
Figure 8 shows the savings over climatology for different values of σ. The maximum SOC was obtained for σ = 48 km. With σ = 96 km, the maximum savings were lower, while using forecasts made with σ = 24 km would lead predominately to a loss. As noted, Figs. 6 and 7 demonstrate that using σ = 48 km leads to forecasts with higher reliability and resolution, while Fig. 8 suggests that such factors lead to larger SOC.
Figure 9a shows the SOC using σ = 48 km for different forecast hours. Larger economic value generally corresponds to larger values of FSS, and occurs earlier in the forecast. All curves have negative economic value at very small values of C/L, while two curves, F7 and F9 (with lower FSS), have negative economic value over a large range of C/L greater than 0.25 and 0.2, respectively. Interestingly, the forecast for F9 shows that C/L was positive when C/L < 0.2, even though FSS was equal to 0.48, which is below the uniform skill level defined by Roberts and Lean (2008). This indicates that forecasts that are not considered useful based on the Roberts and Lean (2008) criteria can also have economic value to some users.
Figure 9b shows SOC results for different threshold values for F2. The SOC results were mostly positive for all threshold values. Yet, the maximum value of SOC decreased with increasing lightning threshold, and so did the range of C/L over which SOC was positive.
2) Benefit of lighting assimilation
Table 4 compares the total lightning forecasts from the DLS LDA and DLS CNTL simulations using the ETS. Significant improvements were obtained with LDA up to between 5 and 7 h of the forecast, depending on the forecast threshold [where 1, 10, 25, and 50 (10 min)−1 were tested]. Hence, these results are consistent with those of Lynn et al. (2015) for CG lightning, Sun et al. (2012) for precipitation amounts, and Fierro et al. (2015) for precipitation amounts.
Forecasting that at least one event will occur within a certain neighborhood radius is not the same as forecasting the probability of lightning. The latter is evaluated with the FSS. Table 5 shows that LDA forecasts had much higher FSSs than those from the CNTL, for all thresholds and almost all forecast hours were shown to have useful skill. These improvements occurred during the forecast hours in which LDA forecasts were significantly better (Table 4) than the CNTL forecasts. The impact of LDA on forecasts for thresholds of 10, 25, and 50 (10 min)−1 was especially important, as only forecasts with LDA produced useful probability predictions.
Figure 10 shows the savings over climatology for 1) perfect forecasts, 2) DLS LDA, and 3) DLS CNTL forecasts, for F3. DLS LDA forecasts have much higher maximum SOC values than the CNTL forecasts. Moreover, the DLS LDA had an SOC with positive values over a much larger range than the CNTL. Yet, DLS LDA forecasts have about ¼ the maximum SOC of perfect forecasts, indicating there is much room for improvement, despite the use of lightning assimilation.
3) Comparison with the McCaul diagnostic scheme
For a smoothing parameter σ of 48 km, both the DLS LDA and McCaul LDA forecasts had small biases for a range of threshold values (not shown). Hence, one can have confidence that the schemes did not require any additional tuning of parameters before the forecast lightning from one was statistically compared to the other.
A statistical comparison of the ETS values was then made for the forecast data without debiasing. For a threshold of ≥1 (10 min)−1 the analysis indicated that the DLS forecast had significantly higher values of ETS than the McCaul scheme during the forecast hours for which the forecasts have useful skill (not shown). However, for a threshold of 6/10 (10 min)−1, significance was only obtained for F5. The differences were not significantly different in any hour for a threshold of ≥11/25 (10 min)−1.
With regard to FSS, Table 6 shows that the DLS produced higher values of FSS for a threshold of ≥1 (10 min)−1 than the McCaul scheme, and that there were more hours of useful forecasts. Similar results were obtained for a threshold ≥ 10 (10 min)−1, although the reader should keep in mind that the results for the higher thresholds were generally not significantly different (when comparing ETS). The results were mixed for forecast probabilities of ≥25 (10 min)−1, with no clear advantage found.
Figure 10 also shows the SOC for the McCaul forecasts of ≥1 (10 min)−1. The McCaul forecasts were substantially more negative at low values of C/L, and the maximum value of its SOC was less than in the DLS forecasts. However, at a C/L higher than the climatological probability, the McCaul forecasts had slightly higher SOC values than the DLS forecasts.
One can integrate the SOC for the forecasts over all C/L, which is shown in Fig. 11. The integral of the savings over C/L shows that the DLS had a clear advantage in producing economic value from its forecasts for a threshold ≥ 1 (10 min)−1. Moreover, the relative advantage grew as the forecast hour proceeded incrementally from F2 through F9.
For a threshold ≥ 10 (10 min)−1, the DLS had higher total economic value in F2, but not F4. Forecast hours F6, F7, and F9 each had negative total economic value in both forecasts. [However, one should again keep in mind the lack of statistical significance for these forecasts at higher threshold values (as noted above).]
4) Relevance to forecasting precipitation threshold amounts
Table 7 shows FSSs for forecasts of precipitation at thresholds of 0.1, 0.25, 0.5, and 1.0 in. h−1. Forecasts for the threshold of ≥0.1 in. h−1 were useful for at least forecast hours F2–F8, but for a threshold of 0.25 in. h−1 forecasts were useful only for hours F2–F4. For thresholds of 0.5 and 1.0 in, h−1, the forecasts were not useful. Table 8 shows that bias values for the lower threshold values were generally less than 20%, but were much higher for threshold values of 0.5 and 1 in. h−1. These variations in bias values with threshold intensity were similar to those obtained in Fierro et al. (2015).
Does accuracy in forecasting lightning correspond to accuracy in forecasting precipitation? Figure 12 shows the FSSs from high forecasts for various precipitation thresholds versus FSSs for lightning for a threshold > 1 (10 min)−1. Hence, there are four sets of points, each with a linear interpolation, one for each forecast threshold. The highest correlation is between a probability forecast for thresholds of >1 (10 min)−1 and 0.1 in. h−1 of precipitation. However, correlations with other threshold values of precipitation (except for 1 in. h−1) were still significant at the 0.95 variance level. Hence, accuracy in forecasting the probability of lightning did correspond to accuracy in forecasting precipitation thresholds. Yet, such correlations were not significant when relating FSS values obtained with higher lightning thresholds and the various precipitation thresholds (not shown).
b. Summertime convection
Statistical scores were calculated with the smoothing coefficient used for mostly springtime convection (σ = 48 km), as it seemed impractical to have different values for different weather regimes. Forecast skill scores for forecast hours F1–F3 were from aggregated data obtained from forecasts made for all days and for all hours. Table 9 shows that forecasts of lighting probabilities made with DLS LDA for a threshold ≥ 1 (10 min)−1 were useful in F1 and F2, but not F3. Forecasts for a threshold ≥ 10 (10 min)−1 were useful only in F1, while forecasts for a threshold of ≥25 (10 min)−1 were not useful in any forecast hour. As expected, the predictability of the summertime storms was lower than that of the mostly springtime storms. However, within F1 and F2, the SOC results were similar in magnitude to those obtained with the mostly springtime convection (not shown).
Statistical scores from DLS LDA forecasts with initial forecast times of 0600, 1700, and 2200 UTC were also examined (Table 9). The 0600 UTC forecast actually pertains to nocturnal convection present at 0300 UTC and its associated lightning was assimilated until 0630 UTC. The 1700 UTC forecast forces convection from 1400 until 1730 UTC and had the largest amount of integrated lightning. Forecasts made at 0600 and 1700 UTC each had larger than the aggregate FSS values, along with very small bias (not shown).
The forecasts beginning at 2200 UTC were spun up from 1900 until 2230 UTC. These forecasts assimilate lightning in the late afternoon to early evening (eastern standard time). These forecasts had FSS values less than the aggregate, which was attributed to positive lightning bias (not shown). Simulations made by modifying the assimilation scheme to produce lower values of supersaturation for the same value of observed lightning did not improve the skill scores or appreciably change the bias of the forecasts (not shown). This suggests that moisture biases were present within the initial and lateral boundary conditions of the RAP, and that the assimilation scheme is relatively insensitive to modest changes (25%) of the original nudging coefficients.
Table 10 shows FSS values for DLS LDA and DLS CNTL forecasts at 1700 UTC (or, as noted, for the forecast period with the highest amount of lightning). Clearly, spinup in the LDA forecasts was faster in hours A1, A2, and A3 than spinup in the CNTL forecasts (as well as in the hour A4/F0). For a threshold of ≥1 (10 min)−1, the FSS in the LDA forecasts was substantially larger than in the CNTL forecasts in F1 and somewhat larger in F2, but smaller than in F3. The results for F3 suggest that the effect of assimilating lightning on summertime producing convection can eventually be detrimental. Moreover, at higher threshold values, the advantage of using lightning assimilation was mostly detrimental in hours F1–F3, even though the DLS LDA forecasts produced much higher FSS values than the DLS CNTL in A4/F0.
As pointed out in Fierro et al. (2014, 2015), the potential for improvement using LDA depends in part upon errors/biases present in the microphysics: single-moment codes such as WSM6 are known to overestimate subcloud evaporation and produce strong cold pools. Moreover, it is well known, as noted, that summertime convection has low predictability, which explains why simulated airmass storms are found to generally decay shortly after data assimilation is applied (i.e., choking/collapse induced by strong outflow). Hence, the forecasting of summertime storms in the late afternoon to evening presents a greater challenge to forecast models than storms with a high number of severe storm reports.
The DLS LDA forecasts were also compared to forecasts derived using the McCaul scheme (with LDA), for a forecast beginning 1700 UTC from 19 July to 14 August 2015. Table 11 shows FSS for a threshold of ≥1 event (i.e., stroke or flash) (10 min)−1, and 10/6 events (10 min)−1. The DLS had higher values of FSS than the McCaul scheme for A4/F0 (and previous hours; not shown), as well as F1–F3. Differences between the schemes were of similar magnitude to those obtained for strong convection. Note, the McCaul scheme did not, in contrast, produce flash origin density values large enough to reach and/or cross the ≥6 flashes (10 min)−1 threshold value.
FSS values were also calculated (Table 12) for the same precipitation threshold values in Table 7, but also for a threshold of 0.04 in. h−1 (or 1 mm). The forecasts for all dates with initial times of 1700 UTC (with 3.5-h prior spinup) were chosen for this analysis. Only the smallest threshold of 0.04 in. h−1 had useful skill, during hours F1 and F2, but not F3. Unlike the probability for the high forecast dataset, FSS values for lightning forecasts of >1 (10 min)−1 were not well correlated with FSS values for threshold values of precipitation. Hence, accurate forecasts of summertime convection were not indicative of accurate precipitation forecasts.
4. Discussion and conclusions
A convection-allowing 4-km forecast model was used to predict hourly lightning probability during mostly springtime and summertime convective storms using the dynamic lightning scheme. The mostly springtime storms were characterized by their relatively large number of storm reports, as compared to the summertime convection.
Reliability diagrams, relative operating characteristic curves, histograms, and fractions skill scores were used to evaluate the accuracy of the forecasts. While most forecasts with different smoothing parameters were skillful, forecast probabilities of at least one lightning event [(10 min)−1] were most reliable with a spatial smoothing parameter σ of 48 km. Hourly probability forecasts with higher threshold values were also found to have useful skill. The forecasts produced economic savings over forecasts made with climatology, including those with a higher lightning threshold—although not for all cost/loss ratios. Simulations comparing forecasts with and without lightning assimilation further demonstrated the utility of using lightning to more quickly spin up forecasts (at the 5% confidence level). For both the high and low datasets (storm reports), lightning stroke forecasts made with the DLS had larger FSSs than forecasts of lightning flashes obtained with the McCaul diagnostic scheme for a threshold of >1 (10 min)−1 (at the 5% confidence level).
It was encouraging that summertime convective storms could be predicted well enough to produce lightning forecasts with skill in the first two full forecast hours, and that the skill of these forecasts (with at least one event per 10 min in the first hour) was similar to that obtained in the high dataset. This finding was especially encouraging in light of Zhang and Sippel’s (2009) report that weakly forced moist convection has low predictability.
Yair et al. (2010) proposed to use the lightning potential index as an indicator of heavy, possibly flooding, rain. They showed that large neighborhood averages had large maximum values when precipitation amounts were high. Here, it was shown that higher FSSs of individual case study days from forecasts of lightning probability of at least one event (10 min)−1 correlated well (significantly at 5%) with higher values of FSS from forecasts of hourly precipitation thresholds (up to 0.5 in.). However, FSSs from the same case study days for forecasts of higher threshold of lightning did not correlate significantly with the FSSs associated with the same precipitation thresholds. One possible explanation is that high lightning rates are often associated with colder cloud bases. This implies that warm rain processes are less active than in clouds with lower cloud bases (D. Rosenfeld 2016, personal communication), leading to relatively lower rain amounts than in warm rain clouds (with also comparatively less lightning). Another possible reason is that lightning forecast models used herein are not fully explicit.
As noted, the lightning assimilation scheme led to a faster (and more accurate) spinup of initial convection. This was the case for both the high and low datasets of storms. However, for a subset of summertime/low forecasts the advantage of using lightning assimilation was less apparent. Forecasts had useful skill for only 2 h after turning off the lightning assimilation. Moreover, accurate forecasts of lightning were not indicative of accurate forecasts of even light precipitation. Convection is driven by instability, even if the convection is weak. Light rain is produced by other mostly nonconvective mechanisms, in an air mass that would generally not support convection. Therefore, the skill of the model with respect to the two might differ, and the two should not be expected to be correlated.
Fierro et al. (2015) noted that their assimilation scheme performs best for outflow-dominated MCS/squall lines. These convective types chiefly rely on a timely and accurate positioning of the cold pool given a model environment broadly consistent with the observations. They noted that the assimilation scheme does less well in reproducing individual supercells, where a balance between storm-relative helicity and the cold pool is required to maintain the supercell structure.
For the most part, the DLS produced higher FSS and SOC values than did the McCaul diagnostic scheme. Specifically, it was significantly better at producing at least one lightning event during the mostly springtime and summertime convection and also had a higher economic value. It also had higher FSS values when comparing 6 flashes versus 10 strokes (10 min)−1, but the results were not statistically different. The DLS takes into account the history of the convective storms that produce lightning, as it uses a four-dimensional (space and time) variable from which lightning strokes are calculated. The McCaul scheme is diagnostic; it uses output forecast data at intervals, regardless of past history. It seems to be tuned to work best for higher threshold values [and this was confirmed by E. W. McCaul (2016, personal communication)]. As the two schemes had similar biases, these particular aspects of their formulations most likely explain the differences in their forecast accuracy, especially for small threshold values.
Fierro et al. (2015) showed that lightning assimilation improves precipitation skill scores in the eastern United States, including the FSS, during the warm season (May–July 2013). FSS scores for hourly thresholds of precipitation were larger than 0.5 only within the first three or four forecast hours, even though the positive impact of lightning assimilation on FSS extended out about 9 h from the time the assimilation was turned off. Roberts and Lean (2008) define forecasts with FSS > 0.5 as useful, depending on climatological values. The analysis of savings over climatology versus C/L suggests that the larger the FSS, the higher the economic value of the forecasts. Forecasts with FSS < 0.5 are more likely to have negative SOC values and possibly negative economic value over all C/L. Yet, forecasts with FSS < 0.5 from this study were still shown to have positive economic value for a certain range of C/L. While not the subject of this paper, it is suggested that forecast accuracy might be further improved through the use of three-dimensional variational data assimilation (3DVAR; Smith et al. 2014), including 3DVAR with lightning as one of its parameters (A. Fierro 2016, personal communication). A future goal is also to use forecast lightning to track the possible movement of convective storms within the forecast, and to use these lightning tracks with other model output to predict severe weather types such as tornadic or strong winds associated with these storms.
I thank NCDC (now NCEI) for providing the NAM boundary conditions, while RAP boundary conditions were from derived from operational forecasts downloaded from NCEP. Earth Networks provided the lightning data, while Bill (Eugene) McCaul and Jonathan Case provided helpful advice on how to calculate McCaul total lightning values. All model forecast data and analysis programs were provided courtesy of Weather It Is, Ltd., and remain its intellectual property. GCIP/EOP Surface Precipitation NCEP/EMC 4KM Gridded Data (GRIB) stage IV data were provided by NCAR/EOL under sponsorship of the National Science Foundation (http://data.eol.ucar.edu/ or http://data.eol.ucar.edu/cgi-bin/codiac/fgr_form/id=21.093). I also thank the reviewers and editor for their helpful suggestions.
Additional Sensitivity Tests
Bias and FSS was compared for forecasts with 3 versus 5 h of model assimilation time for the high forecast experiments. The McCaul scheme was used because its forecasts of lightning are diagnostic and only sensitive to changes in forecast convection. The simulations that assimilate lightning for only 3 h had a lower bias than those with 5 h of lightning assimilation. However, the biases were generally small in both forecast datasets. The main effect of assimilating additional lightning (hours 4 and 5) was much higher FSS values in F1 and F2. Other forecast hours, though, showed more similar values of FSS (Table A1).
Finally, we ran a 5-day subset (9–13 August) of forecasts at 2200 UTC using the Thompson scheme instead of the WSM6 microphysics scheme. The impact of changing schemes on the model forecast precipitation bias was minimal (not shown).
The RAP data available on the NCEP operational download site contains nine soil temperature and moisture layers (/pub/data/nccf/com/rap/prod/rap.YYYYMMDD).
For instance, a forecast beginning at 0400 UTC used reanalysis RAP data from 0100, 0200, and 0300 UTC, as well as the 1-hourly forecast from the 0300 UTC RAP to create initial conditions for each hourly forecast. The lightning data were assimilated between 0100 and 0430 UTC. Assuming a ½-h latency period before new forecasts can be produced, we evaluated the accuracy of the forecast started at 0400 UTC from 0500 to 0800 UTC, in hourly increments.
One can find further information/details online (http://www.nssl.noaa.gov/users/brooks/public_html/feda/note/reliroc.html, and http://neacc.meteoinfo.ru/training/103-lecture-4-on-forecast-downscaling).
One might also calculate the frequency-weighted mean-squared deviation of the reliability curve from the diagonal in the diagram, but this metric does not take into account missed forecasts or indicate usefulness.
The reason 0 takes on a special role is that a difference of 0 means no difference in accuracy between the two methods. Assuming 0 is consistent with the data (and a continuous member of the histogram), then one cannot refute the hypothesis that there is no difference between the forecast approaches.
The emphasis in this paper is on probability forecasts of lightning, and the probability P refers to the probability that lightning rates will exceed threshold values.