1. Introduction
The U.S. National Lightning Detection Network (NLDN) has undergone a number of upgrades over its >30-yr history, both to sensor technology as well as central processing system and algorithms (e.g., Cummins et al. 1998; Orville 2008; Cummins and Murphy 2009). Much about the NLDN has changed since 2013, yet the last comprehensive published review of the network was over a decade ago (Cummins and Murphy 2009), and many recent descriptions of postupgrade performance refer to a network-wide upgrade in 2003 (e.g., Orville et al. 2011; Rudlosky and Fuelberg 2010). Hardware and algorithm updates, however, have important consequences to the interpretation and utilization of data, both in real-time operational contexts and in long-term climatological studies (e.g., Koshak et al. 2015; Medici et al. 2017).
The performance of the NLDN with respect to cloud-to-ground (CG) strokes and flashes is well validated on data prior to 2013. Mallick et al. (2014), e.g., using data from 2004 to 2012, established that the NLDN detected 94% of CG flashes and 75% of CG strokes in rocket-triggered flashes at Camp Blanding, Florida. They further found that the median location error of rocket-triggered strokes detected by the NLDN was 334 m over that same time period. Using video observations in other parts of the United States, Biagi et al. (2007) obtained comparable results. Specifically, in an area near the perimeter of the network in southern Arizona, Biagi et al. observed CG flash and stroke detection efficiencies of 93% and 76%, and in an area of northern Texas and southern Oklahoma (network interior), those detection efficiencies were 92% and 86%, respectively. The median random location errors of strokes that followed the same channel to ground were 424 m in southern Arizona and 282 m in the interior, respectively.
By contrast with the CG performance of the NLDN, estimates and limited validations of cloud lightning (IC) performance prior to 2013 suggested much lower detection efficiency. In the early 2000s, Vaisala embedded a test network of sensors within the NLDN in northern Texas (Murphy et al. 2006). That embedded network had shorter sensor baselines (distance between neighboring sensors) of approximately 150 km, as opposed to the operational NLDN, in which those distances are typically between 300 and 350 km. The IC flash detection efficiency of the embedded network was validated against a VHF total lightning mapping system centered on the Dallas–Fort Worth region, and the results over several storms ranged from 16% to 38%. Based on these results, and a distribution of range-normalized signal amplitudes from cloud discharge pulses taken previously over Florida, Murphy et al. (2006) estimated that the NLDN overall would have cloud lightning detection efficiency of around 10% in the interior of the network.
With a primary objective of increasing the cloud lightning detection efficiency substantially, all sensors in the NLDN were replaced with a fully digital signal processing–based sensor known by its trade name, LS7002, in spring and summer 2013 (Nag et al. 2014; Murphy et al. 2014). That sensor upgrade was, however, only the first major step along a path of improvements focused on improving IC detection efficiency. Subsequent changes involved an expanded data format sent by the sensors to the central processing system, a first round of updates to the algorithms in the central processing system in order to accommodate the new data format and attempt to improve the classification of cloud discharge pulses (henceforth, “IC pulses”) and CG strokes, a reassessment of performance, and a second round of algorithm improvements.
The first objective of this paper is to provide a review of all of the significant changes to the NLDN since 2013. The second objective of the paper is a detailed look at one of the algorithms that has changed over the years—the “classification” algorithm, or the means of differentiating between CG strokes and IC pulses—and how the classification affects long-term climatological analysis and interpretation of NLDN data. The paper is organized as follows. Section 2 provides a time line of changes to the NLDN since 2013. Section 3 provides additional detail about two nonclassification algorithm improvements introduced to the operational central processing suite in the NLDN. Sections 4 and 5 provide a detailed analysis of changes in the classification algorithm and how these affect interpretations of long-term historical datasets across upgrades, including times prior to the 2013 sensor upgrade. Section 5 also provides guidance on a simple means to normalize classifications across upgrades. In sections 3–5, the now-operational version of central processing system is referred to as the “current,” or “November 2018” version, whereas the prior operational version is called the “March 2016” version, and anything older is referred to as the “pre-2015” version. Section 6 addresses the location accuracy of the network, and section 7 gives a brief conclusion.
2. Time line of changes
Table 1 lists the dates of major changes in the NLDN starting with the 2013 sensor upgrade. The following paragraphs provide additional information about this time line.
Time line of major changes in the NLDN since 2013.
A pair of companion papers at the 2014 International Lightning Detection Conference described the 2013 sensor upgrade of the NLDN (Nag et al. 2014; Murphy et al. 2014), which occurred between May and August 2013. As described briefly by Nag et al. (2014), the new sensors include the capability to apply digital filters to reduce local noise and thus attain better signal-to-noise ratios and, correspondingly, increased sensitivity to low-amplitude signals. The new sensors also include a new sensor data format that provides more parameters of the wave shapes of each detected event and bundles information about multiple pulses within so-called pulse trains. Specifically, as documented in Murphy et al. (2004a,b), the parameters are threshold crossing time, a correction to the onset time, rise time, peak-to-zero time, and amplitude of the main pulse, information about the opposite polarity peak (if any), including its time and relative amplitude, and a compressed representation of any pulse trains before or after the main pulse, including numbers of pulses and their relative times and amplitudes with respect to the main pulse. At first, as discussed in Murphy et al. (2014), the original and new data formats were delivered by each sensor in parallel, respectively, to the operational central processor at that time and to an updated central processor that was under test.
By using two Lightning Mapping Arrays (LMAs; Thomas et al. 2004), Murphy et al. (2014) showed that the detection efficiency of pure cloud flashes (i.e., no CG strokes) ranged from 30% to 58% in five storms. In those same storms, 59%–77% of all CG flashes also had IC pulses that were detected by the NLDN. Early analysis of these CG flashes indicated that IC pulses detected within CG flashes by the NLDN preferentially occur in the preliminary breakdown phase. In addition, Murphy et al. (2014) explored the difference in cloud flash detection efficiency as a result of the introduction of a method of time alignment and geolocation of multiple pulses within pulse trains. That method is described in Murphy et al. (2004a,b) and Murphy and Said (2018), with an additional summary provided in section 3 of this paper. The pulse train processing increased the detection efficiency of all flashes containing IC pulses by around 6% over 10 storm cases.
On 18 August 2015, the new data format from the sensors and the corresponding algorithm updates in the central processor became operational (Nag et al. 2016). The main change discussed by Nag et al. (2016) was not the pulse train processing, but rather, a change to the method of classifying IC pulses and CG strokes, with one final change made in March 2016, as described in section 4.
In addition to classification, Vaisala also learned, in partnership with a number of users of NLDN data, that the algorithm upgrades introduced in 2015 had created a small but noticeable population of incorrectly located lightning events. As a result of a detailed investigation of these, some general improvements to the geolocation algorithms were introduced at the same time as the refinements to the classification algorithm.
Changes to reduce poorly located events and a further refinement of the classification algorithm were deployed in a single upgrade that became the operational central processing suite in the NLDN on 7 November 2018.
3. Nonclassification algorithm upgrades
a. Flash clustering
As described by Murphy and Nag (2015), prior to the 2013 sensor upgrade, it was assumed that the detection efficiency of IC pulses was sufficiently low that at most one IC pulse was ever detected per flash. Thus, the original flash clustering algorithm, described in detail by Cummins et al. (1998) in text associated with their Fig. 6, clustered only CG strokes into flashes. Since the 2013 sensor upgrade, however, the NLDN detects 5–10 times more IC pulses, including many that are associated with CG flashes. As of the August 2015 version of the central processing system in the NLDN, Vaisala introduced a new flash clustering method that also considers IC pulses. At its core, the method retains the same CG stroke clustering rules as before: Namely, new CG strokes may be added to an existing flash as long as they occur no more than 1 s after the first stroke, subject to a maximum time interval of 500 ms after the last stroke was added to the flash, and subject to a maximum distance limit of 10 km from the first stroke. In addition, however, the new clustering method has a second maximum distance limit of 20 km that applies to IC pulses within the same flash. In flashes that contain only IC pulses, only the 20-km maximum distance limit applies. The larger maximum distance allowance is designed to take into account the fact that IC pulses are expected to have a larger horizontal footprint than CG strokes, as well as the fact that the signal-to-noise ratio, and hence location accuracy, of IC pulses is not as good as that of CG strokes due to their generally lower signal amplitudes. Figure 1 shows an example of a CG flash that also contained a number of IC pulses, with the two maximum clustering distances also shown.
Most NLDN datasets are “stroke data” sets, meaning that no flash clustering is applied, and all IC pulses and CG strokes are reported individually. It is particularly important to note that, at the time of this paper, “flash” datasets from the NLDN are typically generated using the pre-2015 version of the clustering algorithm—that is, only CG strokes are grouped into flashes, but IC pulses are reported individually. Thus, users who require flash data are encouraged either to request a dataset from the updated clustering algorithm or to implement the clustering method illustrated in Fig. 1.
b. Pulse train processing
Murphy et al. (2004a,b) described a process by which correct time-of-arrival differences could be determined from a collection of pulses whose interpulse times were significantly shorter than the propagation times of the signals between neighboring sensors in a long-baseline network such as the NLDN. Without such a process, if individual sensors simply report times of arrival from some subset of the pulses above the detection threshold, or just a single time of arrival pertaining to one pulse in the train, then an incorrect association of misaligned times of arrival reported by different sensors is virtually assured. The result is either badly located lightning positions of poor quality, or lightning positions that are ultimately rejected by the chi-square quality control parameter in the central processor.
The time alignment process of Murphy et al. (2004a,b) begins with a relatively coarse alignment of the times of arrival of the key pulse in the pulse train as detected by multiple sensors. Normally, this key pulse is the one with the largest absolute amplitude within a set of pulses spanning a total time period of a couple of milliseconds, although the choice of the key pulse is not absolutely critical. Following this initial time alignment, the method then identifies the sensor that detected the greatest number of pulses and uses that pulse sequence as a reference. The pulse sequences detected by the remaining sensors are then compared with the reference sequence with the objective of matching as many interpulse time intervals as possible. As long as at least some minimum number of time intervals can be matched, from some minimum number of sensors, then the time alignment is considered successful, and the central processor proceeds to calculate the positions of the successfully aligned pulses using the relevant times of arrival as well as an angle measurement from each sensor that reports at least two events in the pulse train.
Figure 2a (from Murphy and Said 2018) shows a set of pulses detected by four NLDN sensors in the central United States following just the coarse time alignment stage of the pulse train processing. The pulses were detected over a period of 2 ms. The measured amplitudes of all pulses are represented by the vertical axes in Fig. 2a, which all have the same scale. It is not readily obvious which pulses align in time until the finescale interval alignment stage is complete. The result of that stage is shown in Fig. 2b, together with some visual guides to several clusters of pulses that match and can therefore be geolocated by the NLDN central processor. Altogether, 17 pulses from just these four sensors were successfully time aligned via this process, and 14 of those produced successful quality-controlled positions.
4. Classification algorithm and the effects of changes
The accuracy of classification of individual CG strokes and IC pulses has obvious operational importance to users who need to assess damage to structures or wildfire starts, or evaluate insurance claims. Changes in classification over the course of network upgrades also take on major significance in long-term studies of lightning risk/exposure and lightning climatology. In the NLDN, changes in classification due to upgrades are attributable both to changes to the sensors over the years and changes to algorithms in the central processing system, as well as interactions between the two. The goal of this section is to look back further than the 2013 sensor upgrade, considering changes to both sensor and central processor algorithms.
a. History of classification in the NLDN
Before the 1994–95 IMPACT upgrade to the NLDN (Cummins et al. 1998) and the 2002–03 upgrade of all remaining Lightning Position and Tracking System (LPATS) (time-only) sensors to IMPACT (angle and time) sensors (Biagi et al. 2007; Cummins and Murphy 2009; Orville et al. 2011), the primary objective was not classification per se, but rather the elimination of anything that was not recognized as a CG stroke at the sensor level. In pre-1994 direction-finding sensors as well as various models of the IMPACT sensor (1994 to the mid-2000s), this was accomplished by discarding, at the sensor level, waveforms that (i) had peak-to-zero (PTZ) times less than 15 μs, (ii) were bipolar, and/or (iii) had multiple peaks on the rising edge of the first half cycle (Krider et al. 1980). Likewise, the sensors also eliminated adjacent pairs of opposite-polarity peaks that were, in reality, bipolar pulses but were detected as if they were two separate unipolar pulses. Initially, the sensor-level rejection was based on the foregoing wave shape criteria given by Krider et al. (1980). IMPACT sensors used somewhat relaxed values of these same criteria so as to be able to detect more distant CG strokes (Cummins et al. 1998); at some point, the 15-μs PTZ restriction was reduced to 10 μs (Wacker and Orville 1999) with an option to go to 7 μs as used in a deployment in the Pacific during TOGA COARE (Lucas and Orville 1996). In any case, no information from any eliminated waveforms was even transmitted to the NLDN central processor at the time, and the central processor did not even have what would today be called a “classification algorithm.”
As part of the 1994–95 IMPACT upgrade of the NLDN, sensors of the LPATS model were incorporated into the NLDN (Cummins et al. 1998). The LPATS sensors initially had minimal capability to reject non-CG-stroke signals, but some of the rejection capabilities of the IMPACT sensors were subsequently added (Cummins et al. 1998). To make the best use of the signal rejection capabilities (as well as signal calibration) of the IMPACT sensors, however, the central processor was set to reject any lightning event that did not have at least one contributing IMPACT sensor (Orville and Huffines 2001). Despite this, Cummins et al. (1998) noted an increase in the detection of small positive “CG” strokes that were likely misidentified IC pulses and recommended that positive events with peak current <10 kA be regarded as IC pulses.
Between the 2002–03 upgrade, when all LPATS sensors were converted to IMPACT sensors (Biagi et al. 2007) and the time of a central processor update in April 2006 (Cummins et al. 2006; Fleenor et al. 2009), the focus shifted from the elimination of IC pulses by the sensors to the classification of detected events as either IC pulses or CG strokes at the central processor. Already by the time of the 1994–95 IMPACT upgrade, the sensors were transmitting one parameter, the PTZ time, that is relevant to classification, but the central processor was not set up to make use of the information. Starting with the 2002–03 sensor upgrade, however, the central processor became capable of using the PTZ time to distinguish between IC pulses and CG strokes. Note that the sensors at this time continued to use the signal rejection criteria described above, but the only information available to the central processor was PTZ time. Also note that IC pulses were not output from the central processor until April 2006, when limited IC pulse information was first made available (Cummins et al. 2006).
Previously, Cummins et al. (1998) indicated that positive events with peak currents below 10 kA should be regarded as likely IC pulses, but when IC pulses were first made available in NLDN data (Cummins et al. 2006), that threshold was raised to 15 kA and formally introduced into the NLDN central processing algorithm. Had this fixed threshold been in place in the Biagi et al. (2007) study, the percentage of misclassified positive events would have been 13% (18 of 137) in both directions (actual ICs misidentified as CGs and actual CGs misidentified as ICs). Fleenor et al. (2009) noted that some of the positive events in their dataset were misidentified because the sensors detected an opposite-polarity peak; inclusion of such events suggests that the overall misclassification rate in their dataset was around 33%.
As noted above in section 2, the 2013 upgrade of the sensors brought along a greatly expanded format of the data sent back from each sensor to the central processor. This includes all of the features that had previously been used as the basis of rejection at the sensor level (e.g., the degree of bipolarity of pulses, the number of peaks on the leading edge of the waveform), as well as the PTZ time. The new format also includes any pulse trains either preceding or following the main pulse. This new data format did not become the operational format until the August 2015 central processor upgrade, and thus, the pre-2015 classification algorithm still used just the PTZ time to distinguish between IC pulses and CG strokes. The pre-2015 method also included the fixed limit of +15 kA (Cummins et al. 2006) to reclassify any remaining small positive “CG strokes” as IC pulses.
The greatly expanded set of wave shape parameters offered by the new sensor data format enabled a change to the classification algorithm: a multiparameter classification, as described by Nag et al. (2016). This multiparameter classification method involves a linear combination of multiple features available in the new format: PTZ, peak current, rise time, waveform width, time to any opposite-polarity peak, degree of bipolarity, and duration and number of pulses in pulse trains, if any, before and after the main pulse. It is an explicitly polarity-segregated method insofar as different sets of weights are applied to positive and negative events. This algorithm was included in the August 2015 central processor upgrade but with the fixed limit of +15 kA, below which all positive events were automatically classified as IC pulses, still in place. That limit was subsequently switched off on 23 March 2016, and the result was documented in Figs. 5 and 6 of Nag et al. (2016), clearly indicating the reintroduction of the problem of small positive discharges that were most likely IC pulses but were classified as “CG strokes.” Because of this final change, the classification results described in subsequent sections of this paper refer to the “March 2016” classification, but all other aspects of the central processor that became operational on 18 August 2015, are still referred to as the “August 2015 central processor upgrade.”
Because of the relative rarity of +CG strokes in most areas of the United States, the overall rate of misclassification was estimated by Nag et al. (2016) to be under 5%, or alternatively, the overall classification accuracy was expected to be at least 95%. Similar levels of classification accuracy were verified by Zhu et al. (2016) over most types of pulses. However, the significant percentage of misclassified small positive discharges prompted a thorough reanalysis of the classification problem. In 2017, Vaisala undertook a study in conjunction with M. D. Tran (at the time, at the University of Florida) in order to refine and improve upon the March 2016 multiparameter classification method.
The current (as of November 2018) classification algorithm takes into consideration the same essential parameters mentioned above with respect to the March 2016 classification algorithm but in a nonlinear method that also includes some things derived at the central processor once the position of each event is known (e.g., distance from sensor to lightning event). In addition, the current revision of the multiparameter classification no longer applies separate weights to positive and negative events. Instead, polarity is taken into account in the form of a signed signal amplitude, which is one of the many waveform characteristics.
b. Effects of classification changes, using data from summer, 2018
Historically, statistics on CG flashes and/or strokes, such as annual or monthly densities of negative and positive CG flashes or strokes, have been regarded as the most relevant information from the point of view of both exposure and risk analysis. CG flashes and strokes also happen to have the longest history of observations from automated ground-based lightning locating systems, though future climatological studies may start including IC pulses and flashes. The effects of various upgrades are observable in CG statistics. For example, in the north-central United States, Orville et al. (2011) show that the median peak current of first strokes in negative CG flashes dropped from the 18–20-kA range in most areas to the 8–12-kA range just as a result of the 2003 sensor upgrade. Fortunately, recent changes to the classification algorithm in the central processing system have been out of phase with respect to the 2013 sensor upgrade, so that sensor and algorithm upgrades are separable. Despite that separation, the interpretation of CG stroke and flash statistics over time remains complex.
Table 2 provides the counts of events over the entire NLDN in four classes, +CG, +IC, −CG, and −IC, on a set of 21 busy lightning days in July and August of 2018. The total number of events detected by the NLDN over those 21 days was approximately 41.1 million. The second column provides the same counts except using the classification scheme that was in effect in the pre-2015 central processing system, namely, the PTZ-only method together with the automatic reclassification of all positive events with estimated peak currents < +15 kA as IC pulses. The third column in Table 2 shows the counts of events in each of the four categories under the March 2016 linear multiparameter classification. The fourth column presents the counts using the current, November 2018, update to the multiparameter classification scheme. The remaining columns show the percent changes in each category under the current classification relative to the prior versions. It is important to note that Table 2 was made without any reprocessing of sensor-level data, so that only the classification differences are taken into account; the positions and times of all of the lightning events were produced by the location algorithm that was operational as of August 2015, and then variations in classification were applied separately.
Counts from 21-day sample in mid-2018, over the entire NLDN.
In general, we note a major shift (in relative terms) from positive-polarity CG strokes to positive-polarity IC pulses as a result of the current classification algorithm relative to the March 2016 method. This reclassification of many formerly “+CG” events includes most of the events with peak currents < 15 kA that were shown by Nag et al. (2016) to have been called “+CG” as a result of switching off the fixed threshold of +15 kA below which all positive discharges were automatically reclassified as IC pulses. The pre-2015 approach has also been followed in many long-term climatological studies that span network upgrades (e.g., Rudlosky and Fuelberg 2010; Orville et al. 2011; Koshak et al. 2015). Note also in Table 2 that negative-polarity events also exhibit a shift from CG to IC under the current classification. As shown in more detail in the following subsections, the negative events that are reclassified as IC pulses have estimated peak currents below 10 kA, so we expect that the shift in their classification is reasonable.
Because of regional differences in lightning characteristics, the following sections provide greater detail in two subregions of the United States. The first is the interior southeastern United States (32° to 37° latitude, −88° to −81° longitude—in other words, not too close to coastlines where detection efficiency drops off), where most CG strokes are negative and most IC flashes are “normal” insofar as they occur between a midlevel negative charge region in the cloud and an upper positive, and they thus have positive polarity. The second is the northern Great Plains (39° to 47° latitude, −102° to −96° longitude) because that area has a higher proportion of real positive CG strokes and a large proportion of low-altitude IC discharges that are expected to produce pulses of mainly negative polarity (Bruning et al. 2014; Fuchs and Rutledge 2018).
In the discussion of these two subregions, we present peak current distributions of events classified as positive and negative CG strokes (Figs. 3–6) under four different scenarios: red shows the November 2018 classification algorithm (currently operational), blue shows the March 2016 algorithm, a dashed green line shows the pre-2015 classification algorithm, and last, a dotted gray line shows the results of mimicking the pre-2002 hardware-based rejection of cloud signals. This last result was done by taking sensor data from the 21 busy lightning days in July and August of 2018 and eliminating all sensor records that would have failed the hardware rejection criteria from the pre-2002 IMPACT sensors, as described above in section 4a. The filtered sensor records were then reprocessed through the operational location algorithm but using the pre-2015 classification algorithm, because that is reasonably representative of how the central processor functioned at that time. The pre-2015 and mimicked pre-2002 distributions are shown as a point of reference, but the discussion in the following two sections centers on comparing the March 2016 and November 2018 algorithms.
c. Interior southeastern United States
Figure 3 shows distributions of peak current (showing up to 50 kA only) of events classified as positive CG strokes in the interior SE United States under the four conditions described above. Under the November 2018 method, the count of “+CG” strokes in the interior SE United States is reduced by 63% relative to the March 2016 classification method; that is, 63% of the events formerly classified as +CG are now classified as +IC pulses. However, the median peak current of those events that are still classified as “+CG” strokes is actually now slightly lower than under the August 2015 classification: +7 kA now versus +9 kA before. Positive CG strokes are relatively rare in the southeast United States; Zhu et al. (2016) had only 26 of 367 (7.1%) +CG strokes in their dataset, and they indicated that this was a higher-than-normal percentage. However, when +CG strokes do occur in the southeast United States, their peak currents are fairly high: where peak current is mentioned at all by Nag and Rakov (2012), out of 51 natural +CG strokes in their dataset from Gainesville, Florida, none had peak current below 31 kA. Biagi et al. (2007) found that only 6 of 41 +CG strokes (14%) confirmed on video in northern Texas and southern Oklahoma had peak current ≤ 10 kA. Thus, our finding that the median peak current of “+CG” strokes in the southeast United States is still rather small strongly suggests that the majority of events that are still classified as “+CG” in the SE United States are actually IC pulses. The current classification algorithm, while making a significant improvement over the March 2016 version, still appears to need work.
Negative CG strokes in the interior SE United States are not expected to have any substantial issue. However, we note that the current classification takes 12.4% of the −CGs as classified by the March 2016 algorithm and turns them into −IC pulses. The median (absolute value) of peak current increases from about 12 kA before to almost 14 kA under the current classification. Figure 4 shows the distributions of absolute value of peak current of events classified as −CG strokes in the SE United States, in the same format as Fig. 3. The current classification algorithm decreases the counts of events below 8 kA significantly, and essentially does nothing to the counts of events above 12 kA. The decrease in low-current events and the slight rise in the median peak current of −CG strokes probably indicate that the new classification properly reclassifies small negative events that had previously been classified as −CG strokes. The median peak currents of −CG strokes over the southeastern United States was shown by Orville et al. (2011) to be between 12 and 18 kA, and similar numbers were also presented by Koshak et al. (2015). Most of the median peak current values of subsequent −CG strokes measured directly (but not in the southeastern United States) are also in the 12–18-kA range (Rakov et al. 2013).
d. Northern Great Plains
In the northern Great Plains, the November 2018 classification method reclassifies slightly over 50% of the events formerly labeled as “+CG” strokes by the March 2016 method. Figure 5 shows the distributions of +CG peak currents in this region, in the same format as Fig. 3. The events that are left as “+CG” strokes are predominantly high-current events: The median peak current of “+CGs” in the northern plains rises from 21.5 kA under the March 2016 classification to 35 kA under the current classification method. We should note that the current algorithm was developed with essentially no ground-truth data from the high plains area, with the exception of some events from the Colorado LMA (Barth et al. 2015) that were manually classified based on the LMA data alone rather than waveforms or videos.
It is also important to note that the +CG peak current distribution in Fig. 5 has both some similarities and some notable differences with respect to prior information about +CG peak currents. Particularly, MacGorman and Taylor (1989), in central Oklahoma, observed a mode in the +CG peak current distribution at the equivalent of 9–13 kA (25–50 LLPU), similar to the low-current mode in Fig. 5, as well as an extended tail on the distribution extending to peak currents well beyond 50 kA, where Fig. 5 stops. In mesoscale convective systems in the central Great Plains, MacGorman and Morgenstern (1998) found essentially a superposition of three peak current distributions, with the “small-amplitude” distribution still giving large numbers of low-current strokes even if an estimated 15% of “false detections” (likely ICs) were to be removed.
In the northern Great Plains, +CG strokes compose a larger proportion of the CG strokes relative to other areas of the United States (Orville et al. 2011), and deep, low- to midaltitude positive charge regions are more common (MacGorman et al. 2005; Rust et al. 2005; Bruning et al. 2014; Fuchs et al. 2018; Fuchs and Rutledge 2018), as are high ratios of IC to CG flashes (Boccippio et al. 2001; MacGorman et al. 2011; Fuchs et al. 2015; Medici et al. 2017). The combination of anomalous charge structure, with deep positive charge in lower altitude portions of the storms, and high IC:CG ratios, also causes the northern plains region to be relatively rich in negative-polarity IC pulses, as confirmed by Fleenor et al. (2009) in storms in which CG strokes were primarily positive. Thus, we expect the current algorithm to reclassify a sizable portion of events classified as “−CG” strokes in the northern plains as IC pulses, particularly events with lower amplitudes, and indeed, this is what we find. Namely, the median (absolute value) peak current of “−CG” strokes under the March 2016 classification is 9.5 kA, but it rises to 12.5 kA under the November 2018 classification. About 29.5% of events in the northern plains that had been classified as −CG strokes are reclassified as −IC pulses under the November 2018 algorithm (a lower proportion than the 63% of “+CG” events that are reclassified in the southeast). Figure 6 shows the distribution of absolute value of peak current in events classified as −CG strokes in the northern plains, in the same form as Fig. 4.
5. Normalization of classification across years/upgrades
As just described, changes to the classification algorithm can lead to significant changes in the numbers and peak current distributions of events classified as CG strokes, negatives as well as positives. Thus, classification algorithm changes are a significant source of artificial discontinuities in long-term climatology studies, similar to but even larger than changes in detection efficiency. Detection efficiency can be normalized across network upgrades via a method described in Medici et al. (2017) and references therein. Classification, however, presents a different challenge. Here, we lay out a straightforward method of normalizing classification across upgrades, using just two parameters that are available in the most common end-user formats of NLDN data: 1) estimated peak current and 2) PTZ time. We predicate this discussion on the goal of normalizing to the current, November 2018, classification algorithm, the best that we have achieved to date with a network that detects a large percentage of IC pulses. We take this approach despite the fact that the preceding section shows that the current algorithm has some inaccuracies that remain to be characterized and corrected.
The two parameters mentioned above, peak current and PTZ, have significant weight in the current classification method. The simple two-parameter approach to normalizing across classification algorithm updates is a linear regression, with one set of coefficients applying to negative events and the other to positive events. The coefficients of the linear regressions are determined using the independently characterized dataset from 2017 that was used to train the now-operational November 2018 classification algorithm. Finally, the performance of these simple linear classifiers is compared with the full, November 2018 multiple-parameter classification method in two separate datasets, one from 2018 and the other from 2012. In both sets, the data were first reprocessed using the current classification method, because their original classifications were determined using pre-2018 algorithms. The classifications given by the November 2018 multiple-parameter classification algorithm are taken as “truth,” and these are compared against the results of the two-parameter linear regression.
Figure 7 presents a graphical summary of the simple linear classifier as applied to data from 21 days in summer 2018. The upper half of the graph corresponds to CG strokes, and the lower half to IC pulses. The left half applies to negative events, and the right half to positive events. All of the bars are scaled by the number of “true” events of each type, so that the vertical scale goes from 0 to 1. A pair of bars appears in each stack—one blue bar representing correctly classified events, and one red bar representing incorrectly classified events. The numbers that appear above or below each pair of bars give percentages of events that are classified in each type relative to the number of “true” events of that type. Thus, the “101.1” under the leftmost pair of bars says that the total number of events that are classified as −CG strokes using the simple two-parameter linear regression is 101.1% of the “true” number of −CG strokes in the 2018 dataset. We immediately see that the simple two-parameter linear regression slightly overdoes −CG strokes at the expense of −IC pulses. On the positive side, the simple reclassification method overdoes +CG strokes at the expense of +IC pulses. Because positive CG strokes are so rare relative to positive IC pulses, the misclassifications are such that the total number of events classified as +CG strokes is increased by 158.1% relative to the “true” number of +CG strokes in the test dataset, but those same misclassifications only result in an underestimation of +IC events by just 4.9%.
The simple two-parameter linear regression is also applied to a set of NLDN data from 2012, which is before the 2013 sensor upgrade, the August 2015, central processor upgrade, and the March 2016 classification algorithm. The 2012 dataset is composed of 35 days sampled from the months of June, July, and August. Figure 8 shows a bar chart of the same type as Fig. 7 on the 2012 dataset. Although the network underwent a substantial improvement in detection efficiency in the northern United States in 1998 with the first installation of the Canadian Lightning Detection Network, and sensor upgrades in 2002–03 (Cummins and Murphy 2009), the basic sensor-level data remained essentially the same until the 2013 sensor upgrade, and the PTZ-only classification was in effect. Thus, although we have not repeated this analysis on datasets going back to 1998, we expect that the 2012 results are likely applicable to datasets from the April 2006 central processor update (Cummins et al. 2006) back to the 1994–95 IMPACT upgrade (Cummins et al. 1998). However, the application of this proposed method to data prior to 2006 deserves a dedicated study, which we defer to future work. A comparison of Fig. 8 with Fig. 7 shows that the relative impact of the simple two-parameter classification on negative CG strokes is minimal and consistent across these 2 years. The relative impacts of the simple two-parameter approach on the other categories is much more variable, in large part because of the massive increase in the number of IC pulses detected after the 2013 sensor upgrade.
Note that the classification algorithm is applied at the level of individual strokes/pulses. Changes in the counts of flashes are obviously determined not only by the classifications of the individual strokes and pulses but the combination of these by the flash clustering algorithm (Fig. 1). A flash that contains any CG strokes is defined as a CG flash, and the polarity of the first CG stroke determines the polarity of the flash. A flash that contains only IC pulses is defined as an IC flash, and the polarity of the first pulse determines the polarity of the flash. Table 3 shows the percentage change in the counts of flashes, as well as strokes and pulses, as a result of applying the simple two-parameter linear classifier to subsets of the data from 2018 and 2012 followed by flash clustering. The linear classifier increases the 2018 −CG flash count by about 1.6% relative to the current operational classification algorithm and decreases the 2012 −CG flash count by 8.6%.
Stroke and flash counts from 21-day sample in mid-2018 and a 28-day sample in mid-2012, using the simple two-parameter linear regressions (LR) vs the current (November 2018) classification algorithm.
The bottom line of Figs. 7 and 8 is that it is possible to apply a simple two-parameter linear regression to datasets prior to the November 2018 algorithm update and mimic the current classification of negative CG strokes with an error of less than 10%. Thus, this simple approach to reclassifying events from earlier datasets at least helps to deal with long-term studies of negative CG strokes and flashes. Obviously, positive events and negative IC pulses are more severely impacted by this simple approach, however. The appendix provides the linear regression coefficients derived in this analysis.
6. Location accuracy
Cramer and Cummins (2014) demonstrated the ability to produce ground-truth location accuracy information over wide areas by identifying lightning strokes to tall towers. Based on a sample of 2022 strokes in 2013, just after the NLDN sensor upgrade, they found a median location error of 83 m, with 1.15% of those strokes having location errors of 500 m or more. As noted by Cramer and Cummins, strokes to tall towers tend to be located more accurately than the general population of CG strokes, because of shorter rise times on the leading edge of waveforms and because of higher signal amplitudes that lead, in turn, to detection by larger numbers of sensors on average. From a sample of 62 rocket-triggered strokes at Camp Blanding, Florida, in 2013, Mallick et al. (2014) found a median location error of 173 m. Over most of the interior of CONUS, we anticipate a median location error of approximately 150 m from the general population of CG strokes (not specifically tower strokes or rocket-triggered strokes). Zhu et al. (2020) has an update on tower strokes using the same essential approach and finds a median location error overall of 84 m.
7. Conclusions
The U.S. NLDN has undergone a number of significant upgrades between 2013 and the time of this paper. The purpose of this review is to provide a description of those upgrades, particularly the algorithms applied in the central processing system.
Of particular importance both to operational users of NLDN data and to long-term lightning climatology studies is the proper classification of detected events as either CG strokes or IC pulses. In this paper, we provide a linear regression that only requires two variables that are available in common NLDN data formats and that permits the normalization of classifications of events across NLDN upgrades going back prior to the 2013 sensor upgrade. The simple two-variable linear regression reproduces the current operational classification of negative CG strokes to within 10%. It is less effective at reproducing the operational classification of positive events (particularly +CG strokes due to their overall rarity) and negative IC pulses. Future work will be dedicated to independently characterized datasets that can be used both in validation of the classification algorithm and to make necessary improvements upon the current classification algorithm.
Acknowledgments
The NLDN and GLD360 data used in this paper belong to Vaisala Inc. These data are available for purchase or license from Vaisala by contacting B. Pearson (brooke.pearson@vaisala.com). We greatly appreciate the efforts of Don MacGorman and two anonymous reviewers to assist us via very helpful and insightful comments on the first draft.
APPENDIX
Coefficients of Two-Parameter Linear Regression Classifiers
Coefficients of the simple linear regressions are provided in Table A1.
Coefficients of the simple two-parameter linear regressions. The independently characterized dataset from 2017 was subdivided by polarity, and then the two features used to derive the linear regression coefficients were peak current (kA) and PTZ time (μs). Linear regression: y = θIpk × Ipk + θptz × PTZ + θintcp; then IC if y > 0, CG if y ≤ 0; Ipk is the signed value of peak current given in kA and PTZ time is given in μs.
REFERENCES
Barth, M. C., and Coauthors, 2015: Overview of the Deep Convective Clouds and Chemistry (DC3) field campaign. Bull. Amer. Meteor. Soc., 96, 1281–1309, https://doi.org/10.1175/BAMS-D-13-00290.1.
Biagi, C. J., K. L. Cummins, K. E. Kehoe, and E. P. Krider, 2007: National Lightning Detection Network (NLDN) performance in southern Arizona, Texas, and Oklahoma in 2003–2004. J. Geophys. Res., 112, D05208, https://doi.org/10.1029/2006JD007341.
Boccippio, D. J., K. L. Cummins, H. J. Christian, and S. J. Goodman, 2001: Combined satellite- and surface-based estimation of the intracloud–cloud-to-ground lightning ratio over the continental United States. Mon. Wea. Rev., 129, 108–122, https://doi.org/10.1175/1520-0493(2001)129<0108:CSASBE>2.0.CO;2.
Bruning, E. C., S. A. Weiss, and K. M. Calhoun, 2014: Continuous variability in thunderstorm primary electrification and an evaluation of inverted-polarity terminology. Atmos. Res., 135–136, 274–284, https://doi.org/10.1016/j.atmosres.2012.10.009.
Cramer, J. A., and K. L. Cummins, 2014: Evaluating location accuracy of lightning location networks using tall towers. 23rd Int. Lightning Detection Conf./Fifth Int. Lightning Meteorology Conf., Tucson, AZ, Vaisala, https://www.vaisala.com/sites/default/files/documents/Cramer%20et%20al-Evaluating%20LA%20of%20LLN%20using%20tall%20towers-2014-ILDC-ILMC.pdf.
Cummins, K. L., and M. J. Murphy, 2009: An overview of lightning locating systems: History, techniques, and data uses, with an in-depth look at the U.S. NLDN. IEEE Trans. Electromagn. Compat., 51, 499–518, https://doi.org/10.1109/TEMC.2009.2023450.
Cummins, K. L., M. J. Murphy, E. A. Bardo, W. L. Hiscox, R. B. Pyle, and A. E. Pifer, 1998: A combined TOA/MDF technology upgrade of the U.S. National Lightning Detection Network. J. Geophys. Res., 103, 9035–9044, https://doi.org/10.1029/98JD00153.
Cummins, K. L., J. A. Cramer, C. J. Biagi, E. P. Krider, J. Jerauld, M. A. Uman, and V. A. Rakov, 2006: The U.S. National Lightning Detection Network: Post-upgrade status. Second Conf. on Meteorological Applications of Lightning Data, Atlanta, GA, Amer. Meteor. Soc., 6.1, https://ams.confex.com/ams/Annual2006/techprogram/paper_105142.htm.
Fleenor, S. A., C. J. Biagi, K. L. Cummins, E. P. Krider, and X.-M. Shao, 2009: Characteristics of cloud-to-ground lightning in warm-season thunderstorms in the central Great Plains. Atmos. Res., 91, 333–352, https://doi.org/10.1016/j.atmosres.2008.08.011.
Fuchs, B. R., and S. A. Rutledge, 2018: Investigation of lightning flash locations in isolated convection using LMA observations. J. Geophys. Res. Atmos., 123, 6158–6174, https://doi.org/10.1002/2017JD027569.
Fuchs, B. R., and Coauthors, 2015: Environmental controls on storm intensity and charge structure in multiple regions of the continental United States. J. Geophys. Res. Atmos., 120, 6575–6596, https://doi.org/10.1002/2015JD023271.
Fuchs, B. R., S. A. Rutledge, B. Dolan, L. D. Carey, and C. J. Schultz, 2018: Microphysical and kinematic processes associated with anomalous charge structures in isolated convection. J. Geophys. Res. Atmos., 123, 6505–6528, https://doi.org/10.1029/2017JD027540.
Idone, V. P., A. B. Saljoughy, R. W. Henderson, P. K. Moore, and R. B. Pyle, 1993: A reexamination of the peak current calibration of the National Lightning Detection Network. J. Geophys. Res., 98, 18 323–18 332, https://doi.org/10.1029/93JD01925.
Koshak, W. J., K. L. Cummins, D. E. Buechler, B. Vant-Hull, R. J. Blakeslee, E. R. William, and H. S. Peterson, 2015: Variability of CONUS lightning in 2003–12 and associated impacts. J. Appl. Meteor. Climatol., 54, 15–41, https://doi.org/10.1175/JAMC-D-14-0072.1.
Krider, E. P., R. C. Noggle, A. E. Pifer, and D. L. Vance, 1980: Lightning direction-finding systems for forest fire detection. Bull. Amer. Meteor. Soc., 61, 980–986, https://doi.org/10.1175/1520-0477(1980)061<0980:LDFSFF>2.0.CO;2.
Lucas, C., and R. E. Orville, 1996: TOGA COARE: Oceanic lightning. Mon. Wea. Rev., 124, 2077–2082, https://doi.org/10.1175/1520-0493(1996)124<2077:TCOL>2.0.CO;2.
MacGorman, D. R., and W. L. Taylor, 1989: Positive cloud-to-ground lightning detection by a direction-finder network. J. Geophys. Res., 94, 13 313–13 318, https://doi.org/10.1029/JD094iD11p13313.
MacGorman, D. R., and C. D. Morgenstern, 1998: Some characteristics of cloud-to-ground lightning in mesoscale convective systems. J. Geophys. Res., 103, 14 011–14 023, https://doi.org/10.1029/97JD03221.
MacGorman, D. R., W. D. Rust, P. Krehbiel, W. Rison, E. Bruning, and K. Wiens, 2005: The electrical structure of two supercell storms during STEPS. Mon. Wea. Rev., 133, 2583–2607, https://doi.org/10.1175/MWR2994.1.
MacGorman, D. R., I. R. Apostolakopoulos, N. R. Lund, N. W. S. Demetriades, M. J. Murphy, and P. R. Krehbiel, 2011: The timing of cloud-to-ground lightning relative to total lightning activity. Mon. Wea. Rev., 139, 3871–3886, https://doi.org/10.1175/MWR-D-11-00047.1.
Mallick, S., and Coauthors, 2014: Performance characteristics of the NLDN for return strokes and pulses superimposed on steady currents, based on rocket-triggered lightning data acquired in Florida in 2004–2012. J. Geophys. Res. Atmos., 119, 3825–3856, https://doi.org/10.1002/2013JD021401.
Medici, G., K. L. Cummins, D. J. Cecil, W. J. Koshak, and S. D. Rudlosky, 2017: The intracloud lightning fraction in the contiguous United States. Mon. Wea. Rev., 145, 4481–4499, https://doi.org/10.1175/MWR-D-16-0426.1.
Murphy, M. J., and A. Nag, 2015: Cloud lightning performance and climatology of the U.S. based on the upgraded U.S. National Lightning Detection Network. Seventh Conf. on Meteorological Applications of Lightning Data, Phoenix, AZ, Amer. Meteor. Soc., 8.2, https://ams.confex.com/ams/95Annual/webprogram/Manuscript/Paper262391/AMS2015_MALD_8.2_murphy.pdf.
Murphy, M. J., and R. K. Said, 2018: Toward lightning mapping with long-baseline LF networks. 25th Int. Lightning Detection Conf./Seventh Int. Lightning Meteorology Conf., Fort Lauderdale, FL, Vaisala, https://www.vaisala.com/sites/default/files/documents/Toward%20Lightning%20Mapping%20with%20Long-baseline%20LF%20Networks_M.J.%20Murphy%20and%20R.K.%20Said.pdf.
Murphy, M. J., K. L. Cummins, and A. E. Pifer, 2004a: Lightning detection and data acquisition system. U.S. Patent 6 791 311, 42 pp.
Murphy, M. J., K. L. Cummins, and A. E. Pifer, 2004b: Lightning detection and data acquisition system. U.S. Patent 6 788 043, 42 pp.
Murphy, M. J., N. W. S. Demetriades, R. L. Holle, and K. L. Cummins, 2006: Overview of capabilities and performance of the U.S. National Lightning Detection Network. Second Conf. on Meteorological Applications of Lightning Data, Atlanta, GA, Amer. Meteor. Soc., J2.5, https://ams.confex.com/ams/pdfpapers/103980.pdf.
Murphy, M. J., A. Nag, J. A. Cramer, and A. E. Pifer, 2014: Enhanced cloud lightning performance of the U.S. National Lightning Detection Network following the 2013 upgrade. 23rd Int. Lightning Detection Conf./Fifth Int. Lightning Meteorology Conf., Tucson, AZ, Vaisala, https://www.vaisala.com/sites/default/files/documents/Murphy%20et%20al-Improved%20NLDN%20Performance%20after%202013%20Upgrade-2014-ILDC-ILMC.pdf.
Nag, A., and V. A. Rakov, 2012: Positive lightning: An overview, new observations, and inferences. J. Geophys. Res., 117, D08109, https://doi.org/10.1029/2012JD017545.
Nag, A., M. J. Murphy, K. L. Cummins, A. E. Pifer, and J. A. Cramer, 2014: Recent evolution of the U.S. National Lightning Detection Network. 23rd Int. Lightning Detection Conf./Fifth Int. Lightning Meteorology Conf., Tucson, AZ, Vaisala, https://www.vaisala.com/sites/default/files/documents/Nag%20et%20al-Recent%20Evolution%20of%20the%20U.S.%20National%20Lightning%20Detection%20Network-2014-ILDC-ILMC.pdf.
Nag, A., M. J. Murphy, and J. A. Cramer, 2016: Update to the U.S. National Lightning Detection Network. 24th Int. Lightning Detection Conf./Sixth Int. Lightning Meteorology Conf., San Diego, CA, Vaisala, https://www.vaisala.com/sites/default/files/documents/Amitabh%20Nag%20et%20al.%20Update%20to%20the%20U.S.%20National%20Lightning%20Detection%20Network.pdf.
Orville, R. E., 2008: Development of the National Lightning Detection Network. Bull. Amer. Meteor. Soc., 89, 180–190, https://doi.org/10.1175/BAMS-89-2-180.
Orville, R. E., and G. R. Huffines, 2001: Cloud-to-ground lightning in the United States: NLDN results in the first decade, 1989–98. Mon. Wea. Rev., 129, 1179–1193, https://doi.org/10.1175/1520-0493(2001)129<1179:CTGLIT>2.0.CO;2.
Orville, R. E., G. R. Huffines, W. R. Burrows, and K. L. Cummins, 2011: The North American Lightning Detection Network (NALDN)—Analysis of flash data: 2001-09. Mon. Wea. Rev., 139, 1305–1322, https://doi.org/10.1175/2010MWR3452.1.
Rakov, V.A., and Coauthors, 2013: Lightning parameters for engineering applications-An update on CIGRE WG C4.407 activities. 2011 Int. Symp. on Lightning Protection, Fortaleza, BR, IEEE, 294–297, https://doi.org/10.1109/SIPDA.2011.6088434.
Rudlosky, S. D., and H. E. Fuelberg, 2010: Pre- and post-upgrade distributions of NLDN reported cloud-to-ground lightning characteristics in the contiguous United States. Mon. Wea. Rev., 138, 3623–3633, https://doi.org/10.1175/2010MWR3283.1.
Rust, W. D., and Coauthors, 2005: Inverted-polarity electrical structures in thunderstorms in the Severe Thunderstorm Electrification and Precipitation Study (STEPS). Atmos. Res., 76, 247–271, https://doi.org/10.1016/j.atmosres.2004.11.029.
Thomas, R. J., P. R. Krehbiel, W. Rison, S. J. Hunyady, W. P. Winn, T. Hamlin, and J. Harlin, 2004: Accuracy of the Lightning Mapping Array. J. Geophys. Res., 109, D14207, https://doi.org/10.1029/2004JD004549.
Wacker, R. S., and R. E. Orville, 1999: Changes in measured lightning flash count and return stroke peak current after the 1994 U.S. National Lightning Detection Network upgrade. J. Geophys. Res., 104, 2151–2157, https://doi.org/10.1029/1998JD200060.
Zhu, Y., V. A. Rakov, M. D. Tran, and A. Nag, 2016: A study of National Lightning Detection Network responses to natural lightning based on ground truth data acquired at LOG with emphasis on cloud discharge activity. J. Geophys. Res. Atmos., 121, 14 651–14 660, https://doi.org/10.1002/2016JD025574.
Zhu, Y., W. Lyu, J. Cramer, V. Rakov, P. Bitzer, and Z. Ding, 2020: Analysis of location errors of the U.S. National Lightning Detection Network using lightning strikes to towers. J. Geophys. Res., 125, e2020JD032530, https://doi.org/10.1029/2020JD032530.