1. Introduction
The Geostationary Lightning Mapper (GLM; Goodman 2013) on board the Geostationary Operational Environmental Satellite (GOES) R-Series has provided unprecedented observations of total lightning from space since becoming operational in 2017. GLM is the first optical sensor to observe lightning from geostationary orbit (Rudlosky et al. 2019). The unparalleled observations since 2017 have allowed researchers and forecasters to identify deep convection within the tropics and over the open ocean where observations of deep convection are otherwise limited. The linkage between deep convection and lightning motivated case studies of GLM observations in tropical cyclones (TCs) by Duran et al. (2021) and Fierro et al. (2018) showed the potential for GLM observations to be used for forecasting TC intensity change. However, a statistical analysis of GLM observations within multiple TC seasons has not yet been completed. It took a finite amount of time to develop and implement improved processing measures to mitigate artifacts in the GLM data causing inconsistencies to arise in the GLM data over time.
The algorithm development for the processing of raw GLM data has continuously improved with more recent builds nearly eliminating many of these artifacts; however, older GLM observations have not been quality controlled with a consistent algorithm leading to variations in the quality of the data since 2017. Observations from the GLM are known to have several artifacts within the dataset that can complicate the interpretation of lightning within TCs (Rudlosky et al. 2019; Rudlosky and Virts 2021). The presence of spurious lightning, sun glint, blooming events, and “bar” artifacts along subarray boundaries has all been identified. Spurious lightning includes small and short duration events where lightning is “observed” despite the absence of any clouds, while sun glint includes events where the reflectance of solar energy on the surface is mislabeled as lightning. Bar artifacts occur along subarray boundaries where the sensor is more sensitive to changes in light. Blooming events occur when the photoelectric charge in a pixel becomes saturated and spills over to adjacent cells leading to large area false alarms. A consistent dataset of GLM observations is needed to improve user confidence of GLM observations for TC applications which has motivated this work to create new methods to quality control GLM observations.
GLM observations over the tropical oceans are especially difficult to quality control because of ground-based lightning network limitations. Over the contiguous United States, GLM observations can be compared to Lightning Mapping Arrays (LMAs) which provide high-fidelity 3D observations of lightning using very high frequency (VHF) emissions produced during electrical breakdown (Chmielewski and Bruning 2016). A study by Rutledge et al. (2020) found that GLM observations have a high LMA-relative detection efficiency (DE), although a lower DE for GLM was found at larger viewing angles and where flash heights are lower. In these cases, lightning can be obscured by high concentrations of cloud water and cloud ice above the lightning. The DE of LMAs drops off rapidly away from the VHF sensors, so long-range ground-based lightning networks offer a better comparison with GLM observations over the ocean. Ringhausen and Bitzer (2021) compared the DE of GLM and the Earth Networks Total Lightning Network (ENTLN) for lightning within Hurricane Harvey (2017) finding that each system identified 60%–70% of the flashes in the other system. However, since Hurricane Harvey was examined while in the Gulf of Mexico and close to sensors, the DE comparisons by Ringhausen and Bitzer (2021) offer a best case scenario for the usage of ENTLN. Another long-range lightning network, the World Wide Lightning Location Network (WWLLN; Lay et al. 2004; Rodger et al. 2006), also is challenging to use as a verification baseline because it mostly detects the cloud-to-ground lightning flash component of total lightning. A comparison between GLM and WWLLN in Hurricane Maria by Fierro et al. (2018) showed several instances where the amount of GLM lightning was more than two orders of magnitude larger than that of WWLLN.
Other satellite observations could be used to evaluate the GLM such as from the lightning imaging sensor (LIS) which has flown on the International Space Station (ISS) from 2017 to 2023 (Blakeslee and Koshak 2016). LIS detects lightning using the same technique as the GLM by detecting optical changes in light and has similar biases such as a lower DE during the day and a higher DE at night (Goodman 2013). Since the ISS is in low-earth orbit, the LIS is only able to capture lightning in 1–2-min periods within a region during an overpass. The lack of temporal consistency of LIS from a small number of matched samples makes comparisons with GLM difficult when evaluating the GLM artifacts. The sample size of matching LIS and GLM observations over TCs is even sparser, which prevents the use of LIS for GLM verification and artifact identification.
In this manuscript, we will explore two quality control (QC) methods to aid the community in using GLM observations for TC applications. A simple QC method using thresholding techniques will first be explored by taking advantage of the relationship between lightning area and energy. Artificial intelligence (AI) techniques will also be leveraged to QC the GLM observations. In particular, we seek to evaluate whether convolutional neural networks (CNNs) can offer improved quality control performance. AI techniques could be used to mimic the process of how trained meteorologists looking at gridded GLM imagery can see features that are suspicious when they do not resemble the spatial patterns of lightning. This ability of CNNs offers the possibility of improving on quality control that is based on values in individual pixels. An analysis and comparison of both techniques will be conducted for Hurricane Florence (2018).
To facilitate the goal of developing QC methods that can be applied to GLM for use on TCs, the manuscript is organized as follows. Section 2 will detail the data used in this study. Section 3 will discuss the threshold techniques and show how the lightning group area and group energy are used to develop quality control thresholds. Section 4 will discuss methods and use of a CNN to detect GLM anomalies to aid in the QC of lightning in TCs. Section 5 will summarize the QC methods and discuss limitations of the work.
2. Data
a. GLM
The GLM is an optical transient detector that is able to detect lightning by subtracting the observed brightness from the background brightness level. Because GLM is an optical sensor, there is a diurnal variation in the DE where better performance is expected at night when the background brightness is low. Goodman (2013) estimated that the DE for GLM would be around 90% at night compared to 70% during the day. Bateman et al. (2021) found an average DE of 97% over the field of view of GLM on GOES-16 using a time window of ±30 s, with 98% of pixels having at least 70% DE at night. GLM’s field of view extends from 57°S to 57°N and from 135° to 15°W from the GOES-East position (75°W). GLM has a resolution of ∼8 km at nadir that extends to almost 14 km near the limb. GLM L2 data are corrected for parallax owing to the curvature of the Earth’s surface with an assumed cloud-top height (Virts and Koshak 2020; Bruning et al. 2019).
While many studies have utilized lightning flashes (e.g., Duran et al. 2021), this study will focus on lightning groups. GLM observations are grouped into three classes: events, groups, and flashes. Lightning events are the smallest component of the observed lightning and make up the basic unit of GLM data on a pixel (Goodman 2013). A collection of GLM events within a 2-ms frame that fill a contiguous region in the pixel array is aggregated into a single lightning group (Peterson 2019). Groups capture the optical signals from one or more physical lightning processes such as a flash or cloud pulse. Lightning groups can be clustered into a parent flash based on both the location and timing of the groups within specific thresholds (Peterson et al. 2022). The Lightning Cluster Filter Algorithm (LCFA) is used operationally to combine the events, groups, and flashes and is responsible for QCing the data within a set time window (Goodman 2013). There have been several LCFA builds since GLM data began flowing in 2017 that contribute to differences in the data quality, but the performance has continuously improved over time. A single dataset processed using a consistent LCFA build has not yet been completed but is in development (D. Mach 2024, personal communication).
GLM is able to capture lightning properties in ways that ground-based lightning detection networks cannot. Unlike WWLLN that captures mostly cloud-to-ground flashes, GLM captures both intracloud and cloud-to-ground lightning. Ground-based lightning detection networks also struggle with varying spatial detection efficiency of lightning, particularly over the open-ocean and away from the sensors, whereas GLM has smaller spatial DE variability except near the limb (Goodman 2013), and for GOES-16 GLM toward the northwest continental United States (Bateman et al. 2021). In addition, GLM is able to characterize lightning properties such as the lightning area and optical energy that can be used to better characterize the processes producing the lightning that ground-based networks cannot estimate. Bruning and MacGorman (2013) and Duran et al. (2021) have shown that small and intense convective updrafts can be indicated by frequent small area and low energy lightning flashes while broad weaker updrafts typically found in stratiform regions are indicated by a small number of large area and energetic flashes. A consistent long-term dataset of GLM observations is needed to fully take advantage of the benefits that GLM provides for understanding TCs.
This study will focus on GLM observations from 2017 to 2021 including the checkout period prior to the satellites becoming operational. Since TCs do not regularly occur in the South Atlantic or southeast Pacific, the analysis domain will be restricted to 25°–125°W and 0°–45°N.
b. Additional datasets
To supplement the GLM observations, the Advanced Baseline Imager (ABI) on GOES-16 will also be used. ABI has multiple bands in the visible and infrared that can be used to identify the presence of clouds and estimate their heights. ABI imagery is full disk and has finer resolution than GLM with ∼2-km resolution near nadir. The ABI observations are resampled with the GLM to a 0.1° × 0.1° grid in two ways, by taking the average and the minimum brightness temperature within the grid cells.
The WWLLN has been frequently used to examine lightning in TCs (e.g., Corbosiero and Molinari 2003; DeMaria et al. 2012; Stevenson et al. 2014; Fierro et al. 2018) and is used as a reference when evaluating GLM. Slocum et al. (2023) found that using either GLM flash density or WWLLN flash density provided similar results when incorporated into the Statistical Hurricane Intensity Prediction Scheme (SHIPS) Rapid Intensification Index (RII). WWLLN detects lightning using very low frequency radio waves emitted during a lightning pulse. Although WWLLN does not provide lightning groups, WWLLN flashes are global and have high temporal accuracy. Despite the limitations of WWLLN including both spatial and temporal variability in DE, WWLLN flashes will be used as a reference for GLM lightning groups and will be used by the CNN. A detailed comparison of GLM and WWLLN will not be completed at this time; however, the QC techniques developed here will be employed in a future comparison study. A brief comparison between flash rates from GLM and WWLLN by Fierro et al. (2018) found many times where hourly flash-rate ratios between both systems exceeded two orders of magnitude.
To document how the QC methods impact the lightning observations within TCs, we utilize TC observations from the National Hurricane Center (NHC) best track hurricane database (HURDAT2; Landsea and Franklin 2013). HURDAT2 contains 6-hourly TC tracks, intensity, and wind radii for both the Atlantic and east Pacific basins. TC tracks and intensities are linearly interpolated to every hour to be consistent with the resolution of the gridded GLM dataset. In order for a TC to be considered in the analysis at each hour, the entirety of a 50-km ring around the TC center must be within the analysis domain of interest.
3. GLM QC using thresholds
a. Methods
Two rounds of QC thresholds will be applied to limit artifacts such as spurious lightning, the Bahama Bar and other artifacts along subarray boundaries, blooming events, and sun glint. First, we apply a QC routine to eliminate individual lightning groups that appear suspicious. The QC of individual lightning groups helps to remove spurious lightning and reduces the prominence of the Bahama Bar. Then, we develop and apply another QC filter on the gridded lightning accumulations that incorporates additional information such as ABI. This twofold QC threshold method allows for greater confidence in the lightning distribution and reduces noise in the dataset.
The QC filters work by applying thresholds to the characteristics of the observed lightning by GLM. Although no specific thresholds were found to perfectly QC the dataset, the values identified preferentially remove false alarms while minimizing the removal of true events. The removal of some true lightning flashes is acceptable for tropical cyclone applications because the spatial location of lightning relative to the storm center is thought to be more important for understanding intensity change (Stevenson et al. 2014). Because the focus of this work is on creating a consistent dataset for tropical applications without false alarms, strict thresholds are applied at the expense of true lightning to remove the artifacts. Note that the QC techniques can only remove lightning, so the impacts of lightning holes in the dataset cannot be corrected.
b. QC of individual lightning groups
The first QC routine implemented on the GLM data is to quickly gauge whether individual lightning groups are realistic. Because of data availability time constraints on GLM data, early builds of the LCFA used in real time still contain a number of artifacts. Applying a filter on the individual lightning groups is numerically expensive due to the large number of lightning groups that cover a large area. As such, the group QC method must be simple and will provide simple constraints on the analyzed lightning. We only consider Level 2 data lightning groups with a “good” GLM quality control flag and do not consider any correction to identify lightning that may have been falsely classified by the LCFA.
Lightning groups below the minimum energy or below the minimum area are removed from the dataset. Once the QC has been applied to each lightning group, we grid the lightning and calculate the lightning density. Figure 1a shows the lightning group density from 2017 to 2021 after the QC on the individual lightning groups. Although the initial QC improves the quality of the GLM observations and reduces the number of false flashes, artifacts such as blooming events around the instrument limb, the Bahama Bar, and sun glint around 10°N are still present. The areas with GLM artifacts are easier to identify by comparing the GLM gridded groups (Fig. 1a) to the WWLLN gridded flashes in Fig. 1d. Note that the scales are different between GLM and WWLLN because of the differences between lightning groups and lightning flashes, respectively. Overall, GLM detects a much larger amount of lightning due to its higher DE and the ability to capture weaker intracloud lightning.
c. QC of gridded lightning groups
After the lightning has been gridded, we can apply a more robust QC procedure that can consider the temporal and spatial variability of the lightning in coordination with other output from ABI. The GLM groups are accumulated hourly and gridded on a 0.1° × 0.1° grid within the domain of interest. Within each grid cell, the mean and variance of group area and energy are derived. For each grid cell, the minimum IR brightness temperature from ABI and the satellite zenith angle (SZA) are used to help flag additional GLM artifacts (as seen in Fig. 1b).
To help isolate blooming events, bar features, and other artifacts, cases were identified and examined in detail. On 10 April 2019, there were several artifacts found in the GLM Level 2 data. Figure 2a shows lightning accumulated over 1 h through 0400 UTC overlaid on the mean infrared brightness temperature over the 1 h from ABI. There are two areas where GLM observed lightning where it should not be. The first lightning artifact in Fig. 2a is a blooming event in the north-central United States with sharp lines and a large area covered in low-density lightning groups. The spatial uniformity of the blooming event has been partially removed by the area/energy thresholds, leaving the large and discontinuous feature. The second artifact in Fig. 2a is in the Gulf of Mexico near Louisiana where a band of lightning is observed in an area where no deep convection is present as evident by the ABI imagery.
In addition to the constraints on the maximum energy, we also apply constraints to the variance of the group area and energy. Figure 2d shows the relationship between the variance of energy and variance of area for each grid point with lightning in Fig. 2a. The variance in both area and energy did not isolate the blooming event, but there are several grid points where the variance in area was several orders of magnitude lower than the bulk of the distribution. A minimum threshold for the area and energy variance is applied where the minimum area variance is set to 1013.5 m4 and the minimum energy variance is set to 10−30 J2. It should be noted that since at least two points are required to calculate the variance, grid cells with only one lightning group are removed. Figure 2b shows the resulting lightning after the maximum energy and variance thresholds are applied. When both thresholds are applied, the blooming event and the false alarms near Louisiana are both removed from the sample.
To further illustrate the use of the maximum energy threshold and the group area and energy variance thresholds, Fig. 3 shows the same map of lightning but for the hour of lightning preceding 2000 UTC 10 April 2019. The prominent artifact at 2000 UTC is the Bahama Bar which is evident in Fig. 3a. The distribution of the mean energy and mean area shows the expected relationship with no high energy blooming events contaminating the field such as in Fig. 2c. The Bahama Bar is removed using the minimum energy variance as the lightning groups occurring in the bar artifacts are too similar compared to a normal distribution. The final gridded lightning after the QC thresholds in Fig. 3b show that the Bahama Bar is removed with true flashes near the Bahamas maintained.
We have shown that the simple QC thresholding techniques are able to remove several artifacts including the Bahama Bar, blooming events, and other areas of false flashes identified on 10 April 2019. In addition, we also constrain the lightning so that gridded cells with no clouds colder than 273 K within 1 h of the lightning are not considered. After applying the QC techniques to the entire dataset, we can aggregate the removed QC lightning and calculate the lightning density. Figure 1b shows the lightning density that the gridded QC flagged to be removed from the dataset. Consistent with the single case, the gridded QC helps remove coherent features such as the Bahama Bar from the dataset. The sun glint occurring near 10°N is also removed by the QC as well as a small fraction of lightning in other regions of the domain. Because of the restrictive maximum energy threshold applied to SZAs exceeding 50°, there are areas in the western United States and Mexico that have likely true lightning removed by the QC. The removal of some true continental lightning groups in these areas has little impact on the total lightning in TCs and will be ignored for the purposes of this study. Additional tuning would need to be undertaken to apply these QC techniques to analyze lightning over the United States or at high latitudes.
The final GLM group density after removing the gridded QC lightning is shown in Fig. 1c. Overall, the removal of the artifacts from the QC methods employed here does not alter the spatial distribution of lightning. This result is important as the QC methods have the desired effect of removing clear artifacts in the GLM data without altering conclusions. The group density can be compared with the WWLLN flash density in Fig. 1d. Although the WWLLN DE is much lower for total lightning than GLM over land and considering that there are multiple groups per flash, the spatial patterns in the networks are consistent. A region of higher lightning activity is prevalent over the Gulf Coast and along the Gulf Stream with higher lightning totals in the Atlantic collocated near islands such as Cuba and Puerto Rico.
Although the GLM QC was able to significantly reduce the artifacts present in the GLM dataset, there are still features that exist within the dataset that may limit the utility of the observations. Figure 4 shows the mean of the hourly maximum energy and maximum area in each grid cell over the multiyear sample. The GLM subarray boundaries are prevalent in the mean of the maximum energy detected by GLM groups (Fig. 4a). These local minima in group energy are not caused by the QC methods employed in this study but are prevalent features in the GLM instrument as seen in Rudlosky et al. (2019). On average, there is lower mean maximum energy near the boundaries because the sensor is more sensitive and detects lower energy lightning. This limitation in the lightning energy has been shown in several other studies although no clear solution exists (Peterson and Lay 2020). The area of the lightning is less sensitive to the subarray boundaries. The spatial pattern of lightning area from Fig. 4b is consistent with Rudlosky et al. (2019) which showed larger lightning flashes in overocean convection. The maximum group area in Fig. 4b also shows an artifact that is most pronounced in the Caribbean Sea in the form of oscillating values along curved bands. The pattern is not caused by the QC techniques implemented here, and the source of this artifact is unknown. However, the pattern has a small amplitude that does not impact the conclusions of this study.
To determine what impact the QC procedures have on the amount of lightning within TCs, Fig. 5 shows the cumulative amounts of good and gridded QC flagged lightning within 100 km of all Atlantic TC centers as a function of year. The 100-km distance has been commonly used to identify the lightning within the inner core region of TCs to relate it to intensity changes (e.g., DeMaria et al. 2012; Stevenson et al. 2016). There is a larger number of flagged lightning in 2017, which includes the GLM checkout period and initial GLM LCFA. As the LCFA improved each year, the amount of QC flagged lightning by the simple QC technique was reduced. In 2018–19, the QC flagged 10%–12% of the total amount of lightning observed by GLM within 100 km of Atlantic TCs. The amount of flagged lightning inside TCs was reduced to only 2% of the total in 2021. In 2019, there was an increase in the amount of QC removed lightning relative to 2018, which was largely from Tropical Storm Rebekah (2019), a high latitude TC that had a number of blooming events where the removal of lightning was warranted (not shown).
4. GLM QC using AI
The previous section showed that simple threshold-based methods using area and energy from GLM can go a long way toward removing the most obvious artifacts. However, a limitation of this approach is that it makes use of the information in each pixel individually. On the other hand, a trained meteorologist visually inspecting GLM imagery can identify suspicious features based on their physical knowledge of what spatial patterns of lightning look like. CNNs are able to mimic this ability by learning spatial patterns in image data. Hilburn (2023) demonstrates that CNNs learn information about gradients in images using convolutional kernels and also capture multiscale information through the use of pooling layers. This information about the spatial context in which a pixel is embedded yields additional predictive power for applications of satellite imagery. In this section, we explore the ability of CNNs to learn spatial patterns of lightning and whether these patterns can be used to identify and remove suspicious features.
a. Methods
The motivation for this section is to have TC datasets for use in training machine learning (ML) based forecasting models that have lightning observations with consistent quality. To do that, we are first training ML models to predict the expected lightning without artifacts. There are two benefits for this technique including the output of a smoothed GLM dataset without artifacts and a tool for performing anomaly detection. The AI-based QC method is trained and tested on a fine resolution TC-centric dataset, where best track fixes from HURDAT2 are used to construct image patches centered on TCs. First, GLM data are resampled onto the ABI full disk fixed grids (Harris Corporation 2016) using the GLM group centroid latitudes and longitudes to find the corresponding ABI pixel. The parallax correction in GLM L2 was not removed before resampling. The GLM information (group area and energy) is summed over all neighboring pixels within the GLM group area, converting area to radius assuming circular geometry. This is an approximation to avoid traversing the event-group-flash tree as in Bruning et al. (2019), which makes the resampling fast for real-time display.
To create the datasets for ML, GLM was accumulated over 5-min periods which are then aggregated to longer accumulation periods (e.g., 6 h). Then, image patches of 512 × 512 pixels (approximately 1000 km × 1000 km) are extracted for each 6-hourly best track fix and the GOES data nearest in time to the 0000, 0600, 1200, and 1800 UTC synoptic times. Only image patches fully falling within the GLM event grid (a subset of the ABI full disk) are used, which naturally excludes samples near the limb. For the period 2017–21, this results in a total of 5541 samples, which includes some duplicates of the same storm at the same time when it was viewed by both GOES-East and GOES-West. In total there are 1297 6-hourly samples with observations from both GOES-East and GOES-West which are predominantly eastern Pacific TCs.
A CNN was trained to translate the six GOES inputs in Table 1 (top six rows) to the bias-corrected WWLLN output (bottom row), which is WWLLN data after applying the bias correction in Eq. (3). Inputs and outputs are scaled into the range 0–1 using the values in Table 1. Note that since storm-centered image patches are used and since geographic information (latitude and longitude) is not provided to the network, it cannot simply memorize that certain artifacts, like the Bahama Bar, occur in the same locations. The network architecture shown in Fig. 7 is a U-Net (Ronneberger et al. 2015), and with only eight filters in each convolutional layer, the model is rather small with only 12 537 trainable parameters. Hyperparameter optimization is used to explore similar model architectures varying the depth of the network and varying the number of filters per convolutional layer. The configuration shown is the largest model that can be used before overfitting becomes an issue. The model was trained using samples from 2019 to 2020, which corresponded to GLM LCFA build numbers 7–8. The testing dataset is from 2017 to 2018, which includes build numbers 4–6. Build numbers 4–5 in particular, which aligned with the preoperational checkout phase of the GOES-16 satellite, showed the largest differences between GLM and WWLLN before improved parallax correction and QC was introduced.
Parameters for scaling variables into the range 0–1.
b. AI QC results
An AI-based QC technique described above is explored to remove lightning artifacts in a TC reference frame and to serve as an anomaly detection tool. A TC-centric framework is used because it allows the AI model to learn the spatial relationships between lightning and TC structure. The AI model is trained on GLM observations to try to predict WWLLN observations because the WWLLN dataset does not have blooming events, the Bahama Bar, or sun glint because it does not identify lightning optically. Although GLM groups and WWLLN flashes are different quantities, the implementation of the bias correction from Eq. (3) reduces the impact of the sensor differences. Regardless, trying to predict WWLLN observations using GLM as input is expected to provide a smoother and lower magnitude output given the previously discussed limitations of WWLLN. It should be noted that since WWLLN does not estimate the area of the lightning, only the group extent density is predicted by the AI technique. The lack of area/energy output is one of the limitations of the AI model which could be useful information to understand the convective evolution within TCs (Duran et al. 2021).
To illustrate how the AI-based QC algorithm could be used to reduce artifacts in the GLM dataset, we will discuss the case of Hurricane Florence (2018) at 0000 UTC 7 September. Hurricane Florence rapidly weakened from a peak intensity of 115 kt (1 kt ≈ 0.51 m s−1) at 0000 UTC 6 September to a 60-kt tropical storm at 0000 UTC 7 September. The end of this weakening period was marked by relative maxima in lightning outside the core while the TC was still over the ocean. The structure of Florence became asymmetric during the weakening which is evident in the IR imagery (Fig. 8a). The Bahama Bar artifact can be seen at this time in Fig. 8b which shows the GLM group extent density on the TC-centered grid and in Fig. 8e which shows the GLM group centroid density on the 0.1° × 0.1° grid. Note that group extent density tends to have larger values than group centroid density because of overlapping group areas, and so the group centroid density in Fig. 8e is not scaled by the pixel area (about a factor of 4) to make it more visually comparable to group extent density with the same colorscale. This case highlights the challenges of removing the Bahama Bar because it was within the area of cold brightness temperatures and occurred with a large amount of lightning in neighboring pixels.
The predicted GLM group density by the neural network model is shown in Fig. 8c. Overall, the model shows the ability to remove the nonphysical artifacts in the total lightning while making minimal impacts on the other realistic-looking spatial patterns below the Bahama Bar. In addition to removing the Bahama Bar, the ML model removed the small areas of lightning to the southwest of the TC where there were no clouds shown by ABI. For this case, the ML model leveraged ABI information to help remove these lightning pixels. The ML model output also smoothed the lightning group extent density and reduced the highest values compared to the original GLM observations before the QC. Note that even with the WWLLN bias correction to GLM, the network reduces the highest values too much. This smoothing and reduction of extremes are common to CNNs and likely result from use of a mean-square-error loss function in training the CNN. The use of weighted loss functions (Ebert-Uphoff et al. 2021) can address this issue. However, if we compare the predicted GLM observations to the WWLLN observed lightning over the same time period in Fig. 8d, we can determine that the ML model still provides more information on the extent of lightning compared to WWLLN. We can also compare the ML model output to the threshold QC approach shown in Fig. 8f. Both approaches are able to successfully remove the Bahama Bar, although the threshold approach does not reduce the lightning maximum found near the center of Florence. The threshold method also leaves more isolated lightning groups, both inside and outside the storm, than the ML approach. Some of these lightning pixels are along the Bahama Bar and could potentially be false flashes given that they were not detected by WWLLN. GLM lightning detections outside of cloudy areas can come from different sources. When high energy particles impact that focal plane, it can produce “radiation dots” that manifest as isolated flashes. Another source is “overshoot/undershoot” events where brightness contrast boundaries (such as clouds) make it difficult for the LCFA to separate the background from the lightning signal.
An autoencoder approach (Goodfellow et al. 2016) was also examined using the same GLM and ABI variables in Table 1. Autoencoders are an unsupervised learning approach used when truth data are lacking. Since the biases of WWLLN cause a reduction in the magnitude of the predicted group extent density, the autoencoder was trained with GLM as the validation rather than the bias-corrected WWLLN. Since the GLM dataset has improved over time, the autoencoder was trained on the best-quality GLM samples from recent LCFA builds and then tested on the samples with artifacts from early builds. However, despite hyperparameter tuning, the autoencoder approach was not able to fully remove artifacts such as the Bahama Bar (not shown). An autoencoder simply learns the patterns of variability in the data, including any artifacts in the training data. Using a supervised learning approach (training to WWLLN) gives superior results because the network is learning both the patterns of variability (in the encoder branch of the network on the left side of Fig. 7) and a mapping between GLM and WWLLN patterns (in the decoder branch on the right side of Fig. 7).
In addition to using the predicted GLM group extent density from the AI model, the model can also be used for anomaly detection to identify times when artifacts may be present. Figure 9 demonstrates the model’s usage for anomaly detection by computing the root-mean-square differences between the ML prediction and the GLM and WWLLN observations. Since the model was trained to reproduce WWLLN, the differences between WWLLN are consistently small as expected (orange line). The differences from the predicted and GLM observations are variable and can be quite large at some times (blue line). The largest differences are found in earlier samples with smaller differences in the newer LCFA builds. Inspection of the individual samples found that samples with differences exceeding 1000 groups per square kilometer per 6 hours are all examples of nonphysical artifacts with an example artifact identified for sample 1391 shown (Fig. 9 inset). This threshold was used for detecting and removing anomalous samples from the dataset and identified 31 samples.
Since the anomaly detection is based off large root-mean-square differences, the artifacts detected tend to be associated with large values of GLM group extent density. This approach therefore can detect sun glint or blooming events well, although artifacts on smaller scales such as spurious false lightning or the Bahama Bar do not get classified as anomalies. For the example shown in Fig. 8, the difference was 132 groups per square kilometer per 6 hours, much below the 1000 groups per square kilometer per 6-h threshold. A lower threshold was considered, but differences exceeding 500 groups per square kilometer per 6 hours were all found to be landfalling cases where GLM spatial patterns were realistic, but GLM observed much larger group extent densities than WWLLN. This large difference between the predicted and observed is because WWLLN has a lower DE over land compared to GLM as evident by Fig. 1d. The threshold-based quality control on the group counts can also reveal such artifacts. Other metrics can be used for anomaly detection, such as the difference in number of pixels with lightning exceeding some threshold that might provide improved sensitivity to lower intensity but more widespread artifacts. The advantage of an anomaly detection approach such as this is that it can be automated and could potentially be used in real time to flag inconsistencies in the data.
This section has shown that threshold-based and ML-based methods are both capable of removing nonphysical artifacts in GLM data. The threshold-based method has the advantages of interpretability and that it does not touch the good pixels not meeting the thresholds. The ML-based method has the advantage of providing more overall removal of suspicious features, but it also modifies pixels that do not appear to need it. To understand how the CNN is performing the quality control, an ablation study was conducted by training a version of the CNN using only the GLM group extent density and ABI brightness temperature as input, but not the GLM area or energy. This network was found to have very similar performance to the results shown in Fig. 8, despite not having the area or energy information. This suggests that the spatial patterns of lightning do indeed provide useful information content for quality control, but that this information largely overlaps the information provided by area and energy in individual pixels. Thus, while CNNs do capture unique information, simpler threshold-based methods using area and energy provide similar performance.
5. Summary and conclusions
The GLM has provided unprecedented lightning observations over the open-ocean since becoming operational in 2017. However, a number of artifacts in the Level 2 GLM data still exist because the observations have not been quality controlled by a consistent version of the LCFA. These artifacts have made processing and using GLM observations for forecast aids challenging. To facilitate the use of GLM observations in large-scale TC studies, we have developed quality control methods to remove GLM false flashes caused by the Bahama Bar, sun glint, blooming events, and spurious lightning to make the dataset more consistent across time. We have explored the use of both simple thresholding approaches using lightning characteristics and complex ML techniques. Since this work is focused on TC applications, it should be noted that trade-offs are made in the QC algorithms that emphasize the removal of artifacts while allowing the removal of some true lightning.
A simple QC of GLM lightning groups was developed using thresholding techniques that take advantage of the relationship between the group area and group energy and the variance in GLM group area and group energy. Since group area and group energy scale with each other, we employ scaled thresholds for a maximum energy that helps to remove the blooming events in the dataset. The minima in the variance of the lightning attributes are important for reducing false alarms and the extent of the Bahama Bar as the lightning within those grid cells tends to be too similar to each other compared to a natural distribution. When applied to TCs, the QC thresholds reduce the total amount of lightning within 100 km of TCs by 70% in 2017, 12% in 2019, and 2% in 2021. The reduction in flagged lightning from our simple QC techniques each year is attributed to the significant improvements in the real-time filtering techniques of the LCFA. The QC methods developed here will now be used to evaluate the lightning distribution in TCs in a subsequent study.
A CNN trained on WWLLN data was also explored as a QC technique. The CNN was able to learn the spatial patterns and relationships among lightning variables. This knowledge can be applied to remove nonphysical artifacts from the GLM dataset when the difference between the predicted and true lightning counts is anomalously large. The AI-based QC method was able to identify anomalous samples such as the Bahama Bar and increase confidence of lightning in TC-centered reference frames. However, despite the additional complexity, the CNN did not appear to provide significant benefits over the much simpler and interpretable threshold-based approach. Intermediate complexity methods, such as support vector regression (Erdmann et al. 2022), have shown promise in the related context of creating synthetic observations.
The focus of this work on applications to TCs is because there is no comparable baseline for GLM over the ocean. Because the DE of WWLLN decreases over land, the CNN trained on WWLLN cannot be applied to over land regions. Therefore, the application of these methods to CONUS should be questioned and considered with care on a case by case basis. Future work will use a blended verification dataset for GLM by merging lightning networks with higher DE over land (e.g., ENTLN) with the WWLLN.
While the QC methods presented here are capable of removing the main artifacts that are contained in the GLM dataset and reduce false alarms, there are instances where true lightning groups are removed by the QC methods. For instance, there are true lightning flashes in the Southwest United States that are removed by the strict maximum energy threshold for areas with a satellite zenith angle exceeding 50° (Fig. 1b). As such, improvements to the QC methods to reduce the number of true lightning flashes removed will be the topic of further study. The CNN also removes realistic lightning at times as it learns the biases from WWLLN and reduces the amplitude of observed lightning. The use of the AI-based QC for an anomaly detection algorithm had limitations in that large differences in the predicted and observed lightning were often found when parts of TCs were over land where the WWLLN DE is lower. The anomaly detection algorithm was also limited by the type of artifact as it was not always able to identify the Bahama Bar due to the lower amounts of total lightning.
Acknowledgments.
This work has been funded by NOAA Award NA19OAR4320073. The authors thank Wallace Hogsett and four anonymous reviewers for providing helpful feedback.
Data availability statement.
The GLM dataset for L2 data can be found at National Centers for Environmental Information (NCEI) https://doi.org/10.7289/V5KH0KK6. The software used to implement the thresholding QC is publicly available at https://gitlab.com/Btrabing/climatology_glm. The large-scale and TC-centered datasets produced by this project can be requested from the authors.
REFERENCES
Bateman, M., D. Mach, and M. Stock, 2021: Further investigation into detection efficiency and false alarm rate for the Geostationary Lightning Mappers aboard GOES-16 and GOES-17. Earth Space Sci., 8, e2020EA001237, https://doi.org/10.1029/2020EA001237.
Blakeslee, R., and W. Koshak, 2016: LIS on ISS: Expanded global coverage and enhanced applications. Earth Observer, 28, 4–14.
Bruning, E. C., and D. R. MacGorman, 2013: Theory and observations of controls on lightning flash size spectra. J. Atmos. Sci., 70, 4012–4029, https://doi.org/10.1175/JAS-D-12-0289.1.
Bruning, E. C., and Coauthors, 2019: Meteorological imagery for the Geostationary Lightning Mapper. J. Geophys. Res. Atmos., 124, 14 285–14 309, https://doi.org/10.1029/2019JD030874.
Chmielewski, V. C., and E. C. Bruning, 2016: Lightning Mapping Array flash detection performance with variable receiver thresholds. J. Geophys. Res. Atmos., 121, 8600–8614, https://doi.org/10.1002/2016JD025159.
Corbosiero, K. L., and J. Molinari, 2003: The relationship between storm motion, vertical wind shear, and convective asymmetries in tropical cyclones. J. Atmos. Sci., 60, 366–376, https://doi.org/10.1175/1520-0469(2003)060<0366:TRBSMV>2.0.CO;2.
DeMaria, M., R. T. DeMaria, J. A. Knaff, and D. Molenar, 2012: Tropical cyclone lightning and rapid intensity change. Mon. Wea. Rev., 140, 1828–1842, https://doi.org/10.1175/MWR-D-11-00236.1.
Duran, P., and Coauthors, 2021: The evolution of lightning flash density, flash size, and flash energy during Hurricane Dorian’s (2019) intensification and weakening. Geophys. Res. Lett., 48, e2020GL092067, https://doi.org/10.1029/2020GL092067.
Ebert-Uphoff, I., R. Lagerquist, K. Hilburn, Y. Lee, K. Haynes, J. Stock, C. Kumler, and J. Q. Stewart, 2021: Cira guide to custom loss functions for neural networks in environmental sciences -- Version 1. arXiv, 2106.09757v1, https://doi.org/10.48550/arXiv.2106.09757.
Erdmann, F., O. Caumont, and E. Defer, 2022: A geostationary lightning pseudo-observation generator utilizing low-frequency ground-based lightning observations. J. Atmos. Oceanic Technol., 39, 3–30, https://doi.org/10.1175/JTECH-D-20-0160.1.
Fierro, A. O., S. N. Stevenson, and R. M. Rabin, 2018: Evolution of GLM-observed total lightning in Hurricane Maria (2017) during the period of maximum intensity. Mon. Wea. Rev., 146, 1641–1666, https://doi.org/10.1175/MWR-D-18-0066.1.
Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. The MIT Press, 773 pp., http://www.deeplearningbook.org.
Goodman, S. J., 2013: The GOES-R Geostationary Lightning Mapper (GLM). Atmos. Res., 125–126, 34–49, https://doi.org/10.1016/j.atmosres.2013.01.006.
Harris Corporation, 2016: GOES-R Series Product Definition and User’s Guide (PUG) Volume 5: Level 2+ Products. NOAA NASA Tech. Rep. DCN-7035538, 726 pp., https://www.goes-r.gov/products/docs/PUG-L2+-vol5.pdf.
Hilburn, K. A., 2023: Understanding spatial context in convolutional neural networks using explainable methods: Application to interpretable gremlin. Artif. Intell. Earth Syst., 2, 220093, https://doi.org/10.1175/AIES-D-22-0093.1.
Landsea, C. W., and J. L. Franklin, 2013: Atlantic hurricane database uncertainty and presentation of a new database format. Mon. Wea. Rev., 141, 3576–3592, https://doi.org/10.1175/MWR-D-12-00254.1.
Lay, E. H., R. H. Holzworth, C. J. Rodger, J. N. Thomas, O. Pinto Jr., and R. L. Dowden, 2004: WWLLN global lightning detection system: Regional validation study in Brazil. Geophys. Res. Lett., 31, L03102, https://doi.org/10.1029/2003GL018882.
Peterson, M., 2019: Research applications for the Geostationary Lightning Mapper operational lightning flash data product. J. Geophys. Res. Atmos., 124, 10 205–10 231, https://doi.org/10.1029/2019JD031054.
Peterson, M., and E. Lay, 2020: GLM observations of the brightest lightning in the Americas. J. Geophys. Res. Atmos., 125, e2020JD033378, https://doi.org/10.1029/2020JD033378.
Peterson, M., T. E. L. Light, and D. Mach, 2022: The illumination of thunderclouds by lightning: 2. The effect of GLM instrument threshold on detection and clustering. Earth Space Sci., 9, e2021EA001943, https://doi.org/10.1029/2021EA001943.
Ringhausen, J. S., and P. M. Bitzer, 2021: An in-depth analysis of lightning trends in Hurricane Harvey using satellite and ground-based measurements. J. Geophys. Res. Atmos., 126, e2020JD032859, https://doi.org/10.1029/2020JD032859.
Rodger, C. J., S. Werner, J. B. Brundell, E. H. Lay, N. R. Thomson, R. H. Holzworth, and R. L. Dowden, 2006: Detection efficiency of the VLF World-Wide Lightning Locations Network (WWLLN): Initial case study. Ann. Geophys., 24, 3197–3214, https://doi.org/10.5194/angeo-24-3197-2006.
Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional networks for biomedical image segmentation. arXiv, 1505.04597v1, https://doi.org/10.48550/arXiv.1505.04597.
Rudlosky, S. D., and K. S. Virts, 2021: Dual Geostationary Lightning Mapper observations. Mon. Wea. Rev., 149, 979–998, https://doi.org/10.1175/MWR-D-20-0242.1.
Rudlosky, S. D., S. J. Goodman, K. S. Virts, and E. C. Bruning, 2019: Initial Geostationary Lightning Mapper observations. Geophys. Res. Lett., 46, 1097–1104, https://doi.org/10.1029/2018GL081052.
Rutledge, S. A., K. A. Hilburn, A. Clayton, B. Fuchs, and S. D. Miller, 2020: Evaluating Geostationary Lightning Mapper flash rates within intense convective storms. J. Geophys. Res. Atmos., 125, e2020JD032827, https://doi.org/10.1029/2020JD032827.
Slocum, C. J., J. A. Knaff, and S. N. Stevenson, 2023: Lightning-based tropical cyclone rapid intensification guidance. Wea. Forecasting, 38, 1209–1227, https://doi.org/10.1175/WAF-D-22-0157.1.
Stevenson, S. N., K. L. Corbosiero, and J. Molinari, 2014: The convective evolution and rapid intensification of Hurricane Earl (2010). Mon. Wea. Rev., 142, 4364–4380, https://doi.org/10.1175/MWR-D-14-00078.1.
Stevenson, S. N., K. L. Corbosiero, and S. F. Abarca, 2016: Lightning in eastern North Pacific tropical cyclones: A comparison to the North Atlantic. Mon. Wea. Rev., 144, 225–239, https://doi.org/10.1175/MWR-D-15-0276.1.
Virts, K. S., and W. J. Koshak, 2020: Mitigation of Geostationary Lightning Mapper geolocation errors. J. Atmos. Oceanic Technol., 37, 1725–1736, https://doi.org/10.1175/JTECH-D-19-0100.1.