1. Introduction
Lightning is defined as electrical discharges within the atmosphere, more particularly within and between clouds [intra- and intercloud (IC)] or between clouds and the ground (CG). Transient lightning phenomena also occur between the cloud and the upper atmosphere, e.g., sprites and jets. While cloud electrification and lightning initiation are still subject of studies, it is widely accepted that cloud ice and graupel are necessary to separate charges within clouds (e.g., Luque et al. 2020; Emersic and Saunders 2020; Lyu et al. 2019; Kolmasova et al. 2019; Takahashi et al. 2017; MacGorman and Rust 1998; Brooks et al. 1997). In particular, convection creates favorable conditions for lightning, and the updraft strength can be well correlated to the total lightning rate (e.g., Deierling and Petersen 2008). Ávila et al. (2010) found a high correlation between the occurrence of deep convection and lightning over land at a global scale. Hence, lightning is an effective tracer of deep convection.
The new generation of geostationary (GEO) satellites carry optical lightning sensors, among other instruments. The Geostationary Lightning Mapper (GLM) of the Geostationary Operational Environmental Satellite (GOES)-R series, the Lightning Mapping Imager (LMI) on board the Chinese Fengyun-4 satellites (Yang et al. 2017), and the upcoming Meteosat Third Generation Lightning Imager (MTG-LI; Dobber and Grandell 2014) will provide GEO lightning observations at a global scale. This satellite-based, large-scale, continuous observation of lightning offers new information for climate monitoring and studies. In addition, the assimilation of GEO lightning data in numerical weather prediction (NWP) can help to improve the initial state of the model. Most recent lightning data assimilation studies use gridded flash extent density (FED), for example, Allen et al. (2016) and Fierro et al. (2019).
To assimilate new observation types in NWP models it is desired to develop an assimilation scheme prior to the instrument launch and data availability. The simulation of appropriate realistic pseudo-observations precedes the development of any assimilation scheme, especially when the sensor is not yet in operation. Such synthetic observations can be derived from existing GEO sensors over other regions, that is, GLM, and ground-based lightning locating systems (LLSs). In addition, low-Earth-orbit (LEO) missions such as the Lightning Imaging Sensor (LIS) on the Tropical Rainfall Measuring Mission (TRMM) satellites (e.g., Christian et al. 1999; Cecil et al. 2005) and on board the International Space Station (ISS) (Blakeslee and Koshak 2016; Blakeslee et al. 2020) provide space-based lightning observations. One can also use ground-based networks, for example, the National Lightning Detection Network (NLDN) (e.g., Cummins and Murphy 2009), Meteorage (e.g., Schulz et al. 2016; Erdmann et al. 2020), and lightning mapping arrays (LMAs) (e.g., Rison et al. 1999; Thomas et al. 2004; Coquillat et al. 2019). While the satellite sensors detect visible light of lightning at 777.4 nm, the ground-based networks are operated at different frequencies that match electromagnetic radiation emitted by different lightning processes. NLDN and Meteorage use low-frequency (LF) sensors that are most sensitive to discharge processes such as return strokes for CG flashes. Most LF networks can distinguish CG and IC signals. The CG flash detection (with return strokes) is usually reliable, whereas the IC flash detection efficiency (DE) increases within the network and for shorter baselines given one LF sensor type (S. Pedeboy 2020 and 2021, personal communication). Global LF networks have lower DE and accuracy than national and regional LF networks (e.g., Nag et al. 2015). LMA stations sense very high-frequency (VHF) signals of lightning leader propagation and allow for three-dimensional (3D) channel mapping (e.g., Rison et al. 1999). Their drawback is the limited range. An LMA network provides coverage within a radius of typically a few hundred kilometers (e.g., Thomas et al. 2004; Koshak et al. 2004; Chmielewski and Bruning 2016; Coquillat et al. 2019).
Biron et al. (2008) resampled TRMM-LIS data on an MTG-LI-like grid to assess the potential performance of the MTG-LI with emphasis on the influence of varying minimal detectable radiant energy. However, this method relying on LEO lightning data is not suitable for producing continuous pseudo-observations in the same area for operational applications because of the poor revisiting time. Stano (2013) demonstrated a simple method to create pseudo-GLM gridded products using LMA data. The pseudo-GLM data served to train forecasters on the use of GLM data products. GLM’s Algorithm Working Group investigated a transformation function that transforms LMA sources to optical lightning observations. The technique combines TRMM-LIS flash statistics and observed LMA flashes (Bateman 2013). The same method was applied by Schultz et al. (2016) to study automated storm tracking and lightning jump algorithms using GLM pseudo-observations. Höller and Betz (2010) present a simple statistical model for transforming stroke-type data of the LF network LINET (Betz et al. 2009) to pseudo-MTG-LI optical events. The statistical relations were studied comparing LINET strokes to concurrent TRMM-LIS groups. Then they created a pixel matrix of the future MTG-LI and used TRMM-LIS statistics of radiance and event number per group to obtain pseudo-MTG-LI events. Their work aimed to propose a statistics-based method to create optical pseudo-observations of lightning from a given set of LF LINET strokes. The available satellite lightning data solely emanated from the LEO TRMM-LIS mission, and in addition the number of cases was fairly limited (705 coincident flashes).
Recent studies assessing the GLM performance have shown that the DE varies within the field of view. GLM detects almost 90% of the flashes in the southeastern United States (e.g., Marchand et al. 2019; Murphy and Said 2020). The flash DE is statistically lower in other regions like Colorado. Rutledge et al. (2020) showed that the GLM performance depends on the charge structure and the hydrometeor distribution. In particular, electrically “anomalous” storms led to degrading GLM flash DE. The GLM flash DE also depends on the size and duration of flashes. Zhang and Cummins (2020) found that small, short duration flashes are more likely not observed by GLM than larger flashes.
This paper introduces in-depth techniques and results of creating GEO lightning pseudo-observations. The GEO lightning pseudo-observation generator is developed using NLDN records in the United States and can be applied to all NLDN-like ground-based LLSs, e.g., Meteorage in France. One key part of the generator uses machine learning (ML) to relate NLDN-like observations to the extent and duration of the generated optical flashes. The generator simulates the GEO lightning pseudo-observations on the flash level including events and thus flash extent. FED grids can be derived from the generated pseudo-observations to serve as assimilation input data. This work prepares in particular the assimilation of pseudo-MTG-LI data in the Météo-France operational mesoscale numerical weather prediction system Applications de la Recherche à l’Opérationnel à Méso-Echelle (AROME) in France. As MTG-LI will produce GLM-like data, and the French Meteorage network observes lightning similarly as NLDN in the United States (Erdmann 2020, chapter II.2.4), the developed GEO lightning pseudo-observation generator can be used to simulate realistic pseudo-MTG-LI data.
The main objective of this study is the generation of a realistic GEO lightning FED field. It does not aim at reproducing correctly individual flashes, but the FED product. Therefore, the most important characteristics are the overall flash number and the flash extent. There is no direct dependency of FED on the flash duration and event number per flash, neither on flash energetics. The developed generator should provide synthetic MTG-LI FED over France for data assimilation studies (not in the scope of the present paper). The application in our study is not intended for an operational use even though the developed algorithm could be used for operational application or for training forecasters and users.
Section 2 introduces both NLDN and GLM instruments. It also describes the dataset with coincident GLM and NLDN flashes. Section 3 explains in-depth the strategy to mimic GLM data from NLDN observations. This includes a two-part GEO lightning pseudo-observation generator and different ML models to relate GLM and NLDN flash characteristics. Section 4 presents pseudo-GLM observations, their comparison to real GLM observations, and the evaluation of the two-part generator. FED from real and pseudo-GLM observations is compared for the different ML-based generators. Recommendations for suitable GEO lightning pseudo-observation generators are given.
2. Instruments and data
GLM and NLDN make use of different lightning detection and locating techniques. This section introduces important specifications of both instruments and the studied dataset. It briefly describes the developed methods to match and compare GLM and NLDN observations, and to infer flash characteristics needed for training ML models.
a. GLM
The GLM is an optical sensor on board the GOES-R series (currently GOES-16 at 75°W and GOES-17 at 137°W). This study uses the GOES-16 GLM data only. The GLM detects total lightning including IC and CG during day and night. Although it cannot directly distinguish IC from CG signals, Koshak and Solakiewicz (2015) show that some ICs and CGs can be statistically differentiated. Especially due to the difficulty of the detection of daytime lightning against bright, sunlit clouds, thresholds and filters are applied to separate the lightning optical signal from background and other light sources. Lightning is detected in a narrow (1 nm) band centered at the 777.4-nm oxygen line in the near-infrared. The wide field-of-view (FOV) image is focused on a high-speed charge coupled device (CCD) focal plane with a nearly hemispheric FOV coverage (1372 × 1300 pixels). The variable-pitch pixel CCD allows for resulting pixels of about 8 km at nadir and only 14 km at the edge of the FOV (Goodman et al. 2013). Images are produced continuously and in time frames of 2 ms.
NASA’s GLM lightning data algorithm produces level-2 data with lightning information as events, groups, and flashes. The x, y coordinates of the focal plane are transformed to latitude and longitude coordinates of an estimated cloud-top ellipsoid (with a height of 14 km at the equator and 6 km at the poles). Bruning et al. (2019) describes the effects of using this ellipsoid on GLM parallax with respect to any known ground-relative reference. GLM events are single illuminated pixels that pass the optical filters and are thus identified as lightning signals. Their location is defined as the center of the illuminated pixel. Adjacent events observed in the same 2-ms time frame are merged to form a group. Next, groups are combined into flashes. NASA’s clustering algorithm uses a weighted Euclidean distance (WED) with limits of 16.5 km in latitude and longitude direction and 330 ms in time. Two groups with a WED of less than 1 are assigned to the same flash. The WED criterion is tested for pairs of events with one event in each group (Mach 2020).
The reader is referred to Goodman et al. (2013), the GLM Product Performance Guide for Data Users (Koshak et al. 2018), and Goodman et al. (2012) for further information on GLM details. Mach (2020) analyzed the GLM algorithms recently.
b. The NLDN
The NLDN (Cummins and Murphy 2009) consists of more than 100 Vaisala, Inc., LS7002 ground sensors in the contiguous United States (CONUS). It detects LF electromagnetic signals generated by fast lightning discharges such as return strokes or by intracloud components. Due to a combination of magnetic direction finding and time-of-arrival techniques, only two sensors are needed to construct the horizontal location (latitude and longitude, no altitude) and time of a signal. NLDN locates total lightning, including CG and IC discharges. According to Vaisala (2013), up to 95% and better than 50% of all CG and IC lightning, respectively, is detected. Zhu et al. (2016) found that one-third of 153 IC pulses were detected by NLDN, and 86% were classified correctly. NLDN detected 92% of 367 return strokes, and also 92% were correctly classified as CG. The median location accuracy approaches 250 m for CG strokes in the interior of the network. Lightning can be located at long range (1500 km), but the location accuracy in the interior of the network is significantly higher than outside. NLDN measures also the peak current amplitude of the LF source. NLDN data used in this study include time (resolved at 1 ms), the location as latitude and longitude, the peak current amplitude (kA), the polarity, the type (CG or IC) of the LF source, and quality parameters, e.g., the location error ellipse axes. Although Vaisala merges strokes to flashes (within 10 km and 1 s), this study retrieves NLDN flash-level data using the algorithm developed by Erdmann et al. (2020) for Meteorage records in France. Hence, pulses/strokes are merged into a flash if they occur within both 20 km and 0.4 s. The dataset is not further separated in this work, and the term pulse/stroke is used to represent all NLDN detections on the stroke–pulse level.
c. Database for the current study
The general dataset consists of 6 months of GLM and NLDN records, from 15 March to 15 September 2018. NLDN data were provided in a region between 30° and 35°N and between 95° and 82°W. GLM data before 26 September 2018 need a time-of-flight (TOF) correction that takes into account the time lightning photons need to travel from the cloud tops (approximated at 10 km of altitude) to the GLM orbit. Our study applies a dynamical TOF correction with values ranging from 122.8 to 124.9 ms in the region of interest.
To handle the large amount of GLM data and hence to limit the data processing time, a reduction of the 6-month dataset was necessary. The complete lightning dataset is studied to identify lightning-active days (start and end at 0000 UTC), defined by the number of GLM flashes and the number of GLM events. Ten days with significant lightning activity and different storm types during both day and night are selected. Table 1 summarizes the number of GLM events and flashes as well as NLDN pulses and strokes and flashes recorded in the region during each of the 10 selected days. Table 1 also states the dominant weather situation during each of the 10 days. At least one day per month is selected to represent possible climatological differences of the lightning within the region. All further analyses use these 10 days so as to reduce the immense amount of GLM event-scale data. The resulting dataset comprises 1 133 671 GLM flashes and 1 115 675 NLDN flashes. Missing data are identified through an analysis of instrument activity during 20-s time windows equal to those of the GLM L2 data files. The amount of flashes is reduced to 1 132 051 GLM flashes and 1 115 585 NLDN flashes due to possible1 short periods of instrument inactivity. Hence, the difference in the number of observed flashes is less than 2% of the flash counts, and both instruments operated continuously during the selected days. As the effect of downtimes of an instrument can be disregarded, the following analysis uses all available data. Three among the 10 days are chosen to test the generators with uncorrelated data and to assess the variability in the results (test days). The test days (7 April, 26 May, and 31 July 2018) feature both thermally driven convection and dynamic forcing at air mass boundaries. In the following, the different weather regimes with different lightning activity are briefly described for the test days as the final FED product is in fact only analyzed for these three days.
Study dates (in 2018) with the amounts of GLM and NLDN data. The three rightmost columns indicate whether the data are used for ML-based generator building (GB) or the generator test (GT) part, the time of most lightning activity in the region (D: local daytime; N: local nighttime), and the primary forcing (trigger) for storm development and lightning.


For instance, on 7 April 2018, the weather was dominated by a major cold front that traversed the region from northwest to southeast. Temperatures dropped by about 10 K behind the front. The strong dynamic forcing caused a mesoscale convective system (MCS) with linear structure. This system produced the vast majority of flashes observed during the test period of 7 April 2018 until it left the studied region at about 1200 UTC.
The date of 26 May 2018 was characterized by relatively warm surface temperatures with slightly decreasing temperatures from west to east within the region. Moisture was induced into the region by a weak tropical depression over Cuba and later southern Florida. Convection occurred mainly in the local afternoon as a result of surface heating. Well defined cells formed and propagated slowly southward in the cyclonic flow.
Daytime temperatures widely exceeded 30°C and remained at about 25°C at night within the region on 31 July 2018. Moisture was advected into the region from the Gulf of Mexico while a dryline approached from the northwest. A multicell storm cluster formed in the convergence zone at local nighttime and propagated eastward driven by a short baroclinic wave aloft. The second peak of lightning activity results from thermal convection in the eastern portion of the region before the dry air moved in and inhibited further convection.
d. Data processing algorithms—Flash-scale data and identification of matches
NLDN and GLM observe lightning independently of each other. The comparison of the two LLSs needs, however, coincident observations. This work uses the matching algorithm introduced by Erdmann et al. (2020). Coincident observations are defined at the flash scale for flashes within 20 km and 1.0 s. The criteria are tested for any pair of events and pulses/strokes. Two parent flashes are matched if one event (pulse/stroke) meets both the spatial and the temporal criteria to any pulse/stroke (event) of the given flash. The algorithm does not analyze the flash mean position but the event and pulse/stroke locations.
GLM flash-level data are included in the GLM L2 science data and emanate from NASA’s GLM L2 clustering algorithm. Mach (2020) found recently that NASA’s GLM clustering algorithm was stable for different spatial and temporal merging criteria (mainly for storms with flash rates below about 40 flashes per minute). In the present study, the performance of NASA’s GLM L2 clustering algorithm for 1 h on 26 May was investigated. NASA’s L2 GLM clustering algorithm succeeded in merging many events and in detecting large flashes. The GLM operational algorithm still limits the maximum size of flashes because of computational restrictions. However, such cases are rare and hardly influence the data generators as statistical approaches are used for both training and testing here.
The matching of GLM and NLDN flashes (for the 10-day dataset) leads to 948 872 GLM and 971 102 NLDN flashes with match. Some flashes from one system are matched to more than one flash in the other system, and it happens more often that one GLM flash matches multiple NLDN flashes than vice versa. Considering the total number of GLM (NLDN) flashes, the relative flash DE is defined as ratio of flashes observed by both given and reference LLSs to the total number of flashes observed by the reference LLS. It yields 87.0% (of 1 115 585 NLDN flashes) and 83.8% (of 1 132 051 GLM flashes) for GLM and NLDN, respectively. Figure 1 illustrates the flash DE of both GLM and NLDN within the studied region, along with 2D density of observed flash centroids (gray isocontour). The flash DE remains consistent within the entire domain. The local minimum in the northeast is caused by a low number of observed flashes for the two 1° × 1° pixels in Fig. 1. The high flash DE of GLM agrees with the results of Marchand et al. (2019), who found the GLM DE relative to ground-based Earth Networks Total Lightning Network (ENTLN) flashes exceeding 80% for most of the southeastern CONUS. They used 35 km and 330 ms as spatial and temporal matching criteria, respectively. Murphy and Said (2020) compared among others GLM and NLDN relative DE, matching flashes within 20 km between GLM flash centroids and the first NLDN pulse/stroke per flash and 200 ms between the flash time windows between the start and end times and report similar flash DE values on the large scale in the southeastern CONUS. A new approach to the GLM flash DE and false alarm ratio (FAR) is introduced by Bateman and Mach (2020) and Bateman et al. (2021): combining several ground-based networks to provide reference data and using coarse matching criteria of 50 km and 10 min, they found flash DE of over 90% and FAR just above 5% for the GLM on GOES-16.

Relative flash detection efficiency per 1° × 1° pixel (color) for the full 10-day dataset for (a) GLM and (b) NLDN. Grayscale lines contour (as per the shades on the right-side legend) the flash number at the 0th (1 flash), 50th, 80th, and 95th percentile of the flash-number distribution per 0.25° × 0.25° pixel (only for pixels with flash activity).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

Relative flash detection efficiency per 1° × 1° pixel (color) for the full 10-day dataset for (a) GLM and (b) NLDN. Grayscale lines contour (as per the shades on the right-side legend) the flash number at the 0th (1 flash), 50th, 80th, and 95th percentile of the flash-number distribution per 0.25° × 0.25° pixel (only for pixels with flash activity).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Relative flash detection efficiency per 1° × 1° pixel (color) for the full 10-day dataset for (a) GLM and (b) NLDN. Grayscale lines contour (as per the shades on the right-side legend) the flash number at the 0th (1 flash), 50th, 80th, and 95th percentile of the flash-number distribution per 0.25° × 0.25° pixel (only for pixels with flash activity).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
3. Methods
This section defines the concepts and the strategy to generate GEO lightning pseudo-observations. The methods are designed to use NLDN data and evaluated using real GLM observations. MTG-LI will provide total lightning observations with similar data structure as GLM observations. It will also consist of events, groups, and flashes. Although MTG-LI’s spatial resolution (4.5 km at nadir versus 8.0 km at nadir) and the temporal resolution (1 vs 2 ms) will be higher than those of GLM, the methods presented here can still be applied to simulate MTG-LI observations. A comparison of ISS-LIS records over the domain of this study (United States) and the target region (France) revealed statistically similar flash characteristics (Erdmann 2020, chapter II.1-2). In addition, for both regions of interest, statistics on NLDN and Meteorage LF lightning observations relative to ISS-LIS records were consistent. The FED as explained in the following section is simulated on a 5 km × 5 km resolution grid approximating the MTG-LI grid.
a. Definition of the FED
Flash extent density is a gridded product, summing over a given time integration period, the projections of the location of flash components, e.g., events and pulses/strokes on a given regular grid mesh. FED pixels with any lightning observation are identified, while pixels with multiple observations (e.g., multiple NLDN pulses/strokes) are counted once per flash. This gives a grid of pixels with either lightning (value 1) or no lightning (value 0) for each flash. The FED product considers all flashes within a given time integration period and sums up the occurrence of flashes per pixel. Hence, the FED product can have values greater than or equal to one flash per pixel. It shows the spatial distribution of lightning activity within the given time period. For example, the propagation of convective cores can be tracked over several successive time integration periods.
The FED in this study is calculated on a regular latitude–longitude grid with an average pixel size of 5 km × 5 km. To obtain the regular latitude–longitude grid, the distance of 5 km is transformed to latitudinal and longitudinal distance as of the pixel at the center of the study region. Appendix A describes the details to transform GEO pixel grid to the regular FED grid. In the present study, FEDs are analyzed per 60 min time integration periods. The 1-h period maintains information to locate tracks of convective cores and most electrified regions while it is also long enough to capture several storms distributed within the full domain. There might be, however, multiple storms at one location during 60 min. The FED integration period can be changed as needed since our GEO lightning pseudo-observation generator simulates data at the flash level. The sum of multiple short FED periods is equal to the FED of a corresponding long period, but the computation of one long period is more efficient. Hence, this work simulates FED per hour for computational reasons. It should be mentioned, however, that other FED time integration periods are currently under investigation, and the assimilation of MTG-LI will use a shorter FED time integration period.
b. Work flow—The simulation of GEO pseudo-observations of FED
The simulation of pseudo-GLM flashes from NLDN observations is performed in two parts. First, our GEO lightning pseudo-observation generator uses the flash database with the coincident GLM and NLDN flashes and their characteristics. This part called target generator employs ML techniques. It is based on statistical relationships between the NLDN characteristics (features) and the characteristics of the concurrently observed GLM flashes (targets). The target generator is detailed in the following section. This part is conducted using different approaches, which will be explained thereafter. They include simple linear regressions as well as more sophisticated ML models. The second part of the GEO lightning pseudo-observation generator, described in the last section here, simulates pseudo-GLM events using the simulated GLM flash characteristics.
1) Simulate pseudo-GLM flash characteristics
Coincident NLDN and GLM flashes are analyzed with regard to their characteristics including the flash extent and flash duration (both GLM and NLDN) as well as the event number per flash (GLM) or pulse/stroke number (NLDN) per flash. The flash extent is a characteristic distance for the illuminated area for GLM or simply the distance between point sources for NLDN. It sums up the distance between the lowest and highest latitude [the north–south (NS) extent] and the distance between lowest and highest longitude [the west–east (WE) extent] of events or pulses/strokes of the flash. GLM flash extent relies on the pixel center position but does not include the pixel extensions. Single pixel GLM flashes and single pulse/stroke NLDN flashes have an extent of 0.0 km. Flash duration is defined as the time between the frames; therefore, a single frame features a flash duration of 0.0 s, that is, GLM flashes with all events at the same time and NLDN flashes with all pulses/strokes at the same time. The maximum and mean signal strengths, defined from the LF peak currents and radiant energies as measured by NLDN and GLM, respectively, are evaluated per flash to represent flash energetics. In addition, a CG stroke ratio is calculated for NLDN flashes dividing the number of CG strokes of the flash by the total pulse/stroke number. Previous studies (e.g., Thomas et al. 2000; Marchand et al. 2019; Erdmann et al. 2020; Murphy and Said 2020; Rutledge et al. 2020) found that characteristics of flashes observed by optical satellite LLSs depend among others on the flash altitude. Flash components identified as CG strokes propagate on average at lower altitudes than the IC components. In total, there are five GLM flash characteristics (flash duration, event number per flash, flash extent, and mean and maximum event radiant energy per flash) and six NLDN flash characteristics (flash duration, pulse/stroke number per flash, flash extent, mean and maximum LF amplitude per flash, and CG stroke ratio). Details on the distributions of the flash characteristics are provided by Erdmann (2020, chapter II.3.4).
Linear regressions between any two GLM and NLDN flash characteristics showed that GLM flash duration has Pearson correlation coefficients R above 0.64 to NLDN flash duration and the number of pulses/strokes per flash. GLM event number per flash and GLM flash extent feature R of 0.08–0.43 to the complete set of features. Mean and maximum event radiant energies per GLM flash are not correlated with any NLDN flash characteristic on the flash scale and are then not relevant for synthetic FED generation. Hence, they are excluded from the ML targets. The remaining targets are GLM flash duration, event number per flash, and flash extent.
Building the GEO lightning pseudo-observation generator requires independent generator building (GB) and generator testing (GT) data for the generator design and for the verification of the generated product, that is, the FED, respectively. The split of our dataset is illustrated in Fig. 2. The GB data consist of 7 days and the GT data consist of the remaining 3 days (test days) of the full dataset (see section 2c and Table 1). The GB includes an ML part. Here, only matched flashes are considered so as to compare feature and target values (see Fig. 2). Features (input data) of the ML are the six NLDN characteristics, and targets (output data) are GLM flash duration, event number per flash, and flash extent. Feature and target sample sizes are given as the number of matched flashes detected by GLM and NLDN, respectively, and are not equal in general (section 2d). Since training the ML models requires the same sample size for the features and targets, two (or more) flashes matched to the same flash of the other LLS are merged, and characteristics of the merged flashes are combined. The resulting ML data (dark orange in Fig. 2) consist of 672 794 flashes, each sample with six NLDN features and three GLM targets. The ML part further splits this set of ML data randomly into independent ML training and ML validation data at a ratio of 90%–10%. The ML models are thus trained with 605 515 flashes. The ML validation data serve to calculate goodness-of-fit scores for each applied ML technique. Then the different ML models are compared and the model parameters (e.g., the number of trees or the number of neural network layers, see appendix B, section a) are tuned based on the scores. The 3-day GT dataset is used to evaluate each generator as a whole including the ML and event generation parts. The test exercise exploits both observed GLM and generated, NLDN-based pseudo-GLM datasets as two independent populations.

Illustration of splitting the 10-day dataset with GLM and NLDN flashes in 7-day generator building (GB) and 3-day generator testing (GT) data. The GB data are further processed for the ML part.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

Illustration of splitting the 10-day dataset with GLM and NLDN flashes in 7-day generator building (GB) and 3-day generator testing (GT) data. The GB data are further processed for the ML part.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Illustration of splitting the 10-day dataset with GLM and NLDN flashes in 7-day generator building (GB) and 3-day generator testing (GT) data. The GB data are further processed for the ML part.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
The generator simulates one pseudo-GLM flash for each observed NLDN flash. Thereby, it is assumed that flashes detected by GLM only and detected by NLDN only compensate each other. The assumption was justified as (i) GLM and NLDN feature flash DEs on the same order, (ii) both GLM-only and NLDN-only flashes were smaller in extent and shorter in duration than the flashes with coincident observations (see also, e.g., Zhang and Cummins 2020; Erdmann et al. 2020), and (iii) most GLM-only and NLDN-only flashes were found in the same regions in proximity to convective cores where high flash rates were observed (as in Zhang and Cummins 2020). The overall GLM and NLDN flash numbers (see Table 1) vary by only a few percent. However, it can be seen that there are days and cases where NLDN detects more flashes than GLM, i.e., 7 and 14 April and other days where GLM detected more flashes than NLDN, i.e., 26 May, 3 June, and 7 August.
Here, only GLM flashes will be simulated and only if there are NLDN records. The algorithms do not distinguish potential NLDN flashes that would not be detected by GLM. In addition, there is no algorithm developed to create flashes only detected by GLM. For those two configurations, developing dedicated algorithms would require taking into account the microphysical properties of the cloud profiles, but also a model that would generate the lightning activity as realistically as possible to mimic GLM-only and NLDN-only flashes. The goal of the lightning generator is to provide synthetic LI records with a better representativeness than what has been used so far, knowing that there are some limitations in our models, to develop a new proof of concept to assimilate space-based lightning observations. Another aspect concerns the detection of optical flashes at day and night. One can consider developing a GEO pseudo-observation generator for both day and nighttime with potentially different relations between LF flash characteristics and GEO flash characteristics. However, as this paper includes a variety of methods and the first approach to use ML techniques to simulate GEO flashes, day and nighttime flashes are not separated. This also is the case for flashes over land and sea.
The aforementioned assumption means that flashes detected by NLDN only are treated similarly to those coincidently detected by both NLDN and GLM. As the number of NLDN-only flashes is significantly lower than the number of NLDN flashes with GLM match (given a GLM flash DE relative to NLDN of 87% for the full 10-day dataset), the assumption only affects about 13% of the simulated flashes. Statistics of GLM targets and FED fields inferred from the generated pseudo-GLM flashes are compared to those from all observed GLM flashes during the three days.
The comparisons of statistics of the observed and simulated targets include the distribution mean, median, minimum, and maximum. The root-mean-squared error (RMSE) between characteristics of individual (simulated and real) GLM flashes is also computed, but only for the 295 313 NLDN flashes with GLM match (representing a GLM flash DE of 86.7% for the test days). The evaluation makes an addition use of two statistical scores that are defined for the cumulative (in fact empirical) distribution functions (CDFs): the Kolmogorov–Smirnov statistic (KS; Massey 1951) and the Cramér–von Mises criterion (CvM; Anderson 1962) measure the distance between the observed and simulated CDFs of the targets. Both the KS and the CvM tests can verify the null hypothesis that two samples belong to the same population.
2) ML-based target generators relating NLDN flash characteristics to GLM flash characteristics
The previous sections explained that our GEO lightning pseudo-observation generator consists of two parts, the ML-based target generator and the simulation of GEO pseudoevents. Section a of appendix B briefly describes the different ML models used in the ML-based part of this generator. The ML-based algorithms relate NLDN flash characteristics to GLM flash characteristics in this work. Hence, all ML models are supervised models with the same training data. The models emanate from Python’s scikit-learn library (sklearn; Pedregosa et al. 2011).
This study uses seven different ML model types (details are in section a of appendix B): multivariate linear regressions (LinReg), third-degree polynomial regressions (Poly), extra-trees regressors (ETR) as a form of random forests, bagging with K-nearest neighbor regressors (BAGR KNN), multilayer perceptron neural networks (MLP), linear support vector regressors (linSVR), and histogram gradient boosting regressors (HGBR).
3) Multistep approach
Targets of a multitarget ML training can be correlated; for example, GLM event number per flash is strongly correlated to GLM flash extent with R of 0.74. To the best knowledge of the authors, models of Python’s sklearn library do not take advantage of correlations between targets. Indeed, the so-called single target (ST) approaches do not consider correlations between targets; however, such correlations can help to improve the skill of ML models and thus the prediction of the generators. Borchani et al. (2015) summarize methods to deal with multitarget regressions and take advantage of correlations between targets. Their paper compares the ST approach to multiple multitarget approaches, e.g., multitarget regressor stacking (MTRS), regression chains (RC), multioutput support vector regression, multitarget regression trees, and rule methods. Spyromitros-Xioufis et al. (2016) introduced the stacked ST (SST) and ensemble RC (ERC). These methods can be computationally complex with high memory costs (Mastelini et al. 2019). As Aguiar et al. (2019) state, choosing the most suitable approach needs previous testing and depends on the task. The methods cited here are computationally expensive.
The flowchart in Fig. 3 shows a computationally efficient multitarget approach that simplifies the SST. As a starting point, there are NLDN features and GLM targets as input for the ML training. The approach combines ST models (Fig. 3a) of three classes (colored) for the training. The application case only uses the NLDN features as first input. Therefore, a multistep approach is required. An application example is shown in Fig. 3b. More details about our approach can be found in appendix C.

Flowchart of the multistep approach illustrating the possible predictions of a given target using different combinations of features and pseudofeatures [(a) training]. (b) The application shows the example of the num ext(a) configuration (section b of appendix B).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

Flowchart of the multistep approach illustrating the possible predictions of a given target using different combinations of features and pseudofeatures [(a) training]. (b) The application shows the example of the num ext(a) configuration (section b of appendix B).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Flowchart of the multistep approach illustrating the possible predictions of a given target using different combinations of features and pseudofeatures [(a) training]. (b) The application shows the example of the num ext(a) configuration (section b of appendix B).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
In summary, the multistep approach modifies the ML input feature-set selection and thus the configuration of the corresponding generator. It is a form of multitarget regression that can take advantage of correlations between the ML targets. Section b of appendix B summarizes the available feature-set selections for the ML as different configurations of generators. Figure 3b shows just one example of the application that is also detailed in appendix C. Section 4 will demonstrate whether the additional GLM pseudofeatures can help to tune the pseudo-GLM simulation toward observed GLM data.
4) Applied scaling methods
The min–max scaler is an alternative standardization method that is more robust to small standard deviations and for different feature ranges than the common standard scaler (sklearn documentation).
Some generators perform well with unscaled data (i.e., direct input of data with physical units) used as a reference input method during the ML part. All results presented in this paper are rescaled to physical units.
5) Generate pseudo-GLM events
The studied domain is separated into regular adjustable size latitude–longitude pixels that represent the pseudo-GLM pixel matrix. Any given latitude–longitude position is projected on that pixel matrix to determine the corresponding pixel and thus the shape of one pseudo-GLM event. Using a regular grid simplifies and speeds up the simulation of pseudo-GLM events significantly. Each regularly shaped pseudo-GLM event covers an area equal to the average size of the observed, irregularly shaped GLM events in the region of interest. Analyzing simulations built on this regular pseudo-GLM grid should lead to statistically similar results as for the irregular grid of the GLM observations.
The target generator of the GEO lightning pseudo-observation generator simulates the targets based on the given NLDN flash characteristics. These pseudo-GLM targets provide the information to derive individual pseudo-GLM events. As the target generator may produce targets with values smaller than the observed (and physical) limits, the targets are adjusted to account for the known thresholds. For instance, negative flash extent or negative flash duration is set to zero, and there are at least two pseudo-GLM events per flash in accordance with NASA GLM data processing (Mach 2020). Pseudo-GLM flash NS and WE extents are calculated based on the simulated pseudo-GLM flash extent applying the same ratio as the NS and WE extents of the corresponding NLDN flash. If the NLDN flash contains a single pulse or stroke, the NS-to-WE ratio is set to 1.
First, the locations of pseudo-GLM events are generated. Using the simulated pseudo-GLM flash extent and its NS and WE components, a rectangular subdomain on the pseudo-GLM pixel matrix is defined. The center of this subdomain houses the NLDN flash position centroid, and the corresponding pixel constitutes the first event of the pseudo-GLM flash. Any pixel within the subdomain may also become a pseudo-GLM event of this pseudo-GLM flash. Three constraints have been designed to generate subsequent pseudo-GLM events: (i) each event of the flash has at least one adjacent or diagonal neighbor within one flash, thus avoiding spatial gaps; (ii) pixels are primarily selected starting at the first event and propagating (meaning increasing distance to the first event) toward the subdomain border to approximate the simulated flash extent; and (iii) additional pixels can be selected randomly within the rectangular area until the simulated event number is reached. In consequence one single pixel of the subdomain can contain more than one pseudo-GLM event. Since pixels of the subdomain are not guaranteed to contain a pseudo-GLM event, this random selection also affects the final FED product.
Then the pseudo-GLM events get time stamped. In the present study, the matching of GLM and NLDN flashes revealed that the median time offset between the mean time of a given NLDN flash and the mean time of the matched GLM flash was about 8 ms. The NLDN and GLM average flash duration were 0.24 and 0.39 s, respectively. Hence, the mean time of matched NLDN and GLM flashes are relatively close while GLM flashes last on average longer than NLDN flashes. As a consequence, the mean time of the NLDN flash defines the mean time of the pseudo-GLM flash that is also the time stamp of the first pseudo-GLM event. Our generator is built to generate realistic FED fields. Only the spatial distribution of the events is needed to infer FED. Hence, the temporal occurrences of pseudoevents are uniformly and arbitrary distributed during the duration of one flash. Pseudoevent times are then rounded to the time frames of the mimicked GEO LLS, that is, to 2-ms frames for pseudo-GLM data. The only constraint is that any adjacent pixel occurs within 330 ms (i.e., the time criterion to separated flashes in NASA’s GLM L2 algorithm). One 2-ms frame contains often several pseudo-GLM events.
4. Results
Figure 4 shows the example of one simulated pseudo-GLM flash created with the final GEO lightning pseudo-observation generator based on a linSVR model, the corresponding GLM and NLDN observations, and the observed and simulated GLM flash characteristics. One can see the difference between the real GLM grid and the regular pseudo-GLM grid of the simulation (Fig. 4c). The difference between observed and simulated flash extent is within the size of one GLM pixel for this example. The simulated flash duration exceeds the observed flash duration significantly. There is also an overestimation of the number of GLM events by the generator.

An example of one simulated flash with corresponding GLM and NLDN observations on 26 May 2018. The final GEO lightning pseudo-observation generator is used including a linear SVR model, i.e., linSVR num ext(a) plus. Shown are time series of (a) latitudes, (b) longitudes and (c) a map. The map includes characteristics of the observed and simulated GLM flash. The time interval shown matches the simulated flash duration of 640 ms.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

An example of one simulated flash with corresponding GLM and NLDN observations on 26 May 2018. The final GEO lightning pseudo-observation generator is used including a linear SVR model, i.e., linSVR num ext(a) plus. Shown are time series of (a) latitudes, (b) longitudes and (c) a map. The map includes characteristics of the observed and simulated GLM flash. The time interval shown matches the simulated flash duration of 640 ms.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
An example of one simulated flash with corresponding GLM and NLDN observations on 26 May 2018. The final GEO lightning pseudo-observation generator is used including a linear SVR model, i.e., linSVR num ext(a) plus. Shown are time series of (a) latitudes, (b) longitudes and (c) a map. The map includes characteristics of the observed and simulated GLM flash. The time interval shown matches the simulated flash duration of 640 ms.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Results are obtained from the 3-day test dataset. It contains 340 712 NLDN flashes that are used to simulate the same number of pseudo-GLM flashes. Statistics of the pseudo-GLM flashes are compared to the statistics of all 338 579 observed GLM flashes. First, the distributions of the simulated and observed GLM flash extent, flash duration, and event number per flash are compared. The best target generators are used to simulate pseudo-GLM events and eventually compute the pseudo-FED product. The FED is analyzed statistically for both observed and simulated GLM data of the three test days. The minimum discrepancy between observation and simulation will indicate the most suitable target generator configuration for the final GEO lightning pseudo-observation generator.
a. Evaluating the target generators—Distributions of GLM flash extent, flash duration, and event number per flash
In a general sense, a wide range of values is observed for all target distributions. The GLM flash duration ranged from 0.0 to 16.4 s. Observed GLM flashes comprised between 2 and 1395 events. The test data feature GLM flash extent between 0 and 277 km. The target generators should handle these ranges of values and predict target statistics similar to the statistics of observed GLM flashes.
Table 2 summarizes the findings, with statistics, the KS, and the CvM of the distributions of observed and simulated GLM flash duration, event number per flash, and flash extent for the full 3-day test data. The table contains distribution statistics for the respective target generator with smallest difference between observed and simulated characteristics over the test period. Results for the linSVR num ext(a) plus generator are shown as reference. Statistics of the simulated pseudo-GLM and the observed distributions are referred to as simulated statistics and observed statistics, respectively. This analysis was also conducted for each test day. The results are presented in section a of appendix D.
Comparison of distribution statistics for observed GLM data and the best generator for each target during the full test period. The recommended linSVR-based generator is shown in boldface type. Details about the target generator names are provided in section b of appendix B.


The majority of the target generators features mean values similar to the observed means for all three target characteristics. The simulated medians, however, exceed the observed medians in most cases, especially for the number of events per flash, suggesting a tendency to overestimate the target values. The previously described behavior is true for all but the linSVR-based generators. The linSVR filters the dataset in advance to build the prediction on the support vectors [section a(6) of appendix B]. That results (in this study) in lower differences between the simulated and observed median values as compared to using the other ML model types. The mean values of linSVR-based predictions are, however, often smaller than the observed mean, especially for the event number per flash. Table 2 demonstrates this behavior of linSVR-based generators. To detail one example, the recommended linSVR-based generator (see section 4b and the boldface type in Table 2) underestimates median and mean flash extent by about 4.5% and 11.7%, respectively. The mean event number per flash is also underestimated by about 29.6%; however, the median event number per flash is overestimated by 20%. The linSVR-based generator creates, compared to the observations, not enough flashes with an event number in the tails of the distribution, i.e., close to the observed minimum and maximum event numbers. Hence, it cannot mimic the full range of the observed event numbers per flash. This linSVR-based generator still outperforms all other generators with respect to the median considering the full 3-day test data.
Some general conclusions can be drawn about the generator performances for the observed range and variability of the target values. The target generator minimum often approximates or slightly exceeds the observed minimum, whereas the maximum is underestimated in most cases. This particular behavior can even be seen for the best target generators (Table 2) because the number of small flashes with characteristics close to the minimum observed target values is relatively high. The rare, highest observed values are often underrepresented in the statistical approach. It is further found that observed GLM flash statistics can vary for a given set of the six observed NLDN features. This is the case as our six NLDN features cannot completely explain the range of target values even if the statistics derived here are significant in terms of the large sample size. The large values of the RMSE per flash in Table 2 and also appendix D, section a, result from the deterministic nature of the ML models in combination with a set of features that cannot include all physical influences on the targets, for example, cloud properties. The RMSE values of the GLM flash extent are similar to the mean values, whereas they reach twice the mean for both GLM flash duration and event number per flash. Here, the optimization of our GEO lightning pseudo-observation generator for FED that depends mostly on the flash extent is evident. A relatively wide range of target values is in particular found for small NLDN flashes with NLDN pulse/stroke number, extent, and duration near the lower end of the distributions (not shown). Large (meaning long extent, long duration, and many pulses/strokes or events) NLDN flashes usually coincide with large GLM flashes. As the NLDN features are somewhat correlated to the GLM targets, the high RMSE due to a small NLDN flash as input also leads to a high RMSE when predicting small GLM flashes.
KS and CvM assign a quantitative value to measure the distance between two samples. While KS is normalized (values of 0–1), the CvM value depends in general on the distance between simulated and observed CDFs and the sample size. As the sample size is kept constant for all generators, CvM in fact provides a common measure of the agreement between observed and simulated targets. Both KS and CvM feature lower values for the GLM flash duration than for both the GLM flash extent and the GLM event number per flash considering the full test dataset (Table 2). This result is in accordance with the strong correlation coefficients between observed GLM flash duration and NLDN features (see also section 1). KS and CvM for the flash duration rely mainly on the underestimation of long duration flashes. As an exception, the recommended linSVR num ext(a) plus generator not only underestimates the maximum flash duration but also cannot produce single-frame flashes. Therefore, KS and CvM are higher for the flash duration than for the flash extent here.2 The KS and CvM reach their highest values, that is, when comparing the three target distributions, for the GLM event number per flash, for which the weakest correlations to features were observed.
The performances of the generators for the full 3-day test data and each test day (section a of appendix D) indicate that the choice of a suitable target generator can be situational. The objective now is to find a configuration that best approximates the observed GLM flashes and target distributions. Therefore, the differences between the simulated and observed statistics (i.e., mean, median, minimum, maximum, RMSE, KS, and CvM) are calculated and normalized for each statistic. The normalization divides each absolute difference by the maximum absolute difference of all target generators for a given statistic. A value of 1 represents the worst target generator for the given statistic, while a value of 0 indicates no difference to the observation. In addition, and to summarize all the information, the so-called normalized difference average (NDA) is introduced to average the normalized absolute differences and scores for a given generator. The perfect generator would yield an NDA of zero. NDAs of the target generators can be directly compared to identify the highest performer. NDA is calculated per target and for all three targets overall.
Overall NDAs for all three targets range from 0.35 for the linSVR num ext raw generator to 0.87 for the MLP num ext(a) raw generator. The best (i.e., lowest NDA) 24 target generators all use a linSVR, and the performances of the best target generators vary only within the range of uncertainty given in section 1. For example, the difference between the first and tenth ranked target generator is only 0.04 NDA. The NDA ranking of target generators reveals a clustering explained by the ML model type, with linSVR-based generators performing the best, followed by BAGR KNN dist–based, ETR-based, and polynomial regression-based generators. MLP- and HGBR-based generators exhibit the highest NDAs.
The generators yielding the lowest NDA values are mostly those using the multistep approach. In addition, the use of all six (plus; section b of appendix B) instead of only four (default) NLDN features improved the performances of the majority of tested generators. The feature and target scaling had little effect on the generator performances, although scaling is usually recommended for ML applications. The ML model type has in fact the highest impact on the simulation of pseudo-GLM flashes and thus on the target generator performances.
Figure 5 visualizes the statistics of all tested target generators for the flash extent as the most impactful characteristic on FED. It groups the results for each statistic by ML model type. Seven ML model types were used to build the generators (section 2 and appendix Table B1 except RF). Each distribution contains the results of 28 generators using this ML model type including seven feature-set selections and two optional attributes (Table B2 of section b of appendix B). Figure 5 shows these results as normalized differences and scores for the three test days combined. It reveals that the boxplot minima for the linSVR type generators are the closest to zero for most statistics. BAGR KNN dist–based generators feature the second-lowest values of KS and CvM. The finding is supported by results for each test day (section a of appendix D) showing best performances for the targets by BAGR KNN dist–based generators on 7 April 2018 and by linSVR-based generators on 26 May and 31 July 2018. Some boxplots exhibit a wide range of outcomes. The range shows that all ML model types are sensitive to the configuration. The NDA of the best generator, i.e., linSVR num ext(a2), is equal to 0.28. The associated outcomes for flash duration and event number per flash statistics (section b of appendix D) confirm linSVR-based generators as most suitable to simulate GLM targets for the entire test period. Hence, results for the individual targets agree with the overall NDA analysis.

Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM flash extent (0 means equal to observation; 1 represents the worst simulation). The boxplots represent the distributions of 28 target generator results per ML type (x axis) including the interquartile range (IQR; blue box), 1.5 times the IQR (whiskers), and outliers (black cross). The horizontal green line gives the median. Results are for the full test dataset. The abbreviations for ML type are in appendix table Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM flash extent (0 means equal to observation; 1 represents the worst simulation). The boxplots represent the distributions of 28 target generator results per ML type (x axis) including the interquartile range (IQR; blue box), 1.5 times the IQR (whiskers), and outliers (black cross). The horizontal green line gives the median. Results are for the full test dataset. The abbreviations for ML type are in appendix table Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM flash extent (0 means equal to observation; 1 represents the worst simulation). The boxplots represent the distributions of 28 target generator results per ML type (x axis) including the interquartile range (IQR; blue box), 1.5 times the IQR (whiskers), and outliers (black cross). The horizontal green line gives the median. Results are for the full test dataset. The abbreviations for ML type are in appendix table Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Confidence in the results
The confidence in the outcomes is evaluated for the two parts of the GEO lightning pseudo-observation generator. The uncertainty of the outcomes is expressed as the range of outcomes given the same configuration. First, (only) three selected generators with constant configuration are trained 10 times using the same full training dataset (for computational efficiency). Herewith, the training variability of the ML model is assessed. The selected generators are BAGR KNN dist num plus, MLP alpha8 num raw, and linSVR num ext(a) plus. They are labeled by their ML model type as BAGR KNN dist, MLP, and linSVR, respectively. Figure 6 shows the distributions (boxplots) of targets for the full test data for the three generators (x axis) each trained 10 times for pseudo-GLM flash duration (Fig. 6a), pseudo-GLM number of event per flash (Fig. 6b), and pseudo-GLM flash extent (Fig. 6c), respectively. The predicted target range of the 10 trained generators is smaller than the variability due to different ML model types and due to different configurations of one ML type. The 10 BAGR KNN dist–based simulations feature a very narrow range of outcomes for all statistics. The 10 trainings of both the presented linSVR and the presented MLP yield a range of values from 0.2 to 0.4 normalized absolute difference for most statistics. The range of the minimum event number per flash (Fig. 6b) and the minimum flash extent (Fig. 6c) reaches about 0.5 and up to 0.7 for the linSVR and MLP-based generators, respectively. Here, the uncertainty from retraining these two generators becomes as high as the variability seen for the different ML model types (reference to Figs. D2 and 5). The range of normalized absolute difference for the maximum event number predicted based on 10 equally configured linSVR models is also about 0.6. In addition, the range of normalized absolute differences is always wider for the mean than for the median. Despite a relatively high uncertainty in some statistics, the overall trends as described in the previous section remain valid. Statistics sensitive to distribution outliers, that is, the mean and minimum, exhibit higher uncertainties than more robust statistics, that is, the median, KS, and CvM. Some target generators, that is, the BAGR KNN dist–based one, appear to provide very robust predictions. The uncertainty range is usually smaller than the overall range of values for each statistic.

Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM (a) flash duration, (b) event number per flash, and (c) flash extent; y range (0–1) as in appendix Fig. D1 [for (a)], appendix Fig. D2 [for (b)], and Fig. 5 [for (c)]. Boxplots (as in Fig. 5) represent the distribution for training the same model (x axis) 10 times during the first step of the simulation. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM (a) flash duration, (b) event number per flash, and (c) flash extent; y range (0–1) as in appendix Fig. D1 [for (a)], appendix Fig. D2 [for (b)], and Fig. 5 [for (c)]. Boxplots (as in Fig. 5) represent the distribution for training the same model (x axis) 10 times during the first step of the simulation. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM (a) flash duration, (b) event number per flash, and (c) flash extent; y range (0–1) as in appendix Fig. D1 [for (a)], appendix Fig. D2 [for (b)], and Fig. 5 [for (c)]. Boxplots (as in Fig. 5) represent the distribution for training the same model (x axis) 10 times during the first step of the simulation. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
The test of the variability in the results enforced by the second part of the GEO lightning pseudo-observation generator, i.e., generating pseudo-GLM events (not shown) is much smaller than for the ML part. Hence, the overall range of targets for a given generator configuration is similar to those shown in Fig. 6.
b. Evaluating observed and simulated FED
Hourly FED maps are calculated for both GLM observed and simulated flashes. They will be referred to as observed and simulated FED, respectively, in the following. The evaluation includes the hourly FED summed-up over the domain (termed FED sum), the electrified areas defined as pixels with positive FED (i.e., greater than 0 flashes per 5 km × 5 km pixel per hour), and a visual inspection of convective cores. As the choice of the ML model type has the highest impact on the overall performance of the GEO lightning pseudo-observation generator, the results are mainly discussed with respect to the ML model types.
Figure 7 presents the observed FED (Fig. 7a) to the simulated FED of three selected generator configurations (Figs. 7b–d) for the example of 2000–2100 UTC 26 May 2018. The three generator configurations represent a selection of the variety of tested generators with different ML model types, feature-set selections, and scaling, as detailed below (see also appendix B). Simulated FED fields capture the coarse geographical distribution of the observed FED. One can identify the most active regions (highest FED values), which are situated at similar locations for the observed and simulated FEDs. The numbers in the top corners of Figs. 7a–d indicate the number of lightning pixels with FED > 0 flashes per 5 km × 5 km pixel per hour on the left and the FED sum on the right. The product of the number of lightning pixels and the area per pixel yields the electrified area. The linSVR [num ext(a) plus; appendix Table B2] in Fig. 7b uses GLM duration as additional feature when simulating GLM number, and then GLM duration and GLM event number to simulate GLM extent. This linSVR-based generator performs among the best for the simulation of GLM targets overall, and it appears to be among the best also for the FED sum. It underestimated the electrified area in most cases (as in the example in Figs. 7a,b). The MLP-based simulation (num raw; appendix Table B2) of the FED of Fig. 7c uses unscaled features and targets. GLM flash extent and flash duration relate only to four NLDN features (without mean amplitude and CG stroke ratio). The attribute num means that pseudo-GLM flash duration and flash extent are obtained directly from ST approaches. Those simulated targets serve as pseudofeatures to derive the pseudo-GLM event number. This MLP-based generator performs among the best for the electrified area, but overestimates GLM flash extent, GLM event number per flash, and eventually the FED sum. Figure 7d maps the FED as simulated by the BAGR KNN dist–based generator (num plus; appendix Table B2) that uses all six NLDN features and the num method (see above). It is the best performing generator using the BAGR KNN dist ML model type. Although this generator overestimates the target medians and the FED sum, it belongs to the best 25% of generators for both FED sum and electrified area. It performed best for 7 April 2018 test case with the dominant squall line that produced most of the large-extent lightning flashes. In general, all three generators overestimate the 1-h FED sum in Fig. 7. The linSVR-based generator simulates an FED sum significantly closer to the observed FED sum than using both the MLP and the BAGR KNN dist. The linSVR, however, underestimates the number of lightning pixels, which is best simulated by the MLP-based generator here.

(a) Observed and simulated hourly FED using (b) linSVR num ext(a) plus, (c) MLP alpha8 num raw, and (d) BAGR KNN dist num plus generator for 2000–2100 UTC 26 May 2018. The FED grid uses pixels of 5 km × 5 km. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

(a) Observed and simulated hourly FED using (b) linSVR num ext(a) plus, (c) MLP alpha8 num raw, and (d) BAGR KNN dist num plus generator for 2000–2100 UTC 26 May 2018. The FED grid uses pixels of 5 km × 5 km. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
(a) Observed and simulated hourly FED using (b) linSVR num ext(a) plus, (c) MLP alpha8 num raw, and (d) BAGR KNN dist num plus generator for 2000–2100 UTC 26 May 2018. The FED grid uses pixels of 5 km × 5 km. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
The results are further investigated for the 3-day test period by comparing pixel-to-pixel simulated and observed hourly FED. Figure 8 shows the 2D histograms, computed for the entire 3-day test dataset, for the same linSVR- (Fig. 8a), MLP- (Fig. 8b), and BAGR KNN dist (Fig. 8c)-based generators as used in Fig. 7. In general, the Pearson correlation coefficients R of 0.91–0.92 indicate well correlated distributions of observed and simulated FED. Figure 8 also shows the range of simulated FED is wider than the range of observed FED (gray box). The corresponding trend to overestimate the FED in the simulation is proofed by the regression lines (light green) that feature steeper slopes than the equal-value line (black). In particular, the MLP-based (Fig. 8b) and the BAGR KNN dist–based generator (Fig. 8c) overestimate the FED usually more than the linSVR-based generator (Fig. 8a). The Y intercepts near 0 indicate good agreement for regions without lightning activity. These findings agree with the example in Fig. 7.

Pixel-to-pixel (5 km × 5 km) simulated vs observed hourly FED for the 3-day test period using the same (a) linSVR-, (b) MLP-, and (c) BAGR KNN dist–based generators as in Fig. 7. The gray box and white margins indicate the upper limits of distributions on each axis. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

Pixel-to-pixel (5 km × 5 km) simulated vs observed hourly FED for the 3-day test period using the same (a) linSVR-, (b) MLP-, and (c) BAGR KNN dist–based generators as in Fig. 7. The gray box and white margins indicate the upper limits of distributions on each axis. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Pixel-to-pixel (5 km × 5 km) simulated vs observed hourly FED for the 3-day test period using the same (a) linSVR-, (b) MLP-, and (c) BAGR KNN dist–based generators as in Fig. 7. The gray box and white margins indicate the upper limits of distributions on each axis. The abbreviations for ML type are in appendix Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
All 196 generators are evaluated for Dreal and Dabs of both FED sum and electrified area. As the ML part of the generator enforces significantly higher differences than the derivation of pseudo-GLM events (the second part), again results are mainly discussed with regard to the different ML configurations.
The Dreal and Dabs are calculated for the 3-day test period. For the FED sum, the 28 linSVR-based generators tested are ranked as best 28 configurations in the comparison, that is, lowest Dabs. Table 3 presents the results for the best 20 and worst 5 generators as ranked by Dabs of FED sum. The best GEO lightning pseudo-observation generators exhibit a Dabs of 22%–25%, while Dreal is close to zero, that is, balance between situations with over and underestimated FED sum. The worst generators (some of MLP- and ETR-based configurations) lead to almost 2 times as high FED sum as the observed values. Similar, positive values of both Dreal and Dabs for the FED sum mean that most generators overestimate the FED sum. This agrees well with Fig. 8. The exception is found for the linSVR type generators that often underestimate the FED sum with Dreal ranging from −22% to +39%. Figure 8a shows one example of a linSVR with positive Dreal.
Comparison of Dreal and Dabs in percent of observed value for the FED sum during the full test period. The best 20 and the worst 5 of the 196 generators (ranked by Dabs) are included. The recommended linSVR-based generator is shown in boldface type. Details about the target generator names are provided in section b of appendix B.


As mentioned, the best 28 generators for the FED sum are all of type linSVR. The best 10 generators use the multistep approach (num and num ext, Table 3). The use of mean LF amplitude and CG fraction (plus) as additional NLDN features has a minor effect on the simulation of FED sum.
Results for the electrified area are in general closer to the observation than the FED sum. They are shown in Table 4 for the best 20 and worst 5 generators as ranked by Dabs of the electrified area. The generators with the lowest Dabs, HGBR type, differ absolutely by about 7.5% from the observed electrified area. The vast majority of all tested target generators underestimate the electrified area (negative Dreal). Multiple generators of various types feature Dabs of less than 10%, e.g., using HGBR, Poly, BAGR KNN dist, ETR, or MLP models. The linSVR-based generators, which performed best for the FED sum, exhibit the highest differences to the observation here with Dabs from 15% to 35% (all with negative Dreal). For example, the best performer for the FED sum is ranked as third worst for the electrified area with a high underestimation of the area.
Comparison of Dreal and Dabs in percent of observed value for the electrified area during the full test period. The best 20 and the worst 5 of the 196 generators (ranked by Dabs) are included. In addition, the recommended linSVR-based generator, separating the best and worst generators, is shown in boldface type. Details about the target generator names are provided in section b of appendix B.


The best 20 generators for the electrified area take advantage of the multistep approach in 15 cases. Also 15 of those 20 ML-based generators use all NLDN features (Table 4). Comparing only the linSVR-based generators, all 10 leading generators use six rather than only four NLDN features. This result strengthens the meaning of including all NLDN features and of the multistep approach.
The computational cost of our multistep approach is still higher than the ST approach, however, only needed for the training of the generator. The application of trained multistep generators is relatively fast, i.e., similar duration as applying ST generators. The best generator without multistep approach [linSVR(a) raw] exhibits Dabs more than 14% higher than the best generator for the FED sum (Table 3). In addition, Dreal exceeds 23% indicating that the FED sum is mostly overestimated. FED sum simulation is most sensitive to the choice of the generator and, hence, particularly important to obtain realistic synthetic FED. The multistep approach helps in particular to obtain more realistic FED sum than ST-based generators. For the electrified area, however, generators not using the multistep approach can perform as well as the best generators (Table 4). If only electrified area is of interest, common ST models can be used. The multistep generator linSVR num ext(a) plus is successfully applied to simulate GLM FED (section 4) and also MTG-LI FED over France (not in the present paper).
The recommended GEO lightning pseudo-observation generator balances the simulation of all pseudo-GLM target distributions, FED sum, and electrified area. It is named linSVR num ext(a) plus generator. This configuration features an overall NDA of 0.39, and a Dabs to observed FED sum and electrified area of 24.9% and 21.3%, respectively. This generator used all available features and utilizes the multistep approach. First, GLM flash duration is predicted from all six NLDN features, and then used as additional pseudofeature to predict the event number per flash. Last, the pseudo-GLM flash extent is simulated from NLDN features and the pseudofeatures GLM flash duration and event number. Both features and targets are scaled [section 3b(4)]. The linSVR ML technique is more time-efficient than the MLP and bagging-based, e.g., BAGR KNN dist and ETR, techniques for the training and also needs less disk space to be stored. These are two other advantages of the linSVR num ext(a) plus generator.
Figure 9 presents hourly FED sum (Fig. 9a) and electrified area (Fig. 9b) with the overall value (top) and the difference to the observation (bottom) for 31 July 2018 test case. The observed FED and results for the 10 generators with lowest Dabs are plotted. Figure 9a includes in addition results of the best generator for electrified area (lime), and Fig. 9b the results of the best generator for FED sum (orange). The figure also shows the number of hourly simulated pseudo-GLM flashes (histogram). Similar figures for the other two test days are also evaluated but not shown here because identical conclusions are drawn. The absolute values (Fig. 9, top) show that the FED sum (Fig. 9a) reacts directly to the number of (simulated) flashes. The electrified area curves (Fig. 9b) appear to have a time offset relative to changes in the flash number, suggesting that within 1 h a lower number of relatively large flashes can electrify a similar area as a higher number of smaller flashes. An increasing (decreasing) flash rate during the development (decay) of convective storms does not automatically mean a larger (smaller) electrified area, since even less flashes can still illuminate a large portion of the cloud via scattering. The simulated FED adapts this behavior very well. In particular, the simulated FED features similar hours with highest FED and electrified area as the observed FED.

(a) Hourly sum of FED and (b) hourly electrified area within the region of interest. For (a) and (b), the top plot (label 1) is absolute values and number of simulated flashes per hour and the bottom plot (label 2) is difference of simulation minus observation. The observation is plotted in blue, and the remaining colors represent the 10 best generators for FED sum [in (a)] and electrified area [in (b)]. The best generator of FED sum in (a) is also included in (b) (orange), and the best generator from (b) is included in (a) (lime). Results are for 31 Jul 2018. Details about the generator names are provided in section b of appendix B.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

(a) Hourly sum of FED and (b) hourly electrified area within the region of interest. For (a) and (b), the top plot (label 1) is absolute values and number of simulated flashes per hour and the bottom plot (label 2) is difference of simulation minus observation. The observation is plotted in blue, and the remaining colors represent the 10 best generators for FED sum [in (a)] and electrified area [in (b)]. The best generator of FED sum in (a) is also included in (b) (orange), and the best generator from (b) is included in (a) (lime). Results are for 31 Jul 2018. Details about the generator names are provided in section b of appendix B.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
(a) Hourly sum of FED and (b) hourly electrified area within the region of interest. For (a) and (b), the top plot (label 1) is absolute values and number of simulated flashes per hour and the bottom plot (label 2) is difference of simulation minus observation. The observation is plotted in blue, and the remaining colors represent the 10 best generators for FED sum [in (a)] and electrified area [in (b)]. The best generator of FED sum in (a) is also included in (b) (orange), and the best generator from (b) is included in (a) (lime). Results are for 31 Jul 2018. Details about the generator names are provided in section b of appendix B.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
It is observed that the simulated FED sum usually exceeds the corresponding observation during the phases of highest flash amounts within the region (Fig. 9a). This could mean that NLDN detects significantly more flashes than GLM during these times, and thus the number of simulated flashes is significantly higher than the number of observed GLM flashes. These findings agree with Zhang and Cummins (2020), who found that the GLM DE decreases for high flash rates and with shorter extent and duration flashes, which are observed during the mature phase of a thunderstorm.
Note that the absolute values (Figs. 9a,b, top) and difference to the observation (Figs. 9a,b, bottom) for the FED sum (Fig. 9a) have the same order of magnitude. In contrast, the difference (Fig. 9b, bottom) is one order of magnitude smaller than the absolute values (Fig. 9b, bottom) for the electrified area. Hence, the difference to observed FED and also the spread between generators with different configurations are much greater for the FED sum than for the electrified area. Therefore, it is decided to put more weights on the ranking of the FED sum than on the ranking of generators by electrified area when choosing the recommended generator. Eventually, the linSVR-based generator returns as the recommendation in an overall evaluation context. If, however, for a certain objective the electrified area is most important, several HGBR-, MLP-, or even ETR-based generators perform better than the recommended linSVR-based generator.
In a Monte Carlo approach, FEDs for 10 of in total 100 realizations of the recommended linSVR generator are calculated for the three test days. Figure 10 illustrate the median (line) and range (shaded) of FED sum and electrified area on 31 July 2021. The variability of both the FED sum and the electrified area has the same order of magnitude as the difference between the leading generators (Fig. 9). Figure 10 also confirms that the linSVR-based generator tends to underestimate the electrified area. The vast majority of the time, all 10 realizations simulate lower electrified area than the GLM observations indicate. However, all 10 realizations remain relatively close to the observed FED sum at most times (except for the cases with intense convection, as discussed earlier). Note that this linSVR-based generator does not appear among the best 10 generators for the electrified area (Fig. 9b).

As in Fig. 9, but with 10 repetitions of the recommended linSVR num ext(a) plus generator. Median (line) and range (shaded) of 10 generator repetitions are for 31 Jul 2018.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

As in Fig. 9, but with 10 repetitions of the recommended linSVR num ext(a) plus generator. Median (line) and range (shaded) of 10 generator repetitions are for 31 Jul 2018.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
As in Fig. 9, but with 10 repetitions of the recommended linSVR num ext(a) plus generator. Median (line) and range (shaded) of 10 generator repetitions are for 31 Jul 2018.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
5. Summary
This study analyzed in detail the simulation of GEO lightning pseudo-observations in two parts: First pseudo-GLM flash characteristics are simulated and then pseudo-GLM events are derived. The data generator uses only LF ground-based data. There is no additional cloud information used in the generator. The entire process is nontrivial because relations (correlations) between characteristics of coincident LF ground-based and optical satellite lightning observations are often weak at the flash scale.
A multivariate analysis using several features and targets is conducted to achieve more robust flash characteristics. Simulated GEO flash characteristics (targets) are obtained via ML models. Targets include GLM flash extent, GLM flash duration, GLM event number per flash. An independent test dataset is then introduced to compare the statistics of simulated pseudo-GEO flashes to the observed GEO, i.e., GLM, flash characteristics. In a second part, the simulated targets are used to mimic individual GEO events on a regular latitude–longitude grid.
After testing different ML models used in the first part of our generator, a linear SVR (linSVR)-based GEO lightning pseudo-observation generator is recommended. The results of multiple linSVR configurations turned out to be similar. In more detail, our recommendation is to use a linSVR with feature and target scaling, which uses all NLDN and pseudo-GLM features in a multistep approach.
The type of the ML model chosen in the first part of our GEO lightning pseudo-observation generated has a major impact on the simulated flashes. In fact, the performance ranking of tested target generators reveals clusters per ML model type. Whereas the vast majority of generators produces pseudo-GLM flashes with flash characteristic means close to the observed ones, they simultaneously overestimate the medians of flash characteristics. Therefore, they produce insufficient small flashes as compared to the GLM observations. Only linSVR-based generators were able to simulate pseudo-GLM flash characteristics with distribution medians close to the observation for the 3-day test dataset. This gain is achieved at the expense of slightly underestimating the target means. It is then found that FED sums from linSVR-based generators are closer to the observed FED sum than for all other generators; however, the electrified area is at least 10% smaller than the observed electrified area.
Besides the type of the ML model, the set of features and the feature scaling impact the results. In particular, including (pseudo) GLM flash characteristics in the set of features improved the predictions of most ML models as target generators and thus the overall performance of the GEO lightning pseudo-observation generator.
In general, generators that perform well for the FED sum exhibit high Dabs for the electrified area and vice versa. For example, the best generator for the electrified area with Dabs and Dreal of 7% and −2%, respectively, highly overestimates the FED in most cases with Dabs and Dreal of 75% and 72%, respectively. On the other hand, the best generator for the FED sum with Dabs and Dreal of 22% and 2%, respectively, always underestimates the electrified area with Dabs and Dreal of 27% and −27%, respectively. Figure 9 illustrates this finding on the example of test day 31 July 2020.
The developed GEO lightning pseudo-observation generator provides exactly one pseudo-GEO flash for each LF flash. It does not distinguish whether an LF flash, that is, an NLDN flash, is detected by the GEO LLS, that is, GLM. During the application of the generator, there is no information whether a given NLDN flash could be detected by the GEO LLS. Additional assumptions, for example, using flash characteristics, would then be needed to distinguish the LF flashes with and without GEO match. In addition, our GEO lightning data generator does not include a specific part to simulate GEO flashes that are not directly coincident to any LF flash. Here, the pragmatic approach of using all LF flashes as input is justified with similar flash DE of the LF (i.e., NLDN) and the GEO (i.e., GLM) LLS thus giving overall similar amounts of GLM and NLDN flashes. Then NLDN and GLM flashes without any coincident observation are analyzed. They are referred to as NLDN-only and GLM-only flashes, respectively. It was observed that both the NLDN-only and GLM-only flashes occurred mostly in proximity to the convective cores and regions of overall high flash rates. The number of observed GLM-only and NLDN-only flashes was in general on the same order of magnitude. It is assumed that pseudo-GLM flashes simulated from the NLDN-only flashes substitute the observed GLM-only flashes. It should be mentioned that some simulated pseudo-GLM flashes might overlap as the pseudo-GLM flash extent is usually greater than the NLDN flash extent. Overlapping pseudo-GLM should actually be merged; however, this is not further studied here. As one possible consequence, the simulated pseudo-GLM FED can be somewhat higher than the observed GLM FED (as seen for most configurations of generators). In particular, the simulated hourly FED values are often higher than observed in situations when many NLDN flashes were observed. On the other hand, lower simulated than observed FED at the rim of cells indicate that NLDN flashes cannot represent the scattering of light as seen by GLM. Peterson et al. (2020) showed that optically detected flashes can appear large near storm edges due to light reflected off nearby clouds. Simulated FED (based on NLDN observations) could then be closer to the actual flash channel extent as derived from LMA-type observations than the observed FED, especially at the rim of cells. Nevertheless, the simulation might differ from what the satellite sensor sees.
Our method is configured and refined for NLDN Vaisala sensors. NLDN flash statistics were compared to coincident GLM flashes and their extent, duration, and event number. For an application in other regions than the United States and/or with different LF networks, NLDN operational specification and observations might be compared with the ones of the other LF network in order to identify the necessity for adapting the input data. This comparison can be of direct (e.g., NLDN and GLD360) or indirect (e.g., NLDN and Meteorage compared to ISS-LIS as common reference; Erdmann 2020) nature.
The studied dataset is limited to a region in the SE United States and for the months of March to September. GLM features high flash DE (e.g., Marchand et al. 2019; Murphy and Said 2020) in this region satisfying our objective to build a high-fidelity generator to simulate GEO lightning data. However, the limited dataset lacks winter storms that may have different characteristics. For the application of our generator in Europe, this should be a minor limitation as winter storms rarely occur here. Taszarek et al. (2020) found that 3.6% of flashes over Europe occurred during the European winter. Wintertime flashes might be important over SE Europe and the Mediterranean Sea. The performance of the data generator will depend on the LF network performance, for example, flash DE. Realistic data can only be expected in regions where the LF network provides good coverage. The simulated data are, thus, restricted by the quality and range of the LF data input. The SE U.S. region features mostly normal polarity storms while storms with different charge structure occur more often in other parts of the United States. For example, Rutledge et al. (2020) show that flash characteristics and GLM flash DE are altered for storms with anomalous charge structure. In addition, the data used to train our GEO lightning data generator were recorded in this region well covered and far from the edges of the GLM’s (on GOES-16) field of view. Simulating data of a GEO LLS near the edges of the field of view needs caution about parallax effects and an increase in the area one event covers.
The GLM data include a parallax correction. Our GEO lightning pseudo-observation generator assumes that GLM observations are correctly located. The simulated flashes are placed according to the LF lightning data. If the GEO LLS that should be mimicked uses a different parallax correction than GLM, an adaption may become necessary to obtain realistic data of this LLS.
A comparison of GLM and NLDN during day and night, and for intracloud (IC) and cloud-to-ground (CG) flashes revealed similar relationships between NLDN and GLM flash characteristics. The dataset for the ML includes all observed flashes, without a separation of these flash types. In addition, all applied ML models aim to optimize average characteristics. This study uses deterministic approaches without a definition of a confidence interval of the outcomes. As one result, the tails of the characteristics’ distributions, e.g., exceptionally small flashes, are underrepresented in the simulation compared to the observation.
Supplementary data might improve the present GEO lightning data generator. Cloud information and brightness temperature data could provide additional features for the ML, e.g., cloud-top height, and also information about more likely scattering directions, e.g., in anvils of convective clouds or in stratiform cloud lightning. (Doppler-)Radar data would provide even more versatile possibilities to include cloud structures, dynamics, and microphysics.
We do not know instrument downtimes from the data. Data may come with flags, but they do not give reliable information about the instrument status. We used a two-step approach to identify downtimes: (i) the flash DE is less than 50% and (ii) the number of flashes observed is less than 10% of the reference LLS.
It can be noticed from Table 2 that this linSVR-based generator also overestimates the minima of event number per flash and flash extent. For those two targets, the simulated maxima are closer to the observed maxima than for the best-performing generator in Table 2, causing overall similar and even lower KS and CvM for the linSVR num ext(a) plus.
The RF is included in the table for completeness. Only ETR as a special RF model is used in the study.
Acknowledgments.
This work is part of the Ph.D. thesis of Felix Erdmann funded by the Centre National d’Études Spatiales (CNES) and Météo-France. This article is funded by Météo-France, the SOLID project, and the EXAEDRE project (ANR-16-CE04-0005). This work was supported by the French National program Les Enveloppes Fluides et l’Environnement (LEFE), project ASMA. We thank the AERIS/ICARE Data and Services Center for providing access to GLM data. The authors thank Ronald L. Holle (Vaisala) for providing the NLDN data and reviewing the NLDN specific information. We acknowledge the guidance of Chien Wang (MOPGA Recipient Scientist at Laboratoire d’Aérologie, Toulouse, France) on machine-learning applications. The authors thank the three anonymous reviewers and Eric Bruning for their detailed and constructive critics. We declare that there are no conflicts of interest.
Data availability statement.
NLDN data are available from Vaisala, and data as presented in this paper were provided by Ronald L. Holle. GLM data are in general available from NOAA (GOES-R Algorithm Working Group 2018). The GLM data as presented in this paper were downloaded from the French AERIS/ICARE Data and Services Center of the Université de Lille, where the files were stored 19 April 2018.
APPENDIX A
GEO Pixel Slicing for FED
Deriving FED requires knowledge about flash locations or, in case of satellite images, the positions of lightning data pixels. GLM products do not come with this necessary information. Therefore, the real GLM grid is reconstructed locating the centers of all events of the full half-year dataset. This large dataset was used to ensure that the reconstruction of the GLM grid would be complete; that is, there was at least one event at each GLM pixel. A time-invariant real GLM grid is assumed. Because individual pixels appear to wobble locally with time and do not appear on a regular grid due to microvibrations of the satellite platform, spacecraft jitter, and variable-pitch CCD, a k-means cluster analysis is performed to identify the statistical mean location of each pixel center. Corner points of pixels are then defined as the mean locations between the centers of the four pixels adjacent to each point. It is assumed that corner points can be connected by straight lines to represent the pixel shapes. This assumption is not entirely true because the regular CCD grid is projected on Earth (more precisely, on the cloud-top ellipsoid; section 2a); however, the FED should be less impacted by this assumption than by assuming a regular GLM grid. Shapes of GLM events do not usually match the FED grid pixel shapes. One GLM event with average side length of 8.7 km can overlap multiple FED pixels with side length of 5 km to some degree. The fractions of the GLM event within each pixel of the FED grid are summed up while integrating over the period. This slicing of GLM events reduces the effect of producing gaps or double counts of GLM pixel when transformed to the regular FED grid, as recently described by Bruning et al. (2019).
APPENDIX B
Definitions of the ML Algorithms
a. ML model types
This section defines the seven ML model types that are trained in the study. The basic idea of each ML model type is introduced, and specifications and important parameters for their tuning are briefly described. As mentioned in section 2, Python’s sklearn package is used. Model names are given as they appear in the sklearn library and documentation (https://scikit-learn.org/stable/), which provides further details.
1) Multivariate linear regression
The first approach is the most commonly used linear regression: sklearn.linear_model.LinearRegression. It is applied simultaneously to all features and targets and is, thus, a multivariate linear regression (LinReg). The algorithm seeks for the minimum sum of squared errors between the features and the targets by using linear functions. It is an ordinary least squares fit in a space with dimensions equal to the number of features times the number of targets.
2) Multivariate polynomial regression
The polynomial regression (Poly) is an adjustment to the multivariate linear regression. It fits a polynomial of degree 3 (rather than a linear function) to minimize the sum of squares between predicted targets and the corresponding observations in the validation dataset. The cubic polynomial model is chosen based on the initial correlation analysis with relations between any one feature and one target. The low polynomial degree allows fast computation.
3) Random-forest regressor
A random forest (RF) is an ML algorithm using bootstrapping and applying single decision trees to each bootstrap sample. The overall result is the average of the outcomes of all the decision trees. The minimum leaf size defines the minimum size at the end of the decision tree. A specific form of the RF is called extra trees: sklearn.ensemble.ExtraTreesRegressor (ETR; Geurts et al. 2006). ETR enforces randomness by not only selecting random features in each subset but also splitting depending on the best randomly produced thresholds instead of looking for the most distinctive threshold (as in RF). ETR usually reduces the variance and increases the bias of the model relative to RF. In general, a higher number of trees improves the performance but also the computation time. Our RF implementation uses an ETR model with 50 decision trees. The number of decision trees results from a sensitivity test (ETRs with 5, 10, 20, 50, 100, and 500 trees were tested) between performance as R2 score (see the sklearn documentation) and computational effort. Here, the GB dataset with independent ML training and validation (i.e., calculating the R2 scores) data (see section 1) is used. The minimum leaf size is set to two; that is, a remaining sample of two data points defines the end of the branch. Single-point leaf size would increase the variability of the trees and would lead to a higher likelihood of overfitting.
4) Bagging regressor with K-nearest neighbor regressor
Bootstrap aggregation, short bagging (Breiman 1996), uses subsamples drawn by bootstrapping from the entire dataset. This step is similar to the RF regressor. The algorithm used to treat the subsamples can, however, be chosen (not always a decision tree). This paper applies the bagging regressor sklearn.ensemble.BaggingRegressor combined with the K-nearest neighbor (KNN) regressor (e.g., Altman 1992) sklearn.neighbors.KNeighborsRegressor on each of 50 subsamples. The number of neighbors to use by default is set to the five closest points, and distance weighting is applied for Euclidean distances. The KNN finds closest neighbors with a K-dimensional tree (KD tree) method (Bentley 1975). It reduces the number of distance calculations compared to a brute-force approach calculating distances between all data points. The KNN regressor in combination with distance weighting should represent the actual range of the subsample training data better than a decision tree (as used in RF and ETR). The expense might be an increase in overfitting of the data.
5) Multilayer perceptron neural network
Multilayer perceptrons (MLPs) are a form of neural networks in supervised ML (Glorot and Bengio 2010). They consist of different layers of neurons, where the input layer neurons represent the features and the output layer neurons represent the simulated targets. An adjustable number of hidden layers can connect the input and output layers. Each neuron initially transforms the values from the previous layer in a weighted linear summation. Then a (non)linear activation function is used. Parameters of our MLP model, sklearn.neural_network.MLPRegressor, were determined after testing different configurations to balance computation time and accuracy. It uses one hidden layer with 50 neurons. The activation function is the rectified linear unit function. Additionally, an early stopping criterion is applied if there is no improvement over 20 consecutive iterations. The early stopping requires splitting the training dataset randomly, whereby 10% are used to verify the improvement of the model and 90% remain as actual training dataset. The tolerance for the stopping criteria is reduced from default 10−4 to 10−8 to allow a higher number of iterations. The alpha parameter for the L2 penalty was also reduced from default 10−4 to 10−8 after testing different values. The lower alpha led to faster training while maintaining the model skill. This change is indicated by naming alpha8 of the MLP-based generators. Furthermore, the default Adam solver (Kingma and Ba 2014) and a constant learning rate are used, along with adjusted parameters beta1 (0.7), beta2 (0.9), and epsilon (10−10) for the decay rates and the numerical stability in the Adam solver.
6) Support vector regressor
The support vector regressor (SVR) is based on support vector machine (SVM) algorithms. A set of hyperplanes is constructed. Therefore, a defined kernel function is applied to achieve a separation of data clusters (by the hyperplanes) for the regression. The kernel function can be a linear or nonlinear function (i.e., polynomial or radial basis function). Linear SVR (linSVR) is faster and uses less memory than SVR with nonlinear kernel functions. Nonlinear SVR provides usually better separation of different clusters in the data and thus a higher score than linear SVR. The distances of the nearest data points to the hyperplanes (so-called functional margins) are maximized. Points with a larger functional margin lead to less uncertainty for the prediction than data close to the hyperplanes. SVM in general analyzes all data while the cost function (L1 loss) depends on a subset of the training data, referred to as support vectors. Support vectors are a set of data points with some distance from the target values that still allow the correct prediction. The systematic reduction of the training data makes this model type fundamentally different from the remaining model types of this study. Further information is also provided by Smola and Schölkopf (2004).
Because of our large sample size (672 794 flashes), only the linSVR sklearn.svm.LinearSVR is used in this study in its default configuration. As for the MLP, an early stopping criterion is used for a lack of improvement between consecutive iterations.
7) Histogram-based gradient boosting regression tree
Boosting is, besides bagging, another approach to reduce overfitting of ML models. It combines an ensemble of weak learners to one strong learner. The histogram gradient boosting regression sklearn.ensemble.HistGradientBoostingRegressor (HGBR) is much faster than regular gradient boosting regressors. Data are first binned into 256 integer-valued bins. The algorithm can then leverage histograms instead of relying on sorted continuous values when building the decision trees. The number of splitting points is reduced, and the algorithm becomes time efficient, inspired by LightGBM (Ke et al. 2017). The first step of the HGBR averages the target values and calculates residuals (average difference of observation to prediction) with a least squares loss function. Based on these residuals, a small decision tree is built, along with a learning rate. The learning rate limits the influence of a single small decision tree in the final ensemble to avoid overfitting. Then new predictions are computed using the averages and the decision tree for residuals. Based on new predictions, new residuals are calculated, and a new decision tree is created. The final model combines several of these decision trees to pull the target averages toward the observations. The used maximum number of iterations is 500, and the early stopping criteria kicks in after 50 iterations without significant improvement of the loss value.
b. Naming convention for the GEO lightning pseudo-observation generator configurations
This section defines the meaning of names given to different configurations of a target generator. The names and abbreviations of the ML model types can be found in Table B1. The given ML model types are used in the first part of the GEO lightning pseudo-observation generator referred to as target generator.B1 Table B2 summarizes the feature usage that is available for each ML model type available for the target generator. The feature-set selections indicate whether a single-target or multistep approach is used. The feature-set selection called NLDN is the default configuration as described. Generators with extension of only default, plus, raw, or raw plus are single-step approaches, that is, using the model of class 1 three times in Fig. 3. Multistep simulations always simulate the GLM flash duration in the first step here. The order of the remaining targets, that is, number of events per flash and the GLM flash extent, is not fixed. The extension “num” indicates one additional step only for the pseudo-GLM event number per flash using the pseudo-GLM flash duration as pseudofeature. GEO lightning pseudo-observation generator configurations with extension num ext and num ext(a) have two additional steps using different pseudofeatures as shown in Table B2. The num ext(a2) generators use only the GLM flash duration as pseudofeature, and thus two models of class 2 as in Fig. 3.
ML model types with abbreviation.


Naming conventions of used target generator configurations. The name extensions in column 1 are used following the ML model type. The remaining three columns indicate the utilized features during the ML training for each of the three targets GLM flash duration (flash duration), number of events per flash (event no.), and GLM flash extent (flash extent). NLDN indicates that NLDN flash duration, the number of pulses/strokes per flash, NLDN flash extent, and the maximum LF amplitude are used as features. The GLM pseudofeatures flash duration, flash extent, and/or event number can complement the NLDN features for some configurations. Feature-set selections define how one target (header) is generated, i.e., ST or multistep approach. The attributes can or cannot be applied and may replace default in the generator name. Combinations of a feature-set selection with 0, 1, or 2 attributes are possible.


The attributes define a modification of the feature-set selections with binary character. The plus attribute indicates that NLDN LF amplitude and CG fraction are added to the list of features. Attribute raw means that no feature and target scaling were used. Combinations of the given feature-set selections and attributes are possible; for example, an unscaled model with NLDN mean LF amplitude and CG stroke ratio as additional features that uses the GLM flash duration as pseudofeature for the event number per flash gets the extension num(a) raw plus. The total number of generator configurations is 196: There are seven ML model types (Table B1 except RF). For each ML model type there are seven feature-set selections resulting from the single and multistep approaches, and for each combination of ML model and feature-set selection again four different attribute usages (Table B2), that is, none, plus, raw, or raw plus. The 196 generator configurations (28 for each ML model type) define the base for the statistical results presented in section 4.
APPENDIX C
The Multitarget Multistep Approach
This section describes a multitarget regression that simplifies the idea of the stacked single target (SST) approach (Spyromitros-Xioufis et al. 2016). In this study, there are six NLDN features (as physical input) and three GLM targets (as physical simulated variables) per sample, that is, per flash. The three GLM targets are denoted Ti, Tj, and Tk; Ti can represent any of the three targets. The indexes i, j, and k indicate the order of obtaining the final targets. Targets that are used like features are referred to as pseudofeatures, that is, Tj and Tk in Fig. 3a. With this dataset, there are in general four different ways to simulate the target Ti. The four ST models are shown as the training part in Fig. 3a. There are three classes of models: Yellow is the model class 1 without pseudofeatures, gray indicates model class 2 using one pseudofeature, and the red is for model class 3 using two pseudofeatures. The model M→i constitutes the common ML model, that is, class 1, with only the NLDN features as input. One (i.e., Tj or Tk) or two (i.e., Tj and Tk) of the three targets can be added to the input as pseudofeatures to simulate the target Ti. The resulting models Mj→i (using Tj as pseudofeature), Mk→i (using Tk with the features), and Mj,k→i (using Tj and Tk with the features) may indeed take advantage of correlations between the predicted target and the targets that are used as pseudofeatures.
The application case only uses the NLDN features as first input. Therefore, a multistep approach is required. Figure 3b presents the example application for a three-step approach that first predicts the pseudo-GLM flash duration, then the pseudo-GLM event number per flash, and last the pseudo-GLM flash extent. This configuration is denoted num ext(a) (see Table B2 for details on the configuration naming). The first step, M→i, uses the NLDN features and predicts the first pseudo-GLM characteristic M→i(NLDN), that is, pseudo-GLM flash duration. The second step, Mi→j, uses the NLDN features and the result of the first step, M→i(NLDN), that is, the pseudofeature GLM flash duration. This model of class 2 predicts the second pseudo-GLM characteristic Mi→j[NLDN, M→i(NLDN)], that is, the pseudo-GLM event number per flash. Both predicted pseudo-GLM characteristics (i.e., GLM flash duration and event number per flash) can then be used as pseudofeatures to predict the third target with the class-3 model Mi,j→k. Hence, the final target prediction Mi,j→k〈NLDN, Mi→j[NLDN, M→i(NLDN)]〉 depends on the NLDN features and both previous predictions for this configuration. In general, a model of class 3 can also use two pseudofeatures produced by two models of class 1. Also, two models of class 2 could be used to simulate the remaining two targets after the first step. Utilizing a model of class 1 three times is equal to the common ML ST approach. Hence, several combinations of models of different classes are possible and have been investigated here.
The ML training for the multistep approach can be performed in parallel for the models M→i, Mj→i, and Mj,k→i. The approach can use all ML model types as the training creates independent learners. Our multistep approach adapts the idea of the SST but uses GLM observations instead of simulated pseudo-GLM targets during the ML training. A trained generator can be applied even if the observations are not available using the corresponding pseudo-observation in their place. This method assumes similarity between observations and pseudo-observations; however, the pseudo-observations only approximate the real observations. Our approach does not propagate errors in successive steps. However, the training is more efficient than for an SST approach as all generator parts can be trained simultaneously rather than waiting for the pseudo-observations to be created. Computational efficiency was necessary due to the large number of generators tested in this paper and in the perspective of an operational-like application. The results (section 4) showed that our multistep approach aids in simulating realistic pseudo-GLM observations and the performance is often better than with using common ST models without pseudofeatures.
Although the correlations between the NLDN features and both GLM flash extent and event number per flash are relatively weak, the NLDN features improve the prediction during each step as seen through feature drop tests (not shown). Indeed, all features have a positive effect on the model score. Because of strong correlations between GLM flash duration and NLDN features flash duration and pulse/stroke number, and to reduce the number of ML-based target generators, only the multistep approaches that predict the GLM flash duration in the first step (M→i) are considered. There remains only one model of class 2 in Fig. 3a and three ways to simulate a target Ti. The GLM flash duration is also weakly correlated with both GLM flash extent and event number (R of approximately 0.10 and 0.17, respectively), and GLM flash extent and event number per flash are well correlated (R of ∼0.74). Thus, the first step always provides the pseudo-GLM flash duration from the NLDN flash characteristics as features. The second step uses the simulated flash duration in addition to the NLDN features to simulate one or both of pseudo-GLM flash extent and event number per flash. The pseudofeature used in model class 2 (Fig. 3) is fixed in this paper to be flash duration leaving only one realization of model class 2 to simulate a second target. To further reduce the number of multistep configurations, the approaches that simulate the flash extent but not the event number per flash through a multistep process are not further considered since (i) GLM flash duration shows weaker correlation with GLM flash extent than with the event number per flash and (ii) the ST approach for event number per flash from NLDN features exhibits the lowest skill of the three targets. A potential third step may simulate the last GLM target based on NLDN features and the two remaining simulated pseudo-GLM characteristics as additional pseudofeatures. The paper describes generator configurations using only the GLM duration (strongest correlations) or using GLM duration and a second target as additional pseudofeatures to simulate the remaining target (GLM flash extent or event number per flash).
Our multistep approach aims at producing more realistic pseudo-GLM flash extent and event number per flash than using the NLDN features alone. The NLDN features also remain important because the correlations between some targets are weak.
APPENDIX D
Supplementary Results for each Test Day and Target Distribution Statistic
This section contains detailed results for each test day that are presented in the main paper for the combined 3-day period. The second part includes the figures and analysis of the normalized difference between distribution statistics of observed and simulated GLM flash duration and event per flash. The results are presented in a similar way as for the flash extent statistics in the main paper.
a. Target generator results for each test day
Tables D1, D2, and D3 present the results for 7 April, 26 May, and 31 July 2018, respectively. As explained for Table 2 with results for the three days combined, the tables show the observed statistics for each target distribution, the outcomes using the best performing generator, and statistics of data simulated with the linSVR num ext(a) plus recommended generator. The most common behavior of the target generators exhibits simulated mean values close to the observation statistics of the three target distributions. Median values are usually underestimated by the target generators as seen in Tables D2 and D3. Results for the 7 April 2018 test case differ from the general behavior (Table D1). That day saw exceptionally large flashes with high event numbers per flash that likely occurred within the MCS and the squall line. As a consequence, the ML-based target generators underestimated the means of the observed flash characteristics for that test case, but the medians of simulated and observed targets are similar.
Comparison of distribution statistics for observed GLM data and the best generator for each target on 7 Apr 2018. The recommended linSVR-based generator is shown in boldface type. Details about the target generator names are provided in section b of appendix B.


The results for each test day resample the results for the combined 3-day test data (see Table 2) overall. Minimum values are often only slightly overestimated for the three targets, while the simulated maxima cannot reach the observed maxima for none of the targets and on none of the 3 test days (Tables D1–D3). The linSVR-based target generator outperforms all other generator types on 26 May and 31 July 2020. On 7 April with extensively large flashes, different BAGR KNN dist–based generators are found as best performers for all three targets (Table D1). The choice of the most suitable generator appears to be situational; that is, there is no generator that performs better than all other generators in all cases. The recommended linSVR num ext(a) plus (boldface in Tables D1–D3) performs on one level with best generator for the event number per flash and flash extent on 26 May and 31 July 2020. The event number per flash is significantly underestimated by this linSVR-based generator on 7 April 2020 for the mentioned reason. The flash extent, as most important target for the FED, is also underestimated on that day; however, the CvM is only about one-half of the CvM for event number per flash, meaning a more realistic simulation of the flash-extent distribution than the event-number-per-flash distribution.
b. Normalized statistics for difference between observation and simulation for GLM flash duration and event number per flash
Figures D1 and D2 group the results for each statistic by the seven ML model types. As explained in section 4a, each distribution contains the results of 28 generators (see also appendix B, section b, Table B2).

Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM flash duration (0 means equal to observation; 1 represents the worst simulation). The boxplots represent the distributions of 28 target generator results per ML type (x axis) including the IQR (blue box), 1.5 times the IQR (whiskers), and outliers (black cross). The horizontal green line gives the median. Results are for the full test dataset. The abbreviations for ML type are in Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM flash duration (0 means equal to observation; 1 represents the worst simulation). The boxplots represent the distributions of 28 target generator results per ML type (x axis) including the IQR (blue box), 1.5 times the IQR (whiskers), and outliers (black cross). The horizontal green line gives the median. Results are for the full test dataset. The abbreviations for ML type are in Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM flash duration (0 means equal to observation; 1 represents the worst simulation). The boxplots represent the distributions of 28 target generator results per ML type (x axis) including the IQR (blue box), 1.5 times the IQR (whiskers), and outliers (black cross). The horizontal green line gives the median. Results are for the full test dataset. The abbreviations for ML type are in Table B1.
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

As in Fig. D1, but for the normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM event number per flash (0 means equal to observation; 1 represents the worst simulation).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1

As in Fig. D1, but for the normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM event number per flash (0 means equal to observation; 1 represents the worst simulation).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
As in Fig. D1, but for the normalized absolute difference of statistics and scores (titles) between distributions of observed and simulated GLM event number per flash (0 means equal to observation; 1 represents the worst simulation).
Citation: Journal of Atmospheric and Oceanic Technology 39, 1; 10.1175/JTECH-D-20-0160.1
Figure D1 shows the normalized differences and scores of different target generators for the GLM flash duration for the three test days combined. The GLM flash duration distribution is equally well simulated by a variety of ML-based target generators as the narrow spread of the medians (green line) indicates. In detail, a linSVR-based generator and an MLP-based generator can predict the mean well, an MLP-based generator and an ETR-based generator are best for the maximum, and a linSVR-based generator exhibits the lowest differences for the median as well as both KS and CvM scores. In total, a linSVR-based target generator (linSVR num ext raw) best approximates the observed distribution of the GLM flash duration in this comparison, with an NDA value of 0.30.
For the GLM event number per flash in Fig. D2, linSVR and BAGR KNN dist models make the best target generators. The lowest NDA of 0.45 is obtained for several linSVR and BAGR KNN dist–based generators, for example, linSVR num ext(a) raw plus and BAGR KNN dist num ext raw. For test day 7 April 2018 (a dominant mesoscale system with above-average mean and median GLM event numbers per flash), all generators underestimated the event number per flash. As generators using a linSVR usually predict lower values than the other generators, they underestimate the observed statistics even more on 7 April 2018. Nevertheless, for the full test data, there are linSVR-based generators that predict the mean event number equally as well as the best target generator, that is, MLP based, as demonstrated by the lower whiskers in Fig. D2. LinSVR-based generators are again most suitable to predict the event-number distribution median.
REFERENCES
Aguiar, G. J., E. J. Santana, S. M. Mastelini, R. G. Mantovani, and S. Barbon Jr., 2019: Towards meta-learning for multi-target regression problems. Eighth Brazilian Conf. on Intelligent Systems, Salvador, Brazil, IEEE, https://doi.org/10.1109/BRACIS.2019.00073.
Allen, B. J., E. R. Mansell, D. C. Dowell, and W. Deierling, 2016: Assimilation of pseudo-GLM data using the ensemble Kalman filter. Mon. Wea. Rev., 144, 3465–3486, https://doi.org/10.1175/MWR-D-16-0117.1.
Altman, N. S., 1992: An introduction to kernel and nearest-neighbor nonparametric regression. Amer. Stat., 46, 175–185, https://doi.org/10.1080/00031305.1992.10475879.
Anderson, T. W., 1962: On the distribution of the two-sample Cramér-von Mises criterion. Ann. Math. Stat., 33, 1148–1159, https://doi.org/10.1214/aoms/1177704477.
Ávila, E. E., R. E. Bürgesser, N. E. Castellano, A. B. Collier, R. H. Compagnucci, and A. R. Hughes, 2010: Correlations between deep convection and lightning activity on a global scale. J. Atmos. Sol.-Terr. Phys., 72, 1114–1121, https://doi.org/10.1016/j.jastp.2010.07.019.
Bateman, M., 2013: A high-fidelity proxy dataset for the Geostationary Lightning Mapper (GLM). Sixth Conf. on the Meteorological Application of Lightning Data, Austin, TX, Amer. Meteor. Soc., 725, https://ams.confex.com/ams/93Annual/webprogram/Paper223473.html.
Bateman, M., and D. Mach, 2020: Preliminary detection efficiency and false alarm rate assessment of the Geostationary Lightning Mapper on the GOES-16 satellite. J. Appl. Remote Sens., 14, 032406, https://doi.org/10.1117/1.JRS.14.032406.
Bateman, M., D. Mach, and M. Stock, 2021: Further investigation into detection efficiency and false alarm rate for the Geostationary Lightning Mappers aboard GOES-16 and GOES-17. Earth Space Sci., 8, e2020EA001237, https://doi.org/10.1029/2020EA001237.
Bentley, J. L., 1975: Multidimensional binary search trees used for associative searching. Commun. ACM, 18, 509–517, https://doi.org/10.1145/361002.361007.
Betz, H. D., K. Schmidt, P. Laroche, P. Blanchet, W. P. Oettinger, E. Defer, Z. Dziewit, and J. Konarski, 2009: LINET—An international lightning detection network in Europe. Atmos. Res., 91, 564–573, https://doi.org/10.1016/j.atmosres.2008.06.012.
Biron, D., L. D. Leonibus, P. Laquale, D. Labate, F. Zauli, and D. Melfi, 2008: Simulation of Meteosat Third Generation-Lightning Imager through Tropical Rainfall Measuring Mission: Lightning Imaging Sensor data. Proc. SPIE, 7087, 77–88, https://doi.org/10.1117/12.794764.
Blakeslee, R., and W. Koshak, 2016: LIS on ISS: Expanded global coverage and enhanced applications. Earth Obs., 28, 4–14.
Blakeslee, R., and Coauthors, 2020: Three years of the lightning imaging sensor onboard the International Space Station: Expanded global coverage and enhanced applications. J. Geophys. Res. Atmos., 125, e2020JD032918, https://doi.org/10.1029/2020JD032918.
Borchani, H., G. Varando, C. Bielza, and P. Larrañaga, 2015: A survey on multi-output regression. WIREs: Data Min. Knowl. Discovery, 5, 216–233, https://doi.org/10.1002/widm.1157.
Breiman, L., 1996: Bagging predictors. Mach. Learn., 24, 123–140, https://doi.org/10.1007/BF00058655.
Brooks, I. M., C. P. R. Saunders, R. P. Mitzeva, and S. L. Peck, 1997: The effect on thunderstorm charging of the rate of rime accretion by graupel. Atmos. Res., 43, 277–295, https://doi.org/10.1016/S0169-8095(96)00043-9.
Bruning, E. C., and Coauthors, 2019: Meteorological imagery for the Geostationary Lightning Mapper. J. Geophys. Res. Atmos., 124, 14 285–14 309, https://doi.org/10.1029/2019JD030874.
Cecil, D. J., S. J. Goodman, D. J. Boccippio, E. J. Zipser, and S. W. Nesbitt, 2005: Three years of TRMM precipitation features. Part I: Radar, radiometric, and lightning characteristics. Mon. Wea. Rev., 133, 543–566, https://doi.org/10.1175/MWR-2876.1.
Chmielewski, V. C., and E. C. Bruning, 2016: Lightning Mapping Array flash detection performance with variable receiver thresholds. J. Geophys. Res. Atmos., 121, 8600–8614, https://doi.org/10.1002/2016JD025159.
Christian, H. J., and Coauthors, 1999: The Lightning Imaging Sensor. 11th Int. Conf. on Atmospheric Electricity, Guntersville, AL, NASA, 746–749.
Coquillat, S., and Coauthors, 2019: SAETTA: High-resolution 3-D mapping of the total lightning activity in the Mediterranean Basin over Corsica, with a focus on a mesoscale convective system event. Atmos. Meas. Tech., 12, 5765–5790, https://doi.org/10.5194/amt-12-5765-2019.
Cummins, K. L., and M. J. Murphy, 2009: An overview of lightning locating systems: History, techniques, and uses, with an in-depth look at the U.S. NLDN. IEEE Trans. Electromagn. Compat., 51, 499–518, https://doi.org/10.1109/TEMC.2009.2023450.
Deierling, W., and W. A. Petersen, 2008: Total lightning activity as an indicator of updraft characteristics. J. Geophys. Res., 113, D16210, https://doi.org/10.1029/2007JD009598.
Dobber, M., and J. Grandell, 2014: Meteosat Third Generation (MTG) Lightning Imager (LI) instrument performance and calibration from user perspective. 23rd Conf. on Characterization and Radiometric Calibration for Remote Sensing, Logan, UT, Utah State University.
Emersic, C., and C. Saunders, 2020: The influence of supersaturation at low rime accretion rates on thunderstorm electrification from field-independent graupel-ice crystal collisions. Atmos. Res., 242, 104962, https://doi.org/10.1016/j.atmosres.2020.104962.
Erdmann, F., 2020: Préparation à l’utilisation des observations de l’imageur d’éclairs de Météosat troisième eneration pour la prévision numérique à courte (Preparation for the use of Meteosat Third Generation Lightning Imager observations in short-term numerical weather prediction). Ph.D. thesis, Université Toulouse 3–Paul Sabatier, 292 pp., http://thesesups.ups-tlse.fr/4947/.
Erdmann, F., E. Defer, O. Caumont, R. J. Blakeslee, S. Pédeboy, and S. Coquillat, 2020: Concurrent satellite and ground-based lightning observations from the Optical Lightning Imaging Sensor (ISS-LIS), the low-frequency network Meteorage and the SAETTA Lightning Mapping Array (LMA) in the northwestern Mediterranean region. Atmos. Meas. Tech., 13, 853–875, https://doi.org/10.5194/amt-13-853-2020.
Fierro, A. O., Y. Wang, J. Gao, and E. R. Mansell, 2019: Variational assimilation of radar data and GLM lightning-derived water vapor for the short-term forecasts of high-impact convective events. Mon. Wea. Rev., 147, 4045–4069, https://doi.org/10.1175/MWR-D-18-0421.1.
Geurts, P., D. Ernst, and L. Wehenkel, 2006: Extremely randomized trees. Mach. Learn., 63, 3–42, https://doi.org/10.1007/s10994-006-6226-1.
Glorot, X., and Y. Bengio, 2010: Understanding the difficulty of training deep feedforward neural networks. 13th Int. Conf. on Artificial Intelligence and Statistics, Sardinia, Italy, AISTATS, 249–256.
GOES-R Algorithm Working Group, 2018: NOAA GOES-R Series Geostationary Lightning Mapper (GLM) level 2 lightning detection: Events, groups, and flashes. Subsets used: Flash and event. NOAA National Centers for Environmental Information, accessed 1 November 2018, https://doi.org/10.7289/V5KH0KK6.
Goodman, S. J., D. Mach, W. Koshak, and R. Blakeslee, 2012: GLM lightning cluster-filter algorithm. NOAA NESDIS Center for Satellite Application and Research Algorithm Theoretical Basis Doc., 73 pp.
Goodman, S. J., and Coauthors, 2013: The GOES-R Geostationary Lightning Mapper (GLM). Atmos. Res., 125 –126, 34–49, https://doi.org/10.1016/j.atmosres.2013.01.006.
Höller, H., and H.-D. Betz, 2010: Study on inter-comparison of LIS and ground-based lightning location system observations. EUMETSAT–Deutsches Zentrum für Luft- und Raumfahrt Rep. ITT 09/996, 44 pp.
Ke, G., Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, 2017: LightGBM: A highly efficient gradient boosting decision tree. 31st Conf. on Neural Information Processing Systems, Long Beach, CA, NeurIPS, 3146–3154, http://papers.nips.cc/paper/6907-lightgbm-a-highly-Ωefficient-gradient-boosting-decision-tree.pdf.
Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, https://arxiv.org/abs/1412.6980.
Kolmasova, I., T. Marshall, S. Bandara, S. Karunarathne, M. Stolzenburg, N. Karunarathne, and R. Siedlecki, 2019: Initial breakdown pulses accompanied by VHF pulses during negative cloud-to-ground lightning flashes. Geophys. Res. Lett., 46, 5592–5600, https://doi.org/10.1029/2019GL082488.
Koshak, W. J., and R. J. Solakiewicz, 2015: A method for retrieving the ground flash fraction and flash type from satellite lightning mapper observations. J. Atmos. Oceanic Technol., 32, 79–96, https://doi.org/10.1175/JTECH-D-14-00085.1.
Koshak, W. J., and Coauthors, 2004: North Alabama Lightning Mapping Array (LMA): VHF source retrieval algorithm and error analyses. J. Atmos. Oceanic Technol., 21, 543–558, https://doi.org/10.1175/1520-0426(2004)021<0543:NALMAL>2.0.CO;2.
Koshak, W. J., D. Mach, M. Bateman, P. Armstrong, and K. Virts, 2018: GOES-16 GLM level 2 data full validation data quality: Product performance guide for data users. NASA Marshall Space Flight Center Guide, 16 pp., https://lightning.umd.edu/documents/GS_Status/GOES16_GLM_FullValidation_ProductPerformanceGuide.pdf.
Luque, M. Y., F. Nollas, R. G. Pereyra, R. E. Bürgesser, and E. E. Ávila, 2020: Charge separation in collisions between ice crystals and a spherical simulated graupel of centimeter size. J. Geophys. Res. Atmos., 125, e2019JD030941, https://doi.org/10.1029/2019JD030941.
Lyu, F., S. A. Cummer, Z. Qin, and M. Chen, 2019: Lightning initiation processes imaged with very high frequency broadband interferometry. J. Geophys. Res. Atmos., 124, 2994–3004, https://doi.org/10.1029/2018JD029817.
MacGorman, D. R., and W. D. Rust, 1998: The Electrical Nature of Storms. Oxford University Press, 422 pp.
Mach, D. M., 2020: Geostationary Lightning Mapper clustering algorithm stability. J. Geophys. Res. Atmos., 125, e2019JD031900, https://doi.org/10.1029/2019JD031900.
Marchand, M., K. Hilburn, and S. D. Miller, 2019: Geostationary Lightning Mapper and Earth networks lightning detection over the contiguous United States and dependence on flash characteristics. J. Geophys. Res. Atmos., 124, 11 552–11 567, https://doi.org/10.1029/2019JD031039.
Massey, F. J., 1951: The Kolmogorov-Smirnov test for goodness of fit. J. Amer. Stat. Assoc., 46, 68–78, https://doi.org/10.2307/2280095.
Mastelini, S. M., V. G. T. da Costa, E. J. Santana, F. K. Nakano, R. C. Guido, R. Cerri, and S. Barbon, 2019: Multi-output tree chaining: An interpretative modelling and lightweight multi-target approach. J. Signal Process. Syst., 91, 191–215, https://doi.org/10.1007/s11265-018-1376-5.
Murphy, M. J., and R. K. Said, 2020: Comparisons of lightning rates and properties from the U.S. National Lightning Detection Network (NLDN) and GLD360 with GOES-16 Geostationary Lightning Mapper and Advanced Baseline Imager data. J. Geophys. Res. Atmos., 125, e2019JD031172, https://doi.org/10.1029/2019JD031172.
Nag, A., M. J. Murphy, W. Schulz, and K. L. Cummins, 2015: Lightning locating systems: Insights on characteristics and validation techniques. Earth Space Sci., 2, 65–93, https://doi.org/10.1002/2014EA000051.
Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 2825–2830.
Peterson, M., S. Rudlosky, and D. Zhang, 2020: Changes to the appearance of optical lightning flashes observed from space according to thunderstorm organization and structure. J. Geophys. Res. Atmos., 125, e2019JD031087, https://doi.org/10.1029/2019JD031087.
Rison, W., R. J. Thomas, P. R. Krehbiel, T. Hamlin, and J. Harlin, 1999: A GPS-based three-dimensional lightning mapping system: Initial observations in central New Mexico. Geophys. Res. Lett., 26, 3573–3576, https://doi.org/10.1029/1999GL010856.
Rutledge, S. A., K. A. Hilburn, A. Clayton, B. Fuchs, and S. D. Miller, 2020: Evaluating Geostationary Lightning Mapper flash rates within intense convective storms. J. Geophys. Res. Atmos., 125, e2020JD032827, https://doi.org/10.1029/2020JD032827.
Schultz, E. V., C. J. Schultz, L. D. Carey, D. J. Cecil, and M. Bateman, 2016: Automated storm tracking and the lightning jump algorithm using GOES-R Geostationary Lightning Mapper (GLM) proxy data. J. Oper. Meteor., 4, 92–107, https://doi.org/10.15191/nwajom.2016.0407.
Schulz, W., G. Diendorfer, S. Pedeboy, and D. R. Poelman, 2016: The European lightning location system EUCLID—Part 1: Performance analysis and validation. Nat. Hazards Earth Syst. Sci., 16, 595–605, https://doi.org/10.5194/nhess-16-595-2016.
Smola, A. J., and B. Schölkopf, 2004: A tutorial on support vector regression. Stat. Comput., 14, 199–222, https://doi.org/10.1023/B:STCO.0000035301.49549.88.
Spyromitros-Xioufis, E., G. Tsoumakas, W. Groves, and I. Vlahavas, 2016: Multi-target regression via input space expansion: Treating targets as inputs. Mach. Learn., 104, 55–98, https://doi.org/10.1007/s10994-016-5546-z.
Stano, G. T., 2013: Fusing total lightning data with Aviation Weather Center and Storm Prediction Center operations during the GOES-R visiting scientist program. Sixth Conf. on Meteorological Applications of Lightning Data, Austin, TX, Amer. Meteor. Soc., 724, https://ams.confex.com/ams/93Annual/webprogram/Paper215183.html.
Takahashi, T., S. Sugimoto, T. Kawano, and K. Suzuki, 2017: Riming electrification in Hokuriku winter clouds and comparison with laboratory observations. J. Atmos. Sci., 74, 431–447, https://doi.org/10.1175/JAS-D-16-0154.1.
Taszarek, M., J. T. Allen, P. Groenemeijer, R. Edwards, H. E. Brooks, V. Chmielewski, and S.-E. Enno, 2020: Severe convective storms across Europe and the United States. Part I: Climatology of lightning, large hail, severe wind, and tornadoes. J. Climate, 33, 10 239–10 261, https://doi.org/10.1175/JCLI-D-20-0345.1.
Thomas, R. J., P. R. Krehbiel, W. Rison, T. Hamlin, D. J. Boccippio, S. J. Goodman, and H. J. Christian, 2000: Comparison of ground-based 3-dimensional lightning mapping observations with satellite-based LIS observations in Oklahoma. Geophys. Res. Lett., 27, 1703–1706, https://doi.org/10.1029/1999GL010845.
Thomas, R. J., P. R. Krehbiel, W. Rison, S. J. Hunyady, W. P. Winn, T. Hamlin, and J. Harlin, 2004: Accuracy of the Lightning Mapping Array. J. Geophys. Res., 109, D14207, https://doi.org/10.1029/2004JD004549.
Vaisala, 2013: Vaisala thunderstorm advanced total lightning sensor LS7002. Vaisala Doc., 2 pp., https://www.vaisala.com/sites/default/files/documents/WEA-MET-ProductSpotlight-LS7002-B212227EN-A.pdf.
Yang, J., Z. Zhang, C. Wei, F. Lu, and Q. Guo, 2017: Introducing the new generation of Chinese geostationary weather satellites, Fengyun-4. Bull. Amer. Meteor. Soc., 98, 1637–1658, https://doi.org/10.1175/BAMS-D-16-0065.1.
Zhang, D., and K. L. Cummins, 2020: Time evolution of satellite-based optical properties in lightning flashes, and its impact on GLM flash detection. J. Geophys. Res. Atmos., 125, e2019JD032024, https://doi.org/10.1029/2019JD032024.
Zhu, Y., V. A. Rakov, M. D. Tran, and A. Nag, 2016: A study of national lightning detection network responses to natural lightning based on ground truth data acquired at log with emphasis on cloud discharge activity. J. Geophys. Res., 121, 14 651–14 660, https://doi.org/10.1002/2016JD025574.