Abstract

An analytic perturbation method is introduced for retrieving the lightning ground flash fraction in a set of N lightning flashes observed by a satellite lightning mapper. The value of N must be large, typically in the thousands, and the satellite lightning optical observations consist of the maximum group area (MGA) produced by each flash. Moreover, the method subsequently determines the flash type (ground or cloud) of each of the N flashes. Performance tests of the method were conducted using simulated observations that were based on Optical Transient Detector (OTD) and Lightning Imaging Sensor (LIS) data. It is found that the mean ground flash fraction retrieval errors are below 0.04 across the full range 0–1 under the nominal conditions defined. In general, it is demonstrated that the retrieval errors depend on many factors (i.e., the number N of satellite observations, the magnitude of random and systematic instrument measurement errors, the ground flash fraction itself, and the number of samples used to form certain climate distributions employed in the method). The fraction of flashes accurately flash typed by the method averaged better than 78%. Overall, the accuracy of ground flash fraction and flash-typing retrievals degrade as the simulated population ground and cloud flash MGA distributions become more identical. Finally, because the analytic perturbation method was found to be quite robust (i.e., it performed well for several arbitrary underlying MGA distributions), it is not restricted to the lightning problem studied here but can be applied to any inverse problem having a similar problem statement.

1. Introduction

The Geostationary Operational Environmental Satellite-R series (GOES-R) is due to launch in early 2016 as of this writing. In preparation for this launch and in order to optimize return on investment, many ongoing research activities are being carried out to explore and exploit the full information content of the GOES-R instrument data.

One instrument on GOES-R is the Geostationary Lightning Mapper (GLM) described in Goodman et al. (2013). The GLM will map the locations and the time of occurrence of total (ground flash and cloud flash) lightning activity continuously day and night with near-uniform storm-scale spatial resolution and with a product refresh rate of under 20 s over the Americas and adjacent oceanic regions. The GLM is based on two heritage low-Earth-orbiting National Aeronautics and Space Administration (NASA) satellite lightning mappers: the Optical Transient Detector (OTD) (Christian et al. 1996, 2003) and the Lightning Imaging Sensor (LIS) (Christian et al. 1992, 1999).

Although the above-mentioned lightning mappers are designed to provide total lightning activity, these instruments do not directly provide flash type (ground or cloud flash) classification; that is, they do not determine the flashes that strike the ground. This is understandable, since the optically thick thundercloud obscures the view, making it difficult to determine flash type. Early examinations of data from the OTD and LIS confirm this difficulty. At optical frequencies, the thundercloud multiply scatters the lightning source, and this results in a diffuse cloud-top emission that prevents one from deciding whether the lightning channel below cloud top connects to ground.

Nonetheless, the spatiotemporal pattern of the diffuse cloud-top emission itself provides key information about whether or not the channel connects to ground, as first pointed out in Koshak (2010). Specifically, by intercomparing OTD and National Lightning Detection Network (NLDN; Cummins and Murphy 2009) data, Koshak (2010) showed that the mean maximum group area (MGA) for ground flashes was substantially larger than the mean MGA for cloud flashes, so that for spaceborne detectors, MGA is a fundamental variable for discriminating ground flashes from cloud flashes. [Note: a lightning flash typically lasts a few tenths of a second and typically consists of several optical groups. The term group follows the vernacular used to describe a contiguous region of illumination seen by a lightning mapper during one (2 ms) instrument frame time; Mach et al. 2007.]

The hypothesis proposed by Koshak (2010) is that the return stroke (and its accompanying simultaneous discharges, if any) produces a relatively large optical group within the flash. In effect, a statistically large MGA is useful as a “return stroke detector.” The mean ground flash MGA also substantially exceeded the mean cloud flash MGA when 9 years (2003–11) of LIS and NLDN data were intercompared (Koshak and Solakiewicz 2012). In addition, the study by Rudlosky and Shea (2013) intercompared World Wide Lightning Location Network (WWLLN) data with LIS data and found that LIS flashes associated with WWLLN ground flash detections had substantially larger mean MGA values than LIS flashes not associated with WWLLN ground flash detections. Some of the difference in the mean MGA value can possibly be attributable to WWLLN triggering only on the more energetic lightning events. But, this does not explain all of the difference; that is, the study by Koshak (2010) and Koshak and Solakiewicz (2012) still found substantial differences in the mean MGA values when (the more sensitive) NLDN data were used for the ground flash detections. Whereas the NLDN provides a ground flash detection efficiency of 90%–95% and a ground flash location accuracy of better than 500 m across the conterminous United States (CONUS) (Cummins and Murphy 2009), the WWLLN has respective values of 11% and under 5 km globally as summarized in Hutchins et al. (2012) and references therein.

The follow-on studies by Koshak and Solakiewicz (2011) and Koshak (2011) introduced the first attempts to retrieve the ground flash fraction based on the MGA variable suggested by Koshak (2010). That is, for a satellite lightning mapper that observes a large number N of flashes, these retrieval methods are designed to estimate the fraction of flashes that strike the ground. As discussed in Koshak (2010), ground flash fraction retrieval algorithms would have significant utility in a variety of scientific areas, such as severe weather studies, lightning nitrogen oxide (NOx) production for air quality and global chemistry climate modeling, lightning–convection relationships, the contribution of lightning to the global electric circuit, and cross-sensor validation.

The retrieval algorithm in Koshak and Solakiewicz (2011) depended on using a fixed value for the mean ground flash MGA and a fixed value for the mean cloud flash MGA; the two means were derived for CONUS by comparing NLDN and OTD observations. Thus, retrieval errors are expected to increase when other geographical regions and seasons are considered.

The more sophisticated retrieval algorithm in Koshak (2011) attempted to overcome this limitation by employing a mixed exponential distribution model to describe the ground and cloud flash (shifted) MGA distributions. In this approach, one would retrieve not only the ground flash fraction for the region–season under consideration but also the optimum values of the population means of each shifted MGA distribution; that is, the approach was to find the three parameters of the fixed (mixed exponential distribution) model that best describes the satellite observations.

Even with this beneficial advancement, one can still argue that a different region/season might have ground and cloud flash MGA population distributions that deviate too much from the fixed mathematical model employed. In addition, both retrieval algorithms provide no way for flash typing specific flashes. These two limitations have motivated the present study.

In this paper we introduce a model-independent method for retrieving the ground flash fraction in a set of N flashes observed by a satellite lightning mapper (such as OTD, LIS, or the future GLM). The method also subsequently determines the flash type of each of the N flashes.

The retrieval method introduced here is called the Analytic Perturbation Method (APM), and it consists of two basic phases: 1) a “burn in” phase and 2) an operational phase wherein the APM is applied to retrieve the ground flash fraction, the ground and cloud flash MGA density functions, and the flash type of each flash. The details of the APM are provided in section 2. In section 3, we provide detailed performance tests of the APM. Mathematical error analyses are provided in section 4 that provide additional insight and context. A summary of the study is given in section 5.

2. APM

This section introduces the mathematical details of the APM. Section 2a describes the burn-in phase. Section 2b describes how the ground flash fraction, the density function of ground flash MGAs, and the density function of cloud flash MGAs are retrieved from a set of N flashes observed by a lightning mapper. Finally, section 2c describes how the N flashes are individually flash typed.

a. Burn-in phase

Before the APM can be applied to retrieve the quantities mentioned above, the lightning mapper ideally first collects a large sample of flashes from the target (i.e., region and season of interest) that have already been flash typed using an independent validation dataset. This collection period or “burn-in” phase estimates the mathematical forms of the underlying ground and cloud flash MGA population density functions for the target. For example, the GLM would sample a large number of flashes over the CONUS using NLDN data for flash typing.

If independent validation datasets are unavailable for a particular target, then one is forced to apply burn-in results from a different target (e.g., CONUS, and for the same season of interest, where independent ground validation is ample). In taking this route, one would need to anticipate degradation in APM solution retrieval accuracy (see section 4 for details). However, it has been demonstrated in Koshak and Solakiewicz (2011) that, at least for CONUS (a region of highly variable lightning, highly variable thunderstorm structures, and highly variable cloud morphologies with diverse cloud-scattering properties) the ground and cloud flash values of MGA do not vary greatly. In other words, it might be possible to adequately apply the APM to most regions of the world using only the CONUS-based burn-in results as an approximation. Again, section 4 quantifies how accurate such a process must be in order to retain reasonably accurate APM retrievals.

To summarize, the burn-in phase can be viewed as a training or information collection period for the retrieval algorithm. After this period, the retrieval algorithm can be run autonomously for a wide range of targets, including possibly those targets that may have no, or inadequate, independent lightning flash type validation measurements available.

b. Ground flash fraction and MGA density functions

Suppose a lightning mapper detects N lightning flashes over a particular region in a given interval of time. Each flash is composed of one or more “optical groups,” and so the MGA of each flash can be obtained. The MGA of a single group flash is simply the area of that single group. If two or more groups in a flash have equivalent areas A, and this area A is the MGA, then this is simply the value of MGA for the flash (i.e., one does not numerically enhance the value of the MGA in an attempt to account for the several equivalent maxima). In addition, it is assumed for the present discussion that the N MGA values are exact; that is, they have no measurement error associated with them.

Next, the N MGA values are placed into an N-vector x. In general, Ng of these MGAs will be from ground flashes and Nc of these MGAs will be from cloud flashes, where N = Ng + Nc. Therefore, one can also construct the Ng-vector xg to hold the ground flash MGAs, and one can construct the Nc-vector xc to hold the cloud flash MGAs.

From x, one can easily find the frequency distribution of the MGA values. Specifically, one picks an MGA bin width (typically 20 km2) and some appropriate MGA range (say, 0–4000 km2), and then determines to what bin each element of x belongs. The number of bins is given by n, which in this example would equal 4000 km2/20 km2 = 200. Hence, after tallying the number of occurrences in each bin, the MGA frequency distribution can be written as an n-vector U. Because U consists of the union of ground and cloud flash MGAs in general, it is called the union frequency distribution, or the mixture distribution for brevity. Similarly, a ground flash MGA frequency distribution, G, can be derived from xg, and a cloud flash MGA frequency distribution, C, can be derived from xc. Clearly, the sum of ground and cloud MGA frequency distributions must equal the mixture distribution, that is,

 
formula

Dividing this relationship by N yields an expression in terms of probability density functions (pdfs), that is,

 
formula

where

 
formula

and the ground flash fraction, α, is given by

 
formula

Next, one can set up a fundamental retrieval equation. To do this, the presence of measurement errors in the MGA values needs consideration. These errors will propagate into the pdfs shown in (2). Characterizing the errors added to u by the vector ε, the observed mixture density m is given by m = u + ε. So, when measurement errors are included, the analog to (2) is given by

 
formula

Here, the lhs represents a set of measurements (originating from the satellite MGA observations x) and the rhs is a modeling of those measurements. The retrievable parameters in the model are αr, gr, and cr.

The densities (g, c) associated with the satellite observations x will be some deviation from the associated population densities [denoted here by (gpop, cpop)] from which they are respectively drawn. Hence, the retrieved densities can be viewed as a perturbation from the climatological estimate of the population densities; that is,

 
formula

where (a, b) are the ground and cloud flash climatological estimates of the population densities obtained by the burn-in phase, respectively, and (d, e) are the perturbations. To help visualize the n-dimensional space employed, Fig. 1 provides a conceptual (three dimensional) illustration of the climate vectors (a, b), their perturbants (d, e), and the mixture density vector (m).

Fig. 1.

Geometry associated with the perturbation approach (depicted conceptually in three dimensions). The blue-shaded regions are spherical.

Fig. 1.

Geometry associated with the perturbation approach (depicted conceptually in three dimensions). The blue-shaded regions are spherical.

Figure 2 emphasizes that the hyperplane formed by the retrieved densities (gr, cr) must be perturbed from the climate vectors in such a way that the (fixed) mixture density m is in this hyperplane. In other words, one wants to pick the perturbants (d, e) such that (5) holds; the skewed vectors (gr, cr) span the hyperplane vector space.

Fig. 2.

The solution space hyperplane depicted in three dimensions; the hyperplane must be moved so that it contains the (fixed) m.

Fig. 2.

The solution space hyperplane depicted in three dimensions; the hyperplane must be moved so that it contains the (fixed) m.

As an additional aid in understanding the solution process, one can define a reference density mo that is in the hyperplane space spanned by the climate vectors as

 
formula

where the reference ground flash fraction, αo, is given by solving (7):

 
formula

Since the observation m is not (in general) in the hyperplane space spanned by the climate vectors, it deviates from the reference mo by some perturbation p(αo) ≡ po, that is,

 
formula

Now, if Ng is large, then g likely deviates little from gpop. Similarly, if Nc is large, then c likely deviates little from cpop. This motivates finding the values of (αr, d, e) that minimize a scalar function, H = H(αr, d, e), given by

 
formula

which is subject to the constraint given in (5). The retrieved densities in (5) are as given in (6). For example, if the observations x contained mostly cloud flashes, then the ground flash fraction is near zero, and one would retrieve a value for αr close to zero. This implies that d2 could get relatively large without making H too large (desirable, since g likely deviates from gpop). It also implies that e2 should be forced to stay relatively small, otherwise H will get too large (desirable, since c likely deviates relatively little from cpop).

One can guarantee that (5) will hold exactly by choosing one perturbant properly. Substituting (6) into (5) and solving for d gives

 
formula

Alternatively, one could just as well have eliminated e in terms of d; the choice does not change the final retrieval solution.

Substituting (11) into (10) produces a new scalar function Q = Q(αr, e), given by

 
formula

where the Ci are not to be confused with the components of C in (1) and

 
formula

Next, one can find the e that minimizes Q by solving the system of equations ∂Q/∂ek = 0, with k = 1, 2, …, n, which gives the critical point perturbant

 
formula

Substituting (14) into (12) gives a new scalar function S = S(αr), given by

 
formula

The expressions in (13) can be used to further simplify (14) and (15); the algebra is lengthy but straightforward. Also, (14) is substituted into (11) to get the associated critical point form for d. The rather interesting results are

 
formula

Using the results in (16), one can find αr that minimizes S by solving the equation dS/r = 0. This gives the final APM solution as

 
formula

The final perturbants are dcrit(αr = α1) = ecrit(αr = α1) = mm1 = p(α1) = p1. Note that the solution is in terms of the three vector densities (m, a, b), where m is derived from the satellite MGA observations x, and the climate vectors (a, b) are obtained from the burn-in process. Since (m, a, b) are each pdfs, they are normalized (their elements sum to unity) and their elements are nonnegative (negative probabilities are not allowed). By inspection of the last two equations in (17), one can see that (gr, cr) are normalized. In this lightning retrieval problem, we have found minor instances of some of the elements of cr going very slightly negative but not enough to adversely affect our retrieval results. In any case, one can always zero out negative retrieved elements and then renormalize the vector.

Figure 3 provides a geometrical summary of the solution process. The steps taken to minimize H while satisfying (5) have resulted in there being only one viable type of perturbation, given by p1. Note that for an arbitrary perturbation [i.e., arbitrary (gr, cr)], one obtains the usual algebraic result of

 
formula

The perturbation p1 can be described as a transformation T: (a, b) → (gr, cr) given by the last two equations in (17); that is, the transformation can be expressed as

 
formula

Here, we use the mnemonic where the vectors play the role of scalars, so that normal matrix multiplication can be used to reproduce the two transformation equations in (17). In addition, βr = 1 − αr as in (13). The transformation matrix in (19) defines . The first application of perturbs one from a to gr, and from b to cr, by the amount p1 as discussed. In other words, substituting the last two equations of (17) into (18) yields α1.

Fig. 3.

A geometrical conception of the APM solution process (depicted in three dimensions).

Fig. 3.

A geometrical conception of the APM solution process (depicted in three dimensions).

Finally, the perturbation (transformation) has an interesting mathematical property. Once one completes the transformation (), additional applications of the same transformation result in no further changes in (αr, gr, cr). This is because j = , for j = 1, 2, 3, … . That is, pre-multiplying both sides of (19) by any number of times still gives back the lhs of (19). In other words, and from the perspective of the physical inverse problem at hand, if m happens to be already in the space spanned by the climate vectors, then no perturbations are necessary to model the observation vector m.

c. Flash type

In this section, we show that it is possible to mathematically infer the flash type (i.e., ground or cloud) of each of the N flashes observed. In other words, a distinct advantage of the APM is that it retrieves estimates of not only the ground flash fraction α but also estimates of the ground and cloud flash MGA probability density functions, g(x) and c(x), respectively. [Note: in this section we use the notation of continuous functions instead of vectors, i.e., g(x) denotes g, c(x) denotes c, and so on.] The retrieved values in (17) and the observed MGA of a flash allow one to estimate the probability Pg(x) that the observed flash is a ground flash. The probability that the flash is a cloud flash is simply 1 − Pg(x). To understand how this can be accomplished, note the following:

 
formula

Now, the number of ground flashes with MGAs in the interval I given by (xδ) < y < (x + δ) is

 
formula

where y is a dummy integration variable. Similarly, the number of cloud flashes with MGAs in the interval I is

 
formula

The fraction of flashes with MGAs in the interval I that are ground flashes is denoted as fg(x, δ). Similarly, the fraction of flashes with MGAs in I that are cloud flashes is denoted as fc(x, δ). These fractions are given by

 
formula

Substituting (21) and (22) into (23) and noting that Ng = αN and that Nc = (1 − α)N, one obtains

 
formula
 
formula

In addition, note that α = fg(x, ∞), where g(x) and c(x) are zero for x < 0.

In the limit of small δ, the integrals in the above-given two equations each have the simplified form

 
formula

So, one obtains

 
formula
 
formula

Here, Pg(x) is the probability that the flash is a ground flash given that it has an MGA value equal to x. Similarly, Pc(x) is the probability that the flash is a cloud flash given that it has an MGA value equal to x. Note from (27) and (28) that Pg(x) + Pc(x) = 1 as it should; that is, Pc(x) = 1 − Pg(x) is the complimentary probability.

An alternate way to obtain (27) is to start with Bayes’s theorem, which gives

 
formula

Here, p(x | tg) is the probability density function of MGAs for ground flashes; that is, p(x | tg) = g(x), where tg indicates flash type “ground.” In addition, p(tg) is the probability of choosing a ground flash from the set of all N observed flashes, hence p(tg) = α. Finally, p(x) is the probability of drawing an MGA value of x from the mixture of flashes; that is, p(x) = m(x) = αg(x) + (1 − α)c(x), where the expression for m(x) was derived in (2). Substituting these expressions into (29) gives

 
formula

But, p(tg | x) is the probability that the flash is a ground flash given that it has an MGA = x; that is, p(tg | x) is equivalent to Pg(x) even though the notations differ. Hence, the result in (27) has once again been obtained.

To summarize, the APM first retrieves estimates of [α, g(x), c(x)] given by their discrete vector forms (αr, gr, cr) as obtained from (17). These estimates are used to estimate Pg(x) in (27). Each of the N flashes observed has a specific MGA value x. So, the APM-retrieved flash type probability and flash-typing conditions are

 
formula

3. Performance tests

To evaluate the performance of the APM, simulated retrievals are conducted. To accomplish this, the population densities (gpop, cpop) are created from which both the burn-in observations and the N operational measurements can be drawn.

To simulate realistic populations, we use the OTD MGA distributions obtained from Koshak (2010). The ground and cloud flash MGA distributions are shown in the bottom two panels of Fig. 4. These frequency distributions are interpolated to high resolution and smoothed using Interactive Data Language (IDL) utilities; they are then scaled by the respective sample sizes to obtain pdfs. The pdfs are shown in the top-right panel of Fig. 4.

Fig. 4.

Summary of OTD MGA analyses (adapted from Koshak 2010). (top left) Ground flash locations (red dots) and cloud flash locations (blue dots). (bottom left) Ground and (bottom right) cloud flash MGA distributions. (top right) High-resolution smoothed MGA pdfs derived from the OTD data.

Fig. 4.

Summary of OTD MGA analyses (adapted from Koshak 2010). (top left) Ground flash locations (red dots) and cloud flash locations (blue dots). (bottom left) Ground and (bottom right) cloud flash MGA distributions. (top right) High-resolution smoothed MGA pdfs derived from the OTD data.

Since LIS data are of generally higher quality than OTD, simulations are also conducted using population densities that are based on LIS MGA distributions obtained from Koshak and Solakiewicz (2012). Figure 5 provides the results for LIS in the same format as provided in Fig. 4. For all of our simulations, it was convenient and computationally expedient to use an MGA range from 0 to 2000 km2, ignoring the few flashes that exist beyond this upper limit.

Fig. 5.

As in Fig. 4, but for LIS MGA analyses (adapted from Koshak and Solakiewicz 2012).

Fig. 5.

As in Fig. 4, but for LIS MGA analyses (adapted from Koshak and Solakiewicz 2012).

Figure 6 provides an overview of the retrieval error analysis associated with the simulation runs. By normalizing the ground and cloud flash MGA frequency distributions given in the bottom panels of Fig. 4 (or Fig. 5), one obtains the respective densities (a*, b*) shown in Fig. 6. Interpolating these densities to high resolution, and smoothing, gives the simulated population densities (gpop, cpop). These densities are then converted to cumulative distribution functions (CDFs).

Fig. 6.

Overview of the simulator used to evaluate APM performance. The error added to the MGA observations x consists of simulated random and systematic measurement errors described in the main text.

Fig. 6.

Overview of the simulator used to evaluate APM performance. The error added to the MGA observations x consists of simulated random and systematic measurement errors described in the main text.

To simulate the burn-in phase, a large number ηg of samples from the ground flash CDF are drawn; simulated random and systematic instrument measurement errors are added to each MGA sample. Similarly, a large number ηc of samples from the cloud flash CDF are drawn and instrument measurement errors added. This sampling produces a ground flash MGA distribution and a cloud flash MGA distribution; scaling these distribution by ηg and ηc, respectively, produces the associated climate vector pdfs (a, b).

With the value of (a, b) given, one can now simulate the observation of N flashes and apply the APM to analyze the flashes. One is interested in the APM accuracy as a function of the true ground flash fraction α. Therefore, a total of 21 values of the true ground flash fraction (α = 0, 0.05, 0.10, 0.15, …, 0.95, 1.0) are examined. For each of these values of α, 100 trials (i.e., simulated APM retrievals) are performed. Each trial consists of uniquely drawing Ng samples from the ground flash CDF and Nc samples from the cloud flash CDF, adding the simulated random and systematic errors to these simulated MGAs, and then applying the APM to obtain the desired retrieved quantities. Here, N = Ng + Nc and α = Ng/N. Hence, a total of 2100 retrievals are performed for one analysis run (i.e., one error plot). All of our simulation runs were performed twice (i.e., once using OTD data to build the population densities and once using LIS data to build the population densities). For brevity, and because LIS data are generally of higher quality, the test results given below are based on the LIS data; one exception is the comparative analysis (see Fig. 9) discussed later.

Both random and systematic measurement errors are simulated. Simulated random measurement error added to each MGA value is chosen from a uniform distribution ranging from (−R to +R) km2, where R is chosen to be about twice the instrument nadir pixel footprint; the nadir footprints are approximately 16 km2 (LIS) and 64 km2 (OTD). Simulated systematic measurement error, S, is the negative systematic error associated with the finite instrument pixel footprint (taken as the nadir footprint values mentioned above). That is, each MGA value is reduced by truncation via the conversion x′ = long(x/FP)FP, where FP is the instrument footprint, and long is the IDL utility that converts its argument to an integer value. Hence, S ranges from 0 to −16 km2 for LIS and 0 to −64 km2 for OTD. In some of the LIS-based sensitivity tests to follow, the value of S takes on specific fixed values (i.e., −16, −8, −4, 4, 8, and 16 km2).

The performance test results in Fig. 7 demonstrate how decreasing the sampling size in the burn-in phase degrades the APM solution accuracy. For these simulations, N = 5000, R = 32 km2, and S is the negative systematic error due to the finite pixel footprint, as discussed above. Each error plot shows statistics of the ground flash fraction retrieval error (Δα = |αrα|) as a function of the true ground flash fraction α. Curves for the mean of Δα (black curve), the standard deviation of Δα (red curve), the median of Δα (blue curve), and the maximum (green) and minimum (turquoise) of Δα are provided. Note that the retrieval errors are sufficiently small when the burn-in phase collects just ηg = 5000 ground flashes and ηc = 5000 cloud flashes. Degradation in the APM solutions occurs below these values, and is significant when 1000 or fewer are collected for each flash type.

Fig. 7.

The effect of reducing the sample size (ηg and ηc) in the burn-in phase.

Fig. 7.

The effect of reducing the sample size (ηg and ηc) in the burn-in phase.

The performance test results in Fig. 8 demonstrate how decreasing the value of N degrades the APM solution accuracy. For these simulations, R and S are as described in the previous paragraph, and the burn-in sample sizes are ηg = 40 000, ηc = 120 000. Note that there is not much return on investment in increasing N from 5000 to 7500, and that decreasing N below 2500 shows noticeable increases in solution retrieval error.

Fig. 8.

The effect of reducing the number of flashes (N) observed. Color key as in Fig. 7.

Fig. 8.

The effect of reducing the number of flashes (N) observed. Color key as in Fig. 7.

Figure 9 summarizes the performance of the APM when LIS data are used to generate the population densities (left column) and when OTD data are employed (right column). These simulations use the values N = 5000, ηg = 40 000, and ηc = 120 000. For the LIS data run, the values of R and S are as given above for Figs. 7 and 8, but for the OTD data run the value of R = 64 km2 and S ranges from 0 to 64 km2. In addition to the statistics of Δα, statistics of the fraction correctly flash typed are also provided (two bottom panels in Fig. 9). The results are very encouraging. The overall mean error in Δα is only 0.027 (LIS data run) and 0.018 (OTD data run). The associated average fraction of flashes correctly flash typed is 0.780 and 0.797, respectively. Even though the OTD data run involves larger values of R and S, it provides better retrievals. This is because the LIS ground and cloud flash MGA distributions overlap more than the OTD ground and cloud flash distributions (see bottom panels in Figs. 4 and 5). Specifically, the difference in the mean MGA is 277.4 km2 (= 493.0 – 215.6 km2) for the OTD data but only 213.5 km2 (= 465.4 – 251.9 km2) for the LIS data.

Fig. 9.

Summary of APM performance using (left) LIS data to construct the MGA population and using (right) OTD data: (top) alpha error and (bottom) fraction correct. Color key as in Fig. 7. See text for additional details.

Fig. 9.

Summary of APM performance using (left) LIS data to construct the MGA population and using (right) OTD data: (top) alpha error and (bottom) fraction correct. Color key as in Fig. 7. See text for additional details.

Figure 10 demonstrates that one can apply the APM method to arbitrary population densities that have nothing to do with lightning; that is, the APM is a general inversion technique that can be applied to any inverse problem having a similar problem statement to the lightning problem studied here. In this particular test, the population densities are defined by phase-shifted absolute value sine functions that have been appropriately normalized. We have also mixed various exponential distributions with absolute value sine functions and achieved similar performance by the APM.

Fig. 10.

The effect of reducing the distinction between the underlying population densities (the densities are shown in the left panel, one in black, the other in blue). Population densities having nothing to do with lightning were purposely employed to demonstrate that the APM is a general inversion technique. The right panel follows the color key as in Fig. 7.

Fig. 10.

The effect of reducing the distinction between the underlying population densities (the densities are shown in the left panel, one in black, the other in blue). Population densities having nothing to do with lightning were purposely employed to demonstrate that the APM is a general inversion technique. The right panel follows the color key as in Fig. 7.

Figure 10 also demonstrates that if the population densities for the two species involved (i.e., ground and cloud flashes in the case of the lightning problem) become less distinct, the retrieval errors increase. If there is no distinction, then there is no information content and the inverse problem is ill posed.

Finally, Tables 14 provide additional simulation runs that evaluate the impact of the simulated measurement errors R and S. All of these performance tests employ N = 5000 flash observations. It is known that the burn-in approach is only approximate; that is, the climate vector pdfs (a, b) are only approximations to (g, c), respectively. Such approximations result in APM solution retrieval errors. Therefore, in order to conduct a true sensitivity test of the effects of R or S on APM solution retrieval errors, it is necessary to shut off the errors due to the burn-in phase approximation; that is, from the known value of x in the simulation, the values of (g, c) are extracted and used in the APM solution rather than the approximations (a, b).

Table 1.

The effect of random measurement errors when one sets a = g and b = g in the simulation. The quantities Δα and f are both dimensionless fractions (no units). See text for a description of the table headings.

The effect of random measurement errors when one sets a = g and b = g in the simulation. The quantities Δα and f are both dimensionless fractions (no units). See text for a description of the table headings.
The effect of random measurement errors when one sets a = g and b = g in the simulation. The quantities Δα and f are both dimensionless fractions (no units). See text for a description of the table headings.
Table 2.

The effect of random measurement errors when one obtains a and b from the burn-in phase with ηg = 40 000 and ηc = 120 000.

The effect of random measurement errors when one obtains a and b from the burn-in phase with ηg = 40 000 and ηc = 120 000.
The effect of random measurement errors when one obtains a and b from the burn-in phase with ηg = 40 000 and ηc = 120 000.
Table 3.

The effect of systematic measurement errors when one sets a = g and b = g in the simulation.

The effect of systematic measurement errors when one sets a = g and b = g in the simulation.
The effect of systematic measurement errors when one sets a = g and b = g in the simulation.
Table 4.

The effect of systematic measurement errors when one obtains a and b from the burn-in phase with ηg = 40 000 and ηc = 120 000.

The effect of systematic measurement errors when one obtains a and b from the burn-in phase with ηg = 40 000 and ηc = 120 000.
The effect of systematic measurement errors when one obtains a and b from the burn-in phase with ηg = 40 000 and ηc = 120 000.

Hence, for the simulation run associated with Table 1, the burn-in phase approximation is shut off, the value of S = 0, and the value of R is ramped up. Note that as the value of R increases, the error statistics (Min, Avg, Std, Median, Max) of Δα increase. The variable f in Table 1 is the fraction correctly flash typed. Because the APM flash typing is probabilistic in nature, there is less of a trend in the statistics of f as a function of R.

To obtain the true impact of ramping up the value of R, the burn-in phase approximation (with ηg = 40 000, ηc = 120 000) is turned back on, and S = 0 still holds. The results are provided in Table 2. Now, there is virtually no impact of varying R because these random errors are small in comparison to the errors associated with the burn-in phase approximation.

Similarly, Table 3 shuts off the burn-in phase approximation, sets R = 0, and varies S. A value of S = nad16 denotes the variable negative systematic measurement error associated with the LIS finite pixel footprint (taken as the nadir value of 16 km2). The error statistics of Δα increase as the magnitude of S increases, and the statistics of f are again more variable. Note that the negative values of S had a much more adverse impact on the minimum f than did the positive values of S.

The true impact of varying S is obtained by turning the burn-in phase approximation back on (with ηg = 40 000, ηc = 120 000) and setting R = 0 (see results in Table 4). Once again, there is virtually no impact of varying S because these systematic errors are small in comparison to the errors associated with the burn-in phase approximation.

4. Mathematical error analysis

In this section we clarify exactly what retrieval errors Δα are expected given the percent burn-in errors Ωa = 100|ga|/|g| and Ωb = 100|cb|/|c|. This is just a mathematical test to see how these burn-in errors mathematically map into Δα. For this analysis, N = 5000, R = 0, S = 0, and the LIS data are used to obtain the population densities. The usual APM burn-in approach is not followed (i.e., the parameters ηg and ηc are not used or assigned any values). Instead, each of the 100 trials for a given known α is conducted by assigning each element of a as follows: ai = gi + (2r − 1)γ, where r is a uniform random variable. An analogous expression is used to assign each element of b. Negative components are resampled, and the vectors are appropriately normalized. As before, each trial consists of N = 5000 observations that are randomly drawn from the population densities.

The test results are provided in Table 5. As γ increases, the values of Δα, Ωa, and Ωb all increase, as expected.

Table 5.

The effect of percent errors (Ωa, Ωb) in the burn-in approximation on the ground flash fraction retrieval error.

The effect of percent errors (Ωa, Ωb) in the burn-in approximation on the ground flash fraction retrieval error.
The effect of percent errors (Ωa, Ωb) in the burn-in approximation on the ground flash fraction retrieval error.

Note that for γ = 0.001, the average of Δα is 0.020, which is very close to the average values shown for the nominal runs provided in Fig. 9. Moreover, the corresponding percent errors (Ωa, and Ωb) are both under 4%. This implies that the APM burn-in approach carried out for Fig. 9 is doing quite well. That is, the burn-in approach is good enough to move the climate vectors (a, b) to within about 4% of the true pdfs (g, c), respectively.

The results in Table 5 provide additional valuable insight. Suppose region A did not have independent observations available for conducting a burn-in phase but region B did. One wonders if the burn-in results from region B could be used for region A. Table 5 helps determine if this is feasible. For example, if one wanted to retrieve the ground flash fraction to within (on average) 0.09 in region A, then according to Table 5 a γ of about 0.003 would be needed. That is, the population densities in region A should not be more than about 10.4% different from the population densities in region B; that is, Ωa = 10.40% and Ωb = 10.46%. Note from Table 5 that the ratio ave(Δα)/ave(Ωa, Ωb) is under about 0.009/1%.

5. Summary

An Analytic Perturbation Method (APM) was introduced for retrieving several important unknowns from satellite lightning mapper observations of N lightning flashes. The value of N must be large, typically in the thousands. Each lightning flash illuminates the cloud top, and a lightning mapper registers one or more optical groups per flash. The key lightning mapper observation exploited by the APM is the maximum group area (MGA) of each of the N flashes. Hence, the APM mathematically inverts these N MGA values to determine the following: the ground flash fraction (i.e., the number of ground flashes divided by N), the probability density functions of the ground and cloud flash MGAs, and the flash type (ground or cloud) of each of the N flashes.

As part of the method, the lightning mapper first collects a large number (≳5000) of samples of both ground and cloud flashes from the region/season of interest that have already been flash typed using an independent validation dataset. This training period is called a “burn in” phase. Alternatively, if independent validation is not available, then the burn-in results from a different region–season can be tried, but with the expectation that APM solution retrieval error will possibly increase. Under the conditions of our analysis, every 1% error in the burn-in process appears to add about 0.009 or less to the APM mean ground flash fraction retrieval error. Long-term drift in the climate vectors (which are used to bias the retrieval results) can be mitigated by repeating the burn-in phase; a more aggressive approach would be to perform the burn-in phase on a regular basis. One could even consider using the APM results themselves to estimate upgrades to the climate vectors; that is, APM solutions obtained prior to ample drift in the climate vectors could in principal be accurate enough to update the climate vectors, thereby effecting a “self-propagating” system. This of course might only work up to a certain limit because of the accumulated effect of retrieval errors over a long period. In such a case, a new burn-in phase would be required to “reset” the system. Ultimately, the training period allows the retrieval algorithm to be applied autonomously to targets of interest during a designated operational phase.

A detailed simulator has been coded to perform extensive performance tests of the APM. The results are encouraging. The overall mean error in the retrieved ground flash fraction is only 0.027 (LIS data run) and 0.018 (OTD data run). The mean ground flash fraction errors as a function of the true ground flash fraction itself are under 0.04. The associated average percent of flashes correctly flash typed are 78.0% and 79.7%, respectively. Therefore, we believe the APM is a viable approach for inverting future GLM-observed flashes, especially over CONUS, where a burn-in phase can be accurately conducted using NLDN data. For other regions, Vaisala Inc.’s global lightning dataset (GLD360) ground flash data could be employed to complete the burn-in phase. Or, the CONUS burn-in results could be used, but with possible increases in solution retrieval error.

Acknowledgments

This research has been supported by the NOAA GOES-R Risk Reduction Program (managed by Ms. Ingrid Guch and Dr. Mark DeMaria), and by the Lightning Imaging Sensor (LIS) project (program manager Ramesh Kakar, NASA Headquarters) as part of the NASA Earth Science Enterprise (ESE) Earth Observing System (EOS) project. Special thanks to Dr. Steve Goodman, senior (chief) scientist, GOES-R System Program, for his guidance throughout this work effort.

REFERENCES

REFERENCES
Christian
,
H. J.
,
R. J.
Blakeslee
, and
S. J.
Goodman
,
1992
: Lightning Imaging Sensor for the Earth Observing System. NASA Tech. Memo. NASA TM-4350, 44 pp.
Christian
,
H. J.
,
K. T.
Driscoll
,
S. J.
Goodman
,
R. J.
Blakeslee
,
D. M.
Mach
, and
D. E.
Buechler
,
1996
: The Optical Transient Detector (OTD). Proceedings of the 10th International Conference on Atmospheric Electricity, ICAE, 368–371.
Christian
,
H. J.
, and Coauthors
,
1999
: The Lightning Imaging Sensor. 11th International Conference on Atmospheric Electricity, NASA Conf. Proc. NASA/CP-
1999
-
209261
, 746–749.
Christian
,
H. J.
, and Coauthors
,
2003
:
Global frequency and distribution of lightning as observed from space by the Optical Transient Detector
.
J. Geophys. Res.
,
108
, 4005, doi:.
Cummins
,
K. L.
, and
M. J.
Murphy
,
2009
:
An overview of lightning locating systems: History, techniques, and data uses, with an in-depth look at the U.S. NLDN
.
IEEE Trans. Electromagn. Compat.
,
51
,
499
518
, doi:.
Goodman
,
S. J.
, and Coauthors
,
2013
:
The GOES-R Geostationary Lightning Mapper (GLM)
.
Atmos. Res.
,
125–126
,
34
49
, doi:.
Hutchins
,
M. L.
,
R. H.
Holzworth
,
J. B.
Brundell
, and
C. J.
Rodger
,
2012
:
Relative detection efficiency of the World Wide Lightning Location Network
.
Radio Sci.
,
47
, RS6005, doi:.
Koshak
,
W. J.
,
2010
:
Optical characteristics of OTD flashes and the implications for flash-type discrimination
.
J. Atmos. Oceanic Technol.
,
27
,
1822
1838
, doi:.
Koshak
,
W. J.
,
2011
:
A mixed exponential distribution model for retrieving ground flash fraction from satellite lightning imager data
.
J. Atmos. Oceanic Technol.
,
28
,
475
492
, doi:.
Koshak
,
W. J.
, and
R. J.
Solakiewicz
,
2011
:
Retrieving the fraction of ground flashes from satellite lightning imager data using CONUS-based optical statistics
.
J. Atmos. Oceanic Technol.
,
28
,
459
473
, doi:.
Koshak
,
W. J.
, and
R. J.
Solakiewicz
,
2012
: The ground flash fraction retrieval algorithm employing differential evolution: Simulations and applications. NOAA Satellite Science Week Meeting, Kansas City, MO, NOAA. [Available online at http://www.goes-r.gov/downloads/2012-Science-Week/poster-abstracts-tues/06_Koshak.pdf.]
Mach
,
D. M.
,
H. J.
Christian
,
R. J.
Blakeslee
,
D. J.
Boccippio
,
S. J.
Goodman
, and
W. L.
Boeck
,
2007
:
Performance assessment of the Optical Transient Detector and Lightning Imaging Sensor
.
J. Geophys. Res.
,
112
, D09210, doi:.
Rudlosky
,
S. D.
, and
D. T.
Shea
,
2013
:
Evaluating WWLLN performance relative to TRMM/LIS
.
Geophys. Res. Lett.
,
40
, 2344–2348, doi:.