1. Introduction
The studies by Koshak (2007) and Koshak (2010) provided the first detailed statistical distributions of ground and cloud flash optical characteristics measured from the Optical Transient Detector (OTD). It was found that these distributions overlapped considerably, thereby making it difficult to build an algorithm that can discriminate between ground and cloud flashes. However, it was also found that the means of these distributions were quite different for ground and cloud flashes.
Therefore, following the recommendation in Koshak (2010), our approach to the problem of flash-type discrimination is to consider mean optical statistics rather than individual optical measurements. Conceptually, we use the Central Limit Theorem of statistics to convert the original overlapping optical distributions into distributions of the means (see Fig. 10 of Koshak 2010). The distributions of the means have little overlap when the means are taken over a sufficiently large number of flashes. Consequently, we focus on retrieving the fraction of ground flashes in a set of N flashes instead of discriminating flashes on a flash-by-flash basis. We give special attention to two important optical parameters, the maximum number of events in a group (MNEG) and the maximum group area (MGA), since these were cited in Koshak (2010) as particularly useful variables for ground flash fraction retrieval.
By obtaining the ground flash fraction, one can determine the ratio Z of cloud flashes to ground flashes. The Z ratio is thought to be particularly useful in a number of areas including severe weather warning, lightning–convection relationships, lightning nitrogen oxide (NOx) production, the contribution of lightning to the global electric circuit, and cross-sensor validation (see Koshak 2010 and Boccippio et al. 2001 for further discussion).
In this study, we introduce a technique for retrieving the ground flash fraction (and hence the Z ratio) of a set of N lightning that occurs within a specific region and that is observed by a spaced-based lightning imager [e.g., OTD, the Lightning Imaging Sensor (LIS), or the future GOES-R Geostationary Lightning Mapper (GLM)]. The retrieval method and the associated retrieval error theory are described in sections 2–4. A more general version of the retrieval method is introduced in section 5. Section 6 discusses the relationship between the simple and generalized forms of the retrieval method, and section 7 discusses solution nonuniqueness. Section 8 shows graphical representations of the solution retrieval process and illustrates how retrieval errors are reduced when the sample size of observations is increased. Finally, section 9 applies the retrieval method to actual conterminous United States (CONUS) OTD lightning data that have been partitioned into ground and cloud flashes using independent ground-based observations; this assesses the accuracy of the retrieval method. The retrieval errors are shown to be encouragingly small when an optimal space-based lightning imager observable [such as MNEG or MGA] is used.
2. The mean equation and ground flash fraction
Consider a set of i = 1, … , N flashes that are observed over a time period Δt by a satellite lightning imager (e.g., a low earth-orbiting sensor like the LIS or OTD, or a geostationary sensor like GLM). As shown in the example of Fig. 1, each observed ground flash is indicated by a “g” and each cloud flash by a “c”; a small value of N is shown solely for brevity and is no indication of an acceptable value of N (indeed it will be shown later that N must be sufficiently large to bring retrieval errors down to an acceptable level).
A set of N flashes occurring in a region during time period Δt. A “g” denotes a ground flash, and a “c” denotes a cloud flash. The desire is to retrieve the fraction of ground flashes.
Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1
For each of the N flashes, the sensor measures a particular flash optical characteristic x. For example, this characteristic could be any one of the following: flash radiance, flash area, flash duration, the number of optical groups in the flash, the number of optical events in the flash, the maximum number of events in a 2-ms sensor frame time for a given flash, radiance of the first event in the flash, radiance of the brightest group, maximum number of events in a group, maximum group area, and so on. Here, the basic terminology of OTD/LIS data is used; that is, a flash is composed of optical groups, and each optical group is composed of optical events (see Mach et al. 2007).
Note that x is not limited to flash-level properties; for example, one could use the area of the first group in a flash rather than flash area itself or both. In general, one is free to choose any optical information from the optical data (including concocting derived variables from the data); hence, the list of possible optical characteristics is virtually unlimited. However, a certain set of optical characteristics will outperform another set, in general. Based on numerical results provided in section 9, we are able to recommend reasonably optimal optical characteristics to employ.
3. The applied form of the mean equation
4. The single-characteristic solution and associated retrieval errors




5. The multiple-characteristic solution


The retrieval error associated with the multiple-characteristic solution is derived in the following section. As one might expect, the retrieval error is related to ρk in (8).
6. The relationship between the single- and multiple-characteristic solutions



7. Solution nonuniqueness


In a real retrieval problem the errors will be nonzero, the discriminant will be nonpositive, and so the solution in (23) provides, in general, two complex roots. We would be forced of course to pick the real part of this solution (which is just the multiple-characteristic solution for an arbitrary f). Moreover, employing the complex solution in (23) is of no help because it will drive L to zero no matter what value of f is chosen (even if e is large). In other words, one f is as good as another, and so the generalized cost function does not help us pick an optimum f.
However, from the second and third equations in (25), one can see that it is also possible to arrive at the absolute minimum L = 0 by employing an arbitrary value of α′ between 0 and 1 and a value of f for which egk = eck = ɛk (case A = 0), or by employing the multiple-characteristic solution and a value of f for which D(f) = 0 (case A > 0). In practice, the normal situation will be A > 0 (since by Koshak 2010 the ground and cloud flash mean characteristics are typically distinct); hence, one can perform a numerical minimization of the function F(f) ≡ L(f, αmul) = −D(f)/[4A(f)] to obtain the optimum f; the corresponding ground flash fraction would then be retrieved as αmul(f) = −B(f)/[2A(f)].

The family of lines identifying the nonunique solution space for α′. Any real problem has N finite, so there will be a finite number of nonunique solutions on the lines drawn. The black points shown correspond to the example in Table 2 and are described in the main text.
Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1
The number of possible solutions, the number of distinct values of α′, and the granularity in α′ as a function of the number of flashes observed. Values are rounded at the third decimal place.
To investigate this situation more closely, consider the simple example of only one characteristic (n = 1) and N = 5 flashes. A satellite lightning imager records 5 optical values given by (1, 4, 5, 7, 8), in arbitrary units, for the single lightning characteristic; measurement errors are ignored (i.e.,
An example of solution nonuniqueness for a case of N = 5 flashes where the satellite lightning imager measured 5 values (1, 4, 5, 7, 8) for a particular kth characteristic. The bold italicized line (solution 25) represents the correct solution (i.e., the truth).
To relate the general result in (31) to the specific example in Table 2, the results in Table 2 are plotted in Fig. 2. The 32 solutions in Table 2 correspond to 32 ordered pairs (rck, rgk), but solution 1 has rck undefined and solution 32 has rgk undefined. Solution 11 has the same value of (rck, rgk) as solution 10; similarly, solutions 14, 21, 24, and 29 repeat other solutions. This leaves a total of 25 (=32 − 2 − 5) distinct solution points (rck, rgk) to plot, and these 25 points are shown in Fig. 2 as the black dots. These black dots serve as a reminder that, in any real problem, the value of N is finite so that an infinite number of solutions do not exist.
8. Graphical representation of solution process
The retrieval of the ground flash fraction using the single-characteristic solution provided in (6) can be understood in a graphical way. Since the multiple-characteristic solution is just a linear superposition of single-characteristic solutions as shown in (16), this section also helps one better understand the multiple-characteristic solution.
Figure 3a illustrates the solution process and the role of solution nonuniqueness; it begins with the case of N = 100 observed flashes. For these 100 flashes, the true value of the ground flash fraction α is assumed to be 0.3. So this implies that there are Ng = 30 ground flashes and Nc = 70 cloud flashes. We consider just one characteristic, say MNEG, so all “k” subscripts are dropped from appropriate variables. To retrieve an answer using the single-characteristic solution in (6), we must have a sensor measurement
(a) A graphical representation of the relationship between the retrieved ground flash fraction (intersection of green lines), the true ground flash fraction (intersection of black lines), the “lines of ambiguity”, and the distributions involved. The peaks of the normal distributions have been scaled to unity for plot clarity; the distribution of
Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1
The true mean value of the ground flash MNEG optical characteristic is given by the horizontal black line fg =
Note that for the finite sampling of N flashes, the mean values of
The most probable solution αMP is associated with the intersection of the horizontal green line fg = μg and the vertical green line fc = μc. These two lines intersect at point C. The value αMP is shown at the top middle of the plot, in green, as 0.2445. The resulting retrieval error in this example is ρ = αMP −α = −0.0555, as shown in the upper-right portion of the plot. Note that substituting αMP in for α in (33) defines another LOA (the green slanted line through the points A and C).
Figure 3b shows what happens when the value of N increases. In this sensitivity analysis, the values (
(b) A continuation of (a) for larger values of the number of flashes N analyzed. Note that as N is increased, the retrieval error, ρ (“rho”), decreases. Neglecting satellite measurement errors, absolute convergence of ρ to zero will only occur if the means of the ground (red) and cloud (blue) normal distributions match the actual population means for the particular geographic location studied.
Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1
Finally, Fig. 3c shows the same type of results as in Fig. 3b except that a larger truth value of α = 0.7 is assumed. Because of (8), and since eg < ec < 0 holds in these two examples, the retrieval error in Fig. 3c is larger in magnitude than in Fig. 3b. That is, ρ is proportional to αeg + (1 − α)ec when no satellite measurement errors are present. This equals 0.3eg + 0.7ec for the case in Fig. 3b, and the (more negative) value 0.7eg + 0.3ec for the case in Fig. 3c.
(c) As in the previous figure, but for a larger value of the ground flash fraction; that is, α = 0.7.
Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1
9. Numerical tests
To test our retrieval methods, we applied them to actual CONUS OTD data. NLDN data was used to independently determine what the true ground flash fraction was within any particular region of the CONUS. Comparing our ground flash fraction retrieval to the known value enabled us to directly assess retrieval errors.
A total of 52 locations across the CONUS were considered. At each location, a total of N = 1000 OTD flashes were analyzed; that is, a circular ring was centered on each location, and the ring radius was increased until it enclosed 1000 flashes. Since these 1000 flashes were partitioned into ground and cloud flashes using NLDN, the true ground flash fraction was obtained for each of the 52 circular regions.
Figure 4 shows the cloud flashes (blue dots) and ground flashes (red dots) across CONUS, and the total number of each of these flashes is provided in the upper-right-hand corner of each plot. To test retrievals of larger ground flash fraction values, we simply removed some cloud flashes; this implies making the circular ring larger so that N remains at a value of 1000. The NLDN-confirmed average (±standard deviation) ground flash fraction for the 52 regions is provided in the upper-left-hand corner of each plot. Note that the 52 locations (i.e., centers of each circular region) are easiest to see in the bottom plot of Fig. 4.
The rms retrieval error ρrms in the ground flash fraction for 52 circular regions (centered on the black dots) analyzed across CONUS using the single-characteristic solution with the MNEG optical characteristic. Some cloud flashes are removed in the middle and bottom plots to increase the (spatially dependent) known test ground flash fraction in each region. The CONUS-averaged value
Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1
A retrieval using (35) is performed for each of the 52 regions, and the retrieval errors (ρ1, ρ2, … , ρ52) are obtained. The root-mean-square (rms) retrieval error ρrms for the 52 regions is then computed; the value of ρrms is provided in the upper left of each plot in Fig. 4. Since the ground flash fraction varies in general from 0−1, we regard the rms errors shown as acceptably small; that is, ρrms is a reasonably small fraction of unity (the full-scale range of the ground flash fraction). Hence, the single-characteristic solution in (35) appears to be a reasonable way for estimating ground flash fraction across CONUS.
Table 3 shows an example of how the other optical characteristics performed for the middle plot in Fig. 4 [note: the value
Example of the variability of the 7 optical characteristics across CONUS and the associated rms retrieval error. The MNEG was slightly better than MGA in most cases that we looked at, but MGA slightly beats MNEG in the case given below. Some multiple-characteristic solutions are shown (last three rows) for comparison.
10. Summary
It was pointed out in Koshak (2010) that the distributions of optical properties for ground and cloud flashes overlap extensively, and this makes flash-type discrimination fundamentally difficult. However, Koshak (2010) also showed that the mean optical properties of the ground and cloud distributions are quite distinct and therefore suggested that the mean data from a finite sampling of flashes should be closely examined to infer the relative frequency of ground and cloud flashes within the sample. This is the course of action we have undertaken in this paper.
We have introduced a theory for relating mean optical properties of a set of N flashes to the ground flash fraction. The fundamental equation is given in (3), and it is essentially an expression of mixtures. That is, when a set of ground flashes (having some mean optical characteristic) is mixed with a set of cloud flashes (having some distinctly different mean optical characteristic), the mean optical characteristic of the mixture depends on how many ground and cloud flashes are mixed. This mixture process is analogous to what one finds in basic chemistry where various chemical types are mixed together in a flask; in our problem, there are only two “chemical types” (i.e., ground flashes and cloud flashes).
Hence, average cloud-top lightning optical characteristics derived from OTD/LIS, or the future GLM, implicitly provide information about the relative number of ground and cloud flashes in a set of N observed flashes. The questions are as follows: how well can this information be extracted, and what is the best approach for doing so? This paper represents just an initial attempt to retrieve the ground flash fraction information. It is important to note that the retrieval problem is fundamentally difficult, and there are many different ways to approach the retrieval problem. The straightforward mean mixing theory used here is just one first-order approach, and we believe it will not be the final (optimal) approach. Nonetheless, it is an important first step that elucidates many aspects of the overall inverse problem.
Though the mixing theory in (3) is straightforward, inclusion of practical errors and the desire to explore the use of using multiple optical characteristics in a single retrieval complicates the mathematics. The generalization of (3) to include practical errors is given in (5), and this leads to the basic expression in (6) for retrieving the ground flash fraction. The formula for extending the retrieval to multiple optical characteristics is provided in (15). The relationship between the single and multiple optical characteristic solutions has been derived, and analytic expressions for the retrieval errors associated with each methodology are provided. Rather than using the multiple-characteristic solution process, we demonstrate that it is best to test several optical characteristics first (e.g., via simulated retrievals using the single-characteristic solution) and then choose from this set the best performer. In addition, solution nonuniqueness is discussed in detail, and practical illustrations of solution ambiguity are provided.
In an attempt to “pull everything together,” we provide graphical representations of the single-characteristic solution retrieval process. Figure 3 illustrated how the distributions of the mean ground and cloud flash optical characteristics “pick out” a solution given the mean mixture observation
To directly understand how well the single-characteristic solution retrieval works across CONUS, we tested it for 52 different regions using actual OTD lightning data. The OTD data were independently partitioned into ground and cloud flashes using NLDN data, so that the true ground flash fraction was known for each region. Comparing our retrieval results with the truth allowed us to compute retrieval errors. The rms errors were encouragingly small (less than 11.1% in all cases, and as low as 6.1%). This implies that the sample ground and cloud flash CONUS means of MNEG are in fact reasonable respective estimates of the true ground and cloud flash mean MNEG at any given region (of the 52 regions we examined across CONUS).
If the CONUS means of MNEG (or MGA) are reasonable estimates of the true population mean MNEG (or MGA) values at arbitrary locations across the globe, then the single-characteristic solution retrieval method could in principle be applied worldwide. Even though the CONUS has diverse lightning, diverse thunderstorm types, and a variety of cloud morphologies with highly distinct light scattering properties, we have found that the mean MNEG (and MGA) optical properties do not fluctuate to the degree that would make ground flash fraction retrieval errors unacceptably large across CONUS. Nonetheless, the authors are of the opinion that an optimal retrieval algorithm would attempt to retrieve not only the ground flash fraction but the specific population mean MNEG (or MGA) values for the arbitrary region of the world under consideration. Within the mathematical framework we have provided here, such an attempt leads to solution nonuniqueness, but other approaches might be possible.
Acknowledgments
This research has been supported by the NOAA/NESDIS/STAR GOES-R Risk Reduction Program under Space Act Agreement No. NA07AANEG0284 (Ms. Ingrid Guch, Chief, NOAA/NESDIS/STAR Cooperative Research Programs Division, Dr. Mark DeMaria Chief, NOAA/NESDIS Regional and Mesoscale Meteorology Branch, and Dr. Steven J. Goodman, Senior (Chief) Scientist, GOES-R System Program), and by the Lightning Imaging Sensor (LIS) project (Program Manager, Ramesh Kakar, NASA Headquarters) as part of the NASA Earth Science Enterprise (ESE) Earth Observing system (EOS) project. We would also like to express our thanks to the NASA Postdoctoral Program (NPP) under which co-author Dr. Richard Solakiewicz served during a portion of this research effort.
APPENDIX
The Roots of the Discriminant Function
In addition, note that the discriminant can equal zero even when the errors are nonzero. For example, setting egk = eck =
REFERENCES
Boccippio, D. J., Cummins K. L. , Christian H. J. , and Goodman S. J. , 2001: Combined satellite- and surface-based estimation of the intracloud-cloud-to-ground lightning ratio over the continental United States. Mon. Wea. Rev., 129, 108–122.
Koshak, W. J., 2007: OTD observations of continental US ground and cloud flashes. Proc. 13th Int. Conf. on Atmospheric Electricity, Beijing China, ICAE, 823–826.
Koshak, W. J., 2010: Optical characteristics of OTD Flashes and the implications for flash-type discrimination. J. Atmos. Oceanic Technol., 27, 1822–1838.
Mach, D. M., Christian H. J. , Blakeslee R. J. , Boccippio D. J. , Goodman S. J. , and Boeck W. L. , 2007: Performance assessment of the Optical Transient Detector and Lightning Imaging Sensor. J. Geophys. Res., 112, D09210, doi:10.1029/2006JD007787.