Retrieving the Fraction of Ground Flashes from Satellite Lightning Imager Data Using CONUS-Based Optical Statistics

W. J. Koshak NASA Marshall Space Flight Center, Huntsville, Alabama

Search for other papers by W. J. Koshak in
Current site
Google Scholar
PubMed
Close
and
R. J. Solakiewicz Chicago State University, Chicago, Illinois

Search for other papers by R. J. Solakiewicz in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

A retrieval method is introduced for estimating the fraction of ground flashes in a set of N flashes observed from either a low earth-orbiting or geostationary satellite lightning imager. The methodology exploits the fact that mean optical characteristics of ground and cloud flashes differ, and hence a properly posed equation set for mean conditions of a set of N observed flashes can be mathematically inverted to estimate the ground flash fraction (and hence the cloud flash-to-ground flash ratio). Explicit analytic expressions for the retrieval errors are derived, and numerical tests of the retrieval method are provided to quantify retrieval accuracy. It has been found that the retrieval method works best when only one optimum optical parameter is used (the single-characteristic solution approach) rather than a mixture of optical parameters (the multiple-characteristic solution approach); that is, the suboptimum optical parameters in the mix degrade retrieval accuracy. Since the retrieval method uses conterminous United States (CONUS)-averaged values of the lightning optical measurements, retrieval errors tend to be smallest in geographical regions whose specific mean lightning optical measurements are closest to the CONUS mean values. The rms ground flash fraction retrieval errors for 52 widely distributed regions across CONUS ranged from as low as 0.061 to 0.111, depending on the true ground flash fraction sought.

Corresponding author address: Dr. William Koshak, Earth Science Office, VP61, NASA Marshall Space Flight Center, Robert Cramer Research Hall, 320 Sparkman Dr., Huntsville, AL 35805. E-mail: william.koshak@nasa.gov

Abstract

A retrieval method is introduced for estimating the fraction of ground flashes in a set of N flashes observed from either a low earth-orbiting or geostationary satellite lightning imager. The methodology exploits the fact that mean optical characteristics of ground and cloud flashes differ, and hence a properly posed equation set for mean conditions of a set of N observed flashes can be mathematically inverted to estimate the ground flash fraction (and hence the cloud flash-to-ground flash ratio). Explicit analytic expressions for the retrieval errors are derived, and numerical tests of the retrieval method are provided to quantify retrieval accuracy. It has been found that the retrieval method works best when only one optimum optical parameter is used (the single-characteristic solution approach) rather than a mixture of optical parameters (the multiple-characteristic solution approach); that is, the suboptimum optical parameters in the mix degrade retrieval accuracy. Since the retrieval method uses conterminous United States (CONUS)-averaged values of the lightning optical measurements, retrieval errors tend to be smallest in geographical regions whose specific mean lightning optical measurements are closest to the CONUS mean values. The rms ground flash fraction retrieval errors for 52 widely distributed regions across CONUS ranged from as low as 0.061 to 0.111, depending on the true ground flash fraction sought.

Corresponding author address: Dr. William Koshak, Earth Science Office, VP61, NASA Marshall Space Flight Center, Robert Cramer Research Hall, 320 Sparkman Dr., Huntsville, AL 35805. E-mail: william.koshak@nasa.gov

1. Introduction

The studies by Koshak (2007) and Koshak (2010) provided the first detailed statistical distributions of ground and cloud flash optical characteristics measured from the Optical Transient Detector (OTD). It was found that these distributions overlapped considerably, thereby making it difficult to build an algorithm that can discriminate between ground and cloud flashes. However, it was also found that the means of these distributions were quite different for ground and cloud flashes.

Therefore, following the recommendation in Koshak (2010), our approach to the problem of flash-type discrimination is to consider mean optical statistics rather than individual optical measurements. Conceptually, we use the Central Limit Theorem of statistics to convert the original overlapping optical distributions into distributions of the means (see Fig. 10 of Koshak 2010). The distributions of the means have little overlap when the means are taken over a sufficiently large number of flashes. Consequently, we focus on retrieving the fraction of ground flashes in a set of N flashes instead of discriminating flashes on a flash-by-flash basis. We give special attention to two important optical parameters, the maximum number of events in a group (MNEG) and the maximum group area (MGA), since these were cited in Koshak (2010) as particularly useful variables for ground flash fraction retrieval.

By obtaining the ground flash fraction, one can determine the ratio Z of cloud flashes to ground flashes. The Z ratio is thought to be particularly useful in a number of areas including severe weather warning, lightning–convection relationships, lightning nitrogen oxide (NOx) production, the contribution of lightning to the global electric circuit, and cross-sensor validation (see Koshak 2010 and Boccippio et al. 2001 for further discussion).

In this study, we introduce a technique for retrieving the ground flash fraction (and hence the Z ratio) of a set of N lightning that occurs within a specific region and that is observed by a spaced-based lightning imager [e.g., OTD, the Lightning Imaging Sensor (LIS), or the future GOES-R Geostationary Lightning Mapper (GLM)]. The retrieval method and the associated retrieval error theory are described in sections 24. A more general version of the retrieval method is introduced in section 5. Section 6 discusses the relationship between the simple and generalized forms of the retrieval method, and section 7 discusses solution nonuniqueness. Section 8 shows graphical representations of the solution retrieval process and illustrates how retrieval errors are reduced when the sample size of observations is increased. Finally, section 9 applies the retrieval method to actual conterminous United States (CONUS) OTD lightning data that have been partitioned into ground and cloud flashes using independent ground-based observations; this assesses the accuracy of the retrieval method. The retrieval errors are shown to be encouragingly small when an optimal space-based lightning imager observable [such as MNEG or MGA] is used.

2. The mean equation and ground flash fraction

Consider a set of i = 1, … , N flashes that are observed over a time period Δt by a satellite lightning imager (e.g., a low earth-orbiting sensor like the LIS or OTD, or a geostationary sensor like GLM). As shown in the example of Fig. 1, each observed ground flash is indicated by a “g” and each cloud flash by a “c”; a small value of N is shown solely for brevity and is no indication of an acceptable value of N (indeed it will be shown later that N must be sufficiently large to bring retrieval errors down to an acceptable level).

Fig. 1.
Fig. 1.

A set of N flashes occurring in a region during time period Δt. A “g” denotes a ground flash, and a “c” denotes a cloud flash. The desire is to retrieve the fraction of ground flashes.

Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1

For each of the N flashes, the sensor measures a particular flash optical characteristic x. For example, this characteristic could be any one of the following: flash radiance, flash area, flash duration, the number of optical groups in the flash, the number of optical events in the flash, the maximum number of events in a 2-ms sensor frame time for a given flash, radiance of the first event in the flash, radiance of the brightest group, maximum number of events in a group, maximum group area, and so on. Here, the basic terminology of OTD/LIS data is used; that is, a flash is composed of optical groups, and each optical group is composed of optical events (see Mach et al. 2007).

Note that x is not limited to flash-level properties; for example, one could use the area of the first group in a flash rather than flash area itself or both. In general, one is free to choose any optical information from the optical data (including concocting derived variables from the data); hence, the list of possible optical characteristics is virtually unlimited. However, a certain set of optical characteristics will outperform another set, in general. Based on numerical results provided in section 9, we are able to recommend reasonably optimal optical characteristics to employ.

Considering a set of k = 1, … , n characteristics given by (xi1, xi2, … , xin) for the ith observed flash, the average of the kth characteristic across the N flashes is
e1
where the mean ground and cloud flash characteristics are
e2
So (1) can be rewritten as
e3
The parameter α is the ground flash fraction that we are interested in retrieving from the m = Nn satellite measurements given by xik, with i = 1, … , N; k = 1, … , n. The first equation in (3) is fundamental. It expresses the mean of a particular optical characteristic as a weighted mixture of the associated ground and cloud flash optical properties. For example, if the ground flash fraction is unity (all N flashes are ground flashes) then the mean optical characteristic is simply the mean optical characteristic of the ground flashes.

3. The applied form of the mean equation

In any real problem, one must be cognizant of sensor measurement errors ɛik and also those errors (egk, eck) involved with estimating the unknowns (xgk, xck). Inclusion of these errors leads to the following set of expressions:
e4
The first expression in (4) can be derived by simply defining the measurement of the kth characteristic in the ith flash as qikxik + ɛik and then averaging this expression over the N flashes. The remaining two equations in (4) are definitions. Using (4), one obtains the generalization of the first equation in (3) as
e5
The left side of this equation is the average (over all N flashes) of the actual sensor measurement values of the kth characteristic. [So, for example, if the kth characteristic was flash radiance, one would take the average of the N flash radiances to compute qk.] The radiance measurement errors are (ɛlk, … , ɛNk) with average error ɛk [this appears as the last term on the right side of (5)].

4. The single-characteristic solution and associated retrieval errors

It is possible to retrieve the ground flash fraction using a single characteristic. Using the form given in the first equation of (3), we have qk = αkfgk + (1 − αk)fck. That is, the same form in (3) is obeyed, and when the quantities (qk, fgk, fck) are respective approximations to the quantities (xk, xgk, xck), then αk approximates α. Hence, we immediately obtain the single-characteristic solution αk given by
e6
Whereas the value qk on the right-hand side of (6) is provided by the sensor, the variables (fgk, fck) are respective estimates of (xgk, xck). In principle, any reasonable estimates of (xgk, xck) can be used. In this writing we note that the variables (xgk, xck) each have a statistical distribution with respective population means (μgk, μck). So our approach is to simply set , where are sample mean estimates of the population means; that is, (see section 8 for additional details). Hence, the right-hand side of (6) can be computed to obtain a value for αk. In what follows, and to maintain generality, we continue using the (fgk, fck) notation in (6) with the understanding that one can employ any reasonable method to assign values to (fgk, fck), and that the assignment is just one particular approach.
Now, the retrieval error is
e7
Using the relationships in (4), the expression in (7) can be rewritten (after some algebra) as
e8
Hence, using the last two equations in (4), the retrieval error has the following important properties:
e9
This means that whenever the estimate fgk is very close in value to the estimate fck, the denominator in (8) will be very close to zero, which results in magnifying the errors in the numerator of (8). This results in the magnitude of ρk being unacceptably large. Conversely, when the estimate fgk is sufficiently distinct in value from the estimate fck, ρk will be a smaller value and can even be zero if the first condition cited in the first equation of (9) is met.

5. The multiple-characteristic solution

Given the mean equation in (5), one can consider generalizing the single-characteristic solution by simultaneously considering all k = 1, … , n characteristics in the retrieval process. This is done by minimizing a scalar cost function of the form
e10
where, once again, the values of (fgk, fck) are fixed based on assigning them reasonable values; for example, by making the assignments as discussed in the previous section. Minimizing (10) allows one to find the value of α′ in the model {αfgk + (1 − α′)fck} that optimally describes the measurements qk, with k = 1, … , n.
In this approach (and for analyses in subsequent sections to follow), the following short-hand notation is invoked for convenience:
e11
Carrying out the algebra implied by (10) gives a quadratic result
e12
where the coefficients in the quadratic (and their signs) are
e13
For brevity, the dependence of B and C on ɛ has been suppressed. The minimum of the quadratic function L(α′) is obtained by taking the derivative and equating to zero
e14
Solving (14) yields the multiple-characteristic solution αmul given by
e15
where f is fixed by assignment. Note that (15) is in fact a minimum of L since the second derivative test gives d2L/2 = 2A(f) ≥ 0, where the degenerate case A(f) = 0 ⇒ fgk = fck can be ignored since it corresponds to a singularity—that is, an inability to retrieve the ground flash fraction. For example, the condition was shown not to hold for a variety of characteristics examined in Koshak (2010).

The retrieval error associated with the multiple-characteristic solution is derived in the following section. As one might expect, the retrieval error is related to ρk in (8).

6. The relationship between the single- and multiple-characteristic solutions

It is natural to wonder what the relationship is between the multiple-characteristic solution in (15) and the single-characteristic solution given in (6). The relationship can be found by rewriting (6) as qkfck = αk(fgkfck), and then substituting this into (15) to get
e16
where the scaled weights are
e17
The scaled weights are nonnegative and range from a minimum to a maximum value given by
e18
So the multiple-characteristic solution in (15) is a general solution that reduces to the single-characteristic solution given in (6) when n = 1. It can be viewed as a weighted mean of the single-characteristic solutions. Hence, the characteristic with the largest scaled weight has the greatest influence on the value of αmul. Since each (fgk, fck) is intended to estimate each mean (xgk, xck), the characteristic whose ground flash mean is maximally different from its cloud flash mean will have the greatest influence on the value of αmul.
If each estimate is perfect [i.e., (fgk = xgk, fck = xck) for k = 1, … , n] and there is no measurement error (i.e., ɛk = 0 for k = 1, … , n) then both (6) and (15) reduce to the true ground flash fraction α given in (3); that is,
e19
Here, the first equation in (3) was used as the definition of α.
Finally, (7) can be rewritten as αk = α + ρk, which when substituted into (16) gives
e20
Hence, the retrieval error associated with the multiple-characteristic solution is simply a weighted mean of the individual retrieval errors ρk for each kth characteristic; that is,
e21
This is an interesting result. Given a set of characteristics associated with the independent retrieval errors (ρ1, … , ρn), one can always order the numbering of the characteristics such that (|ρ1| ≤ |ρ2| ≤ |ρ3| ≤ … ≤ |ρn|). Since the values of the scaled weights range between 0 − 1, and since the values of ρk are arbitrary (i.e., negative, zero, or positive), one can get both constructive and destructive interference of errors from the linear superposition in (21). In other words, could possibly hold in the case of destructive interference. However, one does not expect to be this lucky. What we have found (e.g., Table 3 of section 9) is that .

7. Solution nonuniqueness

Up to this point, we have considered the vector f to be fixed since we assigned its components to reasonable estimates [e.g., as discussed in sections 4 and 5]. However, instead of making these assignments to f, one might wonder if it is possible to retrieve optimum values for (f, α′) by minimizing the cost function
e22
This cost function is identical to (10) and (12) except that now L is considered as a function of the variables (f, α′) rather than of just α′.
Since L in (22) is a sum of squared terms, we must have L ≥ 0. Therefore, the absolute minimum of L is clearly L = 0. Given an arbitrary real vector f, the roots of L (f, α′) are determined by setting (22) to zero. This yields the standard quadratic equation , with solution
e23
Here, the discriminant function is
e24
Note that the first term on the right-hand side of (23) is just the expression for the multiple-characteristic solution given in (15). Furthermore, we get the following relationships
e25
The second equation in (25) is obtained by noting from (13) that fgk = fck when A = 0 and applying (5). The third equation in (25) is obtained by simply evaluating (22) at α′ = αmul given in (15). In addition, A = 0 implies that D = 0 [i.e., A = 0 implies that fgk = fck, which implies that B = 0; hence D = B2 − 4AC = (0)2 − 4(0)C = 0, where C ≥ 0 from the third equation in (13)]. Using this result and rearranging the third equation in (25) gives
e26
Since L and A are each a sum of squares, L ≥ 0 and A ≥ 0 must hold so that the discriminant is nonpositive
e27
Additionally, the appendix shows that the discriminant is zero for the case A > 0 when each single-characteristic solution given by (6) is equivalent to the multiple-characteristic solution; that is,
e28
The appendix also shows that the discriminant is negative solely because of the errors (e, ɛ), and the discriminant is zero when these errors are zero; that is, D(e = 0, ɛ = 0) = 0. A zero discriminant can also occur with nonzero errors if the errors cancel each other out (destructive interference).

In a real retrieval problem the errors will be nonzero, the discriminant will be nonpositive, and so the solution in (23) provides, in general, two complex roots. We would be forced of course to pick the real part of this solution (which is just the multiple-characteristic solution for an arbitrary f). Moreover, employing the complex solution in (23) is of no help because it will drive L to zero no matter what value of f is chosen (even if e is large). In other words, one f is as good as another, and so the generalized cost function does not help us pick an optimum f.

However, from the second and third equations in (25), one can see that it is also possible to arrive at the absolute minimum L = 0 by employing an arbitrary value of α′ between 0 and 1 and a value of f for which egk = eck = ɛk (case A = 0), or by employing the multiple-characteristic solution and a value of f for which D(f) = 0 (case A > 0). In practice, the normal situation will be A > 0 (since by Koshak 2010 the ground and cloud flash mean characteristics are typically distinct); hence, one can perform a numerical minimization of the function F(f) ≡ L(f, αmul) = −D(f)/[4A(f)] to obtain the optimum f; the corresponding ground flash fraction would then be retrieved as αmul(f) = −B(f)/[2A(f)].

Unfortunately, this approach also will not work. The reason is as follows. Given a ground flash fraction α′ = αmul, the value of L = 0 when each kth summand in (22) is zero; that is, . Rearranging this expression and using (26) gives
e29
This is just an alternate expression of what is given in (A4) of the appendix. Now, when one attempts to solve the problem by minimizing F(f), (29) says that there are actually many possible values of f that make F(f) = 0. In fact, there are an infinity of solutions since the expression on the left-hand side in (29) is the equation of a line for the continuous variables fgk and fck; we call this line a line of ambiguity (LOA). [Note: there are actually k = 1, … , n LOAs defined in (29), each with a slope (αmul − 1)/αmul and “y-intercept” qk/αmul.] In effect, there is more than one solution f that can perfectly generate the observed mean values (q1, … , qn). This is what is meant by “solution nonuniqueness.” Hence, the solution obtained by minimizing F(f) depends on the starting point in the parameter search space. Few iterations are required, and the search quickly terminates at a nearby zero of F(f). This situation is obviously unacceptable because one is just picking a mathematically acceptable solution, but the solution can be quite far from the correct answer (the truth).
When α′ is arbitrary—that is, not necessarily equal to the multiple-characteristic solution—then an examination of (22) produces an expression more general than in (29) that is given by
e30
Solution nonuniqueness can be illustrated in a compact graphical form by dividing the left-hand side of (30) by qk to obtain the kth “scaled” LOA
e31
where rgkfgk/qk, and rckfck/qk. The acceptable values of α′ are restricted to the range 0–1, and the family of lines given by (31) for the particular values (α′ = 0., 0.1, 0.2, 0.3, … , 1) are provided in Fig. 2. Unacceptable values are also indicated (α′ < 0, α′ > 1, and α′ undefined).
Fig. 2.
Fig. 2.

The family of lines identifying the nonunique solution space for α′. Any real problem has N finite, so there will be a finite number of nonunique solutions on the lines drawn. The black points shown correspond to the example in Table 2 and are described in the main text.

Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1

Once again, because f is allowed to vary continuously, the LOAs defined in (31) result in an infinite number of zeros of L and therefore an infinite number of possible solutions. This is certainly true when one just considers the mathematical form of the cost function. However, in any actual problem, one must remember that the number of flashes observed N is finite. This means that there cannot actually be an infinite number of possible solutions. In general, the number of possible solutions η generated by each kth characteristic is given as a sum of combinatorial terms
e32
The η solutions fall on the lines described by (31). Table 1 shows how quickly the number of possible solutions increases with increasing N. The table also shows how the “granularity” of α′, denoted by G, improves with N; for example, for N = 3, the only possibilities are α′ = 0, ⅓, ⅔, 1 whereas for N = 5, the possibilities are α′ = 0, 0.2, 0.4, 0.6, 0.8, 1.
Table 1.

The number of possible solutions, the number of distinct values of α′, and the granularity in α′ as a function of the number of flashes observed. Values are rounded at the third decimal place.

Table 1.

To investigate this situation more closely, consider the simple example of only one characteristic (n = 1) and N = 5 flashes. A satellite lightning imager records 5 optical values given by (1, 4, 5, 7, 8), in arbitrary units, for the single lightning characteristic; measurement errors are ignored (i.e., qk = xk). Since n = 1, the cost function in (22) has just one (squared) term. From (32) there must be η = 32 possible solutions. Suppose, however, that we also know from independent measurements [say from National Lightning Detection Network (NLDN) data] that the third and fifth flashes are ground flashes, and the first, second, and fourth flashes are cloud flashes. This implies that xgk = (5 + 8)/2 = 6.5, and xck = (1 + 4 + 7)/3 = 4.0, where k = 1. The true ground flash fraction is therefore α′ = α = ⅖ = 0.4, and the average is xk = (1 + 4 + 5 + 7 + 8)/5 = 5.0. Note that the average is appropriately reproduced when the true values (xgk = 6.5, xck = 4.0, α = 0.2) are substituted into the first equation in (3). Now, as shown in Table 2, there are 31 additional choices that also give a value of xk = 5.0. That is, since one does not know in general which flashes are ground flashes and which are cloud flashes, one has to consider all 32 possibilities. The true situation (bold italicized in Table 2) is just one of many possibilities. These multiple (but finite number of) solutions must fall on the LOAs defined by (31).

Table 2.

An example of solution nonuniqueness for a case of N = 5 flashes where the satellite lightning imager measured 5 values (1, 4, 5, 7, 8) for a particular kth characteristic. The bold italicized line (solution 25) represents the correct solution (i.e., the truth).

Table 2.

To relate the general result in (31) to the specific example in Table 2, the results in Table 2 are plotted in Fig. 2. The 32 solutions in Table 2 correspond to 32 ordered pairs (rck, rgk), but solution 1 has rck undefined and solution 32 has rgk undefined. Solution 11 has the same value of (rck, rgk) as solution 10; similarly, solutions 14, 21, 24, and 29 repeat other solutions. This leaves a total of 25 (=32 − 2 − 5) distinct solution points (rck, rgk) to plot, and these 25 points are shown in Fig. 2 as the black dots. These black dots serve as a reminder that, in any real problem, the value of N is finite so that an infinite number of solutions do not exist.

8. Graphical representation of solution process

The retrieval of the ground flash fraction using the single-characteristic solution provided in (6) can be understood in a graphical way. Since the multiple-characteristic solution is just a linear superposition of single-characteristic solutions as shown in (16), this section also helps one better understand the multiple-characteristic solution.

Figure 3a illustrates the solution process and the role of solution nonuniqueness; it begins with the case of N = 100 observed flashes. For these 100 flashes, the true value of the ground flash fraction α is assumed to be 0.3. So this implies that there are Ng = 30 ground flashes and Nc = 70 cloud flashes. We consider just one characteristic, say MNEG, so all “k” subscripts are dropped from appropriate variables. To retrieve an answer using the single-characteristic solution in (6), we must have a sensor measurement x (i.e., ɛ = 0 is assumed here) and two estimates fg and fc.

Fig. 3.
Fig. 3.

(a) A graphical representation of the relationship between the retrieved ground flash fraction (intersection of green lines), the true ground flash fraction (intersection of black lines), the “lines of ambiguity”, and the distributions involved. The peaks of the normal distributions have been scaled to unity for plot clarity; the distribution of xg (red) and xc(blue) are shown.

Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1

Using the form of the first equation in (30) and regarding the variables fg and fc as coordinate axes, we have the LOA given by
e33
This line has a “y intercept” of x/α and an “x intercept” x/(1 − α). The LOA is shown as the black slanting line in Fig. 3a. Note that the LOA passes through the point (x, x) —that is, point A. In nature, the mean MNEG has a statistical distribution. The mean MNEG for ground flashes xg has a distribution as shown in red, and the mean MNEG for cloud flashes xc has a distribution as shown in blue. Because of the Central Limit Theorem, these are each assumed to be close enough to normal distributions since the sample sizes used to compute the means are equal to, or exceed, 30 (i.e., Ng = 30 and Nc = 70 as mentioned above). The peaks of each normal distribution are scaled to unity for plot clarity. The observed mean MNEG of the mixture of ground and cloud flashes x has a value given as “xbar” in the plot and is indicated graphically as the pink box with corner point A.

The true mean value of the ground flash MNEG optical characteristic is given by the horizontal black line fg = xg, and the true mean value of the cloud flash MNEG optical characteristic is given as the black vertical line fc = xc. These two lines intersect at point B. The LOA passing through the points A and B defines the true value α, whose value is shown at the top left of the plot, in black, as 0.3; that is, the slope of the black line through points A and B is (α − 1)/α as indicated in (33).

Note that for the finite sampling of N flashes, the mean values of xg and xc in nature need not be associated with the peaks of the red and blue normal distributions; these peaks only represent the most probable values of xg and xc. To emphasize this point, we show xg to be below its population mean, and we show xc to be above its population mean.

The most probable solution αMP is associated with the intersection of the horizontal green line fg = μg and the vertical green line fc = μc. These two lines intersect at point C. The value αMP is shown at the top middle of the plot, in green, as 0.2445. The resulting retrieval error in this example is ρ = αMPα = −0.0555, as shown in the upper-right portion of the plot. Note that substituting αMP in for α in (33) defines another LOA (the green slanted line through the points A and C).

Now, in any actual problem, we do not know the values of (μc, μg) for a particular geographic location. In the single-characteristic solution approach, we estimate these population means using the CONUS OTD results for MNEG in Koshak (2010); that is,
e34
So errors in these estimates would technically also propagate into the final retrieval error; however, the examples in Fig. 3 neglect this error contribution. In addition, satellite measurement errors are also neglected in these examples.

Figure 3b shows what happens when the value of N increases. In this sensitivity analysis, the values (xc, xg) remain at the same proportionate values used previously; that is, they are always and from their respective means (μc, μg). As one can see, increasing N necessarily decreases the retrieval error ρ. This is a fundamental advantage of this technique, especially since thunderstorms can produce high flash rates, making N large in a relatively short time.

Fig. 3.
Fig. 3.

(b) A continuation of (a) for larger values of the number of flashes N analyzed. Note that as N is increased, the retrieval error, ρ (“rho”), decreases. Neglecting satellite measurement errors, absolute convergence of ρ to zero will only occur if the means of the ground (red) and cloud (blue) normal distributions match the actual population means for the particular geographic location studied.

Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1

Finally, Fig. 3c shows the same type of results as in Fig. 3b except that a larger truth value of α = 0.7 is assumed. Because of (8), and since eg < ec < 0 holds in these two examples, the retrieval error in Fig. 3c is larger in magnitude than in Fig. 3b. That is, ρ is proportional to αeg + (1 − α)ec when no satellite measurement errors are present. This equals 0.3eg + 0.7ec for the case in Fig. 3b, and the (more negative) value 0.7eg + 0.3ec for the case in Fig. 3c.

Fig. 3.
Fig. 3.

(c) As in the previous figure, but for a larger value of the ground flash fraction; that is, α = 0.7.

Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1

9. Numerical tests

To test our retrieval methods, we applied them to actual CONUS OTD data. NLDN data was used to independently determine what the true ground flash fraction was within any particular region of the CONUS. Comparing our ground flash fraction retrieval to the known value enabled us to directly assess retrieval errors.

Both the single- and multiple-characteristic solution methods were tested. For each of these methods, the values (fck, fgk) were estimated using the population mean estimates obtained by Koshak (2010). So, for example, the explicit form of the single-characteristic solution retrieval formula (when the MNEG characteristic is employed) is
e35
where the starred constants are given in (34), and qMNEG is obtained from the OTD data.

A total of 52 locations across the CONUS were considered. At each location, a total of N = 1000 OTD flashes were analyzed; that is, a circular ring was centered on each location, and the ring radius was increased until it enclosed 1000 flashes. Since these 1000 flashes were partitioned into ground and cloud flashes using NLDN, the true ground flash fraction was obtained for each of the 52 circular regions.

Figure 4 shows the cloud flashes (blue dots) and ground flashes (red dots) across CONUS, and the total number of each of these flashes is provided in the upper-right-hand corner of each plot. To test retrievals of larger ground flash fraction values, we simply removed some cloud flashes; this implies making the circular ring larger so that N remains at a value of 1000. The NLDN-confirmed average (±standard deviation) ground flash fraction for the 52 regions is provided in the upper-left-hand corner of each plot. Note that the 52 locations (i.e., centers of each circular region) are easiest to see in the bottom plot of Fig. 4.

Fig. 4.
Fig. 4.

The rms retrieval error ρrms in the ground flash fraction for 52 circular regions (centered on the black dots) analyzed across CONUS using the single-characteristic solution with the MNEG optical characteristic. Some cloud flashes are removed in the middle and bottom plots to increase the (spatially dependent) known test ground flash fraction in each region. The CONUS-averaged value α (±std dev) for the 52 regions is given in the upper-left corner of each plot. See text for additional discussion.

Citation: Journal of Atmospheric and Oceanic Technology 28, 4; 10.1175/2010JTECHA1408.1

A retrieval using (35) is performed for each of the 52 regions, and the retrieval errors (ρ1, ρ2, … , ρ52) are obtained. The root-mean-square (rms) retrieval error ρrms for the 52 regions is then computed; the value of ρrms is provided in the upper left of each plot in Fig. 4. Since the ground flash fraction varies in general from 0−1, we regard the rms errors shown as acceptably small; that is, ρrms is a reasonably small fraction of unity (the full-scale range of the ground flash fraction). Hence, the single-characteristic solution in (35) appears to be a reasonable way for estimating ground flash fraction across CONUS.

Table 3 shows an example of how the other optical characteristics performed for the middle plot in Fig. 4 [note: the value α = 0.506 ± 0.124 shown at the top of Table 3 is just the average of the actual ground flash values for the 52 regions, as also provided in the middle plot of Fig. 4]. In this particular case, the MGA slightly outperformed the MNEG (see last column in Table 3); most of the time the MNEG slightly outperformed MGA for the cases we examined. Table 3 lists the single optical characteristics from worst performer (flash duration) to best performer (MGA). The mean, standard deviation (std dev), and ratio (std dev/mean) of each optical characteristic for the 52 regions are provided. An optical characteristic that fluctuates widely across CONUS would not be a good variable to use in the single-characteristic solution process. That is, estimating the two values (fck, fgk) in (6) with the fixed estimates of the population means obtained by Koshak (2010) is inaccurate if one has evidence that the values (xck, xgk) vary considerably over CONUS. Fortunately, both MGA and MNEG do not vary as much across CONUS as the other optical characteristics shown. Finally, Table 3 shows that the best single-characteristic solutions (i.e., using MGA or MNEG) outperformed the various multiple-characteristic solutions computed.

Table 3.

Example of the variability of the 7 optical characteristics across CONUS and the associated rms retrieval error. The MNEG was slightly better than MGA in most cases that we looked at, but MGA slightly beats MNEG in the case given below. Some multiple-characteristic solutions are shown (last three rows) for comparison.

Table 3.

10. Summary

It was pointed out in Koshak (2010) that the distributions of optical properties for ground and cloud flashes overlap extensively, and this makes flash-type discrimination fundamentally difficult. However, Koshak (2010) also showed that the mean optical properties of the ground and cloud distributions are quite distinct and therefore suggested that the mean data from a finite sampling of flashes should be closely examined to infer the relative frequency of ground and cloud flashes within the sample. This is the course of action we have undertaken in this paper.

We have introduced a theory for relating mean optical properties of a set of N flashes to the ground flash fraction. The fundamental equation is given in (3), and it is essentially an expression of mixtures. That is, when a set of ground flashes (having some mean optical characteristic) is mixed with a set of cloud flashes (having some distinctly different mean optical characteristic), the mean optical characteristic of the mixture depends on how many ground and cloud flashes are mixed. This mixture process is analogous to what one finds in basic chemistry where various chemical types are mixed together in a flask; in our problem, there are only two “chemical types” (i.e., ground flashes and cloud flashes).

Hence, average cloud-top lightning optical characteristics derived from OTD/LIS, or the future GLM, implicitly provide information about the relative number of ground and cloud flashes in a set of N observed flashes. The questions are as follows: how well can this information be extracted, and what is the best approach for doing so? This paper represents just an initial attempt to retrieve the ground flash fraction information. It is important to note that the retrieval problem is fundamentally difficult, and there are many different ways to approach the retrieval problem. The straightforward mean mixing theory used here is just one first-order approach, and we believe it will not be the final (optimal) approach. Nonetheless, it is an important first step that elucidates many aspects of the overall inverse problem.

Though the mixing theory in (3) is straightforward, inclusion of practical errors and the desire to explore the use of using multiple optical characteristics in a single retrieval complicates the mathematics. The generalization of (3) to include practical errors is given in (5), and this leads to the basic expression in (6) for retrieving the ground flash fraction. The formula for extending the retrieval to multiple optical characteristics is provided in (15). The relationship between the single and multiple optical characteristic solutions has been derived, and analytic expressions for the retrieval errors associated with each methodology are provided. Rather than using the multiple-characteristic solution process, we demonstrate that it is best to test several optical characteristics first (e.g., via simulated retrievals using the single-characteristic solution) and then choose from this set the best performer. In addition, solution nonuniqueness is discussed in detail, and practical illustrations of solution ambiguity are provided.

In an attempt to “pull everything together,” we provide graphical representations of the single-characteristic solution retrieval process. Figure 3 illustrated how the distributions of the mean ground and cloud flash optical characteristics “pick out” a solution given the mean mixture observation x. These plots are particularly well suited at simultaneously illustrating the beneficial effects of the Central Limit Theorem; as the sample size N increases, the retrieval errors decrease.

To directly understand how well the single-characteristic solution retrieval works across CONUS, we tested it for 52 different regions using actual OTD lightning data. The OTD data were independently partitioned into ground and cloud flashes using NLDN data, so that the true ground flash fraction was known for each region. Comparing our retrieval results with the truth allowed us to compute retrieval errors. The rms errors were encouragingly small (less than 11.1% in all cases, and as low as 6.1%). This implies that the sample ground and cloud flash CONUS means of MNEG are in fact reasonable respective estimates of the true ground and cloud flash mean MNEG at any given region (of the 52 regions we examined across CONUS).

If the CONUS means of MNEG (or MGA) are reasonable estimates of the true population mean MNEG (or MGA) values at arbitrary locations across the globe, then the single-characteristic solution retrieval method could in principle be applied worldwide. Even though the CONUS has diverse lightning, diverse thunderstorm types, and a variety of cloud morphologies with highly distinct light scattering properties, we have found that the mean MNEG (and MGA) optical properties do not fluctuate to the degree that would make ground flash fraction retrieval errors unacceptably large across CONUS. Nonetheless, the authors are of the opinion that an optimal retrieval algorithm would attempt to retrieve not only the ground flash fraction but the specific population mean MNEG (or MGA) values for the arbitrary region of the world under consideration. Within the mathematical framework we have provided here, such an attempt leads to solution nonuniqueness, but other approaches might be possible.

Acknowledgments

This research has been supported by the NOAA/NESDIS/STAR GOES-R Risk Reduction Program under Space Act Agreement No. NA07AANEG0284 (Ms. Ingrid Guch, Chief, NOAA/NESDIS/STAR Cooperative Research Programs Division, Dr. Mark DeMaria Chief, NOAA/NESDIS Regional and Mesoscale Meteorology Branch, and Dr. Steven J. Goodman, Senior (Chief) Scientist, GOES-R System Program), and by the Lightning Imaging Sensor (LIS) project (Program Manager, Ramesh Kakar, NASA Headquarters) as part of the NASA Earth Science Enterprise (ESE) Earth Observing system (EOS) project. We would also like to express our thanks to the NASA Postdoctoral Program (NPP) under which co-author Dr. Richard Solakiewicz served during a portion of this research effort.

APPENDIX

The Roots of the Discriminant Function

A general condition for describing when D = 0 can be obtained using (26). With A > 0 so that the multiple-characteristic solution in (15) is defined, the right-hand side of (26) is
ea1
This is zero if, and only if, each summand is zero; that is,
ea2
The right-hand side can be simplified to
ea3
But the left-hand side of (A3) is just the form of the single-characteristic solution given in (6). Hence, given A > 0 and a set of k = 1, … , n characteristics, if each single-characteristic solution is equivalent to the multiple-characteristic solution for the set, the discriminant must be zero. To summarize, this can be expressed as
ea4
For completeness, it is worth clarifying the effect of errors on the discriminant. By analytically propagating all the errors, it can be shown that the discriminant in (24) is zero when all the errors are zero. For brevity, we had not previously written (A, B, C, D) as explicit functions of the various errors, but of course these variables do depend on the errors in the problem. From (4), (11), (13), and (24) we have
ea5
By a considerable amount of algebra, the discriminant function can be written
ea6
where
ea7
The expression in (A6) is the error representation of the discriminant function. It explicitly shows that when all the errors are zero, the discriminant is zero; that is, D(x, e = 0, ɛ = 0) = 0. This conclusion can also be reached by straightforward algebra using the definitions in (A5) and the first equation in (3).

In addition, note that the discriminant can equal zero even when the errors are nonzero. For example, setting egk = eck = ɛk in (A5) and applying the first equation in (3) results in the nullification D = B2 – 4AC = 4α2A2 –4A(α2A) = 0.

Hence, the errors alone are responsible for making D negative, but under the right conditions, nonzero errors can cancel out and thereby result in a zero discriminant. So in general, with an imperfect sensor (i.e., ɛ0) and an imperfect estimation of x (i.e., e0), the discriminant will be driven to a nonpositive value. As a last remark, note that the nonpositivity of the discriminant provided in (27) can also be proven using the Cauchy–Schwarz inequality
ea8
with Vkfgkfck, Wkqkfck and the use of the definitions in (13).

REFERENCES

  • Boccippio, D. J., Cummins K. L. , Christian H. J. , and Goodman S. J. , 2001: Combined satellite- and surface-based estimation of the intracloud-cloud-to-ground lightning ratio over the continental United States. Mon. Wea. Rev., 129, 108122.

    • Search Google Scholar
    • Export Citation
  • Koshak, W. J., 2007: OTD observations of continental US ground and cloud flashes. Proc. 13th Int. Conf. on Atmospheric Electricity, Beijing China, ICAE, 823–826.

    • Search Google Scholar
    • Export Citation
  • Koshak, W. J., 2010: Optical characteristics of OTD Flashes and the implications for flash-type discrimination. J. Atmos. Oceanic Technol., 27, 18221838.

    • Search Google Scholar
    • Export Citation
  • Mach, D. M., Christian H. J. , Blakeslee R. J. , Boccippio D. J. , Goodman S. J. , and Boeck W. L. , 2007: Performance assessment of the Optical Transient Detector and Lightning Imaging Sensor. J. Geophys. Res., 112, D09210, doi:10.1029/2006JD007787.

    • Search Google Scholar
    • Export Citation
Save
  • Boccippio, D. J., Cummins K. L. , Christian H. J. , and Goodman S. J. , 2001: Combined satellite- and surface-based estimation of the intracloud-cloud-to-ground lightning ratio over the continental United States. Mon. Wea. Rev., 129, 108122.

    • Search Google Scholar
    • Export Citation
  • Koshak, W. J., 2007: OTD observations of continental US ground and cloud flashes. Proc. 13th Int. Conf. on Atmospheric Electricity, Beijing China, ICAE, 823–826.

    • Search Google Scholar
    • Export Citation
  • Koshak, W. J., 2010: Optical characteristics of OTD Flashes and the implications for flash-type discrimination. J. Atmos. Oceanic Technol., 27, 18221838.

    • Search Google Scholar
    • Export Citation
  • Mach, D. M., Christian H. J. , Blakeslee R. J. , Boccippio D. J. , Goodman S. J. , and Boeck W. L. , 2007: Performance assessment of the Optical Transient Detector and Lightning Imaging Sensor. J. Geophys. Res., 112, D09210, doi:10.1029/2006JD007787.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    A set of N flashes occurring in a region during time period Δt. A “g” denotes a ground flash, and a “c” denotes a cloud flash. The desire is to retrieve the fraction of ground flashes.

  • Fig. 2.

    The family of lines identifying the nonunique solution space for α′. Any real problem has N finite, so there will be a finite number of nonunique solutions on the lines drawn. The black points shown correspond to the example in Table 2 and are described in the main text.

  • Fig. 3.

    (a) A graphical representation of the relationship between the retrieved ground flash fraction (intersection of green lines), the true ground flash fraction (intersection of black lines), the “lines of ambiguity”, and the distributions involved. The peaks of the normal distributions have been scaled to unity for plot clarity; the distribution of xg (red) and xc(blue) are shown.

  • Fig. 3.

    (b) A continuation of (a) for larger values of the number of flashes N analyzed. Note that as N is increased, the retrieval error, ρ (“rho”), decreases. Neglecting satellite measurement errors, absolute convergence of ρ to zero will only occur if the means of the ground (red) and cloud (blue) normal distributions match the actual population means for the particular geographic location studied.

  • Fig. 3.

    (c) As in the previous figure, but for a larger value of the ground flash fraction; that is, α = 0.7.

  • Fig. 4.

    The rms retrieval error ρrms in the ground flash fraction for 52 circular regions (centered on the black dots) analyzed across CONUS using the single-characteristic solution with the MNEG optical characteristic. Some cloud flashes are removed in the middle and bottom plots to increase the (spatially dependent) known test ground flash fraction in each region. The CONUS-averaged value α (±std dev) for the 52 regions is given in the upper-left corner of each plot. See text for additional discussion.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 468 179 10
PDF Downloads 122 20 1