## 1. Introduction

Space launches and landings at the Kennedy Space Center (KSC) are subject to strict weather-related constraints (see, e.g., Bauman and Businger 1996). Nearly 75% of all space shuttle countdowns between 1981 and 1994 were delayed or scrubbed, with about one-half of these due to weather (Hazen et al. 1995). Of the various weather constraints, the primary weather challenge is to forecast lightning 90 min before a first strike and within a 20 n mi radius of the launch complexes (Fig. 1a). The National Lightning Detection Network indicates that this region has one of the highest lightning flash densities in the country, averaging 10 flashes per square kilometer per year, confirming that lightning has a significant impact on the Kennedy Space Center (Huffines and Orville 1999; Orville and Huffines 1999). The first concern is the safety of personnel working on the complex, and the next is protection for $10 billion rocket launching systems and platforms that include the space shuttle, Athena, Pegasus, Atlas, Trident II, and Titan IV. Last, delay costs can run anywhere from $90,000 for a 24-h delay to $1,000,000 if the shuttle must land at another facility and be transported back to the KSC (Bauman and Businger 1996).

Modeling and observational studies conclude that patterns and locations of Florida convection are related to the interaction of the synoptic wind field with the mesoscale sea breeze (Estoque 1962; Neumann 1971; Pielke 1974; Boybeyi and Raman 1992). The sea-breeze circulation and patterns of convection have different characteristics dependent on whether or not the low-level flow has an onshore, offshore, or alongshore component with respect to Florida's east coast (Arritt 1993).

Onshore easterly flow typically generates less vigorous convection than offshore westerly flow (Foote 1991). However, onshore flow is characterized by a shallow low-level maritime moist layer, capped by a subsidence layer with dry conditions aloft, creating difficulties in predicting convection associated with this type of regime (Pielke 1974; Bauman et al. 1997). Blanchard and Lopez (1985) show that the majority of convection takes place in the sea-breeze and lake-breeze convergence zones. They state that deep convection is sparse and requires low-level forcing, which generally occurs only when the east coast sea breeze has advanced westward and merges with the west coast sea breeze. Although convection can develop independently of a sea-breeze frontal merger, it is usually weaker than when the fronts merge.

Reap (1994) found that southwesterly flow tends to be unstable and produce more lightning strikes along the Florida east coast than easterly flow. The southwesterly flow also contains deeper moisture and accounts for two-thirds of the lightning strikes during the summer at KSC. In contrast, easterly flow accounts for less than 5% of the total lighting flashes (Watson et al. 1991).

The International Station Meteorological Climate Summary (NCDC 1996) for the KSC (March 1968–February 1978) indicates an annual average of 76 days with thunderstorms. Most of the thunderstorms (81.2%) occur from May through the end of September. In fact, the 45th Weather Squadron (WS) at Patrick Air Force Base (AFB) issues more than 1200 lightning watches and warning per year.

The 45th WS uses numerical weather prediction models and many observation systems to detect and predict lightning in support of the space center needs. The latter include satellite data, weather radars, rawinsondes, and five lightning detection systems. The lightning detection and ranging (LDAR) is a seven-antenna radio-wave time-of-arrival system that provides a three-dimensional picture of in-cloud, cloud-to-cloud, cloud-to-clear-air, and cloud-to-ground lightning. The cloud-to-ground lightning surveillance system is a five-antenna magnetic direction finding system. The launch pad lightning warning system (LPLWS) is a network of 31 surface electric field mills. The National Lightning Detection Network (NLDN) is a national network of magnetic direction finding and time-of-arrival antennas. The A. D. Little, Inc., sensor is an older system using one antenna to estimate the lightning distance from the magnetic pulse change. Most of these lightning detection systems are more fully described by Harms et al. (1997).

For the 1999 thunderstorm season, the 45th Weather Squadron's capability to forecast thunderstorms detected by the NLDN and LPLWS is 97.5%, of which 79.1% meet the desired 90-min lead time (NCDC 1996). The KSC false alarm rate is 43.2%. There is room for improvement in these statistics, particularly in reducing the false alarm rate.

### a. GPS and the role of water vapor

The water molecule has an asymmetric distribution of charge that results in a permanent dipole moment. This unique structure results in the large latent heat associated with changes of phase, it makes possible the phenomena of lightning, and also makes possible the ability to monitor integrated precipitable water vapor (IPWV) using the global positioning system (GPS).

Several mechanisms have been proposed to account for the generation of electrical charge separation in clouds. In general, theories that have been forwarded to explain the separation of charge in clouds recognize the importance of microphysical processes involving water in its various phases (e.g., Willis et al. 1994).

Bevis et al. (1992, 1994) describe the methodology for using GPS to monitor atmospheric water vapor from ground-based GPS sites and present an error analysis. Duan et al. (1996) provide the first direct estimation of IPWV by eliminating the need for any external comparison using water vapor radiometer observations (Rocken et al. 1993). Businger et al. (1996) describe meteorological applications of atmospheric monitoring by GPS for use in weather and climate studies and in numerical weather prediction models.

The National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL) established the first GPS network dedicated to atmospheric remote sensing of water vapor (Wolfe and Gutman 2000). Since its inception in 1994, the NOAA GPS network has been steadily expanding, with GPS receivers sited or planned in all 50 states. Florida, like many other states, has plans to develop a relatively large network of GPS receivers for applications other than weather forecasting (Fig. 1b). The dual use of these sites, however, is expected to provide valuable data for the improvement of short-term cloud and precipitation forecasts, with consequent improvements in transportation safety. Recently, a sliding window technique for processing the GPS was developed jointly at the University of Hawaii and the Scripps Institution of Oceanography, University of California at San Diego. The method has been implemented operationally at FSL and provides estimates of IPWV every 30 min, with about 18-min processing time or latency, and an rms accuracy of ∼3.5% of the mean value of IPWV (Fig. 2). GPS IPWV data are available in near–real time on the Web (courtesy of NOAA/FSL at http://gpsmet.fsl.noaa.gov/realtimeview/jsp/rti.jsp).

In this paper we describe a statistical approach that combines IPWV data from a GPS site located at the Kennedy Space Center with other meteorological data to develop a new GPS lightning index. The goal of this effort is to improve the skill in forecasting a first strike at the Kennedy Space Center.

## 2. Data resources

The thunderstorm season at the KSC is from May through September. Data for the 1999 summer season were divided into two periods: period A (14 April–9 June) and period B (10 June–26 September). These particular dates were chosen on the basis of the distribution of thunderstorms and the data availability associated with instrumentation down time.

The period B data were used to create the logistic regression model that composes the GPS lightning index. Period B data contained 46 event days and provided robust predictor data. During this period, GPS IPWV values are much higher and show more variability than in the winter and are somewhat higher than during period A. The period A data were reserved for an independent test using the GPS lightning index results.

The Kennedy Space Center has a dense array of weather sensors. One of the challenges of this research is determining which meteorological variables would add skill in a lightning prediction index. Twenty-three potential predictors were initially evaluated (Table 1). The availability of real-time GPS IPWV was the primary motivation for undertaking the research presented in this paper. Since October 1998, GPS IPWV data have become available with 30-min temporal resolution for a GPS site located at 28.48°N, 80.38°W, roughly the center of the Cape Canaveral landmass just north of the primary landing strip. Data for this research cover 1 yr from October 1998 to October 1999. The GPS site had missing data, mostly due to communications problems between the site and the facility that collects the data for NOAA/FSL. Any day that had a total of more than 4 h of missing data was eliminated from the analysis.

The Kennedy Space Center data resources offered benefits that make this study possible (Fig. 1a). Upper-air soundings are often taken more than twice a day pending various launches and weather conditions, so there is a slight increase in the temporal resolution. Also, there are U.S. Air Force weather observers at Cape Canaveral Air Station 24 h a day. Adding a human element to the observation codes, especially in the remarks section, provided an increase in understanding of the meteorological conditions. Electric field mills provided another source of data in this investigation. There are 31 field mills that measure the electric potential of the atmosphere in volts per meter (V m^{−1}) every 5 min. The maximum field mill value was used for the 30-min window ranging from the top of the hour until the half-hour mark. This maximum value was assigned to the GPS IPWV value taken 15 min after the hour. From the half-hour mark to the top of the hour, that value was assigned with the GPS IPWV value taken 45 min after the hour. Typical fair weather electric field mill values ranged from 70 to 800 V m^{−1}. During inclement weather when the potential for lightning existed, values would increase substantially, sometimes reaching values of 12 000 V m^{−1} during a lightning event. The only suspect values were around 1000 UTC [0500 eastern standard time (EST)]. From a normal field of 100–200 V m^{−1} just before 1000 UTC, field mill values would jump, sometimes up to 3000 V m^{−1} for what appeared to be no meteorological event. Marshall et al. (1999) explain this sunrise effect as the local, upward mixing of the denser, low-lying, electrode-layer charge.

*K*index (KI), freezing level from rawinsondes, and surface temperature, dewpoint, pressure, and wind direction taken from station observations. The KI considers the static stability of the 850–500-mb layer. The KI is given by

*T*

_{850}

*T*

_{500}

*T*

_{d850}

*T*

_{700}

*T*

_{d700}

*T*

_{850}and

*T*

_{d850}are the dry-bulb temperature and dewpoint temperature at 850 mb, and

*T*

_{500}is the dry-bulb temperature at 500 mb. The quantity

*T*

_{700}−

*T*

_{d700}is the 700-mb dewpoint depression. In order for the KI to correspond with the 30-min GPS temporal resolution, KI values were interpolated linearly between sounding times.

Lightning detection and ranging (LDAR) data provide 80%–90% detection efficiency of cloud-to-ground strokes (Cummins et al. 1998). LDAR was used as ground truth to verify when and where a lightning event occurred. LDAR data are voluminous; the sensors detect point discharges. With a time resolution on the order of milliseconds, one lightning flash can have up to 20 000 LDAR points, and one thunderstorm can have thousands of flashes, so one thunderstorm can have up to tens of millions of LDAR points. These LDAR points (in meters) are ranged from a central site in the *x,* *y,* and *z* directions (Maier et al. 1996). A point is classified as a new flash if the new point is 300 ms later or 5000 m from the previous point. Also, two or more points make a flash, presuming they meet the above criteria, occurring within 300 ms in time and nearer than 5000 m in space. These are the same criteria that the National Weather Service, Melbourne, Florida, uses to verify step-leader points as a lightning flash. In the research presented in this paper, the first strike was verified using LDAR data and was matched to the nearest corresponding GPS IPWV data.

## 3. Development of a lightning prediction index

### a. Logistic regression

Regression methods provide the best opportunity for data analysis concerned with describing the relationship between a response variable and one or more predictor variables. Since the event to be forecast was the first strike of a lightning event, a binary logistic regression model was chosen as opposed to a linear regression model.

What distinguishes a logistic regression model from a linear regression model is that the outcome variable in logistic regression is binary or dichotomous (Hosmer and Lemeshow 1989). The two outcomes are yes the lightning event occurred or no it did not.

*π*

_{j}=

*E*(

*Y*|

*x*

_{j}) represents the conditional mean of a lightning strike (

*Y*) given a predictor (

*x*) when the logistic distribution is used. The specific form of the logistic regression model is where

*π*

_{j}is the probability of a response for the

*j*th covariate,

*β*

_{0}is the intercept,

*β*is a vector of unknown coefficients associated with the predictor, and

*x*

_{j}is a predictor variable associated with the

*j*th covariate. Next, Hosmer and Lemeshow (1989) use a logit transformation of

*π*

_{j}defined as

*g*

*π*

_{j}

*π*

_{j}

*π*

_{j}

*β*

_{0}

*β*

_{j}

*X*

_{j}

*g*(

*π*

_{j}) has many of the desirable properties of a linear model. The logit

*g*(

*x*) is linear in its parameters, may be continuous, and may range from −∞ to +∞, depending on the range of

*x.*

### b. The predictors

The initial set of predictors included 23 variables (Table 1), which were tested for their potential contribution to the regression. In the table, changes in IPWV with time (ΔIPWV) for periods ranging from 1 to 12 h are designated as variables 2 through 13. The purpose of including changes in IPWV is to capture differences in the airmass moisture over a number of time intervals. Positive values indicate that IPWV has increased over the time interval.

The logistic regression model applied to these various data provides estimates of the coefficients, standard error of the coefficients, *p* values, *t* ratio, and odds ratio. The coefficient of each predictor is the estimated change in the link function with a one-unit change in the predictor, assuming all other factors and covariates are the same. The last three statistics are described briefly below. The reader is referred to Wilks (1995) for additional discussion and examples.

The *p* value, also known as rejection level or level of the test, is a probability with values ranging from 0 to 1. If the *p* value is small, it means that the difference between the sample means is unlikely to be a coincidence and that the parameter may have a statistical significance. Statistical hypothesis testing is carried out by setting up a null hypothesis. The null hypothesis is rejected if the probability of the observed test statistic is less than or equal to the test level. The test level is chosen in advance of the computations; commonly, the 5% level is chosen. In the work described in this paper, a more stringent test level of 1% (99% significance level) was chosen. Predictors that did not meet the 99% significance level, as determined by the *p* value in the model results, were eliminated.

Of the initial 23 variables, this test left only four predictors that met the 99% significance level: electric field mill maximum (V m^{−1}), GPS IPWV, 9-h ΔIPWV, and KI (Table 2). If we set the coefficients to zero as the null hypothesis, the estimated coefficients show that the four remaining predictors all have a *p* value ≤ 0.01. This indicates that the parameters are not zero with a 99% significance level, and we can reject the null hypothesis and use the estimated coefficients. Given the fact that three of the variables are sensitive to moisture, it important to note that they are not well correlated. The highest correlation coefficient was 0.47 between IPWV and KI.

The *t* ratio, also known as the *t* test or *z* value, is an implicit test that bears directly on the meaningfulness of the fitted regression. The *t* ratio is obtained by dividing the coefficient by its standard deviation. Dividing by the standard deviation weights the accuracy of the coefficient. Smaller standard deviations lead to larger *t* ratio, positive or negative. If the estimated *t* ratio (or slope) is too small, then the regression is not informative. The magnitude of the *t* ratios in Table 2 provides evidence that the coefficients are significant and belong in the GPS lightning index.

The odds ratio is a fundamental method for studying categorical data. In this approach, one first calculates the odds of success instead of failure for one group and the odds of success instead of failure for the other group, and then the ratio of the two sets of odds is taken. The odds ratio is a measure of association in this research. It approximates how much more likely (or unlikely) it is that a predictor contributes to the outcome of a lightning strike (Hosmer and Lemeshow 1989). Review of the odds ratios in Table 2 indicates that some predictors have a greater impact than others. An odds ratio very close to 1 indicates that a one-unit increase minimally affects a lightning event. A more meaningful difference is found with 9-h ΔIPWV. An odds ratio of 1.38 indicates that the odds of a lightning event increase by 1.38 times with each unit increase in 9-h ΔIPWV. The odds ratios calculated for changes in IPWV over shorter and longer time intervals are less and are discussed in the next section.

### c. Relationship between predictors and predictand

The relationship between time series of discrete events can be studied by a technique known as the superposed epoch method (Panofsky and Brier 1958). The first strike of lightning is defined as a discrete event since it is the critical component of the forecast for the KSC. Since lightning occurs at various times of the day, the superposed epoch method creates composites of the data surrounding the 27 lightning events during the thunderstorm period. For each lightning event the time of first strike was denoted as *T*_{0}. Hours prior to that key time were denoted as *T*_{0–1}, *T*_{0–2}, *T*_{0–3}, and so on. Figure 3a depicts the composite GPS IPWV values leading up to the first strike. The general increase in IPWV suggests a correlation between increasing IPWV values and the time of the first strike. Contrary to GPS IPWV, electric field mill values show random fluctuations leading right up to the first strike when the field mill values spike up to indicate a lightning event has occurred. In this case, there is very little warning time for forecasting lightning events. This predictor remains in the model because of its relationship with lightning 90 min prior to the first strike (Fig. 3a). Although this figure shows slight increases up until the first strike itself, in the 5-min resolution (raw), the increases are much more dramatic (not shown).

Days containing no lightning at all, nonevent days, have IPWV values that hover around 35 mm. The nonevent graph depicts average GPS IPWV values for 20 days during the thunderstorm period (Fig. 3b). As seen in the graph, the 24-h run is relatively flat, exhibiting minor fluctuations, including subtle nocturnal decline from 0315 UTC (2215 EST) until 1115 UTC (0600 EST) and a rise to 1815 UTC (1315 EST), that can be attributed to solar heating.

A series of scatter diagrams was used to document relationships between the predictors and the lightning events. The scatterplots show data for all thunderstorm days. Only the 30-min data up to the time of the first lightning strike are plotted. No data following the first strike are included in the plots. Lightning events were considered independent if there was a 12-h period between the end of one lightning event and the start of another. Figure 4a shows GPS IPWV values >35 mm are more conducive for lightning strikes. Conversely, no first-strike events were noted when the GPS IPWV values were <33 mm.

When the KI is plotted with electric field mill data (Fig. 4b), a clear bias is present. Lightning events are much more likely to occur when the KI was ≥26. This stability index proved more relevant than the TT index. A study on nowcasting convective activity for the KSC conducted by Bauman et al. (1997) concluded that of all the stability indices, only the KI was found to have a modest utility in discriminating convective activity. This can be attributed to the fact that the KI captures a moisture layer from 850 to 700 mb, as opposed to just one reference point of 850-mb dewpoint temperature in the TT index. Typical Cape Canaveral KI values ranging from 26 to 30 yield an airmass thunderstorm probability of 40%–60%. A value of 31–35 yields a probability range of 60%–80%, and 36–40 yields a probability of 80%–90%. Values for this study hover around 50%–60%, which represent odds similar to that of flipping a coin.

Initially, changes of IPWV over 1 h were used (1-h ΔIPWV). Eventually, the time interval was extended to 12 h (12-h ΔIPWV). Again, the superposed epoch method was applied to the data. Use of the 1- to 12-h ΔIPWV predictors enabled the GPS lightning index to capture the IPWV changes of the air mass. As it turned out, the 9-h ΔIPWV predictor had more impact statistically than the other ΔIPWV predictors. Figure 5a shows that the odds ratio reaches a convincing maximum for 9-h ΔIPWV, with lower odds ratios associated with both longer or shorter time intervals over which the change in IPWV is taken. Looking at field mill values and 1-h ΔIPWV (Fig. 4c), values associated with lightning are scattered on either side of the zero line with the average value around 2.5 mm. When compared with the 9-h ΔIPWV plot (Fig. 4d), average 9-h ΔIPWV values are double the Δ1-h averages and indicate no lightning strikes occurred below the zero line. Other intervals, such as 6-h ΔIPWV (not shown), do show a tendency of increased lightning as IPWV increased. However, the 9-h ΔIPWV exhibits the most prominent increase of IPWV in the 5 h prior to the first strike (Fig. 5b), and the 9-h ΔIPWV prevails statistically as the best predictor in the logistic regression model (Fig. 5a).

The 9-h ΔIPWV examines changes in IPWV associated with meteorological events that span the course of 9 h. Figure 6 shows that most of the first lightning events occur during the mid- to late afternoon. Mechanisms linked to this time frame may include effects associated with the diurnal cycle of solar heating and moisture properties associated with the sea breeze and, as discussed in the introduction, the impact of synoptic circulations. The relationship between change in GPS IPWV and lightning occurrence proves a little more challenging to explain. Increases in the 9-h ΔIPWV likely indicate an increase in midlevel moisture, which plays an important role in the stability of convective clouds. Moist air (as opposed to dry air) entrained into developing cumulus clouds results in less reduction of buoyancy from evaporative cooling.

A possible mechanism for increased midlevel moisture is the interaction of various mesoscale boundaries associated with the geography and diurnal circulations over central Florida. Sea breezes originating along the west coast advance and interact with east coast sea breezes, resulting in enhanced convergence when they meet. The 9-h ΔIPWV predictor could be detecting the increased moisture associated with sea-breeze fronts.

Other mechanisms that affect changes in IPWV include deeper moisture associated with southwesterly flow regimes, indicating an increase in the maritime moist layer (Reap 1994). Another important mechanism is divergence aloft associated with the passage of weak jet streaks or flow enhancements aloft (Bauman et al. 1997). Divergence aloft is associated with jet entrance and exit regions and draws moisture up to midlevels in the troposphere. These events create environmental conditions favorable to convective activity and lightning, which are modulated by the diurnal cycle. It should be noted that the current analysis does not definitively identify the underlying physical processes that select for 9-h ΔIPWV as a predictor; consequently, the current analysis has not ruled out the possibility that the selection of 9-h ΔIPWV as a predictor could be the result of random variability. This remains a goal for future research.

## 4. Assessing the utility of the GPS lightning index

To determine the effectiveness of the GPS lightning index in describing the outcome variable, the fit of the estimated logistic regression must now be assessed. This is referred to as goodness of fit. Hosmer and Lemeshow (1989) recommend three methods to determine goodness of fit: Pearson residual, deviance residual, and the Hosmer–Lemeshow test (Table 3). They also introduce a decile-of-risk method for observed and expected frequencies (Table 4), as well as measures of association between the response variable and the predicted probabilities (Table 5).

The *p* values range from 0.605 to 1.000 for the Pearson and deviance residuals and for the Hosmer–Lemeshow tests (Table 3). This indicates that there is sufficient evidence for the model fitting the data adequately. If the *p* values were less than the accepted level (0.05), the test would indicate sufficient evidence for a conclusion of an inadequate model fit.

The results of applying the decile-of-risk grouping strategy to the estimated probabilities computed from the model for lightning strikes are given in Table 4. The data in Table 4 are grouped by their estimated probabilities from lowest to highest in 0.1 increments. Thus, group 1 contains the data with the lowest estimated probabilities (≤0.1) while group 10 contains data with the highest estimated probabilities (>0.9). Since the total number of lightning strikes is 995, each decile group total must be evenly distributed for proper comparison. Therefore, the Hosmer and Lemeshow strategy breaks down each group into a total of 99 or 100 events.

The following will help to explain the meaning of Table 4. The observed frequency in the yes (*y* = 0, a lightning strike) group for the seventh decile (<0.7) of risk is 26, meaning that there were 26 lightning events actually observed from the seventh decile group. These are the events that have an estimated probability of occurring of ≤0.7. In a similar fashion the corresponding estimated expected frequency for this seventh decile is 25.8, which is the sum of the modeled probabilities for these lightning events to occur. The observed frequency for the no lightning (*y* = 1) group is 99 − 26 = 73, and the estimated frequency is 99 − 25.8 = 73.2. Table 4 provides sufficient evidence that the model does fit the data well because the observed and expected frequencies are very close.

The values in Table 4 are calculated by pairing the observations with different response values. Here 221 yes lightning strikes and 774 no lightning events were recorded during the thunderstorm period. This results in 221 × 774 = 171 054 pairs with different response values. Based on the GPS lightning index, a pair is concordant if the yes lightning event has a higher probability by the sum of its individual estimated probabilities of occurrence being greater than the observed lightning events, discordant if the opposite is true, and tied if the probabilities are equal. These values are used as a comparative measure of prediction. Measures of association (Table 5) show the number and percentage of concordant, discordant, and tied pairs. These values measure the association between the observed responses and the predicted probabilities.

### a. Testing the GPS lightning index

*ŷ*is the predictand (index value),

*β*is the coefficients for each predictor,

*x*is the value of the predictor, and the subscripts indicate which predictor it is for. In this case, using the GPS lightning index coefficients for each predictor listed in Table 2, Eq. (5) now becomes The meaning of this equation is most easily understood in the limits, as (

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}+

*β*

_{3}

*x*

_{3}+

*β*

_{4}

*x*

_{4}) → ±∞. As the exponential function in the denominator becomes arbitrarily large, the predicted value approaches 0, indicating a lightning strike. As the exponential function in the denominator approaches 0, the index value approaches 1, indicating a nonevent. Thus, it is guaranteed that the logistic regression will produce properly bounded probability estimates. The index value was calculated for the entire dataset for both test periods.

The index time series for each day during the thunderstorm period were reviewed to identify recurring patterns and the best index threshold value (ITV). Index values for nonevent days typically fluctuate very close to 1.0 (Fig. 7). When the index value falls below 0.7, lightning events follow, with few false alarms (discussed later in section 4d). An ITV of 0.7 proved to be best suited for a forecast threshold. A level of 0.8 often recovered to 0.9, indicating a nonlightning event, while a level of 0.6 provided insufficient lead time before the first strike. A running mean time series of index values 10–12 h prior to the first strike graphically captures the predictive value of the GPS lightning index. Figures 7c–e depict typical lightning-event days. In these cases, the ITV of 0.7 was reached up to 10 h prior to the first strike.

### b. Categorical forecasts for period B

Forecast verification is needed to test the predictive accuracy of the GPS lightning index. Anytime the index value fell below the ITV of 0.7 and up to 90 min prior to first strike (meeting 90-min desired lead time), it was counted as a yes forecast. A contingency table (Table 6) is used to evaluate the GPS lightning index's prediction capabilities (Wilks 1995). For the thunderstorm period (subscript *n*), a total of 46 days were evaluated. Twenty-five thunderstorms days were observed and forecast by the GPS lightning index (quadrant *a*). Five thunderstorms days were forecast to occur but did not (quadrant *b*). Three storm days were observed to occur but the model failed to respond (quadrant *c*). In 13 remaining days the model did not forecast a lightning event and none was observed (quadrant *d*). The data from these quadrants are now used to determine the accuracy measures for a binary forecast. In particular the false alarm rate (FAR) is a measure of the forecast events that fail to materialize: FAR = *b*/(*a* + *b*). The probability of detection (POD) is a measure of the forecast events that did occur: POD = *a*/(*a* + *c*). The most direct measure of the accuracy of categorical forecasts is the hit rate: hit rate = (*a* + *d*)/*n.* A frequently used alternative to the hit rate, particularly when the event to be forecasted (as the “yes” event) occurs substantially less frequently than the nonoccurrence (“no” event), is the threat score (TS): TS = *a*/(*a* + *b* + *c*). The results of these calculations are shown in Table 7. The GPS lightning index proved its utility, particularly in the area of FARs.

Although period-B data are not statistically independent, the application of the GPS lightning index to these data reduced the FAR to 16.6%. This is a decrease of ∼27% of the KSC's previous FAR. The POD result was only 8% less than the KSC POD for the 1999 season. In making these comparisons it should be noted that the time window of the GPS lightning index is 12.5 h. The time window associated with KSC forecasts varies with synoptic situation but is 4–6 h on average.

### c. Categorical forecasts for the independent period A

Using the same 90-min desired lead time and the ITV criteria of 0.7 for the independent period A (subscript *i,* Table 6), a total of 21 days were evaluated. Seven thunderstorms days were observed and forecast by the GPS lightning index (quadrant *a*). Three thunderstorm days were forecast to occur but did not (quadrant *b*). One storm day was observed to occur but the model failed to respond (quadrant *c*). In 10 remaining days the model did not forecast a lightning event and none was observed (quadrant *d*). The data from these quadrants are again used to determine the accuracy measures for a binary forecast (FAR and POD). The results of these calculations are shown in Table 8.

GPS lightning index results for FAR in the independent period A were 13.2% lower than KSC results of 43.2% (Table 8). POD was down by only 10%. These measures could easily be improved by adding a forecaster's input of additional knowledge. Reference to satellite and Doppler radar data would give a forecaster the benefit of knowing the tracks and intensities of thunderstorms moving into the area. The GPS lightning index's capability to improve FARs would enhance mission readiness. Mission functions that cease for lightning would not be delayed by a forecast of lightning that does occur.

Results for period B are slightly better than for period A. This is attributed to seasonal availability in the moisture of the atmosphere during the summer season. The GPS trends show an increase in the amount of IPWV as well as more fluctuations during period B.

As with any attempt to forecast a meteorological event, timing is critical. The lightning model output indicates a potential when the ITV is met. To get a better understanding of how the GPS lightning index performs with regard to the timing of the first strike, the distribution of lead times once the ITV is met is plotted in Fig. 8a. A wide range of lead times (0–12 h) is observed. The lead times display an approximately normal distribution. The majority of the lead times fall between 3 and 7.5 h.

### d. Missed events

During the independent test the first missed event occurred on 19 May 1999 (Fig. 8b). The GPS lightning index forecast the lightning event, but only 1 h prior to first strike, thus not meeting the 90-min required lead time. An event on 13 May 1999 is a prime example of a false alarm (Fig. 8c). Observations on this day show early morning fog that may have inhibited development of convection. However, the ITV was met and no lightning occurred.

The GPS lightning index was checked to see how it handled rain showers with no lightning, nocturnal events, and back-to-back events. Figure 8d depicts a day marked by distant nocturnal lightning (>30 n mi from KSC), morning fog, and afternoon towering cumulus in all quadrants. Although the index shows fluctuations, the ITV was never met, and the lightning index correctly handled this event. During a nocturnal example (Fig. 8e), the ITV was met 7 h prior to the first strike around 0500 UTC (midnight EST). Although the index successfully predicted this event, it should be noted that the increase in the index roughly 2 h prior to the lightning strike could confuse forecasters. Last, the lightning index captured back-to-back events (Fig. 8f). In this case, a nocturnal thunderstorm ended just around 0600 UTC (0100 EST). Nine hours later the ITV was met, and the first strike followed 4.5 h later.

An interesting phenomenon that occurs frequently in the time series after the ITV is met is the tendency for a flatness or increase in the index prior to the first strike (e.g., Figs. 7e,f and Fig. 8e). This may be a reflection of compensating mesoscale subsidence associated with developing thunderstorms drying out the atmosphere above the GPS site.

## 5. Summary and conclusions

A new GPS lightning index is developed to provide a tool for the Kennedy Space Center's primary weather forecast challenge. Two periods were chosen, period A (14 April–9 June 1999) and period B (10 June–26 September 1999), to develop and evaluate the GPS lightning index after examining a year's worth of operational GPS IPWV data with reference to the climatology of lightning occurrence in southern Florida. A binary logistic regression model was used to identify which of a set of 23 predictors contributed skill in forecasting a lightning event. Four predictors proved important for forecasting lightning events: maximum electric field mill values, GPS IPWV, the 9-h change of IPWV (9-h ΔIPWV), and *K* index.

Maximum electric field mill values increased substantially during inclement weather when the potential for lightning existed, sometimes reaching values of 12 000 V m^{−1} during a lightning event. However, this variable lacked sufficient long-term (90 min plus) predictability. Composites of GPS IPWV and 9-h ΔIPWV several hours prior to an initial lightning strike show an increase of precipitable water for the site. By using current GPS IPWV and 9-h ΔIPWV values, the model captures the current IPWV of the atmosphere but also changes in midlevel moisture associated with diurnal and synoptic-scale circulations. The KI diagnoses convective activity by examining the moisture in the layer from 850 to 700 mb and the stability of the lapse rate. Given the fact that three of the variables are sensitive to moisture, it is important to note that they are not well correlated. The highest correlation coefficient was 0.47 between IPWV and KI.

The GPS lightning index is a binary logistic regression equation that includes the four predictors multiplied by their coefficients. When a time series of the GPS lightning index is plotted, a common pattern emerges. Whenever the lightning index drops to 0.7 or below, lightning almost always follows within 12.5 h. An index threshold value (ITV) of 0.7 was identified, and lightning events are forecast whenever the ITV is met. Forecast verification results obtained by using a contingency table revealed a 26.6% decrease from the KSC's previous-season false alarm rates during period B and a 13.2% decrease in false alarm rates for period A. For the KSC, a decrease in false alarm rates means that missions will not be halted for a forecast lightning event that does not occur. In addition, the KSC met their desired lead time (90-min notification) 79.1% of the time during 1999. Because the forecast verification was set up with reference to the 90-min desired lead-time criteria, POD results from the thunderstorm test period (89.2%) and period A (87.5%) also reflect the desired lead-time statistic. In this research, if a storm failed to meet the desired lead it was counted as a missed event. Thus, the GPS lightning index also improves the previous lead time at KSC by 10%.

The primary utility of the GPS lightning index is in alerting a forecaster to the possibility of lightning. Armed with the lightning index time series, a forecaster can improve the lead time and false alarm rate in lightning forecasts. However, the lightning index has a fairly large time window within which the lightning can occur. At the time the GPS lightning index falls below the threshold value, the atmosphere may not yet display the signs of a thunderstorm and imminent lightning. Therefore, the forecaster needs to rely on other resources such as radar and satellite to help to refine the timing of the lightning event.

Future work will consist of testing the GPS lightning index using data from future thunderstorm seasons. It may be possible to refine the index with the addition of predictor variables not included in this study, such as low-level divergence, thunderstorm motion, radar data, and so on.

If the GPS lightning index fulfills its early promise, it may be useful to consider a similar statistical approach to help to predict related weather phenomena. Observations show a correlation between increases of GPS IPWV and heavy precipitation (Businger et al. 1996), suggesting that a logistic regression model could provide the basis for a new flash-flood index.

## Acknowledgments

The data resources and collection that went into this research were substantial and could not have been accomplished without assistance from James Foster and Mike Bevis (GPS data) and Susan Derussy (radiosonde data). We thank Gary Barnes, Pao-Shin Chu, and three anonymous reviewers for numerous suggestions for improvements to the manuscript. Nancy Hulbirt provided assistance with the figures. This research was supported by the U.S. Air Force through its Air Weather program and NOAA Grant NA67RJ0154.

## REFERENCES

Arritt, R. W., 1993: Effects of large-scale flow on characteristic features of the sea breeze.

,*J. Appl. Meteor.***32****,**116–125.Bauman, W. H., , and Businger S. , 1996: Nowcasting for space shuttle landings at Kennedy Space Center, Florida.

,*Bull. Amer. Meteor. Soc.***77****,**2295–2305.Bauman, W. H., , and Kaplan M. L. , 1997: Nowcasting convective activity for space shuttle landings during easterly flow regimes.

,*Wea. Forecasting***12****,**78–107.Bevis, M., , Businger S. , , Herring T. A. , , Rocken C. , , Anthes S. R. A. , , and Ware R. H. , 1992: GPS meteorology: Remote sensing of atmospheric water vapor using the global positioning system.

,*J. Geophys. Res.***97****,**15787–15801.Bevis, M., , Chiswell S. , , Herring T. A. , , Anthes R. A. , , Rocken C. , , and Ware R. H. , 1994: GPS meteorology: Mapping zenith wet delay onto precipitable water.

,*J. Appl. Meteor.***33****,**379–386.Blanchard, D. O., , and Lopez R. E. , 1985: Spatial patterns of convection in south Florida.

,*Mon. Wea. Rev.***113****,**1282–1299.Boybeyi, Z., , and Raman S. , 1992: A three-dimensional numerical sensitivity study of convection over the Florida peninsula.

,*Bound.-Layer Meteor.***60****,**325–359.Businger, S., and Coauthors. 1996: The promise of GPS in atmospheric monitoring.

,*Bull. Amer. Meteor. Soc.***77****,**5–17.Cummins, K. L., , Murphy M. J. , , Bardo E. A. , , Hiscox W. L. , , Pyle R. B. , , and Pifer A. E. , 1998: A combined TOA/MDF technology upgrade of the U.S. National Lightning Detection Network.

,*J. Geophys. Res.***103****,**9035–9044.Duan, J., and Coauthors. 1996: GPS meteorology: Direct estimation of the absolute value of precipitable water.

,*J. Appl. Meteor.***35****,**830–838.Estoque, M. A., 1962: The sea breeze as a function of the prevailing synoptic situation.

,*J. Atmos. Sci.***19****,**24–25.Foote, G. B., 1991: Scientific overview and operations plan for the convection and precipitation/electrification program. National Center for Atmospheric Research, Boulder, CO, 145 pp.

Harms, D., , Boyd B. , , Lucci R. , , Hinson M. , , and Maier M. , 1997: Systems used to evaluate the natural and triggered lightning threat to the Eastern Range and Kennedy Space Center. Preprints,

*28th Conf. on Radar Meteorology,*Austin, TX, Amer. Meteor. Soc., 240–241.Hazen, D. S., , Roeder W. P. , , Lorens B. F. , , and Wilde T. L. , 1995: Weather impacts on launch operations at the Eastern Range and Kennedy Space Center. Preprints,

*Sixth Conf. on Aviation Weather Systems,*Dallas, TX, Amer. Meteor. Soc., 270–275.Hosmer, D. W., , and Lemeshow S. , 1989:

*Applied Logistic Regression*. John Wiley and Sons, 307 pp.Huffines, G., , and Orville R. E. , 1999: Lightning ground flash density and thunderstorm duration in the continental United States: 1989–96.

,*J. Appl. Meteor***38****,**1013–1019.Maier, L., , Lennon C. , , Krehbiel P. , , and Maier M. , 1996: Lightning as observed by a four dimensional lightning location system at Kennedy Space Center.

*Proc. 10th Int. Conf. on Atmospheric Electricity,*Osaka, Japan, Int. Commission on Atmospheric Electricity, 280–283.Marshall, T. C., , Rust W. D. , , Stolzenburg M. , , Roeder W. P. , , and Krehbiel P. R. , 1999: A study of enhanced fair-weather electric field occurring soon after sunrise.

,*J. Geophys. Res.***104****,**24455–24469.NCDC, 1996:

*International Station Meteorological Climate Summary.*Version 4.0. CD-ROM. [Available from National Climatic Data Center, 151 Patton Ave., Asheville, NC 28801-5001.].Neumann, C. J., 1971: The thunderstorm forecasting system at the Kennedy Space Center.

,*J. Appl. Meteor.***10****,**921–936.Orville, R. E., , and Huffines G. R. , 1999: Lightning ground flash measurements over the contiguous United States: 1995–97.

,*Mon. Wea. Rev.***127****,**2693–2703.Panofsky, H. A., , and Brier G. W. , 1958:

*Some Applications of Statistics to Meteorology*. The Pennsylvania State University, 224 pp.Pielke, R. A., 1974: A three-dimensional numerical model of the sea breezes over south Florida.

,*Mon. Wea. Rev.***102****,**115–139.Reap, R. M., 1994: Analysis and prediction of lightning strike distributions associated with synoptic map types over Florida.

,*Mon. Wea. Rev.***122****,**1698–1715.Rocken, C., , Ware R. , , Van Hove T. , , Solheim F. , , Alber C. , , Johnson J. , , Bevis M. , , and Businger S. , 1993: Sensing atmospheric water vapor with the global positioning system.

,*Geophys. Res. Lett.***20****,**2631–2634.Watson, A. I., , Holle R. L. , , Lopez R. E. , , Ortiz R. , , and Nicholson J. R. , 1991: Surface convergence as a short-term predictor of cloud-to-ground lightning at Kennedy Space Center.

,*Wea. Forecasting***6****,**49–64.Wilks, D., 1995:

*Statistical Methods in Atmospheric Sciences*. Academic Press, 464 pp.Willis, P. T., , Hallett J. , , Black R. A. , , and Hendricks W. , 1994: An aircraft study of rapid precipitation development and electrification in a growing convective cloud.

,*Atmos. Res.***33****,**1–24.Wolfe, D. E., , and Gutman S. I. , 2000: Development of the NOAA/ERL ground-based GPS water vapor demonstration network: Design and initial results.

,*J. Atmos. Oceanic Technol.***17****,**426–440.

Initial predictors

Logistic regression table

Goodness-of-fit tests

Observed (Obs) and expected (Exp) frequencies

Measures of association

Contingency table for categorical forecast of discrete predictands, giving relationship between counts (letters *a–d*) of forecast event pairs for the dichotomous categorical verification. Quadrant *a* denotes the occasions when the lightning was forecast to occur and did. Quadrant *b* denotes the occasions when the lightning was forecast to occur but did not. Quadrant *c* denotes the occasions when the lightning was not forecast to occur but did. Quadrant *d* denotes the occasions when the lightning was not forecast to occur and did not. Subscripts *n* and *i* indicate dependent test (thunderstorm period) and independent test (period A)

Period-B test results (10 Jun–26 Sep 1999)

Independent test results (period A, 14 Apr–9 Jun 1999)

^{*}

School of Ocean and Earth Science and Technology Contribution Number 5930.