The recent successful deployment of the Geostationary Lightning Mapper (GLM) on board the Geostationary Operational Environmental Satellite R series (GOES-16/17) provides nearly uniform spatiotemporal measurements of total lightning (intracloud plus cloud to ground) over the Americas and adjacent vast oceanic regions. This study evaluates the potential value of assimilating GLM-derived water vapor mixing ratio on short-term (≤6 h), cloud-scale (dx = 1.5 km) forecasts of five severe weather events over the Great Plains of the United States using a three-dimensional variational (3DVAR) data assimilation (DA) system. Toward a more systematic assimilation of real GLM data, this study conducted sensitivity tests aimed at evaluating the impact of the horizontal decorrelation length scale, DA cycling frequency, and the time window size for accumulating GLM lightning observations prior to the DA. Forecast statistics aggregated over all five cases suggested that an optimal forecast performance is obtained when lightning measurements are accumulated over a 10-min interval and GLM-derived water vapor mixing ratio values are assimilated every 15 min with a horizontal decorrelation length scale of 3 km. This suggested configuration for the GLM DA together with companion experiments (i) not assimilating any data, (ii) assimilating radar data only, and (iii) assimilating both GLM and radar data were evaluated for the same five cases. Overall, GLM data have shown potential to help improve the short-term (<3 h) forecast skill of composite reflectivity fields and individual storm tracks. While this result also held for accumulated rainfall, longer-term (≥3 h) forecasts were generally characterized by noteworthy wet biases.
Timely and accurate short-term convection-allowing (<3 km) numerical weather prediction (NWP) is critical for making informed decisions to warn the public and stakeholders of high-impact, and occasionally life-threatening, weather events. Increasing the lead times of such warnings is challenging given the complex, nonlinear physical processes governing the evolution of atmospheric weather phenomena across a large range of scales. Pioneering work by Lilly (1990) outlined the principal scientific challenges of the development of convective-scale NWPs and underlined that increasing computer power combined with advances in NWP parameterizations and data assimilation (DA) methods are expected to play critical roles in improving forecasts in the shorter and longer term (Stensrud et al. 2009, 2013).
In addition to NWP-related strides, the increasing availability of detailed datasets spanning a large range of scales played a significant role toward recent improvements of convective-scale severe weather forecasting. Various observational datasets at the meso and convective scales from in situ to remote sensing instruments have been used in convective-scale DA applications. These include, for instance, traditional observations (e.g., pressure, moisture, temperature, and wind) from surface instruments like mesonets and automated surface observing systems (e.g., Fujita et al. 2007; Knopfmeier and Stensrud 2013; Sobash and Stensrud 2015) and upper-air observation platforms such as rawinsonde, wind profiler, aircraft communications addressing and reporting systems (ACARS; e.g., Hitchcock et al. 2016; Coniglio et al. 2016), radial velocity and reflectivity from Weather Surveillance Radar-1988 Doppler (WSR-88D) (e.g., Snyder and Zhang 2003; Gao and Stensrud 2012; Yussouf et al. 2015; Johnson et al. 2015; Wang and Wang 2017), satellite measurements (e.g., Jones et al. 2016, 2018; Zhang et al. 2018), and combinations of these observations (e.g., Benjamin et al. 2016; Hu et al. 2017). Among these, Doppler radar data (i.e., radial velocity and reflectivity factor) were proven to be critical toward convective-scale forecast improvements. It is well established, however, that ground-based radar networks suffer from poor coverage in mountainous terrains due to beam blockage (Zhang et al. 2011; Maddox et al. 2002) and oceanic regions lying beyond the network’s range. Ancillary detection of deep convection is critical for these regions, given that mountainous areas are susceptible to forest fires while some oceanic regions are prone to hurricane and tropical systems.
Technical advances in spaceborne instruments such as the Geostationary Operational Environmental Satellite R series (GOES-16/17; Goodman et al. 2013) provide an unprecedented opportunity for researchers and operational forecasters to monitor the atmosphere over vast regions devoid of conventional data at nearly uniform spatiotemporal resolution (e.g., Jones et al. 2018; Minamide and Zhang 2018; Zhang et al. 2018). The Geostationary Lightning Mapper (GLM) on board each of the GOES-16/17 satellites provides continuous detection of total lightning over most of the Western Hemisphere with high temporal (20 s) and spatial (8–12 km) resolution (Goodman et al. 2013). Many studies have shown that lightning activity, in general, is an unambiguous indicator of electrified mixed phase convection (Petersen and Rutledge 1998; Rudlosky and Fuelberg 2013; Makowski et al. 2013), especially total lightning (cloud to ground plus intracloud) (e.g., MacGorman et al. 1989; Carey and Rutledge 1998; Wiens et al. 2005; Kuhlman et al. 2006; Fierro et al. 2006; Deierling and Petersen 2008; MacGorman et al. 2011; Weiss et al. 2012; Medici et al. 2017).
Previous studies focusing on lightning DA have generally demonstrated positive impacts on NWP forecast primarily through functional relationships based on proxies exhibiting a well-established linkage with lightning, such as latent heating (Alexander et al. 1999; Chang et al. 2001; Pessi and Businger 2009), specific humidity (Papadopoulos et al. 2005, 2009), the trigger function in convection parameterization schemes (Mansell et al. 2007; Lagouvardos et al. 2013; Giannaros et al. 2016; Heath et al. 2016), the water vapor mass mixing ratio (Fierro et al. 2012, 2016, 2019; Zhang et al. 2017), or radar reflectivity (Wang et al. 2014). In addition to simpler and computationally efficient nudging DA methods (Fierro et al. 2012; Marchand and Fuelberg 2014), variational (e.g., Fierro et al. 2016, 2019; Zhang et al. 2017) and ensemble-based approaches (e.g., Mansell 2014; Allen et al. 2016; H. Wang et al. 2018) have been investigated to assimilate lightning. To the best of our knowledge, there have not yet been any studies evaluating more systematically the impact of assimilating GLM-derived water vapor on short-term forecasts of high-impact convective events in the variational framework at the cloud-scale (~1.5 km) with high-frequency cycling (≤15 min).
Toward the goal of investigating how to best use GLM lightning measurements in a cycled variational (3DVAR) framework, five cases that occurred over the Great Plains of the United States in May 2018 were examined. The 3DVAR DA system that was initially developed at the Center for Analysis and Prediction of Storms (CAPS; Gao et al. 2004), and further updated at the National Severe Storms Laboratory (NSSL) is employed in this study (Gao and Stensrud 2012; Gao et al. 2013). Fierro et al. (2019) adapted the previous 3DVAR method for lightning (ground-based network) to use GLM data. The current study builds upon Fierro et al.’s (2019) proof-of-concept adaptation by performing a detailed sensitivity analysis of the impact of assimilating GLM-derived water vapor mixing ratio on short-term forecasts of storm-scale weather events by varying (i) the length of the accumulation interval of the GLM data prior to the DA, (ii) the frequency of DA cycling, and (iii) the scale of horizontal decorrelation. In light of the significant improvements in short-term forecasts benefiting from the assimilation of radar data (reflectivity factor and radial wind), GLM DA experiments are further evaluated against radar DA experiments.
2. Brief description of GLM-observed lightning data
As mentioned in the introduction, the GLM instrument mounted on board the GOES-16/17 satellites provides a unique opportunity to monitor total lightning over the Western Hemisphere with nearly uniform spatiotemporal resolution. The GLM detects optical pulses emitted by lightning discharges night and day, without being able to differentiate between flash type and polarity. The GLM is expected to detect ~90% of all flashes during the night and ~70% during the day with a daily average exceeding 70% (Goodman et al. 2013). The spatial resolution of the individual pixels varies between 8 and 12 km over the contiguous Unites States. Data processing provides a hierarchy of geolocated and time-stamped lightning data products (level 2 lightning data products), which include the events, groups, and flashes (Goodman et al. 2013; Rudlosky et al. 2019). At each pixel, the GLM tracks the average background energy. During each 2-ms sampling frame, an event is defined if a single pixel exceeds the background threshold. The GLM lightning cluster-filter algorithm (LCFA) then assembles one or more simultaneous GLM events in adjacent pixels into groups, and finally combines one or more sequential groups separated by less than 330 ms and 16.5 km into flashes (Mach et al. 2007). Similar to Fierro et al. (2019), only the GLM flash product is used in this study. The level 2 GLM data products also contain a built-in adjustment for parallax due to the viewing angle of cloud tops from the satellite. That correction is based on satellite position, latitude and longitude of each pixel, and an assumed cloud-top height, which is dependent on latitude. The parallax errors over the Great Plains of the United States is estimated to be on the order of 10 km or less (Fierro et al. 2018a) and, thus, is expected to have negligible impacts on shorter-term, storm-scale forecasts.
3. Methodology/model and data assimilation configuration
a. Forecast model
This work employs version 3.7.1 of the nonhydrostatic Weather Research and Forecasting (WRF) Model with the Advanced Research WRF dynamic core (WRF-ARW; Skamarock et al. 2008). The selected physical parameterizations are the NSSL two-moment 4-ice category bulk microphysics scheme (Ziegler 1985; Mansell et al. 2010; Mansell and Ziegler 2013), the Dudhia shortwave (SW) radiation scheme (Dudhia 1989), the Rapid Radiative Transfer Model (RRTM) longwave (LW) radiation scheme (Mlawer et al. 1997), the Yonsei University (YSU) planetary boundary layer (PBL) scheme (Hong et al. 2006), and the Rapid Update Cycle (RUC) land surface scheme (Benjamin et al. 2004). The DA experiments are performed on one single domain with a uniform horizontal grid spacing of 1.5 km and horizontal dimensions of 500 × 500 grid points (i.e., 750 km × 750 km). There are 51 vertical levels with a top set at 50-hPa corresponding to an altitude of about 20–22 km. The computational time step for the integration is 6 s.
b. Data assimilation procedures
1) 3DVAR system
This study employs the NSSL 3DVAR system (Gao et al. 2013; Gao and Stensrud 2014), which is an upgrade from the Advanced Regional Prediction System (ARPS) 3DVAR code (Gao et al. 2004; Hu et al. 2006a,b). This earlier version was only able to assimilate surface and radar radial velocity observations and update the following six state variables: the pressure p, potential temperature θ, water vapor mixing ratio qυ, horizontal wind components u and υ, and vertical wind component w. The adjustments of water vapor and hydrometer variables were originally performed by means of a complex cloud analysis scheme (Hu et al. 2006a), appearing to be too aggressive due to the introduction of relatively large amount of ice and water mass into the model domain (Schenkman et al. 2011; Fierro et al. 2014; Gao et al. 2016). To mitigate this drawback, NSSL 3DVAR was upgraded to include the capability of assimilating radar reflectivity factor (Gao and Stensrud 2012) to adjust the mass mixing ratios for rainwater (qr), snow (qs), and hail (qh) in the same variational framework in which radar radial velocity data and other traditional datasets are assimilated. The reflectivity forward operator uses classification of hydrometer types so that updates are physically consistent by preventing, for example, snow mass from being introduced at temperatures below the freezing level. Another feature of NSSL 3DVAR not used in this study is the incorporation of ensemble-estimated covariance in a variational approach (Gao and Stensrud 2014; Gao et al. 2016). In addition, NSSL 3DVAR includes a direct interface for WRF-ARW (Zhuang et al. 2016; Y. Wang et al. 2018).
The data assimilated in this study include WSR-88D level II radar data and pseudo water vapor mixing ratio observations derived from GLM-observed total lightning data, both of which are described in more detail in the subsections below.
This work utilizes WSR-88D level II data (i.e., reflectivity factor and radial velocity) from multiple radars covering the forecast domain. Prior to the DA, necessary quality control (QC) procedures including radial velocity dealiasing and removal of weak radar returns or nonmeteorological scatters are performed on each radar. For thinning purpose, QC-ed radar data from multiple sites are then interpolated onto the WRF model grid. If there are multiple radars overlapping at a given grid point, the largest radar reflectivity value is chosen. The QC and processing of radar data follow the same procedures as in Gao et al. (2013) and Fierro et al. (2016, 2019). Additionally, the radar-derived velocity–azimuth display (VAD) wind observations are assimilated into NSSL 3DVAR to adjust the wind components of the storm environment.
3) GLM total lightning
To provide guidance for utilizing GLM lightning observations in real-time DA applications, lightning observations first are accumulated up to the analysis time within a fixed time window. That is, for a given window size Δt and analysis time t, the GLM total flash data (from 20-s operational packets) accumulated between t − Δt and t are used in the assimilation. It is worth underlining that this approach differs from previous works where, instead, observations falling within a time window centered at the analysis time t (i.e., observations from t − Δt/2 to t + Δt/2) are assimilated. In other words, this implementation of the GLM DA does not consider future observations (relative to the analysis time, i.e., smoothing); which topic will be deferred to future research. One chief task of this study is to determine an optimal Δt for the GLM data, which were varied from 10 to 60 min in the present study. The lightning data assimilation (LDA) procedure follows, overall, a similar philosophy as Fierro et al. (2019) wherein the model background qυ values are adjusted around the locations where lightning occurs, regardless of flash rate. In other words, the creation of pseudo-qυ observations is performed on a binary basis and is, thus, only contingent on whether or not a flash is observed at a given grid point.
Figure 1 illustrates a difference of strategy adopted in this work with respect to Fierro et al. (2019) to interpolate the observed GLM data onto the model grid. Each GLM flash has a latitude–longitude coordinate denoting its pixel centroid with an average 8–12 km pixel footprint resolution (Goodman et al. 2013). To account for this coarser resolution relative to the typical convection-allowing grid spacing used (dx ≤ 3 km), Fierro et al. (2019) assumed that, given a GLM flash centroid, all the grid cells contained within an assumed, fixed (e.g., ~9 × 9 km2) area around the centroid on the model grid (blue region in Fig. 1) were assigned the same lightning rate values as the centroid observations. In this study, however, only the (1.5 × 1.5 km2) grid cell containing the flash centroid (referred to as lightning column) is assigned the actual observed flash rate (i.e., the cell at the center of the grid in Fig. 1). The reduction of the lightning footprint on the model grid is motivated by the systematic trade-off seen between shorter-term forecast improvements and increases in wet biases in the forecast when precipitation- or moisture-related variables are adjusted (increased) during the assimilation (Fierro et al. 2015, 2016). The decorrelation length from the 3DVAR is then used to determine a distance-weighted factor such that grid points located farther away from the centroid bear lower weight (i.e., smaller effect) reminiscent of the Gaspari and Cohn (1999) correlation function with compact support used in EnKF methods. At each lightning column, a pseudo qυ field is created wherein the background qυ within a fixed layer (here 3 km) above the lifted condensation level (LCL) is adjusted to near saturation (set RH to 95%, unless the background RH already exceeds 95%). Different layer depths above the LCL ranging from 2 to 10 km were evaluated and yielded overall similar results, consistent with Fierro et al. (2016). The GLM pseudo-qυ observations are produced using the saturation water vapor mixing ratio qυsat and the adjusted background RH of 95% through qυ = qυsat × RH and then assimilated into the 3DVAR system. Different saturation RH values ranging from 80% to 95% were tested and yielded overall qualitatively very similar results.
Similar to Fierro et al. (2016, 2019), the background and observation error for qυ is set to 1 × 10−2 and 3 × 10−3 kg kg−1, respectively, with a smaller observation error aiming to impose more weight on observations during the 3DVAR minimization. These values were obtained by trial and error in a manner that would ensure comparable weights assigned to lightning observations with respect to other storm-scale dataset (in this case radar data) that are used for the real-time Spring Forecast Experiment (SFE).
As indicated by Xie et al. (2011) and Li et al. (2010), a multipass recursive filter (from larger to finer length scales) yields generally superior forecast performance over single-pass 3DVAR methods. This study adopts a two-pass recursive filter, each with a prescribed, fixed horizontal decorrelation length scale (labeled L1 and L2, respectively) and vertical decorrelation length scale. GLM lightning data are only used in the second pass. As in Fierro et al. (2019), L1 = 24 km was chosen for the first pass. Different values of L2 for the second pass were tested for the GLM-based LDA as will be discussed in sections 4 and 5. The vertical decorrelation lengths adopted in this study are 4 grid points in the first pass and 2 grid points in the second pass.
4. Experimental design
To perform a more systematic evaluation of the impact of assimilating GLM lightning-derived water vapor on short-term forecasts, five high-impact weather events from the 2018 SFE hosted by the NOAA Hazardous Weather Testbed (HWT) are examined, namely, 1, 14, 19, 28, and 29 May. The NSSL Experimental Warn-on-Forecast System (WoFS) developed by the Warn-on-Forecast (WoF) project has been run in real time for SFE since 2016. The study domain is selected on a daily basis based on the “day 1” convective outlook product from the Storm Prediction Center (SPC). The same simulation domains as those from WoFS real-time SFE runs are utilized in this study and presented in Fig. 2 with the locations of radar sites used overlaid for all five cases.
The flowchart of the cycled DA and forecast system is illustrated in Fig. 3. For each case, the model is cold started at 1800 UTC and the 3DVAR is cycled at 15-min (or 60-min) intervals until 0300 UTC the following day. Starting from 1900 to 0300 UTC, a 3-h forecast is launched every hour. The initial and lateral boundary conditions are derived from the 3-km High-Resolution Rapid Refresh (HRRR) forecasts initialized at 1800 UTC and downscaled onto the 1.5-km simulation domain (Fig. 2). To better gauge the respective impacts of the radar and/or GLM DA, a control experiment (denoted as CTRL) is conducted first by advancing the model forward without assimilating any observations.
The observed composite reflectivity fields (CREF) from NSSL’s Multi-Radar Multi-Sensor (MRMS) product (Smith et al. 2016) and the Stage IV hourly rainfall estimates from the National Centers for Environmental Prediction (Baldwin and Mitchell 1997) were used to evaluate the simulations. To facilitate the comparison between observations and model simulations, all the observations were interpolated onto the 1.5-km model grid (Fig. 2).
Contingency elements including the probability of detection (POD), the false alarm rate (FAR), and critical success index (CSI) along with the frequency bias and neighborhood-based approach such as the equitable threat scores (ETS; Clark et al. 2010) and the fractions skill scores (FSS; Roberts and Lean 2008) were used to quantitatively evaluate the DA impact on the analysis and short-term forecasts for each experiment individually, and in the aggregate.
To determine more optimal settings for assimilating GLM lightning-derived water vapor mixing ratio in convective-scale NWP models, this study first performed several sensitivity tests related to the DA configuration. Focus was directed on the horizontal decorrelation length scale L2 [ranging from 3 to 12 km to limit the overspreading of qυ described in Fierro et al. (2016, 2018b)], the frequency of 3DVAR cycling (15–60 min) and the length of the time window Δt for accumulating GLM lightning data up to the analysis time t (10–60 min). Additional sensitivity tests using GLM accumulation periods centered at time t were conducted and showed overall similar analysis/forecast performance. The main objective is to explore the potential usage of GLM-derived water mixing ratio in a 3DVAR data assimilation system for real-time forecasts of high-impact weather events. Therefore, at each analysis time, it is reasonable to assimilate observations that have been collected up to the current analysis time to minimize latency. The design of the sensitivity test experiments and associated nomenclature are listed in Table 1. For example, the experiment denoted as “GLM_15min_10win_12km” assimilates water vapor mixing ratio derived from 10-min accumulated GLM total lightning data every 15 min using L2 = 12 km. The flow diagrams for the 15- and 60-min 3DVAR cycling experiments are shown in Figs. 3a and 3b, respectively. In addition to the three aforementioned DA parameters, these sensitivity tests evaluated how the GLM DA strategy herein (Fig. 1) performed against the original method from Fierro et al. (2019).
Based on aggregate forecast evaluation metrics for all five cases, the GLM experiment in Table 1 yielding the best results will be selected for evaluation against (i) CTRL, (ii) a DA experiment only assimilating WSR-88D radar data (RAD), and (iii) a DA experiment assimilating GLM total lightning data in conjunction with WSR-88D radar data (referred to as GLM + RAD).
5. Results and discussion
a. Sensitivity tests for the GLM-based DA experiments
Prior to describing the impact of the GLM DA on the analysis and short-term forecast, it is useful to first provide a brief quantitative depiction of the qυ adjustments made by the 3DVAR. For the sake of brevity, the 1 May 2018 case is used for illustration.
On 1 May 2018, environmental conditions favored the development of severe thunderstorms across Kansas and southeastern portions of Nebraska. Between 1930 and 2000 UTC, the first convective cells initiated along a front across western Kansas and southeastern Nebraska. In the following hours, more storms formed along the front to eventually merge into a quasi-linear mesoscale convective system (QLCS) as they moved northeastward. Based on National Weather Service (NWS) local storm reports, this QLCS produced several large hail (up to baseball size) and tornado events. Between 2300 and 0200 UTC 2 May alone, a total of 13 tornadoes and several large hail events were reported in Kansas, Nebraska, and Oklahoma.
The flash origin density rate data (i.e., the number of flashes in each grid cell) derived from the accumulated GLM flash centroid observations at 2300 UTC in each experiment (Table 1) for the 1 May 2018 case are given in Fig. 4. Comparing Figs. 4a and 4b, it becomes clearer that the “spreading” strategy used in Fierro et al. (2019) leads to a larger areal coverage of nonzero flash density rates, indicating that the qυ adjustments from the 3DVAR will be performed on a larger number of grid columns (or grid volume). Because the probability of a given GLM flash centroid to fall within an individual 9 × 9 km2 box will be larger than within a finer 1.5 × 1.5 km2 box, the spreading method of Fierro et al.(2019) will also lead to higher flash density rates on the model grid (Figs. 4a,b). Although the current study imposes qυ adjustments regardless of lightning rates, Fig. 4 illustrates that consideration must be taken if the qυ adjustments are assumed to depend on flash density as in Fierro et al. (2016). Figures 4c–e highlight and contrast for this case how the number of nonzero lightning cells decreases as the accumulation interval for the lightning decreases (smaller rates).
To assess how the GLM DA impacts the qυ fields for each of the sensitivity experiments listed in Table 1, the 3–7 km (MSL) layer-averaged qυ fields of the analysis are examined for 1 May 2018. Note that due to the cycling, each experiment will evolve a different background over time. Therefore, in this study, the qυ differences relative to CTRL instead of the qυ increments (analysis background) are analyzed at 2300 UTC (Fig. 5). It is worth noting that because 2300 UTC is not the first 3DVAR cycle (Fig. 3a), the qυ differences can be negative owing to successive model adjustments to the background fields from the prior 3DVAR cycles. Rationale for selecting 2300 UTC was because this time produced a copious amount of lightning, better highlighting the differences and sensitivities between the experiments.
As expected, the largest positive qυ differences are collocated with areas of nonzero flash density (Fig. 4). As L2 decreases, the qυ differences are imposed on progressively smaller areas (Figs. 5a–c). For a given value of L2 (here at 3 km), Figs. 5b and 5d highlight how the spreading strategy of Fierro et al. (2019) yields approximately comparable qυ differences as those produced by the nonspreading strategy herein using a larger decorrelation length (L2 = 6 km, Fig. 5b). Comparison of Fig. 5e against Fig. 5c reveals that 15-min DA (five cycles) produces noticeably larger qυ differences than 60-min DA (1 cycle). This is because, over a 1-h period, DA experiment GLM_15min_10win_3km accumulates the impact (qυ adjustments) of five individual 3DVAR cycles, compared to only one 3DVAR cycle in GLM_60min_60win_3km. The 60-min DA experiments show that a slightly larger areal coverage of positive qυ differences will ensue if a longer accumulation interval is used (Figs. 5e–g).
The analyzed CREF fields at 2300 UTC are shown in Fig. 6 for each of the GLM-based experiments in Table 1 together with CTRL and the MRMS observations. To better differentiate and analyze the comparisons, the QLCS is divided into four main regions, labeled A to D from south to north (Fig. 6a). CTRL fails to simulate the tornadic storm in region A (Figs. 6a,b) while all 15-min cycling DA experiments successfully captured it (Figs. 6c–f). Highlighting the benefit of using more frequent 3DVAR cycles, Figs. 6g–i reveal that all 60-min cycling DA experiments fail to analyze the storm despite the positive qυ adjustments incurred there (Figs. 5e–g). The observed storms in region B produced several tornadoes and large hail events. CTRL misses some of the cells in this region and the intensity is overall weaker than observed (Fig. 6b). Although, both 15- and 60-min DA experiments improved the forecast of these isolated cells, the forecasted CREF values are notably larger than observed. The advantage of using higher-frequency 15-min cycling over 60 min is illustrated in Figs. 6c–i: none of the 60-min cycling experiments forecasts the individual storms well enough (Figs. 6g–i), while, in contrast, the 15-min DA experiments are able to produce storm cells that are more isolated in nature and, in turn, more concordant with the observations despite a slight northward bias (Fig. 6a). In region C, all simulations appear to reproduce the largely stratiform, weaker convection there, with the GLM-based DA experiments generally overestimating reflectivity values. In region D, all 15-min DA experiments exhibit reasonable storm coverage with generally larger-than-observed reflectivity values (Figs. 6c–f). In contrast, the 60-min DA experiments produce slightly smaller storm coverage but with reflectivity values generally more aligned with the observations (Figs. 6g–i). Generally, all GLM-based DA experiments show noteworthy improvements over CTRL with the 15-min DA experiments producing the best results, in terms of intensity, structure, location, and areal coverage. In line with the qυ differences highlighted in Fig. 5, the usage of a larger decorrelation length scale yields larger values and areal coverage of simulated reflectivity (Figs. 6c–e). Although the difference is arguably small, storm coverage generally decreases as the lightning accumulation interval decreases (Figs. 6g–i). Consistent with earlier findings, the spreading strategy from Fierro et al. (2019) is effectively equivalent to increasing the decorrelation length scale (Figs. 6d,f).
To analyze the overall performance of each GLM DA experiment, aggregate forecast metrics for composite reflectivity fields and accumulated rainfall were computed for 19 separate 3-h forecasts across all five cases (viz. 2100–0000 UTC 1 May 2018, 1900–2100 UTC 19 May 2018, and 0000–0300 UTC 14, 28, and 29 May 2018). These metrics are: POD, CSI, FAR, success ratio (SR = 1 − FAR), the frequency bias (referred to as bias) and the neighborhood-based ETS. The FSS values were also computed showing, overall, very similar results to the ETS (not shown). The neighborhood radius shown for the ETS is 12 km (8 grid points). To provide a more concise and clearer depiction of forecast performance, this work makes use of the categorical performance diagram described in Roebber (2009), which conveniently merge bias, POD, FAR (through SR), and CSI information into one graph. Forecast skill increases as the POD, SR, bias, and CSI all approach unity, meaning that a perfect forecast lies on the upper-right corner of the diagram.
The improvements from the GLM-based DA experiments relative to CTRL become more evident when visualized in this diagram (Figs. 7a,b). Consistent with Fierro et al. (2016), the 15-min DA experiments indicate that when the decorrelation length scale L2 increases, so does the bias along with progressively larger POD and smaller SR and CSI. The 60-min DA experiments produce smaller CSI values than their 15-min counterparts. The nearly identical placement of the dots on the diagram for both GLM_15min_10win_3km_SPREAD and GLM_15min_10win_6km (Table 1), confirms that the spreading strategy used in Fierro et al. (2019) is effectively equivalent to increasing the decorrelation length scale in the nonspreading strategy. When increasing the CREF threshold from 30 dBZ (Fig. 7a) to 40 dBZ (Fig. 7b), the POD, SR, and CSI values decrease in all experiments. Overall, the experiment labeled GLM_15min_10win_3km produces the best forecast as evidenced by higher POD and CSI values together with a lower bias. Generally speaking, when inspecting the respective aggregate ETS aggregated over 19 forecasts, it also becomes clear that all GLM-based DA experiments outperform CTRL, and that the 15-min cycling experiments produce overall larger scores than with 60-min cycling (Figs. 7c,d). Metrics for hourly precipitation using thresholds of 2.5, 5, and 10 mm relative to the Stage IV estimates have similar forecast behavior and performance, though the relative differences in scores between the experiments are more pronounced than for CREF (Figs. 8a–f). For precipitation, it is also worth noting that between 1 and 3 h forecast, (i) the biases exhibit a larger relative increase than for CREF, and (ii) the SRs at 3 h forecasts are generally smaller than for CREF (owing to larger FAR). This suggests a larger sensitivity of precipitation predictions to some of the factors contributing to forecast errors such as spurious convection, bias in the microphysics (Fierro et al. 2015) and/or phase errors.
The aggregate performance diagrams and ETS time series for rainfall and CREF all indicate that the GLM_15min_10win_3km experiment—referred to as “GLM” in the remainder of the manuscript—generally produces the best forecast.
b. Overall evaluation
To better gauge the impact and added value of the assimilation of GLM data on short-term, cloud-scale forecasts, the GLM DA experiment (i.e., GLM_15min_10win_3km) will be evaluated against simulations assimilating (level II) radar data (RAD) and assimilating both GLM and radar data (GLM + RAD). To maintain consistency with the original (most optimal) GLM experiment, both radar DA experiments (i.e., RAD and GLM + RAD) make use of 15-min cycling frequency as well (Fig. 3a). In the following sections, the results from two case studies representative of the best and worst forecast according to the performance diagram and ETS/FSS values will be first discussed followed by an analysis of the aggregate forecast statistics over all five cases.
1) Best forecast scenario: 1 May 2018 case
Although CTRL is able to produce some of the observed storms at 1-h forecast (Figs. 9a,b), all three DA experiments GLM, RAD, and GLM + RAD improve the forecast further, as evidenced by the two storms just north of the Oklahoma border and one cell on the border between Kansas and Nebraska. Among the three DA experiments, only GLM is able to produce the two isolated supercellular storms just north of the Oklahoma border (Fig. 9c). Additionally, all DA experiments capture the QLCS structure better than CTRL. Both experiments assimilating radar data, that is, RAD and GLM + RAD, tend to produce a more linear convective mode than in the observations. It is relevant to note, however, that GLM also produces a more linear convective mode but in southeast Nebraska and western Iowa (Figs. 9a,c,d,e).
Generally speaking, all DA experiments improve the areal coverage of 3-h APCP < 40 mm further (Fig. 10). For amounts exceeding 50 mm, GLM produces a fairly reasonable areal coverage and amount prediction. However, all experiments appear to overpredict the areal coverage as well as the precipitation amount, which is most evident for the cluster of storms that crossed from Nebraska into Iowa (Figs. 10c–e).
To provide a more thorough assessment of the impact of the GLM DA, the 2–5 km updraft helicity (UH) tracks computed from 15-min model output are depicted in Fig. 11 and were overlaid with the NWS local storm reports. Although CTRL predicts the storm tracks associated with most tornadoes and hail events, it fails in predicting the two strong tracks in south-central Kansas and one additional track to the north of it (Fig. 11a). GLM predicts most of the tracks and their locations aligned generally well with the NWS local storm reports. The GLM storm tracks, however, exhibit a northward displacement bias for storms near the border of Nebraska and Kansas and a weaker intensity for one of the storms that produced two hail events in south-central Kansas (Fig. 11b). The tracks for all storms associated with NWS local storm reports are well forecasted by RAD, with the exception of weaker tracks in north-central Kansas associated with tornadoes and hail reports (Fig. 11c). When assimilating both GLM and radar data, the rotation tracks become more consistent with the NWS local storm reports (Fig. 11d). A similar analysis was conducted for 30+ dBZ CREF tracks and revealed overall very similar qualitative behaviors (not shown).
2) Worst forecast scenario: 14 May 2018 case
On 14 May 2018, severe thunderstorms developed along and ahead of a quasi-stationary front moving through the central Great Plains. Four tornadoes were reported in Kansas—three of them in Cowley County and the fourth one in Chautauqua County (Fig. 12a). Hail up to baseball size (7.0 cm) was reported in western Kansas (near Scott city) and as large as ping pong balls (3.8 cm) in southeast Kansas. These severe thunderstorms also produced flooding rains over Chanute in Neosho County setting a 24-h rainfall record of 2.96 in., and Iola in Allen County of 2.34 in. Additionally, several thunderstorms initiated along a dryline in northwest Texas and the Oklahoma panhandle, and moved northeastward producing several large hail and strong wind reports there.
For the analysis of the CREF forecast, the study domain is, as before, divided into four key regions, labeled from A to D (Fig. 12a). For brevity of illustration, this analysis focuses on 1-h forecasts initialized at 0200 UTC 15 May 2018. In A, all DA experiments reasonably predict most of the storms, with GLM generally producing the best results (Fig. 12c). RAD and GLM + RAD systematically displace the storms southward and with larger-than-observed CREF (Figs. 12d,e). All DA experiments reasonably reproduce most of the storms in B in terms of location and intensity (Figs. 12c–e). The CTRL run generates fewer storms and generally suffers from a notable southward bias (Fig. 12b), while GLM + RAD produce the largest overestimate for their areal coverage. For the storms in C, GLM generally overpredicts their intensity (Fig. 12c) while RAD does not distinctly improve over CTRL in this area (Fig. 12d). In region C, however, radar data are helpful in suppressing some of the spurious radar echoes over Missouri when combined with GLM lightning data (GLM + RAD, Fig. 12e). In region D, all the experiments underestimate CREF and fail to simulate the two main storm cells there.
When examining the 3-h precipitation forecasts, CTRL produces a distinctive southward bias in the APCP maximum observed in southeastern Kansas (Figs. 13a,b). While GLM, RAD, and GLM + RAD alleviate this displacement error somewhat (Figs. 13c–e), GLM overestimates the rainfall amounts there (Figs. 13a,c), with GLM + RAD exacerbating rainfall overprediction further in eastern Kansas (Fig. 13e). Consistent with the CREF forecasts, RAD and GLM + RAD both overpredict precipitation amounts in southwestern Oklahoma (Figs. 13d,e).
The CREF tracks throughout the 3-h forecast and UH tracks (not shown) exhibit overall similar behavior to the rainfall and CREF forecasts. In contrast to the southwest bias in CTRL, all DA experiments predict the CREF tracks in southern Kansas and northern Oklahoma reasonably well, demonstrating their superiority relative to CTRL. In this region, the CREF tracks from RAD are the most consistent with the NWS local storm reports, while both RAD and GLM + RAD generally overestimate their areal coverage there, agreeing well with CREF and rainfall forecasts in Figs. 12 and 13. As discussed before, CTRL as well as all DA experiments fail to predict the storms in western of Kansas and, consequently, the hail storm tracks over that region.
3) Aggregate performance
Similar to the GLM sensitivity experiments discussed earlier, aggregate forecast statistics for 19 separate 3-h forecasts from all five cases were computed to provide a more comprehensive view of the performance of GLM, RAD, and GLM + RAD. The aggregate metrics include POD, CSI, SR, bias, and ETS (radius of 12 km). These were calculated, as before, for CREF and 1-h accumulated precipitation with thresholds of 30 and 40 dBZ and 2.5, 5, and 10 mm, respectively. These statistics were also computed for the FSS and revealed overall similar behavior to the ETS (not shown).
In terms of CREF, all DA experiments outperform CTRL and produce biases ranging between 0.7 and 1.4 (Fig. 14). GLM produces the highest POD but largest bias, RAD has the smallest POD but smallest bias, while GLM + RAD lies in between (Figs. 14a,b). GLM + RAD exhibits slightly higher ETS than RAD for 0–1 h CREF forecast at the cost of adding more bias due to the overprediction and spurious cells (Figs. 14c,d). For the 30-dBZ threshold, the improvement of ETS and CSI by GLM + RAD over RAD persists until the 3-h forecast. At 40 dBZ, however, RAD produces higher ETS and CSI for the 1.5–3 h forecast. GLM has lower ETS for the first 1.5 h of forecast but then surpasses GLM + RAD. The difference between RAD and GLM is more notable during the first half hour, where RAD produces much higher ETS than GLM. After that, GLM outperforms RAD in terms of ETS, POD, and CSI forecast at 30 dBZ but produced similar ETS scores at 40 dBZ. The ETSs for all DA experiments undergo a sharp decrease with time after 1 h forecast. Such a marked drop is especially evident for RAD and GLM + RAD, a result that is consistent with past cloud-scale lightning DA works highlighting that longer-term solutions tend to progressively become bounded by bias and errors contained in the large-scale environment (e.g., Fierro et al. 2015).
The hourly rainfall forecasts (Fig. 15) exhibit broadly consistent results with those from CREF. The 1-h forecasts from GLM + RAD and RAD are relatively similar, with higher POD and SR than GLM (Figs. 15a–c). At 3-h forecast, however, GLM produces the largest POD (at the cost of producing the largest bias and lower SR). RAD shows overall the largest value of SR and CSI but smallest value of POD (Figs. 15a–c). Similar to CREF, both RAD and GLM + RAD produce the best ETS at 1 h, whose values experience a marked drop afterward (Figs. 15d–f). At 2–3 h forecasts, the ETS for GLM and GLM + RAD remain similar. For the 2-h forecast, GLM produces the highest ETS at 2.5 and 5.0 mm while RAD produces the best ETS at 10 mm. For all thresholds, RAD generally shows the largest ETS at 3-h forecasts although being small (e.g., <0.4 at 10 mm) and not significantly different from CTRL (~0.25, Fig. 15f).
Overall, the assimilation of GLM lightning data over radar has a neutral to positive impact on the short-term forecasts of convective-scale severe weather events. In line with Fierro et al. (2019), the assimilation of GLM lightning data alone is able to notably improve short-term (1–2 h) forecast skill over CTRL. The sharp, gradual loss in model skill only a few hours (~3 h) following the analysis time has also been reported in prior cloud-scale 3DVAR DA studies assimilating radar (e.g., Schenkman et al. 2011; Gao et al. 2018) and/or lightning data (Fierro et al. 2015, 2019). When simultaneously assimilating both GLM and radar data, forecast skill is improved further relative to when either only radar or only lightning data are assimilated, especially for 0–1 h forecasts. Similar to previous DA studies involving adjustments of moisture- or precipitation-related proxy variables for lightning (or hydrometeors through a cloud analysis), however, a higher POD is generally accompanied by an increase in wet bias. Fierro et al. (2015) indicated that in the context of qυ-based LDA, there generally was a systematic trade-off between wet biases in the forecast and short-term forecast improvements. Such wet biases earlier on in the forecast can have profound detrimental impacts on the longer-term solution through faster saturation of errors at the smaller scales, and thus should be addressed in future endeavors either by building on the qυ conservation concept proposed in Fierro et al. (2019) or by exploring alternative means to confine the impact of moisture adjustments inferred by lightning or other types of observations near convective cores.
c. DA statistics
Following the line of work of Fierro et al. (2019), the variation of total cost function together with DA statistics including the mean and root-mean-square (RMS) of the innovation and analysis residual during DA cycles are analyzed to provide an evaluation of the quality and performance of the 3DVAR analyses. Because the results for each case were qualitatively similar, only one case is presented for brevity, namely, 1 May 2018. The cost function minimized iteratively by the 3DVAR algorithm during each analysis is defined, for instance, by Eq. (1) in Gao et al. (2013). For a converging analysis, the cost function is anticipated to decrease with iterations in each minimization step.
The innovation defines the difference between the observation and the background, and the analysis residual measures the difference between the observation and the analysis. The mean innovation (referred to as MEANinnov) and MEAN analysis residual (referred to as MEANres) are calculated to estimate a measure of bias between the observation and model. The RMS statistics for the innovation (referred to as RMSinnov) and analysis residual (referred to as RMSres) provide a measure of the Euclidian distance between the observation and the background and between the observation and the analysis, respectively. The reader is invited to consult Fierro et al. (2019) for details behind their formulations. Because statistics from both 3DVAR passes have similar behavior, only those for the second pass are discussed.
In general, the magnitude of the (normalized) total cost function decreases with iterations, with GLM producing the steepest drop (about 90%) during the first few iterations (Fig. 16a). In RAD and GLM + RAD, significantly more radar observations are being assimilated (volumetric scans) and because of the nonlinearity relationship between reflectivity and hydrometer variables, the total cost function decreases less rapidly (Figs. 16b,c). The lower values of RMSres compared to corresponding values of RMSinnov for all variables in each experiment indicate that the analysis is closer to the observations than the background, which confirms that the 3DVAR system generally ingests all the available observations reasonably well (Figs. 16d–f). The slight decrease of RMSinnov for the GLM-derived qυ and radial velocity υr with DA cycles in all experiments indicates particularly good assimilation of GLM-derived qυ observations and radar radial velocity observations by the system. The nonlinearity of the radar reflectivity operator (Gao et al. 2016) coupled with phase errors and the development of spurious convection all account for the gradual increases seen for RMSinnov with cycling before 0000 UTC (Figs. 16e,f).
Positive MEANinnov and MEANres values (close to zero) for qυ in (Figs. 16g,i) indicate that qυ fields from both the background and the analysis are smaller than the GLM-derived pseudo observations for qυ (which assumes near saturation w.r.t. water). This indicates that the 3DVAR analysis properly adjust the background qυ toward saturation within lightning columns. MEANinnov and MEANres values for υr are quite small (Figs. 16h,i), indicating little overall bias. Both background and analysis have higher reflectivity values than the radar observations, as revealed by the negative MEANinnov and MEANres values for reflectivity in both RAD-based experiments (Figs. 16h,i). This systematic overestimation of reflectivity by the 3DVAR analysis accounts for the overprediction of rainfall as seen, for instance, in Figs. 10d and 10e.
6. Summary and conclusions
This study evaluated the impact of assimilating pseudo-qυ observations derived from spaceborne GLM total lightning data into convective-scale NWP for short-term severe weather forecasts. The NSSL 3DVAR system was utilized to assimilate GLM lightning-derived pseudo-qυ observations by building on initial concepts put forth in Fierro et al. (2016, 2019) wherein water vapor mass mixing ratio is adjusted to near saturation within a confined layer at and around observed lightning locations.
To provide a more systematic assessment of the performance of GLM lightning DA, five high-impact weather events that occurred in May 2018 over the Great Plains of the United States were examined. For each of the five cases, the study domain was chosen based on the SPC “day 1” convective outlook product. To determine an optimal configuration for the GLM DA, sensitivity tests were conducted first, which focused on evaluating the impact of the (i) horizontal decorrelation length scale, (ii) 3DVAR cycling frequency, and (iii) accumulation period for the GLM data prior to each 3DVAR analysis. Qualitative and quantitative assessments of composite reflectivity fields and accumulated precipitation analysis/forecast against observations using aggregate statistics revealed that assimilating 10-min accumulated GLM lightning data every 15 min with a decorrelation length scale of 3 km for qυ yielded the best results.
With the above settings for the GLM DA, two companion experiments assimilating level II radar data (VAD-derived winds, radial velocity, and radar reflectivity) with or without GLM total lightning-derived pseudo-qυ observations were conducted. Overall, improvements on short-term (0–3 h) forecasts in the aggregate were revealed by all assimilation experiments including either radar and/or GLM lightning data. Similar to prior cloud-scale DA studies, forecast skill was characterized by a marked drop after about 1 h. For these five cases, the forecast skill for the experiment assimilating only GLM lightning-derived qυ data was comparable to that of the radar-only DA experiment (Fierro et al. 2016). Similar to previous cloud-scale lightning DA studies, the short-term improvements in precipitation forecasts were generally accompanied by a noteworthy increase in wet bias, especially later in the simulation (≥3 h, e.g., Fierro et al. 2015). Overall, higher 0–1 h forecast skill was obtained when radar data were assimilated. The combination of GLM and RAD helps improve POD and ETS forecast but at the cost of exacerbating any existing wet biases (Fierro et al. 2016).
One of the critical limitations of the present LDA scheme lies in its inability to suppress spurious convection even when combined with current radar DA approach. Current parallel work is investigating modifications to the radar DA scheme wherein radar reflectivity fields from background and observed fields are used in concert to identify areas of spurious convection where negative qυ increments should be imposed. Besides, available traditional observations such as pressure, temperature, moisture, and wind observations from surface observing platforms such as mesonets, as well as upper-air observing instruments such as radiosondes and wind profiles, could also be effectively assimilated in future work to help better represent the observed near-storm environment. It is relevant to also point out that this study only considered GLM flash centroids as an effort to primarily facilitate comparisons between previous studies that assimilated typical point-flash data from ground-based sensors (e.g., Fierro et al. 2012; Marchand and Fuelberg 2014; Fierro et al. 2019). Further research will be devoted at leveraging the GLM group and event locations, which better depict the spatial extent of the GLM flashes and, hence, could help better represent convective and stratiform areas in the analysis (Peterson 2019). Auxiliary spaceborne datasets such as the GOES-R Series Advanced Baseline Imager (ABI) product should also be used in tandem with the GLM to better depict the nature of clouds beyond electrically active regions.
Funding was provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA11OAR4320072, U.S. Department of Commerce. This work was further supported by the National Oceanic and Atmospheric Administration (NOAA) of the U.S. Department of Commerce under Grant NOAA-NWS-NWSPO-2018-2005317 (Award NA18NWS4680063). The simulations were conducted on NSSL’s local HPC resources (Odin), with auxiliary computer resources provided by the Oklahoma Supercomputing Center for Education and Research (OSCER) hosted at the University of Oklahoma and by NOAA (Jet). Thanks also go out to Gerry Creager for IT support at NSSL and to Thomas Jones for his valuable comments and suggestions on an earlier version of the manuscript.