1. Introduction
Thunderstorms have caused approximately $200 billion (U.S. dollars) in property damage (NOAA 2019) since the beginning of the millennium. Most severe weather hazards occur with only tens of minutes of advance warning (average tornado warning lead time ≈ 17 min) because the National Weather Service issues warnings based upon the detection of a hazard, either from surface reports, radar detections, or an imminent threat as determined by the forecaster (Stensrud et al. 2013). Based on the current warning paradigm (i.e., warn-on-detection), the average warning lead time for detected tornadoes has remained relatively constant since 1986 (Stensrud et al. 2013) and is unlikely to substantially increase without degrading warning skill (e.g., increasing the number of false alarms) (Brooks 2004). To extend warning lead time, the NWS is working on adopting the Warn-on-Forecast paradigm (Stensrud et al. 2009, 2013) where frequently updated, convection-allowing model (CAM) ensemble forecast guidance is incorporated into the warning process. Due to the rapid error growth of convective-scale forecasts, in addition to limiting sources of forecast model error, one key component of a skilled forecast system is to limit initial condition errors by developing a skilled data assimilation (DA) system.
Various Warn-on-Forecast prototype systems (e.g., Yussouf et al. 2013; Wheatley et al. 2015; Jones et al. 2016; Snook et al. 2016; Lawson et al. 2018; Labriola et al. 2019; Stratman et al. 2020) use an ensemble Kalman filter (EnKF; Evensen 1994, 2003) to assimilate observations and initialize 0–6-h CAM forecasts. An EnKF is a Monte Carlo implementation of Kalman filter (Kalman 1960), which is itself based on optimal estimation theory. EnKFs derive forecast background error covariance statistics from an ensemble of short-term forecasts instead of trying to evolve it in time using expensive prediction equations (Evensen 1994, 2003). This technique is particularly attractive when compared to static covariance models, which often assume the background error covariance is spatially homogenous and isotropic (Parrish and Derber 1992; Purser et al. 2003). Error covariances at the convective scale are inhomogenous and anisotropic, and are thus better represented by flow-dependent covariances derived from the forecast ensemble. The filter can update unobserved model state variables using often a limited number of observed parameters. This is the case with radar observations where radar reflectivity Z and radial velocity Vr are primarily available at the convective scale. Many observation system simulation experiments (OSSEs) and real-data experiments have demonstrated that assimilating Z and Vr observations improves estimated thermodynamic, dynamic, and microphysical state variables (e.g., Snyder and Zhang 2003; Zhang et al. 2004; Dowell et al. 2004; Caya et al. 2005; Tong and Xue 2005; Xue et al. 2010; Dawson et al. 2012; Jung et al. 2012; Johnson et al. 2015; Snook et al. 2015; Supinie et al. 2016; Wang and Wang 2017; Tong et al. 2020). Assimilating other convective-scale observations such as polarimetric radar observations and satellite radiances may also decrease initial conditions errors (Jung et al. 2008; Jones et al. 2018b; Zhang et al. 2019; Putnam et al. 2019). While EnKF systems have demonstrated their potential to assimilate radar observations, performance of the DA system and skill of subsequent forecasts is contingent upon the optimization of the DA configurations. Each DA system contains many tuning parameters and further, the optimization is sensitive to the model configurations and observations assimilated (e.g., Dowell et al. 2004; Tong and Xue 2005; Sobash and Stensrud 2013; Johnson and Wang 2017).
Previous studies (e.g., Zhang et al. 2004; Dowell and Wicker 2009; Johnson and Wang 2017) have found DA system configurations can have a substantial impact on the subsequent forecast skill. For example, EnKF experiments employ covariance inflation and localization to mitigate poor estimates of the background error statistics. Additional considerations when assimilating radar observations include observation error specification, data thinning interval, and assimilation frequency. The remainder of this introduction discusses these commonly used techniques used to improve EnKF initial condition estimates.
EnKF systems are often underdispersive due to the limited ensemble size and unaccounted model errors (e.g., Houtekamer and Mitchell 1998; Romine et al. 2014), which makes the model overconfident and reduces the influence of observations. Repeated assimilation of dense observations can reduce ensemble spread so much that the EnKF can no longer effectively assimilate observations because it becomes overconfident about the ensemble forecast, leading to filter divergence (Jazwinski 1970; Anderson and Anderson 1999). Spread maintenance algorithms artificially increase ensemble variance to better represent the true uncertainty of the atmospheric state during DA (Anderson 2001; Hamill et al. 2001). Some spread inflation algorithms add noise to the posterior ensemble (e.g., Dowell and Wicker 2009; Sobash and Wicker 2015) while some, such as the relaxation-to-prior-spread (RTPS) algorithm (Whitaker and Hamill 2012), increase the spread of posterior ensemble to a fraction of the spread of the prior ensemble. All methods are met with varying degrees of success between experiments, and often a combination of these methods is used to optimize EnKF performance (Jung et al. 2012).
The quality of background error covariance is often poor because it is computed from an insufficient number of ensemble members (e.g., Houtekamer and Mitchell 1998; Hamill et al. 2001). Such sampling errors cause grid points to become spuriously correlated with distant observations, and consequently the filter erroneously updates the background forecasts far from the assimilated observation. A distance-based Gaussian weighting function (Gaspari and Cohn 1999) is often used to localize the influence of observations; the optimal cutoff radius is contingent upon many factors including ensemble size, observation type, density, location, computational cost, model resolution, and correlation length scale from model dynamics (Sobash and Stensrud 2013; Ying et al. 2018). Most studies (e.g., Snyder and Zhang 2003; Zhang et al. 2004; Dowell et al. 2004; Caya et al. 2005; Tong and Xue 2005; Aksoy et al. 2009; Jung et al. 2012; Sobash and Stensrud 2013; Johnson et al. 2015; Wheatley et al. 2015) use a small cutoff radius for radar observations (4–18 km horizontal, 4–8 km vertical) because they are dense and the convective-scale flows have small spatial correlation length scales.
Radar data require preprocessing prior to assimilation because observations are provided on radar polar coordinates at resolutions finer than most forecast models. Most EnKF experiments assimilate coarsened radar data that are interpolated to a regular grid or the model grid itself. For example, Xue et al. (2006) assimilates radar observations that are interpolated to the horizontal model grid but kept on the radar elevation levels in the vertical. After interpolation, some experiments assimilate a fraction of the available radar observations to decrease the computational expense (Gao and Xue 2008) or mitigate the effects of spatially correlated observation errors by removing neighboring observations (Chang et al. 2014). Thinning assimilated observations can also decrease the ensemble spread reduction during EnKF DA, though this does not necessarily improve forecast skill (Aksoy et al. 2012). Radar data are most commonly thinned at regular user-specified intervals throughout the model domain. Despite potential benefits, thinning observations can remove important in-storm observational information during DA and limit the analysis skill.
Modeling errors on the convective scale in particular are poorly understood and thus represent a challenge when assimilating radar observations. While unrealistic, most EnKF-based studies assume radar observation errors are Gaussian in nature, constant in standard deviation, and spatially uncorrelated. Well-calibrated WSR-88D radars have observational errors of approximately 1 dBZ (Z) and 1 m s−1 (Vr) (Doviak and Zrnić 1993; Ryzhkov et al. 2005), but DA experiments typically assume larger errors to account for representativeness errors and other uncertainties (such as those with observation operators). Increasing observation errors also alleviates ensemble underdispersion by decreasing spread reduction during DA and is shown to improve performance of ensemble analyses (e.g., Dowell et al. 2004; Snook et al. 2013).
To obtain accurate initial conditions for convective storms, radar observations are often assimilated at high frequencies. Real-time systems often assimilate radar observations every 15 min (e.g., Wheatley et al. 2015; Johnson et al. 2017; Snook et al. 2019), although improvements to computational infrastructure and new weather radar technology such as phased-array radars (Weber et al. 2007; Zrnic et al. 2007; Heinselman and Torres 2011; Curtis and Torres 2011) provide the opportunity to assimilate observations more frequently in the future. OSSEs that assimilate rapid-scanning radar information note a decrease in the spinup time for convection and a reduction in errors in both observed and unobserved variables (Zhang et al. 2004; Xue et al. 2006; Yussouf and Stensrud 2010). Real-data cases have successfully assimilated radar information at relatively high frequencies (≤5 min) (Snook et al. 2011, 2016; Jung et al. 2012; Supinie et al. 2017; Labriola et al. 2017; Stratman et al. 2020); however, rapid updates can introduce imbalances that are unable to adjust to the model before the next DA cycle and can degrade both analysis and forecast skill (Wang et al. 2013; Johnson and Wang 2017) unless extra care is taken.
Most studies designed within the Warn-on-Forecast framework evaluate forecast and analysis sensitivity to a limited number of EnKF parameters for radar DA including covariance inflation (Dowell and Wicker 2009), covariance localization (Sobash and Stensrud 2013), observation data thinning (Gao and Xue 2008), DA frequency (Yussouf and Stensrud 2010; Stratman et al. 2020), and prescribed observation errors (Gao and Xue 2008; Dowell et al. 2011; Snook et al. 2013). Evaluating only a few parameters allows these studies to analyze forecast sensitivities in greater detail; however, the forecast skill remains suboptimal given the other parameters are untuned. Further, these experiments do not consider the computational limitations when running a real-time CAM forecast system over a large-area domain such as the contiguous United States (CONUS). This is the first study to tune the EnKF radar DA parameters for a real-time short-term CAM forecast system that can be deployed over the full CONUS domain. Comparing forecasts initialized using many different EnKF configurations demonstrates how each parameter impacts forecast skill, and which configuration optimizes forecast skill.
This study evaluates short-term (0–6 h) forecasts for a mesoscale convective system (MCS) event on 28–29 May 2017 that are initialized using an EnKF system based on the NCEP operational Gridpoint Statistical Interpolation (GSI) framework, using a forecast ensemble that is based upon the Center for Analysis and Prediction of Storms (CAPS) Storm Scale Ensemble Forecast (SSEF) run during the 2017 Hazardous Weather Testbed Spring Forecast Experiment (CAPS 2017). The rest of the paper is organized as follows: section 2 provides a brief overview of the case study, a description of the experiments, and verification procedures. The control experiment results including objective and subjective forecast evaluations are discussed in section 3, while section 4 discusses the radar DA parameter sensitivity experiment results. Section 5 summarizes the results of this study and discusses potential future directions for research.
2. Event overview, experiment configuration, and verification methodology
a. Case overview
The focus of this study is to forecast the evolution of an MCS and nearby isolated convective storms that produced strong winds, hail, and tornadoes in Texas, Louisiana, and Mississippi on 28–29 May 2017 (Fig. 1). Upper-level wind patterns were favorable for convective development during this event; a trough was located over the central United States, and Texas was to the right of a jet entrance region. At approximately 2000 UTC 28 May multiple thunderstorms initiated along a frontal boundary extending from Arkansas toward the Texas–Mexico border. Thunderstorms that initiated over eastern Texas quickly grew in scale between 2200 and 0000 UTC as they ingested unstable air (CAPE > 2200 J kg−1). The storms eventually merged to form a squall line that impacted Louisiana and Mississippi between 0000 and 0600 UTC 29 May (Fig. 2). Storms embedded within the line produced several weak tornadoes and multiple wind reports extending from Shreveport to Jackson (Fig. 1). Between 0000 and 0200 UTC isolated thunderstorms in Texas (Figs. 2a,b) produced multiple hail and wind reports (Fig. 1), but by 0300 UTC (Fig. 2c) many of these storms weakened and formed a large region of mostly stratiform precipitation. With the exception of convection near Houston and the MCS located east of Jackson, precipitation remains stratiform through 0600 UTC (Fig. 2d).

A diagram of the 28–29 May 2017 forecast domain, states and countries are labeled in bold. A legend for hail, wind, and tornado SPC storm reports is provided in the upper-left-hand corner. Cities referred to during the study are marked with a fuchsia “X.”
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

A diagram of the 28–29 May 2017 forecast domain, states and countries are labeled in bold. A legend for hail, wind, and tornado SPC storm reports is provided in the upper-left-hand corner. Cities referred to during the study are marked with a fuchsia “X.”
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
A diagram of the 28–29 May 2017 forecast domain, states and countries are labeled in bold. A legend for hail, wind, and tornado SPC storm reports is provided in the upper-left-hand corner. Cities referred to during the study are marked with a fuchsia “X.”
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

MRMS observed column maximum Z valid at (a) 0015, (b) 0100, (c) 0300, and (d) 0600 UTC. Cities referred to during the study are marked with a fuchsia “X” and are the same as in Fig. 1.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

MRMS observed column maximum Z valid at (a) 0015, (b) 0100, (c) 0300, and (d) 0600 UTC. Cities referred to during the study are marked with a fuchsia “X” and are the same as in Fig. 1.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
MRMS observed column maximum Z valid at (a) 0015, (b) 0100, (c) 0300, and (d) 0600 UTC. Cities referred to during the study are marked with a fuchsia “X” and are the same as in Fig. 1.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The presence of active MCS convection within the DA period before the forecast initial condition time, and during several hours of forecast make this case suitable for investigating the impact of DA and its configurations on subsequent storm forecasts. Forecasts focus on the Texas, Louisiana, and Mississippi regions. Experiment conclusions are limited because this is a single cast study; however, analyzing a single case allows this study to test multiple EnKF parameters and determine an optimal radar DA configuration that can be employed in the future.
b. Control experiment DA system settings
This subsection describes the control experiment (hereafter referred to at CTRL) DA configuration (Table 1), modified DA configuration parameters are discussed in section 2d. This study uses the GSI EnKF system for DA. The GSI system performs observation quality control (QC) and applies forward observation operators to the model background to generate observation priors. A version of the ensemble square root filter (Whitaker and Hamill 2002) that precalculates observation priors then updates them together with the state variables within the filter (Anderson and Collins 2007) is used. Recently, the GSI EnKF system was updated to assimilate radar observations; radar forward operators were added to the GSI suite and the EnKF system was given the capability to update hydrometeor variables (e.g., Johnson et al. 2015; Jones et al. 2018a; Tong et al. 2020). CAPS recently added radar reflectivity forward operator consistent with the Thompson microphysics scheme. The new operator follows Jung et al. (2008), which uses the T-matrix method (Vivekanandan et al. 1991; Bringi and Chandrasekar 2001) for raindrops and Rayleigh scattering approximation for ice hydrometeor species (i.e., snow, hail, graupel) to calculate scattering amplitudes. In this study, radar observations undergo automatic QC including velocity dealiasing, using a procedure developed by CAPS (Brewster et al. 2005). These enhancements allow the GSI EnKF system to be used in several storm-scale modeling studies (e.g., Johnson et al. 2015; Johnson and Wang 2017; Johnson et al. 2017; Jones et al. 2018b; Jung et al. 2018a,b; Chipilski et al. 2020).
CTRL experiment configuration.


CTRL follows the 2017 CAPS storm-scale ensemble configuration (CAPS 2017), except the DA parameters (Table 1), which are inherited from the GSI EnKF system used to generate the 2019 CAPS storm-scale ensemble initial conditions (Clark et al. 2019). CTRL is initialized from the 1800 UTC North American Mesoscale (NAM) analysis plus perturbations derived from 3-h forecasts of the 1500 UTC cycle Short-Range Ensemble Forecast (SREF) (Table 1). Perturbations are constructed by taking the difference between two selected members among the 24 SREF members. The differences and their negative counterparts, which are the same in magnitude but opposite in direction, are added to the NAM analysis to produce two perturbed members. This is repeated to create 39 perturbed members, the first ensemble member remains unperturbed. SREF and NAM forecasts provide lateral boundary condition information. For more details on the initial and lateral boundary conditions [see Table 3 of the “CAPS Spring Forecast Experiment Program Plan” (CAPS 2017)].
Conventional observations (e.g., surface stations, buoys, soundings) are assimilated hourly between 1900 and 0000 UTC, and radar (Z and Vr) observations are assimilated every 15 min in the last hour. A schematic on the setup of DA cycles is shown in Fig. 3. Covariance localization uses the Gaspari and Cohn (1999) weighting function with user-specified cutoff radii. In this study the horizontal cutoff radius is 300 km for conventional observations and 12 km for radar observations. The vertical cutoff radius for all observations is 0.7 scale height (natural log of pressure). The RTPS covariance inflation algorithm is used to restore the spread of the analysis to 99% of the background spread. At 0000 UTC an ensemble of 10 free forecasts are initialized from the final analyses and run to 0600 UTC. The ensemble members are initialized following the same procedure as the CAPS real-time system; eight forecasts are initialized from the ensemble member analyses selected considering physics and graupel density diversity and two are initialized from the ensemble mean analysis [see Table 4 of CAPS (2017)]. The 0000 UTC NAM forecast and 2100 UTC SREF forecasts provide lateral boundary conditions. Ensemble members selected to initialize forecasts employ a diverse suite of model physics options, and are chosen to enhance forecast diversity. The forecast configuration is discussed in the following subsection.

Flow diagram detailing the CTRL DA experiment configuration. A bold vertical line at 1800 UTC marks when the ensemble of forecasts is first initialized. The “X” symbols mark when surface observations are assimilated, and the downward pointing arrows mark when radar observations are assimilated. The final free forecasts start at 0000 UTC, after 6 h of DA cycles.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

Flow diagram detailing the CTRL DA experiment configuration. A bold vertical line at 1800 UTC marks when the ensemble of forecasts is first initialized. The “X” symbols mark when surface observations are assimilated, and the downward pointing arrows mark when radar observations are assimilated. The final free forecasts start at 0000 UTC, after 6 h of DA cycles.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Flow diagram detailing the CTRL DA experiment configuration. A bold vertical line at 1800 UTC marks when the ensemble of forecasts is first initialized. The “X” symbols mark when surface observations are assimilated, and the downward pointing arrows mark when radar observations are assimilated. The final free forecasts start at 0000 UTC, after 6 h of DA cycles.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
For radar DA, Z observations are from the Multi-Radar Multi-Sensor (MRMS; Smith et al. 2016) products. The MRMS system performs QC on 140 WSR-88D radars and generates a mosaic of the observations on a three-dimensional grid with a horizontal resolution of 0.01° latitude × 0.01° longitude and 33 vertical levels. Radial velocity Vr observations are processed using CAPS’s software package that includes QC. The data are interpolated to the model grid horizontally and kept on radar elevation levels in the vertical for each radar site (Xue et al. 2006). The Z and Vr observations are thinned horizontally to every 6 km during assimilation. Reflectivity Z observations are also thinned vertically to every 1 km in radar echoes (Z > −90 dBZ) and to every 2 km in clear air regions (Z < −90 dBZ). Such thinning is done to fit the GSI EnKF analyses into available computer memory, which is necessary when a large continent-size domain is used during, e.g., the CLUE experiment (the GSI memory usage is inefficient). The Vr and Z observation errors are assumed to be 3 m s−1 and 6 dBZ, respectively, during DA.
c. Prediction model settings
Aside from the smaller forecast domain focused on the MCS impacted region, the grid specifications and model physics largely follow the CAPS GSI EnKF-initialized ensemble (Jung et al. 2018b) that is part of the CLUE (Clark et al. 2018). Forecasts are run using the Advanced Research version of theWeather Research and Forecast Model (WRF-ARW; Skamarock et al. 2008) version 3.8.1. The forecast domain spans much of Texas, Louisiana, and Mississippi (Fig. 1) with 433 × 241 grid points in the horizontal and 51 vertical levels in sigma-pressure coordinates. The horizontal grid spacing is 3 km. The vertical grid follows the 2017 CLUE configuration where the finest vertical grid spacing is located near the surface. All ensemble members use the Rapid Radiative Transfer Model (RRTM; Mlawer et al. 1997) for general circulation model (RRTMG: Iacono et al. 2008) to represent short- and longwave radiation. During DA (1800–0000 UTC), forecasts are run with the aerosol aware Thompson microphysics scheme (Thompson et al. 2008; Thompson and Eidhammer 2014) but with varying graupel density across the members. The 0000 UTC EnKF-initialized forecasts vary microphysics schemes (Thompson; Thompson et al. 2008), Morrison (Morrison and Grabowski 2008), Milbrandt and Yau (MY; Milbrandt and Yau 2005), National Severe Storms Laboratory (NSSL; Mansell et al. 2010), and Predicted Particle Properties (P3; Morrison and Milbrandt 2015) between ensemble members. Ensemble members also employ different planetary boundary layer physics parameterizations (Mellor–Yamada–Janjić (MYJ; Janjić 1990, 1996, 2001), Yonsei University (YSU; Hong et al. 2006), and Mellor–Yamada–Nakanishi–Niino (MYNN; Nakanishi and Niino 2009) and the Noah land surface model (Chen and Dudhia 2001) both during the DA and forecast periods. An in-depth overview of ensemble member physics options is provided in CAPS (2017).
d. DA configuration experiments
Sensitivity experiments are designed to evaluate the impact of EnKF DA configurations on the forecast skill. Each experiment repeats the CTRL DA procedure (Table 1) except one aspect of the DA configuration is modified when assimilating radar observations. Experiment results analyze forecast sensitivity to data thinning, covariance localization, covariance inflation, observation errors, and assimilation frequency for radar observations. These parameters are tested over a range of values employed by previous studies (e.g., Gao and Xue 2008; Dowell et al. 2011; Sobash and Stensrud 2013; Snook et al. 2013; Wheatley et al. 2015; Supinie et al. 2017; Stratman et al. 2020). The remainder of the subsection defines the range of tested parameter values.
Data thinning experiments assimilate radar observations thinned over increasingly large intervals either horizontally (3, 6, 9 km) or vertically (500 m, 1 km, 2 km) and are labeled by the direction observations are thinned both horizontally and vertically (ThinH3V1, ThinH6V1, ThinH9V1, ThinH3V0.5, ThinH3V2). Following CTRL, all experiments thin Z observations over twice as large of a vertical interval in clear air regions. Radial velocity Vr observations are not further thinned during this study because the data are preserved on about a dozen radar elevations and thus already sparser than MRMS Z. The covariance localization experiments vary the localization radius for radar observations either horizontally (6, 12, 18 km) or vertically (0.4, 0.7, 1.0 scale height) during DA and are referred to by the length of the applied radius in the horizontal and vertical directions (CovH6V0.7, CovH12V0.7, CovH18V0.7, CovH12V0.4, CovH18V1.0). The covariance inflation experiments vary the percentage (80%, 90%, 99%, 110%) that posterior ensemble spread is relaxed to that of the prior ensemble via the RTPS algorithm when radar observations are assimilated (2300–0000 UTC). Experiments are referred to by the inflation factor applied (Inf80, Inf90, Inf99, Inf110). Observation error experiments vary observation errors for Vr (3, 6 m s−1) and Z (6, 9 dBZ) and are referred to by the assumed errors (3ms6dBZ, 6ms9dBZ).
Experiments that assimilate radar data at higher frequencies have demonstrated mixed success in previous studies. Stratman et al. (2020) suggest that assimilating observed Z more frequently can cause predicted storms to spin up more quickly and better suppress spurious convection. Frequently assimilating observations can also introduce imbalances into the ensemble that propagate with time to degrade forecast skill (Hu and Xue 2007; Johnson and Wang 2017). To determine the impact of radar DA frequency, Z and Vr observations are assimilated at 5-, 10-, and 15-min intervals during the final hour of DA (2300–0000 UTC). These intervals roughly correspond to the frequency that WSR-88D radars sample the atmosphere (~5 min) and the frequency some current real-time systems assimilate radar observations (e.g., Wheatley et al. 2015; Jung et al. 2018a). DA frequency experiments are referred to by how frequently observations are assimilated (5, 10, 15 min).
e. Forecast evaluation
The 0000–0600 UTC forecast Z is subjectively and objectively verified against observations to evaluate the predicted evolution of storm structure. This forecast evaluation period corresponds to when the impact of assimilated radar observations is most prominent (Kain et al. 2010), and when the Warn-on-Forecast paradigm is expected to offer the most benefit to operational forecasters (e.g., Stensrud et al. 2009).
During forecast evaluation small errors in storm placement can substantially degrade objective performance by double penalizing a forecast i.e., adjoining grid points may receive a false negative and false positive. This problem is exacerbated when verifying localized events (e.g., convective storms) but can be ameliorated by verifying the occurrence of an event within a prescribed radius. The neighborhood maximum ensemble probability (NMEP; Schwartz et al. 2010; Schwartz and Sobash 2017) method is used to generate probabilistic forecasts. This study verifies the probability of Z exceeding 40 dBZ [P(Z > 40 dBZ)] within a 12-km neighborhood. This neighborhood radius reduces the impact of small forecast displacement errors but ensures the short-term forecast remains precise enough to detect storm structures related to localized severe weather impacts. A Gaussian filter with a smoothing length scale of 12 km is applied to the output to smooth probabilistic forecasts. It is noted that this study verifies a relatively large Z threshold; this is done to verify the location of predicted storm cores and to eliminate regions of stratiform precipitation from the statistics.
Probabilistic forecast skill is objectively evaluated using the Brier skill score (BSS; Brier 1950), which can be decomposed into three distinct components (Murphy 1973): reliability, resolution, and uncertainty (Table 2). Reliability is the difference between predicted probability and observed frequency, forecast skill improves when this difference is minimized. Resolution, which should be maximized to improve ensemble performance, is the difference between the climatological probability and the observed relative frequency for a given probability threshold. Unlike the other two components, forecast uncertainty cannot be changed through calibration and is a function of the climatological probability. Reliability diagrams provide a visual representation of the BSS by plotting forecast probability against observed frequency over increasingly large thresholds. Forecast probability is equal to observed frequency in an unbiased system, and the reliability curve falls along the one-to-one line. If forecast probability is larger (smaller) than observed frequency then the curve falls below (above) the one-to-one line and the forecast is high (low) biased. Reliability diagrams also provide the frequency occurrence of each probability to evaluate forecast sharpness.
Scores used to evaluate forecasts in the study. The number of forecasts is N, the number of forecasts for a given k probability threshold is nk,


3. CTRL results
Shortly after the start of the forecast at 0015 UTC (Fig. 4a) CTRL predicts the P(Z > 40 dBZ) to be relatively high (>0.90) within the confines of observed storms including the MCS located near Shreveport, Louisiana. Although CTRL predicts the P(Z > 40 dBZ) to exceed 0.7 for isolated storms located along the Mexico–Texas border at 0015 UTC (Fig. 4a), probabilities diminish by 0100 UTC (Fig. 4b). This region is cooler and drier than locations near the coast, suggesting that the environment is less conducive for convective development. Several forecasts in the CTRL ensemble (Fig. 4a) also predict spurious storms to initiate to the north and east of San Antonio. Reliability diagrams show probabilistic 0015 UTC forecasts exhibit a slight overprediction bias for both low and moderate probability threshold events [P(Z > 40 dBZ) < 0.6] (Fig. 5) primarily due to the spurious convection in Texas. Despite this bias, the BSS at 0015 UTC (0.52) is relatively high compared to later forecast times because CTRL is unbiased for high-probability events [P(Z > 40 dBZ) > 0.8].

The P(Z > 40 dBZ) predicted by CTRL valid at the labeled times. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The P(Z > 40 dBZ) predicted by CTRL valid at the labeled times. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The P(Z > 40 dBZ) predicted by CTRL valid at the labeled times. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

Reliability diagrams for the CTRL probabilistic forecasts shown in Fig. 4. Line colors correspond to when the forecasts are valid, and the BSS for each forecast time is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

Reliability diagrams for the CTRL probabilistic forecasts shown in Fig. 4. Line colors correspond to when the forecasts are valid, and the BSS for each forecast time is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Reliability diagrams for the CTRL probabilistic forecasts shown in Fig. 4. Line colors correspond to when the forecasts are valid, and the BSS for each forecast time is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
As the MCS moves eastward through Louisiana (0100–0300 UTC), CTRL predicts the P(Z > 40 dBZ) to remain relatively high (>0.7) near the observed storms (Figs. 4b,c). Some CTRL members predict the isolated convection near San Antonio to grow in scale and form a line of storms that increases moderate forecast probabilities [P(Z > 40 dBZ) > 0.4] in Central Texas by 0100 UTC (Fig. 4b). This line of storms is spurious; the observed storms south and west of Houston at 0100 UTC are isolated (Fig. 2b) and the observed MCS is farther east. By 0300 UTC, most of the observed storms in Central Texas weaken (Fig. 2c) and leave behind a large swath of stratiform precipitation and some weak storms. CTRL also predicts the storms in Central Texas to weaken around 0300 UTC, which causes forecast probabilities near San Antonio to decrease (Fig. 4c). The BSS at 0100 UTC is lower than at 0300 UTC (Fig. 5) because more spurious storms are predicted at the earlier time. Spurious storms also cause the 0100 UTC reliability curve (Fig. 5) to become high-biased for moderate and high probability events [P(Z > 40 dBZ) > 0.4]. By 0300 UTC, when the spurious storms begin to weaken, the reliability curve (Fig. 5) becomes less biased and more closely follows the one-to-one line.
CTRL predicts storms to move too quickly, and by 0600 UTC the predicted MCS (Fig. 4d) is east of the observed storms. Storm motion biases, such as this, are commonly observed in CAM forecasts and are often a consequence of model errors (e.g., Yussouf et al. 2016). Displacement errors cause the BSS at 0600 UTC to become negative and the reliability curve to become high-biased (Fig. 5), suggesting CTRL exhibits no objective skill at this time. It is noted that the P(Z > 40 dBZ) exceeds 0.7 on the southern edge of the observed MCS (Fig. 4d). Increasing the neighborhood radius when calculating the NMEP could improve forecast performance but would make forecasts less precise, thus this is not performed. Despite displacement errors, CTRL demonstrates some qualitative skill and predicts the MCS to become less organized between 0500 and 0600 UTC, which is approximately the same time as observations (Fig. 2d).
4. Radar DA parameter results
Many studies show radar observations provide important storm-scale information, and when assimilated by an optimally configured EnKF, improve analyzed storm structure (e.g., Snyder and Zhang 2003; Dowell et al. 2004; Tong and Xue 2005) and subsequent forecasts (e.g., Snook et al. 2012, 2015). Modifying parameters directly related to radar DA, including data thinning, covariance localization radius and inflation, observation errors, and assimilation frequency initially impacts storm structure and the near-storm environment. With the exception of the DA frequency experiments (Fig. 6g), which impact Z forecast skill several hours after DA, the differences between experiments are most prominent during the first forecast hour because large- and small-scale errors grow quickly (e.g., Melhauser and Zhang 2012) and degrade the initial benefits of radar DA. At 0015 UTC reliability curves (and BSSs) are already similar between all experiments (Fig. 6). Despite considerable overlap between the reliability curves, experiments that assimilate more radar observations (i.e., smaller data thinning intervals; Figs. 6a,b) or employ a larger horizontal covariance localization radius (Fig. 6c) are more skilled. Impacts of radar DA are localized and not readily obvious in objective verification metrics that consider the full model domain and multiple storm systems. Subjective forecast evaluations performed in the remainder of this section show these DA parameters impact Z forecasts for individual storm systems (i.e., isolated convection and the MCS).

Reliability diagrams for the forecast P(Z > 40 dBZ) valid at 0015 UTC for the labeled experiments including (a),(b) radar data thinning; (c),(d) covariance localization radius; (e) covariance inflation; (f) observation errors; and (g) DA frequency. Experiments correspond to different colors, and the BSS for each experiment is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

Reliability diagrams for the forecast P(Z > 40 dBZ) valid at 0015 UTC for the labeled experiments including (a),(b) radar data thinning; (c),(d) covariance localization radius; (e) covariance inflation; (f) observation errors; and (g) DA frequency. Experiments correspond to different colors, and the BSS for each experiment is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Reliability diagrams for the forecast P(Z > 40 dBZ) valid at 0015 UTC for the labeled experiments including (a),(b) radar data thinning; (c),(d) covariance localization radius; (e) covariance inflation; (f) observation errors; and (g) DA frequency. Experiments correspond to different colors, and the BSS for each experiment is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
a. Data thinning
Thinning radar observations potentially removes important finescale details of the storms. For small storms, such as those near San Antonio, thinning radar observations removes many if not all available in-storm observations. ThinH3V1 (Fig. 7b) predicts the P(Z > 40 dBZ) to be larger than ThinH9V1 (Fig. 7c), particularly for small storms located near Mexico, because the experiment assimilates more in-storm observations. Assimilating clear air observations also improves forecast skill by suppressing spurious convection. ThinH6V2, which assimilates few clear air observations (every 4 km vertically), predicts more spurious convection, causing the P(Z > 40 dBZ) to exceed 0.7 outside of an observed Z core near San Antonio (Fig. 7e). These spurious storms contribute to a modest overprediction bias observed in the ThinH6V2 reliability curve for moderate to high probability thresholds [P(Z > 40 dBZ) > 0.6] (Fig. 6b).

The P(Z > 40 dBZ) predicted by (a) CTRL and the (b),(c) horizontal and (d),(e) vertical radar data thinning experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Dashed black and red squares mark the verification subdomains discussed in Table 3. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The P(Z > 40 dBZ) predicted by (a) CTRL and the (b),(c) horizontal and (d),(e) vertical radar data thinning experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Dashed black and red squares mark the verification subdomains discussed in Table 3. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The P(Z > 40 dBZ) predicted by (a) CTRL and the (b),(c) horizontal and (d),(e) vertical radar data thinning experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Dashed black and red squares mark the verification subdomains discussed in Table 3. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
BSSs are calculated over forecast subdomains that encompass either the isolated convection in Texas (black square, Fig. 7) or the MCS (red square, Fig. 7) to determine how the Z forecast skill varies between the storm regions (Table 3). The 15-min forecasts predict the MCS with more skill (BSS ≥ 0.51) than the isolated convection (BSS ≥ 0.35). Enhanced Z forecast skill is in part because the MCS is mature throughout the radar DA period (Fig. 8), while some isolated storms, particularly those northeast of San Antonio, rapidly grow and merge during the final 15 min of the DA window (Figs. 8d,e). Studies often assimilate observations of mature storms over many DA cycles to reduce initial conditions errors and improve forecast skill (e.g., Snyder and Zhang 2003; Dawson et al. 2012; Stratman et al. 2020). Thus, isolated convection BSSs are lower in part because fewer DA cycles capture the evolution of storms. The MCS forecasts are also less sensitive to the data thinning experiments (Fig. 7), MCS BSSs decrease less (Table 3) when experiments assimilate fewer radar observations (i.e., ThinH9V1,ThinH6V2).
Data thinning experiment BSSs at 0015 UTC verifying the P(Z > 40 dBZ). Skill scores are computed over subdomains (Fig. 7) that encompass either the isolated convection (black square) or the MCS (red square).



MRMS observed column maximum Z between 2300 and 0000 UTC, when radar observations are assimilated. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

MRMS observed column maximum Z between 2300 and 0000 UTC, when radar observations are assimilated. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
MRMS observed column maximum Z between 2300 and 0000 UTC, when radar observations are assimilated. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
b. Covariance localization
Increasing the horizontal covariance localization radius for radar observations causes ensembles to become more confident in high Z values located near observed storms for short-term forecasts. CovH6V0.7 (Fig. 9b) predicts the P(Z > 40 dBZ) to be smaller than either CovH12V0.7 (Fig. 9a) or CovH18V0.7 (Fig. 9c) for the isolated storms located near the Mexico-Texas border. CovH18V0.7 also predicts fewer spurious storms to be located between San Antonio and Houston (Fig. 9c), which decreases the P(Z > 40 dBZ) outside of observed storm cores. Because CovH18V0.7 and CovH12V0.7 predict fewer spurious storms in Texas and have improved resolution, the ensembles score a larger BSS than COVH6V0.7 at 0015 UTC (Fig. 6c). Increasing vertical localization radii modestly increases the BSS (Fig. 6d) but probabilistic forecasts appear to be insensitive to this EnKF parameter (Figs. 9d,e). Although Z forecast skill is relatively insensitive to the vertical covariance localization radius in this study, Sobash and Stensrud (2013) suggest ensembles that employ a smaller vertical localization radius produce analyses with smaller root-mean-square errors. Result differences are due to many factors such as changes in the vertical distribution of radar observations, how the vertical covariance localization radii are applied, and that the referred study is an OSSE that assumes perfect-model conditions.

The P(Z > 40 dBZ) predicted by (a) CTRL and the (b),(c) horizontal and (d),(e) vertical covariance localization experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Dashed squares in (a)–(c) mark the boundaries of the Fig. 11 domain. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The P(Z > 40 dBZ) predicted by (a) CTRL and the (b),(c) horizontal and (d),(e) vertical covariance localization experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Dashed squares in (a)–(c) mark the boundaries of the Fig. 11 domain. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The P(Z > 40 dBZ) predicted by (a) CTRL and the (b),(c) horizontal and (d),(e) vertical covariance localization experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Dashed squares in (a)–(c) mark the boundaries of the Fig. 11 domain. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
CovH6V0.7 predicts more spurious convection to develop because the ensemble is initialized at 0000 UTC (Fig. 10a) with more spurious radar echoes than CovH12V0.7 (Fig. 10b) or CovH18V0.7 (Fig. 10c). The smaller localization radius employed by COVH6V0.7 reduces the number of surrounding grid points updated by an observation and causes the EnKF to remove less precipitation from clear air regions. Regions of spurious precipitation intensify to form new convection, thus degrading COVH6V0.7 skill during the first forecast hour relative to the other covariance localization experiments (Fig. 6c).

The 0000 UTC ensemble mean analysis simulated Z at 3 km above ground level (AGL). Thick black contours mark where the observed Z at 3 km AGL exceeds 5 dBZ, and the background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The 0000 UTC ensemble mean analysis simulated Z at 3 km above ground level (AGL). Thick black contours mark where the observed Z at 3 km AGL exceeds 5 dBZ, and the background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The 0000 UTC ensemble mean analysis simulated Z at 3 km above ground level (AGL). Thick black contours mark where the observed Z at 3 km AGL exceeds 5 dBZ, and the background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Modifying horizontal covariance localization radii also alters the near-surface environment, which can have a detrimental impact on the predicted thunderstorm evolution. The CovH6V0.7 ensemble mean analysis (Fig. 11a) predicts storms to produce warmer cold pools than CovH12V0.7 (Fig. 11b) or CovH18V0.7 (Fig. 11c). Since cold pools alter the evolution of convection (e.g., Droegemeier and Wilhelmson 1985; Rotunno et al. 1988), changes in the near-surface environment impact the forecasts. The 30-min forecasts initialized from the ensemble mean analysis (Figs. 11d–f) show CovH12V0.7 (Fig. 11e) and CovH18V0.7 (Fig. 11f) predict storm updrafts to form along the cold pool boundaries. CovH6V0.7 predicts storm updrafts to be smaller in area and more fragmented (Fig. 11d) in part because of the initially weak cold pool (Fig. 11a). Although the impacts on Z forecast skill are relatively small for this case study, altering the near-storm environment, such as cold pool intensity, can potentially degrade forecast skill for other cases.

Air temperature T at the lowest model level above the surface for the horizontal covariance localization radius experiments. Thick black contours mark where the predicted updrafts exceed 1 m s−1, and the fuschia “X” marks the locations of San Antonio and Houston.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

Air temperature T at the lowest model level above the surface for the horizontal covariance localization radius experiments. Thick black contours mark where the predicted updrafts exceed 1 m s−1, and the fuschia “X” marks the locations of San Antonio and Houston.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Air temperature T at the lowest model level above the surface for the horizontal covariance localization radius experiments. Thick black contours mark where the predicted updrafts exceed 1 m s−1, and the fuschia “X” marks the locations of San Antonio and Houston.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Improved initial condition estimates cause CovH12v0.7 and CovH18v0.7 to predict the P(Z > 40 dBZ) with more skill than CovH6v0.7 during the first forecast hour (Fig. 6c). These horizontal covariance localization radii are similar to the optimal radius determined by Sobash and Stensrud (2013) (18 km) but larger than what was used in Tong and Xue (2005) (6 km). The optimal localization radius employed by Tong and Xue (2005) is smaller in part because the experiment assimilates radar observations at every model grid point. Localization radii are sensitive to observation density (e.g., Gao and Xue 2008), and thus can change with the number of assimilated observations.
c. Covariance inflation
Although BSSs do not substantially change between the covariance inflation experiments conducted (Fig. 6e), increasing the RTPS inflation factor causes the reliability curves to have an increased high bias. When the RTPS inflation factor increases the areal coverage of low and moderate forecast probabilities [P(Z > 40 dBZ) ≤ 0.5] increases modestly within the vicinity of observed convection (Fig. 12). Experiments run with the three smallest inflation factors (i.e., Inf80, Inf90, Inf99) do not substantially alter Z forecast performance (Fig. 6e); however, the Inf110 reliability curve has the largest overprediction bias and lowest BSS. This is because Inf110 predicts the P(Z > 40 dBZ) to increase most ahead of observed convection near San Antonio (Fig. 12d). Results suggest no forecast skill is gained when inflating the posterior ensemble spread beyond that of the prior ensemble for this case study. Although Z forecast skill is relatively insensitive to the RTPS inflation factor, it remains important to perform ensemble covariance inflation to prevent collapse of ensemble spread and potential filter divergence when assimilating a dense network of observations.

The P(Z > 40 dBZ) predicted by the covariance inflation experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The P(Z > 40 dBZ) predicted by the covariance inflation experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The P(Z > 40 dBZ) predicted by the covariance inflation experiments valid at 0015 UTC. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
d. Observation errors
With some minor differences, 3ms6dBZ and 6ms9dBZ reliability curves are quite similar at 0015 UTC (Fig. 6f) and consequently both ensembles have similar BSSs. Because there are only minor differences between probabilistic forecasts (not pictured), both sets of experiments are not further discussed.
e. DA frequency
Reflectivity Z forecast skill is moderately sensitive to how frequently radar observations are assimilated by the GSI EnKF system. The 5Min (Fig. 13a) and 10Min forecasts (Fig. 13b) predict fewer spurious storms than 15Min (Fig. 13c), which decreases forecast probabilities outside of the observed storm cores. Assimilating radar observations more frequently suppresses the coverage of spurious convection (Stratman et al. 2020); however, the benefit of more frequent DA cycling is short lived. The 5Min forecasts at 0015 UTC predict the P(Z > 40 dBZ) to increase outside of the observed storm cores (Fig. 13d) because spurious convection becomes more widespread in coverage.

The P(Z > 40 dBZ) predicted by (a),(d),(g) 5Min; (b),(e),(h) 10Min; and (c),(f),(i) 15Min valid at the labeled times. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The P(Z > 40 dBZ) predicted by (a),(d),(g) 5Min; (b),(e),(h) 10Min; and (c),(f),(i) 15Min valid at the labeled times. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The P(Z > 40 dBZ) predicted by (a),(d),(g) 5Min; (b),(e),(h) 10Min; and (c),(f),(i) 15Min valid at the labeled times. Thick black contours represent locations where observed Z > 40 dBZ. Background maps are as in Fig. 2.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Experiments that assimilate observations less frequently (e.g., 10Min, 15Min) exhibit the most skill throughout the remainder of the forecast period. Z forecasts differ the most for the Texas-based convection, but are relatively similar for the MCS. All DA frequency experiments predict forecast probabilities in eastern Texas to be high [P(Z > 40 dBZ) > 0.7] at 0300 UTC (Figs. 13g–i); however, 5Min (Fig. 13g) predicts the largest probabilities to be displaced from the observed storm cores. 5Min also predicts low and moderate forecast probabilities [P(Z > 40 dBZ) < 0.5] to be more widespread in coverage throughout the domain because the ensemble predicts more storms to occur. Errors in storm placement and the enhanced coverage of spurious storms modestly reduces the 5Min BSS throughout much of the forecast period.
Experiments that assimilate observations more frequently (i.e., 5Min, 10Min) are the most skilled at 0005 UTC (BSS ≥ 0.57) because the forecasts predict fewer spurious storms. This causes subsequent reliability curves to have a smaller high bias (Fig. 14a). Reliability curves for all DA frequency experiments become more similar by 0015 UTC (Fig. 6g) because spurious convection is predicted to grow in coverage. 10Min is most skilled at 0015 UTC (BSS = 0.56) because the reliability curve is closest to the one-to-one line for most moderate probability thresholds [0.3 < P(Z > 40 dBZ) < 0.7]. After 0100 UTC, the 5Min BSS is modestly lower than the other DA frequency experiments. For example, the 5Min 0300 UTC reliability curve (Fig. 14b) exhibits an increased high bias and a decreased BSS because the predicted storms are displaced from regions of high observed Z, and because the ensemble predicts low and moderate forecast probabilities to be more widespread in coverage (Fig. 13g). While 10Min and 15Min predict the P(Z > 40 dBZ) with more skill at 0300 UTC (BSS ≥ 0.28), both reliability curves (Fig. 14b) have a similar, albeit a modestly smaller, high bias.

Reliability diagrams for the probabilistic forecasts shown in (a) Figs. 13a–c and (b) Figs. 13g–i. DA frequency experiments correspond to different colors, and the BSS for each experiment is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

Reliability diagrams for the probabilistic forecasts shown in (a) Figs. 13a–c and (b) Figs. 13g–i. DA frequency experiments correspond to different colors, and the BSS for each experiment is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Reliability diagrams for the probabilistic forecasts shown in (a) Figs. 13a–c and (b) Figs. 13g–i. DA frequency experiments correspond to different colors, and the BSS for each experiment is included in the legend.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The ensemble mean absolute pressure tendency during the first six forecast hours for the three DA frequency experiments.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1

The ensemble mean absolute pressure tendency during the first six forecast hours for the three DA frequency experiments.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
The ensemble mean absolute pressure tendency during the first six forecast hours for the three DA frequency experiments.
Citation: Weather and Forecasting 36, 1; 10.1175/WAF-D-20-0071.1
Results of the DA frequency experiments concur with previous studies (e.g., Wang et al. 2013; Johnson et al. 2017; Pan and Wang 2019). Johnson and Wang (2017) hypothesize that imbalances introduced during DA do not have enough time to adjust to the model before the next assimilation cycle. It is noted that many other studies (e.g., Aksoy et al. 2009; Jung et al. 2012; Dawson et al. 2012) have successfully assimilated radar observations much more frequently (≤5-min intervals). Stratman et al. (2020), who assimilates phased array radar observations every minute, suggests frequent DA cycling can quickly spin up thunderstorms and suppress spurious convection. Given that most of the adjustment occurs within the first few forecast minutes so that all experiments have similar levels of surface pressure noise after 5 min (Fig. 15), the negative impact of assimilating 5-min data is likely insignificant given the potential benefit quickly spinning up observed storms within the model and suppressing spurious convection. It is likely 5Min skill can be further improved by more thoroughly tuning the GSI EnKF system. For real-time forecasting systems where computational cost is a significant issue, the fact that assimilating radar observations at 10–15-min intervals yields good results is encouraging.
5. Summary and discussion
This study evaluates short-term (0–6 h) convection-allowing model (CAM) forecasts initialized using the GSI EnKF system enhanced with radar data assimilation (DA) capabilities. Forecasts are run for a mesoscale convective system (MCS) event that occurred in the Southern United States on 28–29 May 2017 and produced tornadoes, strong wind gusts, and hail. The control configuration of the GSI EnKF system resembles the Center for the Analysis and Prediction of Storms (CAPS) storm-scale ensemble forecast run during the Hazardous Weather Testbed Spring Forecast Experiment as part of the CLUE. In addition to verifying forecasts run using this configuration, sensitivity experiments are run to evaluate the impact of GSI EnKF configurations for assimilating radar observations including data thinning, covariance localization and inflation, observation error specification, and DA frequency. The 10-member multiphysics ensemble forecasts and deterministic forecasts from the final EnKF ensemble and mean analyses, respectively, are run for the assessment of DA impacts on forecasts.
The CTRL DA configuration creates the most skilled forecasts in this study while remaining computationally efficient for real-time use. The initial ensemble of CTRL is centered on the 1800 UTC NAM analysis with perturbations derived from 3-h SREF forecasts. The DA system assimilates thinned radar observations (6 km horizontally, 1 km vertically for in-storm regions, and 2 km vertically for clear air regions) every 15 min to remain computationally efficient for experiments run over the CONUS domain. Although forecast skill can be further improved when assimilating more observations (i.e., less data thinning, increased DA frequency) the experiments require additional computing resources and become impractical for real-time use over a large domain such as the CONUS. The covariance localization radius for Z and Vr is set to 12 km in the horizontal and 0.7 scalar height in the vertical to update unobserved regions. CTRL skillfully predicts the evolution of the MCS during the first three forecast hours; however, the predicted storm becomes displaced from observations at later times because it moves too quickly. Despite displacement errors that limit objective forecast skill, CTRL predicts the MCS to weaken during the final forecast hour, which is approximately the same time as observations. CTRL predicts isolated convection with less skill. The ensemble predicts small storms located near the domain boundary with less confidence and predicts nearby spurious convection.
Modifying radar DA parameters, including data thinning, covariance localization radii and inflation impacts forecast skill most during the first forecast hour; large- and small-scale errors grow rapidly to degrade the benefits of radar DA that mainly improve storm-scale structures. Results show increasing the horizontal covariance localization radius and assimilating more radar observations (i.e., less data thinning) increases forecast confidence in high reflectivity values near observed storms and also decreases the coverage of spurious convection. Small, isolated convection is most sensitive to changes in these parameters. In contrast, the predicted MCS is less sensitive to the covariance localization radius or data thinning because the storm system is mature throughout the radar DA window and because the MCS is larger in scale so many more in-storm observations are available for assimilation. Changing the assumed radial velocity and reflectivity observation errors (3–6 m s−1 and 6–9 dBZ, respectively) does not substantially impact forecast skill in this study. Covariance inflation is an important aspect of EnKF DA and prevents the collapse of ensemble spread when assimilating a dense network of observations (e.g., radar observations). Forecast skill is relatively insensitive to the range of relaxation-to-prior-spread inflation factors (80%–110%) employed in this study; however, when the posterior ensemble spread is inflated beyond that of the prior ensemble (i.e., 110% RTPS) forecast skill decreases because the ensemble overpredicts storm coverage.
DA frequency experiments demonstrate that assimilating radar observations more frequently (i.e., every 5 min) initially improves forecast skill by suppressing spurious storms; however, after 15 min forecasts predict spurious convection to initiate and forecast skill decreases. The enhanced coverage of spurious convection and storm position errors cause the experiment that assimilates radar observations every 5 min to have modestly less skill than experiments that assimilate observations less frequently (i.e., 10, 15 min) throughout much of the forecast period. Results of this study concur with Johnson and Wang (2017), which hypothesizes assimilating observations too frequently degrades forecast skill because the DA-induced imbalances are unable to adjust to the model before the next cycle. Although assimilating observations every 10 min improves objective forecast skill in this study, Stratman et al. (2020) note that even more frequent DA cycling can reduce the spinup time for storms and suppress spurious convection.
Although this study evaluates several key aspects of the experimental design, many other factors that impact forecast performance should be investigated. For example, future studies should compare forecast differences between mixed physics and single physics ensembles. CTRL is a mixed physics ensemble that can enhance forecast diversity (Snook et al. 2012; Johnson and Wang 2017) but can also introduce more systematic biases into the forecast system than a single physics ensemble (e.g., Romine et al. 2013). Future studies should also evaluate forecast sensitivity to conventional observation (ground-based observations, soundings) DA parameters because these observations modify atmospheric environmental conditions and can substantially impact storm evolution (e.g., Sobash and Stensrud 2015; Snook et al. 2015). It is noted that the conclusions of this work are limited to a single case study, and that results are sensitive to many factors (e.g., storm morphology, environment, ensemble and domain configuration, observation availability, DA system). Additional experiments are required to ensure results are robust for a variety of cases. Even with these limitations, results provide an initial EnKF configuration that can be deployed in future studies, and the insights gained from this work contribute to the development of future ensemble DA systems.
Acknowledgments
This work was supported by NOAA Joint Technology Transfer Initiative (JTTI) Grant NA18OAR4590385. Computing was performed on the XSEDE Stampede2 supercomputer at the University of Texas Advanced Computing Center (TACC). Ensemble background and initial conditions were generated under the support of NOAA CSTAR Grant NA10NWS4680001. We thank the three anonymous reviewers, whose constructive comments helped to improve the manuscript.
REFERENCES
Aksoy, A., D. C. Dowell, and C. Snyder, 2009: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part I: Storm-scale analyses. Mon. Wea. Rev., 137, 1805–1824, https://doi.org/10.1175/2008MWR2691.1.
Aksoy, A., S. Lorsolo, T. Vukicevic, K. J. Sellwood, S. D. Aberson, and F. Zhang, 2012: The HWRF Hurricane Ensemble Data Assimilation System (HEDAS) for high-resolution data: The impact of airborne Doppler radar observations in an OSSE. Mon. Wea. Rev., 140, 1843–1862, https://doi.org/10.1175/MWR-D-11-00212.1.
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903, https://doi.org/10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.
Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 2741–2758, https://doi.org/10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.
Anderson, J. L., and N. Collins, 2007: Scalable implementations of ensemble filter algorithms for data assimilation. J. Atmos. Oceanic Technol., 24, 1452–1463, https://doi.org/10.1175/JTECH2049.1.
Brewster, K. A., M. Hu, M. Xue, and J. Gao, 2005: Efficient assimilation of radar data at high resolution for short-range numerical weather prediction. Int. Symp. on Nowcasting Very Short Range Forecasting, Toulouse, France, WMO/World Weather Research Program, 3.06.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Bringi, V. N., and V. Chandrasekar, 2001: Polarimetric Doppler Weather Radar. Cambridge University Press, 636 pp.
Brooks, H. E., 2004: Tornado-warning performance in the past and future. Bull. Amer. Meteor. Soc., 85, 837–843, https://doi.org/10.1175/BAMS-85-6-837.
CAPS, 2017: 2017 CAPS Spring Forecast Experiment Program Plan. Center for The Analysis and Prediction of Storms, 28 pp., http://forecast.caps.ou.edu/SpringProgram2017_Plan-CAPS.pdf.
Caya, A., J. Sun, and C. Snyder, 2005: A comparison between the 4DVAR and the ensemble Kalman filter techniques for radar data assimilation. Mon. Wea. Rev., 133, 3081–3094, https://doi.org/10.1175/MWR3021.1.
Chang, W., K.-S. Chung, L. Fillion, and S.-J. Baek, 2014: Radar data assimilation in the Canadian high-resolution ensemble Kalman filter system: Performance and verification with real summer cases. Mon. Wea. Rev., 142, 2118–2138, https://doi.org/10.1175/MWR-D-13-00291.1.
Chen, F., and J. Dudhia, 2001: Coupling an advanced land surface–hydrology model with the Penn State–NCAR MM5 modeling system. Part I: Model implementation and sensitivity. Mon. Wea. Rev., 129, 569–585, https://doi.org/10.1175/1520-0493(2001)129<0569:CAALSH>2.0.CO;2.
Chipilski, H. G., X. Wang, and D. B. Parsons, 2020: Impact of assimilating PECAN profilers on the prediction of bore-driven nocturnal convection: A multiscale forecast evaluation for the 6 July 2015 case study. Mon. Wea. Rev., 148, 1147–1175, https://doi.org/10.1175/MWR-D-19-0171.1.
Clark, A. J., and Coauthors, 2018: The Community Leveraged Unified Ensemble (CLUE) in the 2016 NOAA/Hazardous Weather Testbed Spring Forecasting Experiment. Bull. Amer. Meteor. Soc., 99, 1433–1448, https://doi.org/10.1175/BAMS-D-16-0309.1.
Clark, A. J., and Coauthors, 2019: Spring Forecasting Experiment 2019: Program overview and operations plan. NOAA, 39 pp., https://hwt.nssl.noaa.gov/sfe/2019/docs/HWT_SFE2019_operations_plan.pdf.
Curtis, C. D., and S. M. Torres, 2011: Adaptive range oversampling to achieve faster scanning on the National Weather Radar Testbed phased-array radar. J. Atmos. Oceanic Technol., 28, 1581–1597, https://doi.org/10.1175/JTECH-D-10-05042.1.
Dawson, D. T., II, L. J. Wicker, E. R. Mansell, and R. L. Tanamachi, 2012: Impact of the environmental low-level wind profile on ensemble forecasts of the 4 May 2007 Greensburg, Kansas, tornadic storm and associated mesocyclones. Mon. Wea. Rev., 140, 696–716, https://doi.org/10.1175/MWR-D-11-00008.1.
Doviak, R., and D. S. Zrnić, 1993: Doppler Radar and Weather Observations. 2nd ed. Academic Press, 562 pp.
Dowell, D. C., and L. J. Wicker, 2009: Additive noise for storm-scale ensemble data assimilation. J. Atmos. Oceanic Technol., 26, 911–927, https://doi.org/10.1175/2008JTECHA1156.1.
Dowell, D. C., L. J. Wicker, and C. Snyder, 2011: Ensemble Kalman filter assimilation of radar observations of the 8 May 2003 Oklahoma city supercell: Influences of reflectivity observations on storm-scale analyses. Mon. Wea. Rev., 139, 272–294, https://doi.org/10.1175/2010MWR3438.1.
Dowell, D. C., F. Zhang, L. J. Wicker, C. Snyder, and N. A. Crook, 2004: Wind and temperature retrievals in the 17 May 1981 Arcadia, Oklahoma, supercell: Ensemble Kalman filter experiments. Mon. Wea. Rev., 132, 1982–2005, https://doi.org/10.1175/1520-0493(2004)132<1982:WATRIT>2.0.CO;2.
Droegemeier, K. K., and R. B. Wilhelmson, 1985: Three-dimensional numerical modeling of convection produced by interacting thunderstorm outflows. Part I: Control simulation and low-level moisture variations. J. Atmos. Sci., 42, 2381–2403, https://doi.org/10.1175/1520-0469(1985)042<2381:TDNMOC>2.0.CO;2.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 143–10 162, https://doi.org/10.1029/94JC00572.
Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343–367, https://doi.org/10.1007/s10236-003-0036-9.
Gao, J., and M. Xue, 2008: An efficient dual-resolution approach for ensemble data assimilation and tests with simulated Doppler radar data. Mon. Wea. Rev., 136, 945–963, https://doi.org/10.1175/2007MWR2120.1.
Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, https://doi.org/10.1002/qj.49712555417.
Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790, https://doi.org/10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO;2.
Heinselman, P. L., and S. M. Torres, 2011: High-temporal-resolution capabilities of the National Weather Radar Testbed Phased-Array Radar. J. Appl. Meteor. Climatol., 50, 579–593, https://doi.org/10.1175/2010JAMC2588.1.
Hong, S. Y., Y. Noh, and J. Dudhia, 2006: A new vertical diffusion package with an explicit treatment of entrainment processes. Mon. Wea. Rev., 134, 2318–2341, https://doi.org/10.1175/MWR3199.1.
Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126, 796–811, https://doi.org/10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.
Hu, M., and M. Xue, 2007: Impact of configurations of rapid intermittent assimilation of WSR-88D radar data for the 8 May 2003 Oklahoma City tornadic thunderstorm case. Mon. Wea. Rev., 135, 507–525, https://doi.org/10.1175/MWR3313.1.
Iacono, M. J., J. S. Delamere, E. J. Mlawer, M. W. Shephard, S. A. Clough, and W. D. Collins, 2008: Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models. J. Geophys. Res., 113, D13103, https://doi.org/10.1029/2008JD009944.
Janjić, Z. I., 1990: The step-mountain coordinate: Physical package. Mon. Wea. Rev., 118, 1429–1443, https://doi.org/10.1175/1520-0493(1990)118<1429:TSMCPP>2.0.CO;2.
Janjić, Z. I., 1996: The surface layer in the NCEP Eta Model. 11th Conf. on Numerical Weather Prediction, Norfolk, VA, Amer. Meteor. Soc., 354–355.
Janjić, Z. I., 2001: Nonsingular implementation of the Mellor–Yamada level 2.5 scheme in the NCEP Meso Model. NCEP Office Note 437, 61 pp.
Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.
Johnson, A., and X. Wang, 2017: Design and implementation of a GSI-based convection-allowing ensemble data assimilation and forecast system for the PECAN field experiment. Part I: Optimal configurations for nocturnal convection prediction using retrospective cases. Wea. Forecasting, 32, 289–315, https://doi.org/10.1175/WAF-D-16-0102.1.
Johnson, A., X. Wang, J. R. Carley, L. J. Wicker, and C. Karstens, 2015: A comparison of multiscale GSI-based EnKF and 3DVar data assimilation using radar and conventional observations for midlatitude convective-scale precipitation forecasts. Mon. Wea. Rev., 143, 3087–3108, https://doi.org/10.1175/MWR-D-14-00345.1.
Johnson, A., X. Wang, and S. Degelia, 2017: Design and implementation of a GSI-based convection-allowing ensemble-based data assimilation and forecast system for the PECAN field experiment. Part II: Overview and evaluation of a real-time system. Wea. Forecasting, 32, 1227–1251, https://doi.org/10.1175/WAF-D-16-0201.1.
Jones, T. A., K. H. Knopfmeier, D. M. Wheatley, G. J. Creager, P. Minnis, and R. Palikonda, 2016: Storm-scale data assimilation and ensemble forecasting with the NSSL experimental Warn-on-Forecast system. Part II: Combined radar and satellite data experiments. Wea. Forecasting, 31, 297–327, https://doi.org/10.1175/WAF-D-15-0107.1.
Jones, T. A., P. S. Skinner, K. H. Knopfmeier, E. R. Mansell, P. Minnis, R. Palikonda, and W. J. Smith, 2018a: Comparison of cloud microphysics schemes in a Warn-on-Forecast system using synthetic satellite objects. Wea. Forecasting, 33, 1681–1708, https://doi.org/10.1175/WAF-D-18-0112.1.
Jones, T. A., X. Wang, P. Skinner, A. Johnson, and Y. Wang, 2018b: Assimilation of GOES-13 imager clear-sky water vapor (6.5 μm) radiances into a Warn-on-Forecast system. Mon. Wea. Rev., 146, 1077–1107, https://doi.org/10.1175/MWR-D-17-0280.1.
Jung, Y., M. Xue, G. Zhang, and J. M. Straka, 2008: Assimilation of simulated polarimetric radar data for a convective storm using the ensemble Kalman filter. Part I: Observation operators for reflectivity and polarimetric variables. Mon. Wea. Rev., 136, 2228–2245, https://doi.org/10.1175/2007MWR2083.1.
Jung, Y., M. Xue, and M. Tong, 2012: Ensemble Kalman filter analyses of the 29–30 May 2004 Oklahoma tornadic thunderstorm using one- and two-moment bulk microphysics schemes, with verification against polarimetric radar data. Mon. Wea. Rev., 140, 1457–1475, https://doi.org/10.1175/MWR-D-11-00032.1.
Jung, Y., and Coauthors, 2018a: Development of GSI-based EnKF and hybrid EnVar data assimilation capabilities for continental-scale 3-km convection-permitting ensemble forecasting and testing via NOAA Hazardous Weather Testbed Spring Forecasting Experiments. 29th Conf. on Weather Analysis and Forecasting, Denver, CO, 7A.3, https://ams.confex.com/ams/29WAF25NWP/webprogram/Paper345776.html.
Jung, Y., M. Xue, G.