This study tests the hypothesis that assimilating mid- to upper-tropospheric, meso-α- to synoptic-scale observations collected in upstream, preconvective environments is insufficient to improve short-range ensemble convection initiation (CI) forecast skill over the set of cases considered by the 2013 Mesoscale Predictability Experiment (MPEX) because of a limited influence upon the lower-tropospheric phenomena that modulate CI occurrence, timing, and location. The ensemble Kalman filter implementation within the Data Assimilation Research Testbed as coupled to the Advanced Research Weather Research and Forecasting (WRF) Model is used to initialize two nearly identical 30-member ensembles of short-range forecasts for each case: one initial condition set that incorporates MPEX dropsonde observations and one that excludes these observations. All forecasts for a given mission begin at 1500 UTC and are integrated for 15 h on a convection-permitting grid encompassing much of the conterminous United States. Forecast verification is conducted probabilistically using fractions skill score and deterministically using a 2 × 2 contingency table approach at multiple neighborhood sizes and spatiotemporal event-matching thresholds to assess forecast skill and support hypothesis testing. The probabilistic verification represents the first of its kind for numerical CI forecasts. Forecasts without MPEX observations have high fractions skill score and probabilities of detection on the meso-α scale but exhibit a considerable high bias for forecast CI event count. Assimilating MPEX observations has a negligible impact upon forecast skill for the cases considered, independent of verification metric, as the MPEX observations result in only subtle differences primarily manifest in the position and intensity of atmospheric features responsible for focusing and/or triggering deep, moist convection.
Convection initiation (CI), in the context of this study leading to the formation of deep, moist convection, is a sequence of events in which air parcels accelerate beyond their level of free convection to create a visible cloud top with a rapid increase in cloud depth, followed by precipitation development (Kain et al. 2013). CI is triggered by a lower-tropospheric convergence mechanism; common examples include drylines, elevated convergence zones, frontal boundaries, gust fronts, horizontal convective rolls, orographic circulations, sea breezes, and undular bores (Jorgensen and Weckwerth 2003; Weckwerth and Parsons 2006; Burghardt et al. 2014). Further, CI is a classic scale interaction problem, requiring a favorable interaction between phenomena from multiple scales. The synoptic and meso-α scales establish the thermodynamic and kinematic environment favorable for CI (Weisman et al. 2008). The meso-β scale contributes to horizontal variability in the large-scale environment in which CI occurs, and meso-γ to microscale phenomena determine local planetary boundary layer (PBL) lifting, moistening, and environmental variability crucial to CI timing and location (e.g., Markowski et al. 2006; Weckwerth et al. 2008; Kain et al. 2013; Burghardt et al. 2014). Consequently, CI is a locally rare event at any given location (Lock and Houston 2015).
In a favorable environment with sufficiently large vertical wind shear, CI can lead to the development of severe thunderstorms capable of producing damaging surface winds, flash flooding, large hail, and tornadoes. Severe storms annually cause substantial loss of life and are responsible for the largest amount of U.S. billion-dollar natural disaster events during 1980–2014 (NCDC 2015). Despite the significant societal impacts that can result, accurately predicting the initiation, intensity, and evolution of deep, moist convection, whether or not it reaches severe levels, remains a significant challenge for numerical weather prediction models and human forecasters. Contributions to forecast error include the stochastic nature of the atmospheric system, the dependence of CI and subsequent convective evolution upon physical processes on multiple scales, shortcomings in physical parameterization packages employed within convection-permitting numerical simulations, and data quality and availability (Weckwerth and Parsons 2006).
Characteristics of convective storms are strongly tied to the environment in which they develop, so it is important to accurately represent the initiation environment when forecasting such events (Benjamin et al. 2010; Wandishin et al. 2010; Weisman et al. 2015). Several recent studies have been conducted using convection-permitting (CP; horizontal grid spacing of 4 km or less) NWP simulations to quantify CI predictability. Duda and Gallus (2013) investigated the relationship between large-scale forcing and CI predictability for 36 primarily warm-season CI events preceding mesoscale convective system (MCS) formation. Key findings include a mean absolute displacement error of 105 km, no systematic timing bias, and no significant relationship between CI forecast skill and large-scale forcing magnitude, the latter of which was attributed primarily to the importance of smaller-scale features to the CI process. Similar results were obtained by Burghardt et al. (2014) for 27 warm-season CI episodes in the central High Plains in subkilometer horizontal grid spacing deterministic numerical simulations. Burghardt et al. (2014) also documented an overproduction of CI events within the numerical simulations, particularly near higher terrain. Kain et al. (2013) found that numerical models can resolve, if crudely, physical processes important for CI. Despite no systematic ensemble bias in CI timing, Kain et al. (2013) argued that probabilistic numerical CI forecasts were inadequate indicators of subsequent convective evolution.
In general, initial conditions (ICs) exert a greater influence on short- to medium-range (0–36 h) convective forecast skill than does model configuration (Weisman et al. 2008; Romine et al. 2013). Targeted observations, representing the augmentation of the regular observation network with additional, specifically chosen observations, is thought to improve model ICs by improving the initial representation of atmospheric features on the scales of those resolved by the targeted observations (Majumdar et al. 2011). In previous studies, observations have mostly been targeted at synoptic-scale systems for the purposes of improving global model 1–3-day forecasts (e.g., Majumdar et al. 2011; Majumdar 2016). Targeted dropwindsonde observations from field projects such as the NOAA Synoptic Surveillance program (Aberson 2010) and Dropsonde Observations for Typhoon Surveillance near the Taiwan Region (DOTSTAR; Wu et al. 2007; Chou et al. 2011) have been found to provide statistically significant positive impacts to tropical cyclone track forecasts. Targeted observations have, on average, shown smaller yet still positive impacts upon extratropical, synoptic-scale forecasts. Representative examples include the 1997 Fronts and Atlantic Storm-Track Experiment (FASTEX; e.g., Joly et al. 1999; Bergot 1999; Montani et al. 1999) and the early-to-mid-2000s Atlantic The Observing System Research and Predictability Experiment (THORPEX) Regional Campaign (A-TReC; e.g., Fourrié et al. 2006; Rabier et al. 2008). Majumdar (2016) provides a comprehensive summary of targeted observation studies.
On the meso- and smaller scales, the International H2O Project (IHOP_2002) sampled the three-dimensional time-varying moisture field via in situ and remote sensing techniques to better understand convective processes (Weckwerth et al. 2008). For the IHOP_2002 12–13 June 2002 CI event, Liu and Xue (2008) conducted numerical sensitivity experiments to assess how data assimilation frequency and targeted observations influenced CI prediction. The simulation that assimilated the most data subjectively produced the best forecast, as the additional observations removed the resolution-related delay of CI and overly moist lower-tropospheric ICs (Liu and Xue 2008). However, other experiments excluding targeted observations did better with CI timing and location for some cell groups. Two more recent field projects that have targeted observations for mesoscale phenomena are the Hydrological Cycle in the Mediterranean Experiment (HyMex) and Deep Propagating Gravity Wave Experiment over New Zealand (DEEPWAVE), each briefly described in Majumdar (2016). However, research into the forecast impact of targeted observations collected during these campaigns remains in its nascent stages.
The Mesoscale Predictability Experiment (MPEX; Weisman et al. 2015) hypothesized that the collection of nonroutine synoptic- and meso-α-scale observations in the upstream, preconvective environment and their subsequent assimilation into CP numerical forecasts would significantly improve short-lead forecasts of the timing, location, and mode of CI and subsequent convective evolution. Operations involved two missions per active program day: 1) an early morning mission with the NCAR Gulfstream-V aircraft, well upstream of anticipated convective storms, in which dropsonde and microwave temperature profiler observations were collected; and 2) an afternoon-to-evening mission with mobile sounding units to sample the preconvective environment and quantify upscale convective influences (Weisman et al. 2015; Trapp et al. 2016). Dropsonde observations were collected across the U.S. Intermountain West during 15 research flights (RFs) between 15 May and 15 June 2013. An average of 28, a minimum of 17, and a maximum of 33 dropsondes were released during each RF (hereafter, case; Weisman et al. 2015). Ensemble sensitivity analysis (Torn and Hakim 2008) was used to identify mid- to upper-tropospheric phenomena to which subsequent deep, moist convection forecasts were most sensitive for a given case, from which targeted dropsonde observation locations (for a forecast metric of accumulated precipitation) were determined. Consequently, MPEX dropsonde observations primarily sampled mid- to upper-tropospheric kinematic and thermodynamic features in upstream, preconvective environments.
Romine et al. (2016) investigated the impact of assimilating targeted MPEX dropsonde observations on probabilistic short-range forecast skill for accumulated precipitation. Evaluating forecasts against rawinsonde, METAR, and mesonet observations showed only small differences between ensemble forecasts that did and did not assimilate MPEX dropsondes. Despite notable case-to-case variation in forecast skill and dropsonde observation impact, assimilating targeted MPEX observations resulted in a small but statistically significant forecast skill improvement for accumulated precipitation forecasts. Forecast skill improvement was greatest for cases that best sampled the objectively determined atmospheric features—nominally, in the mid- to upper troposphere—to which the subsequent precipitation forecast was most sensitive.
In this study, we seek to quantify the influence of assimilating MPEX dropsonde observations on short-range (0–15 h) ensemble CI forecast skill. As noted above, CI is triggered by a lower-tropospheric convergence mechanism (e.g., Weckwerth and Parsons 2006), with CI occurrence, timing, and location critically dependent on PBL lifting and moistening (Markowski et al. 2006; Weckwerth et al. 2008). However, MPEX dropsonde observations primarily sampled upstream mid- to upper-tropospheric phenomena to which subsequent deep, moist convection (and, specifically, accumulated precipitation) forecasts were most sensitive (Romine et al. 2016), with this sensitivity primarily manifest in a featured location. In other words, in situ and upstream boundary layer phenomena to which subsequent forecasts were most sensitive were not sampled well, if at all, by MPEX dropsonde observations (Torn and Romine 2015; Berman et al. 2017; Torn et al. 2017). Given that meso-α- to synoptic-scale boundaries along which CI occurs (fronts, drylines, etc.) typically translate in concert with upstream mid- to upper-tropospheric cyclonic disturbances, at least in part, any impact of assimilating MPEX dropsonde observations upon subsequent CI forecasts is likely to be manifest in modulating the position of a subset of the lower-tropospheric boundaries along which CI initiates. Consequently, given that MPEX sampled a diverse range of both weakly and strongly forced cases (Weisman et al. 2015), we hypothesize that assimilating MPEX dropsonde observations is insufficient to result in statistically significant improvements in short-range (0–15 h) CI forecast skill over the set of cases sampled by MPEX.
The remainder of the manuscript is structured as follows. Section 2 describes the methodology, including ensemble analysis and simulation configuration, CI identification, and forecast verification methods. Event statistics and forecast skill metrics are presented in section 3, case study analyses for three chosen MPEX cases are presented in section 4, and a summary and avenues for future work are discussed in section 5.
a. Experimental design
The forecast model and ensemble analysis system used for this study are identical to that described in Romine et al. (2016). Summarizing, the ensemble adjustment Kalman filter (EAKF; Anderson 2001, 2003) implemented within the Data Assimilation Research Testbed (DART; Anderson et al. 2009) package is coupled with version 3.4.1 of the Advanced Research Weather Research and Forecasting (WRF-ARW; Skamarock et al. 2008) Model to obtain two identically configured 50-member ensemble analyses of the preconvective atmospheric state for each of the 15 MPEX cases. The first, or Control ensemble, does not assimilate MPEX dropsonde observations. The second, or Updated ensemble, assimilates MPEX dropsonde observations only for that case. Further MPEX dropsonde targeting information is provided later in this section.
The analysis domain upon which all data assimilation is conducted has a horizontal grid spacing of 15 km (415 × 325 grid points) and covers the conterminous United States, the Gulf of Mexico, and portions of Canada, Mexico, the eastern North Pacific Ocean, and the western North Atlantic Ocean. The domain contains 40 terrain-following vertical levels between the surface and 50 hPa with approximately 8 levels within the planetary boundary layer (PBL). The Mellor–Yamada–Janjić (MYJ; Janjić 1994, 2002) PBL, Thompson et al. (2008) hybrid double-moment microphysical, RRTM for GCMs (RRTMG) longwave and shortwave radiation including ozone and aerosol climatologies (Mlawer et al. 1997; Tegen et al. 1997; Iacono et al. 2008), Noah (Chen and Dudhia 2001) land surface, and Tiedtke (Tiedtke 1989; Zhang et al. 2011) cumulus convection parameterizations are utilized by the cycled analysis system (Table 1).
The initial 50-member ensemble analyses are produced by adding Gaussian random samples with zero mean and covariance derived from the NCEP background error covariance matrix to the 1800 UTC 30 April 2013 Global Forecast System (GFS) analysis interpolated to the 15-km WRF Model domain (Schwartz et al. 2015) using the WRF Model’s Community Variational/Ensemble Data Assimilation System (WRFDA) package (Barker et al. 2012). Lateral boundary conditions for the initial and subsequent 6-hourly analysis times are obtained using the fixed covariance perturbation technique of Torn et al. (2006), as applied to the 0-h GFS analysis and 6-h GFS forecast, respectively. Following, for example, Torn (2010), Romine et al. (2013), and Schwartz et al. (2015), sampling error is minimized and ensemble spread is maintained using sampling error correction (Anderson 2012), adaptive Gaspari–Cohn observation localization (Gaspari and Cohn 1999; Anderson 2012), and adaptive time- and space-varying prior inflation (Anderson 2009). Although widely used in modern mesoscale ensemble atmospheric cycled data assimilation applications using the ensemble adjustment Kalman filter, it should be emphasized that these methods for mitigating against sampling error and maintaining ensemble spread are imperfect and largely ad hoc in nature. In particular, covariance localization can result in unrealistic analysis increments and imbalanced model initial conditions, whereas inflation dilutes the extent to which the estimated background error statistics are truly flow dependent (Houtekamer and Zhang 2016).
Analysis fields updated by WRF-DART include horizontal wind components, perturbation potential temperature, geopotential height, water vapor, instantaneous diabatic heating rate, and the mixing ratios and number concentrations for all carried microphysical species. Routine observations assimilated by WRF-DART include mandatory- and significant-level rawinsonde data; surface-based METAR, buoy, and ship observations; Aircraft Meteorological Data Relay (AMDAR) reports; satellite-derived horizontal atmospheric motion vectors (AMVs; Velden et al. 2005); and thinned global positioning system (GPS)-derived radio occultation (Kursinski et al. 1997) refractivity observations (e.g., as in Romine et al. 2013; Schwartz et al. 2015; Torn and Romine 2015). AMDAR observations are averaged over boxes with dimensions of 30 km in the horizontal and 25 hPa in the vertical as in Torn (2010) and AMVs are averaged over 60 km in the horizontal. Surface observations with model terrain and station height differing by more than 300 m and/or located within three grid lengths of the lateral boundaries are excluded. The characteristics of all assimilated observations are compared to the preassimilation atmospheric state space provided by the ensemble, and observations whose squared difference from the ensemble mean estimate exceeds 3 times the sum of the prior ensemble and observational error variances are rejected by the assimilation system (Romine et al. 2013). Over 99% of available MPEX dropsonde observations passed these internal consistency checks and were subsequently assimilated (Romine et al. 2016, see their Fig. 4). Assumed errors for each observation type match those in Romine et al. (2013).
Cycled analysis continues at an interval of 6 h through 0000 UTC 15 June. However, at 0000 UTC on the day of each RF, two distinct forks of the ensemble analyses are obtained. Each utilizes a 1-h cycling interval through 1500 UTC, with Control assimilating only the observations listed above and Updated also assimilating MPEX dropsonde observations. The hourly cycling is performed to reduce time-dependent background errors (Romine et al. 2016). All MPEX data are quality controlled by the staff of NCAR’s Earth Observing Laboratory prior to assimilation and dropsonde observation quality is assessed by evaluating them against Control posterior analyses, as described in Romine et al. (2016, see their section 2c). Note that the dropsonde observations were recently found to be significantly dry biased under cold and dry conditions (Vömel et al. 2016) that, during MPEX, is most commonly observed in the middle to upper troposphere (Romine et al. 2016). As the potential for a nascent thermal to remain positively buoyant for a sufficient duration as to initiate deep, moist convection is partially influenced by entrainment from its surroundings, it is possible that this bias could influence the results presented herein. However, any such impact is expected to be minor given the nature of the bias and the overriding influence of boundary layer processes in triggering CI. Relative to the routine observation vertical distribution, the greatest proportional increase in observation counts resulting from assimilating MPEX targeted dropsonde observations occurs in the middle troposphere (Romine et al. 2016).
Observation targeting is determined from ensemble sensitivity analysis (Torn and Hakim 2008). Ensemble sensitivity analysis provides an estimate of the linear relationship between a chosen forecast metric and model state variables at earlier forecast times. During MPEX, 24–48-h output from daily 1200 UTC real-time ensemble forecasts is used to identify regions where a chosen forecast metric—here, accumulated precipitation averaged over time and space windows that varied between cases, and not a CI-specific metric—had notable variability between ensemble members. Ensemble sensitivity analysis is used to identify the meteorological variables and features at the times of next-day MPEX flight missions to which the forecast metric is most sensitive. The collected observations are then assimilated into later model cycles, as described above, from which observation impact upon subsequent forecast evolution is quantified. Further details are provided in Weisman et al. (2015) and Romine et al. (2016).
For each RF, the first 30 ensemble analyses from each 50-member ensemble analysis system are used to initialize free forecasts. Forecasts are conducted using a two-way-interacting 15-km–3-km nested domain. The outer 15-km domain is identical to the analysis domain. The inner 3-km domain (1046 × 871 grid points) extends from the U.S. Intermountain West to the Appalachian Mountains and from Baja California to the Canadian border (Fig. 1). The model configuration for the free forecasts is identical to that utilized by the cycled analysis system with the exception that cumulus convection is treated explicitly on the 3-km domain. Simulations for the Control and Updated ensembles are initialized at 1500 UTC and are integrated forward 15 h.
b. CI event identification
Following Kain et al. (2013) and Burghardt et al. (2014), CI events in this study are defined as objects in the radar reflectivity field with reflectivity ≥35 dBZ at the −10°C isotherm for a minimum of 30 min. The 35-dBZ reflectivity threshold is chosen following Gremillion and Orville (1999), who found this threshold to effectively distinguish observed thunderstorms from non-thunder-producing precipitation. The −10°C level is selected to prevent contamination of the results from melting hydrometeors (bright banding) primarily within stratiform precipitation regions. The 30-min criterion is used to ensure only sustained CI events are considered.
To identify observed CI events, Level II Next Generation Weather Radar (NEXRAD) reflectivity is obtained for 42 radars across the central United States for the duration of each simulation (Fig. 1). Warning Decision Support System–Integrated Information (WDSS-II; Lakshmanan et al. 2007) spatial analysis tools (Lakshmanan 2012) are used to merge (w2merger; Lakshmanan et al. 2006; Lakshmanan and Humphrey 2014) the NEXRAD data to a uniform 0.03° latitude × 0.03° longitude gridded domain (black square in Fig. 1) and interpolate the result to the height of the −10°C isotherm extracted from the nearest-in-time 13-km 0-h Rapid Update Cycle (RUC; Benjamin et al. 2004) hourly analyses. Next, the watershed transform technique implemented in the WDSS-II w2segmotionll tool (Lakshmanan et al. 2009) is used to identify individual features that encompass at least four contiguous pixels with reflectivity ≥35 dBZ on a single WDSS-II scale with no data smoothing. These features are then tracked forward in time (Lakshmanan and Smith 2010), from which feature motion estimates are obtained (Lakshmanan and Smith 2010; Lakshmanan et al. 2014). Unique features that persist for at least 30 min are classified as CI events, with CI time and location set to their specific values at the start of the 30-min evaluation period. While this procedure primarily identifies isolated CI events (e.g., separate from ongoing deep, moist convection), distinct features within or adjacent to existing regions of deep, moist convection meeting the thresholds described above are also identified as CI events.
To identify simulated CI events, forecast reflectivity at the −10°C level of each ensemble member is computed in-line with model execution every 5 min. These data are then bilinearly interpolated to the same grid as the observed data, upon which the same WDSS-II algorithms described above are used to identify and track model-simulated CI events. As implemented here, bilinear interpolation computes the average of the four simulated grid points nearest the grid point to which the data are being interpolated, with the contribution of each simulated grid point weighted by its distance from the interpolated grid point. It should be noted, however, that no further attempt is made to sample the model data in a similar manner to that of the observations (e.g., to account for radar gaps, terrain blocking, and nonuniform vertical data distribution). As most observed and modeled CI events for the set of MPEX cases considered occur east of elevated terrain, this is believed to have minimal effect on the results.
c. Verification metrics
Both probabilistic and deterministic forecast verification metrics are used in this study. Probabilistic forecast skill is evaluated using fractions skill score (FSS; Roberts and Lean 2008; Schwartz et al. 2010). To compute FSS, observed and modeled CI event locations, the latter for all ensemble members for both Control and Updated ensembles, are first used to define binary event fields over the period 1800–0600 UTC on the 0.03° × 0.03° evaluation grid used to identify CI events (Fig. 1). Event fractions (or probabilities; the number of CI events divided by the number of neighborhood grid points) are then computed at every grid point for both time-independent (ignoring CI event time differences) and time-dependent neighborhoods. For the time-independent verification, square neighborhoods of 50-, 100-, and 200-km side half-lengths are considered. For the time-dependent verification, a square neighborhood of 100-km side half-length and 1-h time window is considered for consistency with the deterministic verification described below. This completes the process of obtaining the observed neighborhood probabilities. For the Control and Updated ensembles, however, a neighborhood ensemble probability is then obtained for each case following Schwartz et al. (2010) by averaging the individual ensemble member modeled event fractions. From these quantities, FSS is computed following Roberts and Lean (2008):
where MSE, or mean-square error, and MSEref, or a reference MSE that represents the maximum-possible MSE for the observed neighborhood and modeled neighborhood ensemble probabilities, are given by
where Nx is the number of east–west grid points in the verification domain, Ny is the number of north–south grid points in the verification domain, Oi,j is the observed neighborhood probability at grid point (i, j), and Mi,j is the modeled neighborhood ensemble probability at grid point (i, j). To our knowledge, this study is the first attempt to probabilistically verify CI forecasts, independent of forecast lead time, although Mecikalski et al. (2015) describe a probabilistic, very-short-range CI nowcast algorithm that is largely verified deterministically.
Deterministic verification uses the flow-dependent error metric of Burghardt et al. (2014) to quantify the proximity between modeled and observed events in both time and space for each ensemble member and event. The error metric takes the following form:
where C is the spatiotemporal error (km), Errord is the spatial difference (km) between the modeled and observed CI events, and Errort is the temporal error (h) between the modeled and observed CI events. Velocityc is the translation velocity (km h−1) of the observed reflectivity object associated with the observed CI event being considered; this differs from Burghardt et al. (2014), which estimated translation velocity using a layer-mean wind. The flow dependence allows for timing errors to be collapsed into the spatial dimension while making allowance for storm motion variation (Burghardt et al. 2014). Next, spatiotemporal thresholds of maximum Errord and maximum Errort of 50 km (0.5 h)−1, 100 km (1 h)−1, and 200 km (2 h)−1 are applied. Particular focus is given to the 100 km (1 h)−1 threshold as it represents a balance of space and time errors on the mesoscale and allows for comparison to the results of Burghardt et al. (2014). The pairing between observed and modeled CI events with the lowest C is designated as the match or hit for the observed event, presuming the modeled CI event is not a better match or hit to a different observed event. Thus, the C metric provides a measure of match goodness (i.e., a lower C error implies a modeled CI event closer in time and space to an observed CI event).
Deterministic verification is performed over the entire radar analysis domain in Fig. 1 using a 2 × 2 contingency table for dichotomous yes–no forecasts (Wilks 1995; Table 2). Multiple previous studies have used this approach to verify forecasts of convection occurrence or initiation (e.g., Fowle and Roebber 2003; Kain et al. 2013; Burghardt et al. 2014). Herein, true positives (or matches; a), false positives (or false alarms; b), and false negatives (or misses; c) are identified. True negatives (d), or correct forecasts of nonevents, are not evaluated due to the ambiguity in defining nonevents from an object-based event identification method. Following Roebber (2009), four quality measures are derived from contingency table classifications. Probability of detection [POD; Eq. (4)] is the ratio of correctly forecast CI events to the total number of observed CI events and ranges from 0 to 1, with higher values indicating a higher proportion of correctly forecast CI events. False alarm ratio [FAR; Eq. (5)] is the ratio of unobserved forecast CI events to the total number of forecast CI events and ranges from 0 to 1, with higher values indicating a greater number of false positives. Bias [Eq. (6)] is the ratio of total forecast CI events to total observed CI events; values less (greater) than 1 indicate fewer (more) forecast than observed events. Finally, critical success index [CSI; Eq. (7)], or threat score, is the ratio of correctly forecast CI events to the total number of observed and forecast CI events, with a value of 1 representing a perfect forecast:
3. Forecast evaluation
a. Overall CI event characteristics
Both observed and modeled CI events most frequently occur during the local late afternoon to early evening hours (Fig. 2a). The Control and Updated ensembles each forecast approximately 2 times more CI events than are observed, with slightly increased spread in event count within the Updated ensemble relative to the Control that may result from an increase in analysis spread in the Updated relative to Control ensemble near dropsonde locations (Romine et al. 2016). For a matching threshold of 100 km (1 h)−1, hits most frequently occur during the local late afternoon hours (2000–2300 UTC; Figs. 2c,d). Misses most frequently occur during the local early evening hours (2200–0000 UTC), while false alarms most frequently occur during the local middle to late afternoon hours (1900–2200 UTC). Hit, miss, and false alarm probabilities and event counts are approximately constant in the first 2–3 h following sunset, after which time each become smaller. Given the high forecast bias in Fig. 2a, it is perhaps unsurprising that false alarms occur nearly twice as frequently as hits at all forecast times (Fig. 2d). Hits occur slightly more frequently than misses during the local daytime hours; the inverse is true after sunset. There are no significant differences in event probability or count between the Control and Updated ensembles over the set of MPEX cases (Figs. 2c,d). The ambient environments supporting CI vary substantially between cases, resulting in significant CI timing, location, and frequency differences from one MPEX case to another (e.g., Fig. 2b for frequency). Despite this, the Control and Updated ensembles each forecast too many CI events relative to observations for all 15 cases, as is demonstrated further in section 3b. In addition, no significant differences between the Control and Updated ensembles exist for any of the metrics presented above (not shown), such that in the verification, the Control and Updated ensembles are more like each other than they are to observations, as in Romine et al. (2016).
b. Forecast skill evaluation
The time-independent average FSS increases with neighborhood size (Fig. 3a). Although the average FSS is relatively high for all neighborhood sizes, significant forecast skill variation about the average FSS exists between MPEX cases. Control and Updated FSS performance when averaged over all cases is nearly identical for each neighborhood, with small differences noted between the Control and Updated ensembles for individual cases (Fig. 3b). When event timing is considered, average FSS is largest in the local evening hours and lowest in the local afternoon and overnight hours (Fig. 3c). Significant forecast skill variation about the average FSS exists between cases at all forecast times (Figs. 3c,d). Despite the use of a neighborhood half-width twice as large as Romine et al. (2016), average FSS with time is lower for CI than for accumulated precipitation at 0.25, 1.0, and 10.0 mm h−1 thresholds [cf. Fig. 3c to Fig. 15 of Romine et al. (2016)]. However, it should be noted that the verification region is more expansive in the present study [cf. Fig. 1 to Fig. 11 of Romine et al. (2016)], which may account for part of the FSS differences. Control and Updated ensemble performance is nearly identical at all forecast times when averaged over all 15 MPEX cases (Fig. 3c), but larger differences emerge for individual cases at selected times (Fig. 3d).
For a specific spatiotemporal matching threshold, there is modest variability in measures of deterministic forecast skill, including POD, FAR, CSI, and C error (Figs. 4 and 5; Table 3), for a given MPEX case. For most cases, greater variability exists between individual ensemble members for a given case than between cases (Fig. 5) and between the Control and Updated ensembles for a given case (Figs. 4 and 5; Table 3). Ensemble-averaged CSI is ~0.1 at the strictest spatiotemporal matching threshold [50 km (0.5 h)−1], ~0.25 at an intermediate threshold [100 km (1 h)−1], and ~0.4 at the most lenient threshold [200 km (2 h)−1]. However, as the time and space thresholds become more lenient, the subjective utility of the forecast (such as to a forecaster) decreases. Notably, ensemble-averaged CSI is approximately equal across cases at the 50 km (0.5 h)−1 and 100 km (1 h)−1 matching thresholds; it is only at the 200 km (2 h)−1 threshold where case-to-case variation becomes larger (Fig. 5). This was not seen in the cases considered by Burghardt et al. (2014), where greater variability between cases was noted albeit with deterministic forecasts, and merits further investigation. For all 15 RFs, mean ensemble bias is greater than unity, indicating a mean model overproduction of CI events that is consistent with Burghardt et al. (2014) and Coniglio et al. (2016). Mean skill of the Updated ensemble is nearly identical to the Control ensemble for all cases and verification thresholds (Figs. 4 and 5; Table 3), leading to a preliminary conclusion that the assimilation of MPEX dropsonde observations is not sufficient to result in increased CI timing and location forecast skill.
At the 100 km (1 h)−1 spatiotemporal matching threshold, average ensemble distance error and timing error and bias for matched events are all small (Table 3). Mean distance error and timing bias over all matched events for the Control and Updated ensembles are 45.7 km (0.9 min)−1 and 45.5 km (1.1 min)−1, respectively, where a positive timing bias denotes the model is later than observations. The near-zero timing biases are consistent with Kain et al. (2013), Duda and Gallus (2013), and Burghardt et al. (2014), providing further evidence that CP ensemble forecasts are capable of accurately simulating the CI diurnal cycle. Mean absolute time error for matched events for the Control and Updated ensembles is 26.7 and 26.6 min, respectively. However, these results should be interpreted in light of the substantial frequency bias that exists for both Control and Updated ensembles for all cases (section 3a) (i.e., subjective forecast utility is less than what it would be in the presence of a near-unity frequency bias).
4. Case studies
Whether in a probabilistic or deterministic context, mean skill score differences between the Control and Updated ensembles are small. For example, the largest mean CSI [100 km (1 h)−1 spatiotemporal matching threshold] and FSS (time independent; 100-km neighborhood half-width) improvements in the Updated ensemble are 0.03 and 0.06, respectively, for RF12 (Figs. 3b and 5). The largest mean CSI and FSS degradations in the Updated ensemble are 0.02 and 0.045, respectively, for RF6 (Figs. 3b and 5). These small differences are not statistically significant. For each case, CSI variation between individual ensemble members is similar within the Control and Updated ensembles. The magnitude of intraensemble CSI variation is larger than the interensemble difference in ensemble-averaged CSI difference (Fig. 5; Table 3).
Despite small probabilistic and deterministic skill differences between the Control and Updated ensembles, three cases are selected for further investigation: RF6 (23 May 2013, Figs. 6a,d; Romine et al. 2016), RF10 (31 May 2013, Figs. 6b,e; Torn and Romine 2015), and RF12 (8 June 2013, Figs. 6c,f; Burlingame et al. 2017). These cases differ in regards to the meso-α- to synoptic-scale environments in which CI occurs and the upstream mid- to upper-tropospheric features sampled by MPEX dropsonde observations. As previously noted, the largest mean FSS and CSI improvement (degradation) over the set of cases considered occurs with RF12 (RF6). RF6 is a relatively poorly forecast case, consistent with the result in Romine et al. (2016), whereas RF12 forecast skill is near average. In contrast, RF10 is a relatively well-forecast case, but one with a similar level of FSS and CSI degradation to RF6.
a. RF6: 23 May 2013
RF6 sampled (Fig. 6a) the environments surrounding a lower-tropospheric cold front in the northern Texas and western Oklahoma Panhandles (Fig. 6d) and a weak short-wave trough in northern New Mexico (Fig. 6a). Through ensemble sensitivity analysis, these features were objectively identified to result in the greatest forecast impact (for accumulated precipitation) over the Texas Panhandle later that day (Romine et al. 2016). Lower-tropospheric flow originating from the Gulf of Mexico allowed for robust lower-tropospheric moisture (e.g., as implied by 850-hPa equivalent potential temperature ≥340 K) to be present across west central Texas (Fig. 6d). This, in conjunction with steep midlevel lapse rates associated with an elevated mixed layer advected downwind of the southern Rockies, contributed to surface-based convective available potential energy (SBCAPE) greater than 3000 J kg−1 (not shown). Convection initiated after 1900 UTC (Fig. 2b) between Lubbock, Texas, and Childress, Texas, along the lower-tropospheric front and in southeastern New Mexico and southwestern Texas along a dryline. Most observed CI events on this day occurred from Childress southwest to the Texas–Mexico border (Fig. 6d).
Observed CI event neighborhood probability is highest in far western Texas, with a secondary maximum to the east from the southern Texas Panhandle southward to the Rio Grande River (Fig. 7a). Both Control and Updated ensembles poorly forecast CI event probabilities, with each predicting the highest neighborhood ensemble probabilities in far northern Mexico, west-central Texas, and along the northeastern Oklahoma–southeastern Kansas border (Fig. 7b). Note, however, that terrain blocking and radar proximity hinder observed CI event identification in northern Mexico, with satellite imagery (not shown) suggesting an observed CI frequency in northern Mexico similar to the Control and Updated ensembles. Forecast skill differences between ensembles (Fig. 3) are primarily manifest in southwestern Texas, where Control ensemble neighborhood probability is higher, and west-central Texas, where maximum Control ensemble neighborhood probabilities are located farther southeast (Fig. 7b). The Control and Updated ensembles are more similar to each other than either ensemble is to observations (cf. Figs. 7a,b).
Assimilating MPEX dropsonde observations for RF6 increases surface-based CAPE in west Texas and eastern New Mexico in the Updated ensemble mean relative to the Control ensemble mean (Figs. 8a–c and 9). Both are associated with higher surface-based CAPE than the corresponding Rapid Refresh model analysis (Fig. 8d). Consequently, CI occurs farther to the west (e.g., Fig. 8c), on average, in Updated relative to both Control and observations (Figs. 7b and 8d). Subsequent simulated CI events that occur along the southward-spreading outflow boundary from this initial convection (not shown) occur farther to the northwest in Updated relative to both Control and observations (Fig. 7b). Assimilating MPEX dropsonde observations for RF6 also decreases surface-based CAPE, while increasing surface-based CIN, in far southwestern Texas (Figs. 8a–c and 9). This mitigates CI potential relative to Control and, again, observations (Fig. 7a). Whereas surface-based CAPE differences between the Updated and Control ensemble means are relatively consistent with time (Figs. 8a–c), coherent differences in the surface-based CIN field in southwest Texas and southeast New Mexico at the time of CI emerge do not emerge until later (cf. Figs. 9a,b). Further investigation is needed to elucidate the causes of the disparate response between these two thermodynamic fields.
b. RF10: 31 May 2013
RF10 sampled (Fig. 6b) a strong upper-tropospheric westerly jet streak over the central high plains to the south of a vertically stacked cyclone over South Dakota (Fig. 6b; Torn and Romine 2015). In the lower troposphere, a cold front extended south-southwestward from the vertically stacked cyclone through northwestern Oklahoma to the southern Texas Panhandle, with lower-tropospheric flow originating in the western Gulf of Mexico promoting a warm, moist boundary layer ahead of the cold front (Fig. 6e). By 2100 UTC, SBCAPE values along this front reached upward of 4500 J kg−1 in Oklahoma (not shown). On this day, CI first occurred in eastern Kansas and Missouri along the cold front by 1930 UTC (Figs. 2b and 6e), with subsequent events occurring in west-central Oklahoma after 2100 UTC near the intersection of the cold front and dryline. It is along this corridor where most observed CI events occurred on this day (Fig. 6e). Discrete supercells produced by the initial convection in west-central Oklahoma led to destructive tornadoes near El Reno, Oklahoma, before transitioning into a quasi-stationary MCS that caused deadly flash flooding near Oklahoma City, Oklahoma, during the evening (Torn and Romine 2015; Schumacher 2015).
Observed CI event neighborhood probability is highest along the cold front (Fig. 6e) from southwest Missouri west-southwestward to west-central Oklahoma (Fig. 7c). Along this well-defined triggering mechanism, both Updated and Control ensembles reasonably forecast CI event probability (Fig. 7d). Both the Control and Updated ensembles have high forecast event probabilities in northern Iowa (Fig. 7d), however, where observed event probability was comparatively low. Forecast skill differences between ensembles (Fig. 3) primarily manifest in northern Iowa and far southwestern Oklahoma, with lower ensemble neighborhood probability in Control relative to Updated in both locations (Fig. 7d). Albeit to lesser extent than with RF6, the Control and Updated ensembles are again more similar to each other than either is to observations (cf. Figs. 7c,d). Over the set of cases considered, the fewest number of observed CI events occurs with RF10 (Figs. 2a,b), contributing in part to the comparatively high CSI spread across the Control and Updated ensembles for this case (Fig. 5).
Assimilating MPEX dropsonde observations for RF10 shifts the simulated cold front slightly westward, shifts a simulated dryline bulge in the southeastern Texas Panhandle southward, and sharpens the simulated horizontal surface-based CAPE gradient across the dryline along the Oklahoma–Texas border in Updated relative to Control (Fig. 10). This results in elevated CI probabilities in southwest Oklahoma in Updated relative to both Control and observations (Figs. 7c,d). In Iowa, assimilating MPEX dropsonde observations for RF10 shifts a simulated short wave embedded within the synoptic-scale upper-tropospheric cyclonic flow slightly eastward relative to Control (Fig. 11a); however, both Updated and Control are too fast, on average, with the eastward translation of this feature relative to Rapid Refresh analyses (Fig. 11d). Convection initiated ahead of this feature in northern Iowa during the local afternoon hours in both Updated and Control ensembles (Figs. 7d and 11a), whereas only congested cumulus occurred in association with the observed feature (not shown). In contrast to RF6, surface-based CAPE (Fig. 10a) and 300 hPa potential vorticity (Fig. 11a) differences between the Updated and Control ensemble means are small at initialization and grow in both spatial extent and magnitude with time (Figs. 10b,c and 11b,c).
c. RF12: 8 June 2013
RF12 sampled (Fig. 6c) the environment surrounding an upper-tropospheric short-wave trough and accompanying northwesterly flow across the northern Intermountain West and high plains (Fig. 6c). Ahead of this trough, a lower-tropospheric wind shift is noted, extending south-southwest from central South Dakota to western Kansas, from which it extended west along the Colorado–New Mexico border (Fig. 6f). Ahead of this feature, the boundary layer was not as warm or as moist as in RF6 and RF10, with 850-hPa equivalent potential temperature generally between 330 and 340 K in the warm sector in Nebraska and Kansas (Fig. 6f), contributing to SBCAPE of up to 2000 J kg−1 (not shown). Widespread CI occurred along this feature from far southwestern Iowa to the Texas Panhandle (Fig. 6f; Burlingame et al. 2017), with the first observed CI events along this feature occurring in northern Kansas by 2000 UTC. A few instances of CI occurred before 2000 UTC (Fig. 2b) in northwestern Nebraska with the upper-tropospheric trough itself.
Observed CI event probability is highest along the southeastward-advancing wind shift (Fig. 6f) from the Texas Panhandle to central Kansas, with a secondary maximum farther northeast from central Kansas to southeast Nebraska (Fig. 7e). Although the Control and Updated ensembles each forecast CI along this boundary, the highest ensemble neighborhood probabilities are found between central Kansas and southeast Nebraska in each, with secondary maxima farther southwest from central Kansas to the Texas Panhandle (Fig. 7f). Furthermore, both Control and Updated ensembles have high forecast event probabilities in western Nebraska (Fig. 7f) near the surface low (Fig. 6f) where observed event probability was comparatively low. Forecast skill differences between ensembles (Fig. 3) primarily manifest in western Nebraska, where Updated ensemble neighborhood probability is lower, and from southwestern Kansas to western Oklahoma, where Updated ensemble neighborhood probability is higher (Fig. 7f). As with RF6 and RF10, the Control and Updated ensembles are more similar to each other than either is to observations (cf. Figs. 7e,f).
Assimilating MPEX dropsonde observations for RF12 reduces simulated surface-based CAPE along and ahead of the simulated lower-tropospheric wind shift in Nebraska and north-central Kansas (Fig. 12), mitigating simulated CI frequency in Nebraska in Updated relative to Control (Fig. 7f). Further, assimilating MPEX dropsonde observations for RF12 increases simulated surface-based CAPE in southwestern Kansas, the Oklahoma and Texas Panhandles, and eastern New Mexico in Updated relative to Control (Fig. 12). This results in the increased CI probabilities in Updated relative to Control (Fig. 7f) from southwestern Kansas to western Oklahoma during the forecast. Surface-based CAPE differences between the Updated and Control ensemble means at initialization (Fig. 12a) quickly grow in magnitude and scale (Figs. 12b,c), with the largest difference magnitudes found along ensemble mean surface-based CAPE gradients (e.g., southeast Colorado, northeast Nebraska) or the lower-tropospheric wind shift (e.g., central Nebraska, west Kansas). Note also that both Updated and Control forecast isolated CI events near and ahead of the upper-tropospheric trough, whereas only shallow convection is found in observations (Fig. 13). This may be related to the chosen microphysics parameterization and its influence upon the computation of simulated reflectivity (e.g., Stratman et al. 2013), although further investigation is warranted.
5. Summary and discussion
This study tested the hypothesis that assimilating mid- to upper-tropospheric, meso-α- to synoptic-scale MPEX dropsonde observations collected in upstream, preconvective environments is insufficient to improve short-range (3–15 h) CI forecast skill due to a limited influence upon the lower-tropospheric phenomena that modulate CI occurrence, timing, and location. Output from two 30-member, convection-permitting WRF-ARW ensembles, one each in which MPEX dropsonde observations were (Updated) and were not (Control) assimilated, was used to verify simulated CI forecasts and test this hypothesis. Verification was conducted over multiple mesoscale spatiotemporal thresholds using FSS to assess probabilistic forecast skill and a flow-dependent metric to match modeled CI events from each ensemble member to observations, from which measures of deterministic forecast skill (POD, FAR, and CSI) were computed. Consistent with prior studies (e.g., Duda and Gallus 2013; Burghardt et al. 2014), the CI diurnal cycle was well predicted for all cases.
Forecast CI event count was large relative to observations in both ensembles for all cases considered, a result that is consistent with prior studies (e.g., Burghardt et al. 2014; Coniglio et al. 2016). There likely exist multiple causes for this high-frequency bias. For 3 of the 15 cases considered herein, Burlingame et al. (2017) demonstrated this result to be robust to the choice of PBL parameterization, with largest frequency bias for local closure parameterizations that forecast moister yet slightly cooler daytime boundary layers (Burlingame et al. 2017). In an evaluation of simulations conducted during the 2009 and 2010 Hazardous Weather Testbed Spring Forecasting Experiments, Stratman et al. (2013) demonstrated a high-frequency bias for the occurrence of >40-dBZ simulated reflectivity values in 0–12-h WRF-ARW simulations. A similar high-frequency bias is noted for RF6, RF10, and RF12 in this study (not shown). Notably, the simulations of Stratman et al. (2013), Burghardt et al. (2014), Coniglio et al. (2016), and this study each use the Thompson et al. (2008) microphysics parameterization. This suggests that at least part of the high forecast CI event frequency bias is related to microphysics parameterization and its influence on simulated reflectivity. However, it is likely that other modeling system components such as horizontal grid spacing (e.g., Bryan and Morrison 2012) also contribute to this result. Further investigation is warranted into the relative contributions of different sources of model error, including its manifestation in the cycled analysis system as well as subsequent free forecasts, toward the forecast CI event bias (and other forecast characteristics) documented herein.
Deterministic forecast skill for the cases considered herein was of similar magnitude to that described in previous works that used similar methodologies to study CI predictability (e.g., Duda and Gallus 2013; Kain et al. 2013; Burghardt et al. 2014). Although subjectively high, probabilistic CI forecast skill was lower than for accumulated precipitation for the 15 cases considered herein and by Romine et al. (2016). Probabilistic forecast skill was largest near the peak of the CI diurnal cycle in the local evening hours, with lesser skill noted in the afternoon and overnight. Though ensemble forecasts could reasonably predict the mesoscale regions in which CI would occur at approximately the correct time, they were unable to predict precisely when and where a given CI event would occur within those corridors. This is generally consistent with the findings of previous CI predictability investigations with convection-allowing models (Duda and Gallus 2013; Kain et al. 2013; Burghardt et al. 2014). Further, while forecast skill was higher at more lenient verification thresholds, such forecasts are expected to have limited forecaster utility.
On average and for each case considered, assimilating MPEX dropsonde observations had negligible impact on all CI forecast skill measures considered. This impact is similar if somewhat reduced to the small positive impact (≤0.02 FSS) in Romine et al. (2016) for accumulated precipitation over the same cases using identical simulation methods. This likely results, in part, from the different forecast feature of interest between the two studies, with accumulated precipitation being that for which observations were specifically targeted to improve. Further investigation is necessary to determine the extent to which forecast sensitivity for CI is different from that for accumulated precipitation. However, an analysis of the CI event locations relative to the areas covered by the accumulated precipitation thresholds of Romine et al. (2016) indicates substantial differences, with lower-threshold regions including light precipitation associated with larger-scale phenomena and higher-thresholds regions including only heavy precipitation associated with organized convective systems (not shown). Although this too requires further investigation, we hypothesize that the greater implied synoptic-scale control on the accumulated precipitation field than the CI event field (as described in the introduction) is responsible for the small increase in forecast skill in Romine et al. (2016) but no change in skill in the present study.
The forecast impact from assimilating MPEX dropsonde observations is also similar if somewhat reduced to that identified by prior observation targeting studies for tropical cyclones and synoptic-scale midlatitude cyclones (e.g., Bergot 1999; Montani et al. 1999; Fourrié et al. 2006; Wu et al. 2007; Rabier et al. 2008; Aberson 2010; Chou et al. 2011; Majumdar 2016). Targeted observation assimilation did not always improve short-range CI predictions for the 12–13 June 2002 IHOP event studied by Liu and Xue (2008), which is also consistent with the results presented here despite significant differences in modeling systems, data assimilation and observation targeting methods, collected observation types, and case characteristics between the two studies. However, it should be noted that the specific findings presented here are formally only valid for the model configuration, selected cases, observation targeting strategies, and verification metrics used herein.
Three cases were investigated in more detail: RF6, a poorly forecast case with reduced skill in Updated relative to Control; RF10, a well-forecast case with reduced skill in Updated relative to Control; and RF12, an average skill case with increased skill in Updated relative to Control. For each of the three cases considered, assimilating MPEX dropsonde observations resulted in small shifts in the positions of lower-tropospheric atmospheric boundaries along which CI preferentially occurred, with case-to-case variability noted in the magnitude and spatial extent of initial differences as well as their subsequent growth between the Updated and Control ensembles. Assimilating MPEX dropsonde observations also modified ambient thermodynamic conditions along and ahead of these lower-tropospheric boundaries, manifest primarily on the meso-β scale for RF6 and RF10 and on the meso-α to synoptic scale for RF12. RF6 was a case where MPEX dropsonde observations poorly sampled the preconvective upstream disturbance and its environment (Romine et al. 2016), whereas RF12 was a case where MPEX dropsonde observations better sampled the preconvective upstream disturbance and its environment. The extent to which the spatial scale of analysis and forecast differences resulting from the assimilation of MPEX dropsonde observations is directly correlated with how well a given feature was sampled merits further investigation with a larger sample of cases. For all cases, however, the Updated and Control ensembles were more alike than either was to observations.
As finite large-scale initial condition errors are a significant contributor to short-range convective-scale error growth (e.g., Durran and Weyn 2016), it is probable that the synoptic-scale phenomena to which CI forecasts are most sensitive, particularly in the lower troposphere, were not adequately sampled to result in significant skill improvement in the Updated ensemble. For the cases considered, Romine et al. (2016) suggest that a suboptimal data assimilation system and assumptions underlying the ensemble sensitivity analysis method used to determine targeted observation locations (e.g., linearity, a normally distributed forecast metric) may also contribute to a small targeted observation impact on accumulated precipitation forecast skill. In applying ensemble sensitivity analysis with a CI-specific forecast metric to two dryline CI episodes, Hill et al. (2016) demonstrate that coherent meso- to synoptic-scale sensitivities tied to features known to influence CI exist at lead times of a few hours (for surface fields) to 12–24 h (at higher altitudes). However, they leave application of their results to observation targeting and predictability for future study. As a result, it is likely that a denser observation network than was used during MPEX, including a robust lower-tropospheric observation capability and with observation locations selected using a CI-specific metric, is necessary to result in significant forecast skill improvement from the assimilation of targeted observations. As spatial variability that can distinguish between CI and null events only exists at short lead times and on small spatial scales (e.g., Madaus and Hakim 2016), however, it is unclear as to precisely how much forecast skill at the lead times considered in this study can be improved through an improved observation strategy, independent of data assimilation and model error considerations. Future study is planned to quantify the intrinsic predictability of CI, case-to-case variability in the practical predictability of CI, and identify resources (including observation strategies) necessary for practical predictability to approach its intrinsic limits.
We acknowledge Ryan Torn for assistance in preparing the ensemble initial conditions, Adam Clark and Scott Dembek for providing code to compute simulated reflectivity at the −10°C isotherm in-line with model execution, and Bryan Burlingame for assistance with conducting the ensemble numerical simulations and identifying observed and modeled CI events. This work benefited from reviews by three anonymous reviewers and Monthly Weather Review Editor Altug Aksoy. This material is based on work supported by the National Science Foundation under Grant AGS-1347545. The National Center for Atmospheric Research is sponsored by the National Science Foundation. The first author acknowledges partial support from a University of Wisconsin–Milwaukee Advanced Opportunity Fellowship. Computing resources were provided by the NCAR Yellowstone supercomputer (CISL 2012).
Current affiliation: National Weather Service, Chanhassen, Minnesota.