Several data assimilation and forecast experiments are undertaken to determine the impact of special observations taken during the second Verification of the Origins of Rotation in Tornadoes Experiment (VORTEX2) on forecasts of the 5 June 2009 Goshen County, Wyoming, supercell. The data used in these experiments are those from the Mobile Weather Radar, 2005 X-band, Phased Array (MWR-05XP); two mobile mesonets (MM); and several mobile sounding units. Data sources are divided into “routine,” including those from operational Weather Surveillance Radar-1988 Dopplers (WSR-88Ds) and the Automated Surface Observing System (ASOS) network, and “special” observations from the VORTEX2 project.
VORTEX2 data sources are denied individually from a total of six ensemble square root filter (EnSRF) data assimilation and forecasting experiments. The EnSRF data assimilation uses 40 ensemble members on a 1-km grid nested inside a 3-km grid. Each experiment assimilates data every 5 min for 1 h, followed by a 1-h forecast. All experiments are able to reproduce the basic evolution of the supercell, though the impact of the VORTEX2 observations was mixed. The VORTEX2 sounding data decreased the mesocyclone intensity in the latter stages of the forecast, consistent with observations. The MWR-05XP data increased the forecast vorticity above approximately 1 km AGL in all experiments and had little impact on forecast vorticity below 1 km AGL. The MM data had negative impacts on the intensity of the low-level mesocyclone, by decreasing the vertical vorticity and indirectly by decreasing the buoyancy of the inflow.
The ultimate goal of the NOAA Warn-on-Forecast project (Stensrud et al. 2009) is to issue reliable probabilistic tornado warnings based upon explicit numerical ensemble predictions of tornadoes, rather than based on radar detection. Achieving this goal faces many challenges, including the needs ranging from understanding the behavior of tornadoes and their parent supercell thunderstorms to understanding the behavior of data assimilation and numerical prediction systems used. Given these difficulties and knowing that the forecast will never be perfect, probabilistic information can be useful in the decision-making process. Recent results using ensemble Kalman filter (EnKF) methods (Evensen 2003) to initialize convective-scale ensembles for the prediction of low-level vertical vorticity swaths yield favorable comparisons to observed tornado damage tracks (Dawson et al. 2012; Yussouf et al. 2013), providing hope that the goal of the Warn-on-Forecast project is attainable. Snook et al. (2012, 2015) also demonstrated probabilistic forecasting skills for low-level mesovortices associated with observed tornadoes within a mesoscale convective system, when EnKF was used for data assimilation.
In addition to a rigorously developed data assimilation algorithm, however, we require data that resolve the features of interest. Radar data provide the best opportunity for resolving supercells. However, one problem for detecting tornadoes is that because of the curvature of Earth, a single radar can only observe the lowest levels of the atmosphere over a very short range. Low-level radar observations have already proven useful in EnKF analyses of a supercell (Tanamachi et al. 2013) and in EnKF analyses and forecasts of a tornadic mesoscale convective system (Snook et al. 2011), suggesting that it is important to have low-level radar observations.
One source of low-level radar observations is the second Verification of the Origins of Rotation in Tornadoes Experiment (VORTEX2; Wurman et al. 2012) conducted during the springs of 2009 and 2010. The purpose of the experiment was to build integrated datasets of supercell thunderstorms using many observation platforms, including multiple mobile Doppler radars. Approximately 50 thunderstorms were sampled during the VORTEX2 campaign, many of which were nontornadic or weakly tornadic, and observations from several mobile radars often are available.
One tornadic storm in particular from the 2009 campaign has already received much attention in the literature: the Goshen County, Wyoming, storm of 5 June 2009. Most of the work thus far has focused on physical understanding of supercell and tornado dynamics (e.g., Markowski et al. 2012a,b; Wakimoto et al. 2011; Atkins et al. 2012; Marquis et al. 2014; Kosiba et al. 2013). Work by Marquis et al. (2014) is unique in that it used EnKF analyses of the storm in order to close gaps in data coverage of the storm, though the study focused on physical understanding. Some preliminary forecasting work has been done using this supercell, notably by Dowell et al. (2010). Their experiments used a continental United States (CONUS)-scale outer ensemble at 15-km grid spacing and a nest along the Front Range of the Rocky Mountains at 3-km grid spacing. They did not assimilate observations from VORTEX2, but they did assimilate observations from the Meteorological Assimilation Data Ingest System (MADIS) and six WSR-88Ds. Their experiments found that assimilation of the WSR-88Ds improved the correspondence between observed storm reports and areas of high probability of updraft helicity generated by their ensembles.
After a forecast, either deterministic or ensemble, has been obtained, one must decide the best way to evaluate the performance of that forecast. Several methods have been used in the literature. For deterministic forecasts, various observation-space statistics can be generated, as in Aksoy et al. (2010), including mean difference (bias), root-mean-square difference, and equitable threat score (ETS). Direct comparisons of model-predicted fields with radar observations in the radar coordinates had also been performed (e.g., Xue et al. 2014). On finescale grids the positioning error becomes more prominent than on coarse-scale grids and double penalization, which is penalizing the model for having a feature where it did not occur and not having a feature where it did occur, becomes an issue. One way to alleviate this problem is to use “neighborhood” scores, such as the fractions skill score (FSS; Roberts and Lean 2008), which evaluate not only at a given grid point, but all grid points within a certain radius. For ensemble forecasts, one could use probabilistic skill scores, such as relative operating characteristic (ROC) score, and ensemble probability. For example, Snook et al. (2012, 2015) used a neighborhood ensemble probability for reflectivity and low-level vertical vorticity exceeding certain thresholds to alleviate double penalty, and ROC was used in Snook et al. (2015) for evaluating probabilistic forecasting skills of precipitation. Object-based methods can also be used for tracking discrete convective-scale features in numerical predictions (e.g., Johnson et al. 2013; Clark et al. 2014). One subjective method used by Dawson et al. (2012) is to use visual comparisons of the probability of time-maximum vorticity at a given level being greater than a threshold against an actual tornado track.
In this study, data-denial experiments are conducted to determine the impact of special VORTEX2 observations on the analysis and prediction of the Goshen County supercell storm. As the study is part of the larger Warn-on-Forecast goal, there is a focus on the utility of these observations in an operational NWP setting. This study is similar in design to Dowell et al. (2010), but it deals specifically with special observations taken during VORTEX2 and evaluating their importance to create a realistic forecast of supercell behavior. Similar experiments have also been conducted by Marquis et al. (2014). Tanamachi et al. (2013) are the closest in spirit to the current study, as they investigated the impact of University of Massachusetts, mobile, X-band, polarimetric Doppler radar (UMass X-Pol) data on analyses of the Greensburg, Kansas, supercell of 4 May 2007. However, they did not examine storm-scale (1 h) forecasts resulting from these analyses. Snook et al. (2012) examined the impact of data from limited-range X-band radars on 3-h forecasts of a mesoscale convective system. This study is unique in examining the relative impact of special VORTEX2 observations on convective-scale analyses and forecasts of a tornadic supercell. Studies looking at convective-scale forecasts but not investigating the impact of different data sources based on ensemble data assimilation methods include Dawson et al. (2012), Yussouf et al. (2013), and Jung et al. (2012).
The rest of this paper is organized as follows. Section 2 gives information about the observations used and the design of the experiments. The general evolution of the simulated supercell and comparisons to observed data are discussed in section 3. Section 4 focuses on the impact of the observations on the analyses forecasts. Conclusions are given in section 5.
2. Experiment design
a. Prediction model configurations
The Advanced Regional Prediction System (ARPS; Xue et al. 2000, 2003), version 5.3, is used for the forward prediction model, and the ensemble square root filter (EnSRF; Whitaker and Hamill 2002) is used as the data assimilation method. The data assimilation and forecast experiments are performed on two domains: an inner domain with 1-km grid spacing one way nested inside an outer domain with 3-km grid spacing (Fig. 1), each with 40 ensemble members. Each member on the inner domain receives boundary conditions from the corresponding member on the outer domain. The location for the inner domain is centered on one of the VORTEX2 soundings launched in the inflow region to the supercell, and it is large enough to contain the storm throughout its life cycle (Fig. 1). Full physics and terrain are used for all experiments, including Lin et al. (1983) microphysics, with the rain intercept parameter reduced by a factor of 20 from the default value, as suggested by Snook and Xue (2008) and Dawson et al. (2010). The Lin scheme was used instead of a theoretically more accurate, but more computationally expensive, double- or triple-moment scheme in order to mimic a potential operational configuration. A planetary boundary layer scheme based on Sun and Chang (1986) is used for all experiments. More details on the physics options can be found in Xue et al. (2001). Selected configuration settings for the experiments for both domains are given in Table 1. These parameter settings were chosen after numerous experiments. For example, smaller observation error standard deviations (OESD) often led to filter divergence because of the large volume of data. Using larger OESDs helped prevent filter divergence and maintain ensemble spreads at desirable levels.
b. Ensemble filter configurations
The EnKF data assimilation system based on the EnSRF algorithm is the one developed for the ARPS modeling system, as documented in a number of papers, including Xue et al. (2007), Tong and Xue (2008), and Y. Wang et al. (2013). The initial ensemble on the outer domain was generated by adding initial perturbations to the North American Mesoscale Forecast System (NAM) analysis at 1800 UTC 5 June 2009. Similar to Jung et al. (2012), the added initial perturbations were spatially correlated and generated by applying a recursive filter (Lorenc 1992) to random Gaussian perturbations. This procedure is a computationally more efficient alternative to the perturbation smoothing procedure introduced into the ARPS EnKF data assimilation system by Tong and Xue (2008) and used in many later studies (e.g., Xue et al. 2009, 2010). In addition, perturbations with a smaller spatial correlation scale were added to the 1-km inner domain ensemble at the start of this nested grid ensemble to introduce additional convective-scale perturbations. On the outer and inner domains, the horizontal decorrelation length scales of the perturbations were 12 and 6 km, respectively, and the vertical decorrelation length scale was 3 km for both domains. The perturbation magnitudes in terms of standard deviations were 2 m s−1 for the zonal and meridional components of wind (u and υ, respectively), 1 K for potential temperature θ, 0.6 g kg−1 for water vapor mixing ratio qυ on the outer domain, and 0.4 g kg−1 for qυ on the inner domain. These magnitudes were chosen by trial and error to give as much spread as possible in the initial conditions without initiating too much spurious convection.
Deep convection initiated along the Laramie Mountains in southeastern Wyoming after 1930 UTC 5 June 2009, with the convection showing supercellular characteristics after 2100 UTC (Markowski et al. 2012a). Thus, on the outer domain (cf. Fig. 1), routine radar and surface data are assimilated every 30 min between 1800 and 2100 UTC, while sounding and profiler data are assimilated every hour during the same period. Between 2100 and 2200 UTC, radar and surface data are assimilated every 5 min, with profiler and sounding data still assimilated every hour. Clear-air reflectivity observations from all three WSR-88D radars are assimilated in order to suppress spurious convection (Tong and Xue 2005). Data from all WSR-88Ds are assimilated on the outer domain, while MWR-05XP data are not. On the inner domain, radar and surface data are assimilated every 5 min between 2100 and 2200 UTC, with sounding and profiler data assimilated every hour. Data from the KRIW radar (Riverton, Wyoming) are not assimilated on the inner domain because of the distance of the radar from the domain. See Fig. 1c for a timeline of the experiment and radar data assimilation configurations.
Spread during the assimilation period is maintained using a combination of static multiplicative (Anderson 2001) and adaptive relaxation to prior spread (RTPS; Whitaker and Hamill 2012) covariance inflation. For the first two cycles, multiplicative inflation with inflation factor αmult = 1.03 is used over the entire domain. From there (1930 UTC) on, a combination of multiplicative inflation with αmult = 1.20 in regions where the reflectivity of the ensemble mean is greater than 15 dBZ and RTPS everywhere in the domain with αadapt = 0.9 is used. Other filter parameter configurations, including localization radii and assumed OESDs for the various observation types, are found in Table 1.
The data used in the experiments are divided into two groups: routine observations and special observations (i.e., those taken as part of the VORTEX2 experiment). Routine observations include surface temperature, moisture, and horizontal wind from the Automated Surface Observing System (ASOS) network, horizontal wind from the NOAA profiler network, and level II reflectivity and radial velocity from three WSR-88Ds: Cheyenne, Wyoming (KCYS); Denver, Colorado (KFTG); and KRIW. As mentioned earlier, the radar data assimilation includes clear-air reflectivity observations to help suppress convection where precipitation is not observed. As the experiments take place between synoptic times, no routine rawinsonde data are available. Manual quality control was performed on data from KCYS and KFTG, including velocity unfolding and ground clutter removal. Data from KRIW were not manually quality controlled, as the radar was out of range of the supercell; however, those data were subjected to automated quality control routines, such as velocity unfolding and automated ground clutter removal [see Brewster et al. (2005) for details].
Special observations include surface temperature, moisture, and horizontal wind from the NSSL mobile mesonet (NSSL MM; Straka et al. 1996) and the Texas Tech University StickNet (Weiss and Schroeder 2008), and upper-air temperature, moisture, and horizontal wind from mobile sounding units. The locations of the assimilated conventional data can be found in Fig. 2. Because of limitations in the implementation of the data assimilation algorithm, we retain the synoptic-scale assumption that each sounding profile is representative of the column above its release point at the beginning of the hour following its release. These are represented by the dashed lines in Fig. 2.
The NSSL MM probes were mounted on cars, labeled P1–P8, excluding “P7,” and thus could be mobile while taking data. The StickNet probes were stationary once deployed. Both datasets included several quality control flags for suspicious readings of the observed quantities. The high temporal resolution of these datasets, observations at 1 s or less intervals, allowed for a strict quality control procedure while keeping temporal errors in the observations used for assimilation low; thus, observations were discarded if any of the quality control flags were set. The NSSL MM cars also have two temperature sensors: a “slow” and “fast” response sensor, and a flag is included in the data file that specifies whether the sensors are well aspirated. The slow-response sensor is used wherever the sensors were well aspirated; the fast-response sensor is used otherwise. The time constant for the slow-response sensor is 1 min, and errors due to finite instrument response time are considered negligible.
In addition, superobservations (superobs) were created from the MM data with the 11 closest observations to the target time. To create the superobs, the minimum and maximum observations for each quantity over the 11-s interval were removed and the remaining observations were averaged together. If the wind direction observations fell over three or more quadrants, then the winds were set to missing. None of the raw observations that were averaged to create a given superob was more than 180 s apart in time or 1 km apart in space.
An additional source of special observations is the Mobile Weather Radar, 2005 X-band, Phased Array (MWR-05XP; Bluestein et al. 2010; French et al. 2013). For the 5 June 2009 deployment, the volumetric update interval is 6–9 s, and volumes are scanned up to 20° in elevation. A nice property of the phased-array radar for data assimilation is that elevation sweeps are taken nearly simultaneously in the context of the scan. This reduces or eliminates position differences between elevation angles associated with the scanning strategies of dish radars. Additionally, MWR-05XP data begin at 2143 UTC, 9 min before tornadogenesis. Furthermore, while the MWR-05XP scanned up to 20° in elevation, Doppler on Wheels (DOW) radars scanned up to 16° in elevation (Kosiba et al. 2013), the NOAA/NSSL mobile, X-band, dual-polarization radar (NOXP) scanned up to 11° (Schwartz and Burgess 2010), and the UMass X-Pol scanned up to 14.4° (J. Snyder 2013, personal communication). Thus, using MWR-05XP allows for better sampling of the midlevels of the storm than the other mobile radars scanning the storm.
The angular resolution of the MWR-05XP is fairly coarse: the half-power beamwidth is 1.8° and 2.0° in azimuth and elevation, respectively, and the sampling intervals are 1.5° in both directions. This is permissible for this study because the radar is located no more than 25 km, and sometimes as close as 10 km, away from the updraft region of the storm, during the assimilation period (French et al. 2013). The angular sampling interval results in a maximum azimuthal gate spacing of ~650 m at a range of 25 km, while the radial gate spacing is 150 m. Since these are both less than the grid spacing used for the inner domain, even in the worst case, the data density is greater than the grid density. Additionally, the focus for this study is on the low-level mesocyclone, which is well resolved by the MWR-05XP data and mostly unobserved by the nearest WSR-88D. In addition to the relatively coarse angular resolution, another drawback is that the MWR-05XP truck is not level, which can introduce position errors in the data. Fortunately, for the 5 June 2009 deployment, the road on which the radar truck was sited was relatively level [see Fig. 1a of French et al. (2014)]. Thus, for this dataset, the position errors are likely less than 500 m and considered to be negligible for purposes of this study.
The MWR-05XP Doppler velocity data were dealiased and data contaminated by ground clutter were removed (French et al. 2013). Because a radar beam at X-band wavelength attenuates strongly in heavy precipitation, reflectivity data are not used directly in the assimilation, as this would require attenuation correction in the forward operator and increase the complexity of the experiments. Because of the large resolution volumes in the MWR-05XP data, the velocity data are prone to sidelobe contamination in regions of strong reflectivity gradients, such as along the forward flank. To mitigate this contamination, radial velocity data are used only in areas where reflectivity from MWR-05XP is greater than 30 dBZ.
All radar data are interpolated linearly to model gridpoint locations (see the next section) in the horizontal direction. In the vertical direction, the data are left on the original conical sweep surfaces, and all elevation angles from all radars are used. WSR-88D volumes are available approximately every 4–5 min, and data are assimilated by volume. For each assimilation time, the closest volume prior to the assimilation time is used, meaning the nominal time for the assimilation always occurs during the scan. The relatively slow forward speed of the storm (~10 m s−1) allows for volume-by-volume use of data in this manner without incurring large errors.
The total number of observations is heavily skewed toward the radar data. At 2200 UTC on the inner domain, the number of observations of radial velocity from KCYS is 63 618, those from KFTG are 9495, and those from MWR-05XP are 1059. Also at this time, there are 638 138 reflectivity observations from KCYS and 284 715 reflectivity observations from KFTG (including clear-air reflectivity observations). For each radar and each observed quantity, the number of radar observations is the same order of magnitude across time steps. By comparison, there are just four soundings with 224 sounding observations of each quantity at 2200 UTC, three ASOS observations, and nine MM observations.
d. Data-denial experiments
The control (CTRL) run assimilates all routine observations, including those from the WSR-88Ds, as well as MWR-05XP, MM, and sounding data. To investigate the impact of each VORTEX2 data source individually, MWR-05XP data, MM data, and sounding data were removed from the assimilation in turn. These experiments will be referred to as NO_MWR, NO_MM, and NO_SND, respectively. Additionally, MWR-05XP and MM data were removed together in one experiment (NO_MM_MWR), and all three experiments were removed in another experiment (NO_V2). VORTEX2 data were only denied on the inner domain in order to have a consistent set of boundary conditions and thereby reduce the number of potential sources of variability.
3. Analyses and evolution of the supercell
First, the model reflectivity is examined to determine the basic evolution of the supercell in the model. Reflectivity is computed using a beam-weighting technique on radar scan surfaces to facilitate direct comparison to radar observations. The technique is the same as that used in Xue et al. (2006), and this is referred to as “simulated reflectivity.” Instead of examining the ensemble mean, which has a tendency to “smear” the storm as placement errors increase, we examine the ensemble member closest to the mean, which is identified in the same manner as Yussouf et al. (2013). The storm motion of the member closest to the ensemble mean mirrors that of the observed storm (Fig. 3), though the observed storm moved more slowly. The major difference is that the reflectivity in all five experiments shown weakens significantly by 2220 UTC in comparison to that observed. This behavior is observed in a few previous studies of high-resolution supercell simulations (Yussouf et al. 2013; Lei et al. 2009). Lei et al. (2009) suspected that smaller-scale structures, such as the low-level rotation, could be underpredicted at 1-km grid spacing. Additionally, in the forecasts, by 2240 UTC, a secondary storm has developed to the southwest and has become the dominant storm by 2300 UTC. The development of the secondary storm occurred in the observed atmosphere between 2300 and 0000 UTC (not shown), and the secondary storm produced a second weak tornado at 2349 UTC. The storm in the NO_V2 experiment had the strongest updraft at the end of the forecast period. The storm in the NO_MWR_MM experiment evolved in largely the same fashion as that in NO_V2 and is therefore not shown.
In terms of the observation-space statistics [Fig. 4; summarized below; see Dowell and Wicker (2009) for a full description of the parameters], the experiments are largely the same. The consistency ratios are similar across all experiments as RTPS covariance inflation and multiplicative inflation were applied to help maintain the ensemble spread. Even with the covariance inflation, the observation-averaged ensemble spread is less than the root-mean-square innovation throughout the assimilation period and forecast, a typical sign of underdispersion. The under- or overdispersion of an ensemble can be quantified by the consistency ratio (Fig. 4b); values near or above 1, as seen in these experiments, typically mean the ensemble is properly dispersed. This is mostly due to a large OESD, 7 dBZ, which appears in the numerator of the consistency ratio equation [Eq. (3.4) of Dowell and Wicker 2009] and is likely inflating the statistic. Thus, we consider the ensemble to be underdispersed. However, as noted earlier, this underdispersion did not necessarily degrade the filter performance, consistent with previous studies (Tanamachi et al. 2013; Dawson et al. 2012; Snook et al. 2011).
In the case of both MM and MWR-05XP data, neither involve direct assimilation of reflectivity observations, so it is through cross covariances among model state variables and interactions between the variables during the prediction that establish the storm reflectivity in the model. The behavior of the observation-space statistics for radial velocity (υr) is similar.
To quantitatively evaluate the performance of the ensemble, the areas under ROC curves (AUCs; Mason 1982; Wilks 2006) are computed and presented in Fig. 5. The score is derived from ROC curves, and AUC has the ability to discriminate between events and nonevents. The range of values for AUC is 0.0–1.0, with 1.0 representing a perfect probabilistic forecast and 0.5 representing probabilistic forecasts no better than random. To compute AUC, simulated reflectivity is computed for each ensemble member for all 14 tilts from KCYS. The verifying observations were taken from KCYS only, which covered the majority of the inner domain, save the far northeastern corner (see Fig. 1).
The AUCs for all experiments start near 0.95 for reflectivity ≥25 dBZ and near 0.9 for reflectivity ≥45 dBZ (Fig. 5). For the 25-dBZ threshold, which corresponds to areas of any precipitation, the score for each experiment falls steadily until the skill reaches 0.5 by the 1-h forecast. The primary reason for the decrease in performance with increasing lead time is the increase in false alarms; that is, regions where reflectivity ≥25 dBZ was forecast but not observed. The false alarms can be seen in Fig. 3, particularly in regions northwest of the supercell, where convection was forecast but not observed. For the 45-dBZ threshold (Fig. 5b), which corresponds to areas of heavy precipitation, the AUC score remains higher than that for the 25-dBZ threshold until approximately the 45-min forecast. At this point, the performance starts to decrease and by 2255 UTC it drops below the 0.7 value suggested by Kong et al. (2011) as the threshold for a useful forecast. The relatively steady AUC in the early part of the forecast is likely because most areas of spurious convection in the forecast have reflectivity less than 45 dBZ. Thus, they do not show up as false alarms for that threshold.
Additionally, the NO_V2 and NO_MWR_MM experiments have the highest AUC for both thresholds throughout each experiment, though statistically insignificant for the 25-dBZ threshold. However, a close look at the reflectivity patterns reveals that all experiments but CTRL show splitting storms that do not match observations (see Fig. 3). AUCs for high thresholds are less confident because of displacement errors of reflectivity cores and underestimation of areal coverage of those cores in CTRL, in addition to the small sample size.
The performance of the ensemble, in terms of ensemble mean forecast, is also evaluated against MM data (Fig. 6). Three MM instruments are used in the comparison: one StickNet site (“102B” in Fig. 6) and two NSSL MM cars [probes 2 and 4 (P2 and P4, respectively) in Fig. 6)]. Experiments assimilating MM data generally better fit the analyses and forecasts at all three sites. The 102B site starts on the forward-flank gust front at 2200 UTC and transitions to the rear-flank cold pool by 2215 UTC. The corresponding increase in u occurs 10 min too soon in the ensembles, suggesting that the cold pool is advancing faster in the model than in the real atmosphere. The same behavior is noted for P4, which starts in the inflow at 2130 and transitions to the rear-flank cold pool by 2215 in the observed atmosphere. This transition occurs 5 min too early in the ensembles, again suggesting that the ensemble cold pool is too fast. P2 starts in the inflow as well and transitions to the rear-flank cold pool very near the mesocyclone around 2150 UTC. The P2 car also moves eastward and transitions back to the inflow by 2240. These transitions are captured in the temperature fields of the ensembles but less so in the u wind fields. The early transitions from inflow to rear-flank cold pool suggest that, more so than the observed storm, the modeled storm is being forced by the cold pool, which helps explain the faster propagation of the ensemble storm when compared to the observed storm.
This can be seen in Fig. 7. In the CTRL experiment, P1 and P4 in the inflow are much colder than the ensemble mean, meaning the ensemble mean forecast is too warm. However, the temperature at P2 just behind the rear-flank gust front is near the ensemble mean. The same can be seen from the difference plots for other experiments; the differences between the forecasts are approximately the same between the inflow and outflow, but the P2 is closer to the CTRL experiment than P1 and P4, meaning the gradient is stronger in the ensemble. In addition, the assimilation of MM υ observations at the time of Fig. 7 (2150 UTC) cooled the region just west of the gust front, and the assimilation of MM potential temperature observations had little effect on the cold pool temperature. The effect of these is that the propensity of the gust front to propagate like a density current is enhanced compared to the real atmosphere. Overall, however, the ensembles are able to capture the general trend of the observations well, even though the ensemble is generally warmer and drier than observed.
Finally, in Fig. 6, clustering of experiments with and without MM observations is evident. This is likely a product of the near-surface nature of the observations, which is subject to the land surface model and boundary layer scheme in the forward prediction model. These both are constant across all members and experiments, meaning that the evolution of the near-surface environment has little variability. This issue could be alleviated using a multiphysics ensemble (Berner et al. 2011). In any case, these results suggest the ensembles are capturing the general evolution of the supercell reasonably well.
4. Forecast mesocyclone tracks
In examining the forecast mesocyclone tracks in these experiments, we use the updraft helicity (UH) metric (Fig. 8). UH is used as defined in Kain et al. (2008), except herein the limits of integration are from the surface to 3 km AGL so as to pick out low-level rotating updrafts. Additionally, instead of a nine-point smoother, a neighborhood ensemble probability has been computed using a circular neighborhood 2.5 km in radius. Here, the probability of UH represents a mesocyclone-scale circulation as a proxy for a tornado track. The results presented here use a UH threshold of 75 m2 s−2, but they are relatively insensitive to the threshold used.
The UH swaths are compared to the Warning Decision Support System–Integrated Information (WDSS-II) rotation swath product (Smith and Elmore 2004). The peak probabilities in the forecast mesocyclone tracks from all six experiments generally follow the WDSS-II rotation track in space, as expected for having initialized the 1-h forecasts after tornadogenesis. This again supports the conclusion that the ensembles are capturing the evolution of the mesocyclone in the forecast. However, some differences are apparent between the individual experiments, discussed in the next sections.
a. Impact of mobile sounding observations
Omitting the sounding data results in a larger area of UH probabilities greater than 50% in the latter half of the forecast than assimilating sounding data (Fig. 8). This suggests higher probabilities in the NO_SND and NO_V2 experiments than the other experiments, while the WDSS-II rotation track suggests a weaker mesocyclone during this period of the forecast.
To examine the reason for the higher UH probabilities in the experiments without soundings, we examine the difference between CTRL and NO_SND experiments and between the NO_MM_MWR and NO_V2 experiments (Fig. 9). The source of the decreased UH in the CTRL and NO_MM_MWR experiments appears to be from the sounding launched farthest southeast. That sounding produces southeastward increments in the velocity, which corresponds to a decrease in wind speed in the inflow to the storm. The information from the soundings is advected northwestward with the wind in the inflow. The effect on the storm is that the soundings decrease vertical velocity at 500 m AGL and therefore decrease low-level vorticity stretching. Both of these have a direct impact on the UH.
b. Impact of mobile radar observations
The MWR-05XP data appear to have no systematic effects on the UH probability swaths. However, additional information can be extracted by examining how the low-level vorticity field (Fig. 10) evolves with time. The CTRL and NO_MM experiments have initialized the most intense mesocyclone in terms of vertical vorticity ζ in the ensemble mean. The next-most-intense mesocyclone in the mean is the NO_MWR_MM experiment, followed by the NO_MWR experiment, which is the least intense in the ensemble mean. As expected, the experiments assimilating MWR-05XP data have a more intense mean low-level mesocyclone than those experiments not assimilating them. This is consistent with the findings of Tanamachi et al. (2013). Without MWR-05XP data, the members of the NO_MM_MWR experiment show larger variability in the placement of the ζ = 0.015 s−1 contours, whereas the individual members of the CTRL and NO_MM experiments are relatively consistent on the placement of those contours. This suggests that the high-spatial-resolution MWR-05XP data reduce the uncertainty in the placement of the vortex. Larger variability in the placement of the ζ maxima in the ensemble is partially responsible for the lessened ensemble mean ζ in experiments without MWR-05XP data.
By the 5-min forecast (Fig. 11), the mesocyclones in all four experiments have weakened considerably. The most dramatic weakening is in the CTRL experiment, which has one of the strongest mesocyclones in the analysis and one of the weakest mesocyclones in the 5-min forecast. The decrease in intensity of the ensemble mean ζ in this experiment appears to be primarily the result of the decreased intensity in the individual members. The NO_MM and NO_MM_MWR members also exhibit a reduction in intensity, but the magnitude of reduction is smaller than the CTRL and NO_MWR experiments. A rapid reduction in mesocyclone intensity has also been observed in the 5-min forecast of a mesoscale convective system using an EnKF (Snook et al. 2011). Some of the weakening in the NO_MM and NO_MWR_MM experiments appears to be associated with differences in placement of the vortex. The vertical vorticity analyses of Atkins et al. (2012) suggest that the observed low-level mesocyclone undergoes a general intensification between 2200 and 2210 UTC, but this is not seen in any of the ensembles here, as they are likely still in the adjustment period of the initial forecast (Putnam et al. 2014).
Furthermore, during the assimilation period, MWR-05XP data increased the ensemble mean of the maximum ζ over the domain, particularly between approximately 2.5 and 4 km MSL (1–2.5 km AGL; Fig. 12). However, there is no discernible difference between the 5-min ζ forecasts at 300 m AGL from the NO_MM and NO_MM_MWR experiments in Fig. 11. We view this as evidence that the inclusion of MWR-05XP data do not always increase near-surface ζ in a forecast, even though the influence is apparent in the analysis.
c. Impact of mobile mesonet observations
An examination of Fig. 8b shows that the NO_MWR experiment has the least area of 50% UH probability out of all the experiments in the first part of the forecast, implying that adding MM observations without the presence of the MWR-05XP data decreased the intensity of the mesocyclone in the ensemble. In addition, ζ at 300 m AGL decreases drastically in the first 5 min of the 1-h forecast (Figs. 10 and 11) in experiments that assimilate MM data (CTRL and NO_MWR). Furthermore, two side experiments are performed that are identical in configuration to the CTRL experiment, except that in one, MM wind observations were omitted, and in the other MM thermodynamic observations were omitted. In both of these experiments, the ensemble mean ζ at 300 m AGL is larger, though this appears to be mainly due to better agreement among the members in placement of the mesocyclone. Further, ζ decreases in the 5-min forecast as quickly in both side experiments as in the CTRL experiment. The MM observations were anticipated to provide better information on surface conditions around the storm, so a decrease in mesocyclone intensity and longevity resulting from MM assimilation was unexpected during an intense period in the observed storm.
A possible reason for the unexpected behavior of the mesocyclones can be found in examining the forecast ζ during the analysis period (Fig. 12). Whereas the last 10 min of the assimilation period in the NO_MM and NO_MWR_MM experiments feature a general increase in or maintenance of forecast low-level ζ, the same period in the CTRL and NO_MWR experiments features a decrease in forecast low-level ζ. Atkins et al. (2012) and Kosiba et al. (2013) found that 2150 UTC begins a period of intensification in the observed low-level mesocyclone, so the low-level mesocyclones in the CTRL and NO_MWR experiments do not mimic the observed. Additionally, the experiments assimilating MM data decrease ζ in the 2150 UTC analyses, where the forecast ζ at 2150 UTC is already weaker in those experiments than in the experiments not assimilating MM data.
To explain the decrease in ζ at the 2150 UTC analysis, we examine the analysis increment of ζ produced by wind observations from individual MM cars (Fig. 13). It is apparent that the υ-wind observation from P2 contributes the overwhelming majority of the ζ decrease, with most of the contributions from the other observations being neutral or slightly positive. This is partially the result of the large deviation of the ensemble mean υ wind from the observed υ, 5.6 m s−1, with the ensemble mean wind being much stronger than observed. This is consistent with the cold pool being too strong, as noted above.
In addition to the cold pool being too cool, the MM observations decrease the temperature of the near inflow to the storm (Fig. 7). This affects the buoyancy of the parcels lifted in the updraft; less buoyant parcels will not be lifted as quickly. This is reflected in the ensemble mean of the domain-maximum forecast (Fig. 14), particularly in the 2155 and 2200 UTC forecasts. The quantity was chosen to evaluate the stretching of ζ while removing the effect of the ζ already present. It also reflects the vertical acceleration, which is proportional to the buoyancy of the inflow air. One would expect low-level to be higher in cases with larger low-level buoyancy (Markowski and Richardson 2014), and this is consistent with Fig. 14. Both the CTRL and NO_MWR experiments have lower buoyancy in the inflow, and they both exhibit lower than the NO_MM and NO_MM_MWR experiments. Thus, preexisting ζ undergoes less stretching in experiments that assimilate MM observations, which accounts for the weaker low-level mesocyclone in those experiments. In addition to the forecast , which is at least partially a reflection of the low-level buoyancy in the model, the increases in by the data assimilation are greater in the NO_MM experiment than in the CTRL experiment. The same is true to a lesser extent in the NO_MM_MWR experiment versus the NO_MWR experiment. Thus, the assimilation of MM data is also directly decreasing vertical acceleration, and therefore the stretching of ζ.
We have presented data assimilation and forecast experiments using data from the 5 June 2009 Goshen County, Wyoming, supercell. This storm was well observed by the VORTEX2 project, including by the MWR-05XP mobile, X-band, phased-array radar; two mobile mesonets (MMs); and several sounding teams. Data were divided into “routine” and “VORTEX2” (V2) observations; groups of V2 observations were removed one by one or all together from the assimilation in each experiment to determine the impact of special high-resolution observations on forecasts of the supercell. Data were assimilated at 5-min intervals for an hour into an outer mesh at 3-km resolution and an inner mesh at 1-km resolution, which produced a realistic initial state for the storm. The subsequent 1-h forecasts from all experiments captured the evolution of the supercell structure reasonably well and captured some observed trends in surface observations.
While most V2 data sources were found to be beneficial to the analyses and subsequent forecasts, one dataset exhibits a negative impact on the forecasts. Special soundings were found to generally have a positive impact on the latter stages of the 1-h forecast. Specifically, the sounding data from the upstream inflow region resulted in a negative velocity increment in the inflow, which decreased the strength of the updraft in the storm. Both the increase in updraft and subsequent increase in low-level vorticity stretching account for the increased UH seen in the latter stages of the forecast.
Furthermore, the MWR-05XP radial velocity data were found to have the effect of increasing the mean low-level vorticity in the analyses, consistent with that found by Tanamachi et al. (2013). The increase in vertical vorticity ζ was at least partially a result of reducing vortex placement differences in the members. Additionally, while the MWR-05XP data are useful in increasing ζ aloft in a forecast, it is less useful in increasing forecast ζ near the surface. Collectively, these results are consistent with the assimilation of radar data, which provide kinematic information about the mesocyclone, but convey little directly about the storm environment and therefore the conditions that at least partially contribute to the maintenance of the mesocyclone.
The MM data were found to have a negative impact, through both direct and indirect means. Directly, the MM observations decrease vertical vorticity in the 2150 UTC analysis through covariances between observed υ and ensemble u and υ. This is likely because outflow in the model background is too strong, which in turn is because the contrast between the cold outflow and warm inflow is greater than observed. Indirectly, the MM observations also decrease the temperature of the inflow to the storm, which also decreases the low-level stretching of parcels from the inflow. We applied several forms of quality control to the MM data, including creating superobs and removing data that did not meet supplied quality control metrics. Markowski et al. (2012a,b) and Marquis et al. (2014) used similar quality control procedures for the same MM data in their studies. Additionally, Marquis et al. (2014) assimilated thermodynamic MM data and found it had a positive impact on the analyses; however, they did not examine the forecasts from those analyses. Despite extensive efforts in quality controlling the data, and in tuning the assimilation configurations, we were not able to obtain a positive impact of the MM data. This is not necessarily because the analyses were not fitting the observations (see Fig. 6), but because fitting the observations moves the ensemble state into one less favorable for sustaining a mesocyclone. While such results are not expected in a statistical sense, it can happen with individual cases, when model errors, background state errors, and errors from other observations can work in complex ways against the realization of the benefit of one particular data type. We consider the current case one of such examples.
The computational requirements for these experiments were quite steep; each forecast step took approximately 10 min to run, and the EnSRF program took about 15 min. Thus, an optimal configuration would take 25 min for a 5-min cycle, much too slow for operational implementation. While the radar data composed the bulk of the assimilated data volume, the efficient parallelization for radar data (Y. Wang et al. 2013) meant that the conventional data assimilation required a disproportionate amount of time. Unfortunately from this perspective, sounding data had the greatest positive impact on the 1-h forecast, and the MWR-05XP data had a negligible impact after the first 10 min of the 1-h forecast. Additionally, we note that the sounding data contain mostly information on the environment, while the MWR-05XP data contain mostly information on the storm itself. These suggest that, for operational forecasting at 1-km grid spacing, it is important to invest computational time in assimilating data that describe the environment, rather than the storm itself.
In addition to the data used in the assimilation, the microphysics scheme used in the prediction model has been shown to have a strong influence on the structure and evolution of simulated storms (Dawson et al. 2012). Whereas a single-moment microphysics scheme, such as the Lin scheme, keeps a constant intercept parameter on the Marshall–Palmer distribution, double-moment schemes let those intercept parameters vary. Several studies show that reflectivity structure, cold pool intensity, and polarimetric signatures simulated with multimoment schemes are more consistent with observations (e.g., Dawson et al. 2010, 2014; Jung et al. 2012; Putnam et al. 2014). Thus, a future avenue for work will be to investigate any structure and evolution changes to the storm with different microphysics and whether the MM data have a more positive impact given a more complex and hopefully more accurate microphysics scheme. This could also lead to using polarimetric observations in the data assimilation, such as in Jung et al. (2008).
Finally, the phased-array MWR-05XP produced 6–9-s volume scans that were not fully utilized in these experiments. Future work could assimilate data more frequently than 5 min, which would allow for more MWR-05XP volumes to be utilized. The 4D EnSRF implementation by S. Wang et al. (2013) would be suitable, which assimilates frequent data collected over a time span in a single filter step. This allows one to relax the assumption that all observations are valid at the same time in each assimilation cycle, which reduces the timing errors of observations and reduces model adjustments at the beginning of each cycle.
Thanks go to Kevin Manross, Brandon Smith, and Lamont Bain for quality controlling the WSR-88D data used in these experiments. Paul Markowski and Yvette Richardson of the Pennsylvania State University and Jerry Straka of the University of Oklahoma collected the NSSL mobile mesonet data, and Chris Weiss of Texas Tech University collected the StickNet data used in these experiments. Josh Wurman provided the observed tornado track data as part of the NSF-sponsored Doppler on Wheels (DOW) project. Additionally, Robin Tanamachi and Yunheng Wang helped with troubleshooting problems with the experiments, and discussions with Nate Snook helped increase the quality of the research. Jim Marquis and one anonymous reviewer also increased the quality of the manuscript. This research has been primarily supported by the NOAA Warn-on-Forecast project under Grant DOC-NOAA NA080AR4320904 and by NSF Grant AGS-0802888. The MWR-05XP has been supported by NSF Grants ATM-0637148 and AGS-0934307. Finally, thanks to the National Institute for Computational Sciences (NICS) and the Texas Advanced Computing Center (TACC) for the use of their supercomputers.