Explicit prediction of hail using numerical weather prediction models remains a significant challenge; microphysical uncertainties and errors are a significant contributor to this challenge. This study assesses the ability of storm-scale ensemble forecasts using single-moment Lin or double-moment Milbrandt and Yau microphysical schemes in predicting hail during a severe weather event over south-central Oklahoma on 10 May 2010. Radar and surface observations are assimilated using an ensemble Kalman filter (EnKF) at 5-min intervals. Three sets of ensemble forecasts, launched at 15-min intervals, are then produced from EnKF analyses at times ranging from 30 min prior to the first observed hail to the time of the first observed hail. Forty ensemble members are run at 500-m horizontal grid spacing in both EnKF assimilation cycles and subsequent forecasts. Hail forecasts are verified using radar-derived products including information from single- and dual-polarization radar data: maximum estimated size of hail (MESH), hydrometeor classification algorithm (HCA) output, and hail size discrimination algorithm (HSDA) output. Resulting hail forecasts show at most marginal skill, with the level of skill dependent on the forecast initialization time and microphysical scheme used. Forecasts using the double-moment scheme predict many small hailstones aloft, while the single-moment members predict larger hailstones. Near the surface, double-moment members predict larger hailstone sizes than their single-member counterparts. Hail in the forecasts is found to melt too quickly near the surface for members using either of the microphysics schemes examined. Analysis of microphysical budgets in both schemes indicates that both schemes suboptimally represent hail processes, adversely impacting the skill of surface hail forecasts.
Severe hail is, on average, directly responsible for over a billion dollars of damage in the United States annually (Jewell and Brimelow 2009). Many businesses and industries are vulnerable to hail, including insurance, agriculture, aviation, transportation, and auto sales. There have been multiple instances in which a single hailstorm has caused over $1.0 billion (U.S. dollars) in hail damage (Changnon and Burroughs 2003). Furthermore, the cost of damages resulting from severe hail is expected to increase in the future as a result of increasing population and rising property values.
Current methods of forecasting maximum hail size are limited in their ability to represent the surrounding environment. One commonly used hail size forecast model, HAILCAST (Jewell and Brimelow 2009), applies a one-dimensional model that uses proximity soundings to predict the growth of hailstones from initial hailstone embryos. Adams-Selin and Ziegler (2016) developed a version of HAILCAST that is implemented within convection allowing numerical weather prediction (NWP) model forecasts to provide high spatial and temporal resolution hail forecasts. HAILCAST does not account for microphysical and dynamic features found within convective storms that have nonnegligible impacts on hail growth. Forecasting hail using NWP models remains a relatively understudied subject in the field of meteorology, particularly when compared to efforts focused on the prediction of other weather hazards such as flooding and tornadoes. Even as NWP forecast skill at convective scales continues to increase, skillful explicit prediction of hail remains difficult, requiring the model to accurately represent the storm intensity, mode, and motion, as well as microphysical and other physical processes important for hailstone formation. The short time scale of hailstorms and the limitations of the parameterizations used to represent microphysical processes within these storms contribute to rapid model error growth in hail-related forecast fields. In this study, we directly address the challenges of explicit hail prediction by investigating the ability of high-resolution ensemble forecasts using varying microphysics (MP) schemes to predict hail at varying lead times for a group of supercell thunderstorms that occurred over Oklahoma on 10 May 2010.
Ensemble prediction has become a vital tool for convective-scale forecasting because it is helpful in accounting for the fast error growth present at convective scales (Lorenz 1969; Clark et al. 2009, 2010). A well-designed ensemble will generate an “envelope” of possible outcomes that contains the true atmospheric state (Kalnay 2002; Kalnay et al. 2006). Furthermore, the ensemble can be used to generate probabilistic forecasts and provide uncertainty information. Two of the major factors affecting the ability of the ensemble to generate skillful short-term ensemble convective-scale forecasts are the data assimilation (DA; i.e., initial conditions) and MP parameterizations employed by the forecast system.
To maximize the accuracy of short-term storm-scale ensemble forecasts, it is necessary to reduce errors in the initial conditions by incorporating observational information via DA. To perform this task, modern storm-scale NWP systems, both in operations and research, rely upon cycled DA systems that assimilate observations at regular intervals (e.g., Snyder and Zhang 2003; Dowell et al. 2004, 2011; Zhang et al. 2004; Tong and Xue 2005; Caya et al. 2005; Xue et al. 2006, 2010; Lei et al. 2008; Snook et al. 2011, 2015, 2016; Dawson et al. 2012; Jung et al. 2012a; Yussouf and Stensrud 2012; Yussouf et al. 2013; Putnam et al. 2014). While both variational and statistical techniques have been used, ensemble-based DA techniques, such as the ensemble Kalman filter (EnKF; Evensen 1994, 2003), are often chosen because they benefit from nonstatic spatial covariances carried by and derived from the ensemble. Flow-dependent covariances are particularly important at the storm scale, as they allow for the correction of errors in fields that, while not usually directly observed, strongly influence convective dynamics (e.g., vertical velocity, in-cloud temperature, pressure, and MP properties) (Tong and Xue 2005, 2008b). In this study, we use an EnKF DA system to assimilate radar and other observational data.
The parameterization of microphysical processes is complex, and is generally a large contributor to uncertainty within convective-scale models. In most research and operational applications, bulk MP schemes are used to parameterize cloud and precipitation processes because of their computational efficiency compared to spectral bin MP schemes. Most bulk MP schemes assume a three-parameter gamma distribution to describe the hydrometeor particle size distribution (PSD):
where , λ, and α are the intercept, slope, and shape parameters, respectively (Ulbrich 1983). MP schemes can vary in complexity depending on the method used to specify these three parameters. The simplest schemes, single-moment (SM) schemes, predict only the mass mixing ratio (referred to in this paper as the mixing ratio) of hydrometeors. More complex double-moment (DM) schemes typically predict the total number concentration in addition to the mixing ratio. A number of recent studies (e.g., Jung et al. 2010; Dawson et al. 2010; Jung et al. 2012a; Putnam et al. 2014; Johnson et al. 2016) have found that the additional degrees of freedom within the predicted PSD when using a DM MP scheme allow for a more realistic representation of features observed in convective storms (e.g., cold pool structure, hydrometeor size sorting, polarimetric signatures). In this study, we will compare the relative performance of a DM scheme and an SM scheme within the EnKF DA and ensemble forecasts for hail.
Objective verification of hail forecasts is also a major challenge. Hail reports contain size and population biases, making verification of hail forecasts using these datasets difficult. In one of the first real-data convective-scale explicit hail prediction studies, Snook et al. (2016, hereafter S16) used products derived from Weather Surveillance Radar-1988 Doppler (WSR-88D) observations as supplementary sources of information for verification of hail forecasts to limit the impact of surface hail report biases and to facilitate verification in sparsely populated areas. More recently, Luo et al. (2017) used radar-derived products to verify hail simulations using one-, two-, and three-moment versions of the Milbrandt and Yau (2005) MP scheme. The Milbrandt and Yau (2005) three-moment version was found to perform the best. One radar product that is often used operationally as a proxy for hail observations is the maximum estimated size of hail (MESH). MESH is computed using a vertical integration of radar reflectivity Z multiplied by a temperature-based weighting function to estimate the maximum hail size at the surface (Witt et al. 1998). MESH was also used in the study to evaluate hail prediction. The recently upgraded WSR-88D network observes several polarimetric fields, including differential reflectivity Zdr, copolar correlation coefficient hv, and differential phase Φdp in addition to radial velocity Vr and Z. These variables can be used to infer various properties of hydrometeors, including their size, shape, orientation, and phase (Kumjian and Ryzhkov 2008). When used as inputs for a fuzzy logic hydrometeor classification algorithm (HCA; Heinselman and Ryzhkov 2006; Park et al. 2009), polarimetric radar observations can be helpful in inferring the dominant hydrometeor species present, as well as in discriminating between regions of severe and nonsevere hail (Ryzhkov et al. 2013; Ortega et al. 2016).
In this paper, storm-scale ensemble forecasts with 500-m horizontal grid spacing are performed using single- and double-moment MP schemes for the case of a group of supercell thunderstorms occurring over southern Oklahoma on 10 May 2010. The goals of this study are 1) to evaluate the ability of ensemble forecasts with varying lead times to skillfully predict hail from these storms, 2) to evaluate the strengths and weaknesses of hail forecasts produced using different MP schemes, and 3) to investigate biases related to hail prediction in the MP schemes used in this study. These results can be used to further improve MP schemes and improve the viability of explicit hail prediction within a warn-on-forecast framework (Stensrud et al. 2009).
The remainder of the paper is organized as follows. Section 2 provides a brief overview of the case, the ensemble configuration, and the techniques used in this study to verify ensemble hail forecasts. In section 3, we examine the sensitivity of hail prediction to the choice of MP scheme and forecast lead time, as well as giving consideration to the biases present in the MP schemes used. Finally, an overall summary of the results and discussion of potential areas for future investigation are presented in section 4.
2. Data and methods
a. Case overview
On the afternoon of 10 May 2010, a severe weather outbreak occurred in Oklahoma, in which a line of discrete supercell thunderstorms developed over the central and south-central parts of the state, producing multiple tornadoes and severe hail (Fig. 1). These thunderstorms developed ahead of a dryline at approximately 2200 UTC (Fig. 1a) and moved very quickly (80–90 km h−1). For ease of reference, we will refer to southern storms by a letter designation in alphabetical order from north to south (see Fig. 1). The northernmost, left-splitting storm is labeled storm A. Storms B–D are right-moving supercell thunderstorms. At 2315 UTC (Figs. 1c,d), storm C produced an EF3 tornado; Storm Prediction Center storm reports indicate that storm D simultaneously produced hail of up to 4-in. (101.6 mm) diameter. For an observationally oriented discussion of this case, we refer the reader to Palmer et al. (2011) and Bodine et al. (2014).
b. Experiment setup
1) Prediction model and domain
The Advanced Regional Prediction System (ARPS; Xue et al. 2000, 2001) is used as the prediction model in this study. All experiments use the same 723 × 723 × 53 model grid, with a horizontal grid spacing of 500 m and a stretched vertical grid with a minimum vertical grid spacing of 50 m near the surface that increases to a maximum vertical spacing of 800 m at the model top (approximately 21 km above the surface). The 500-m horizontal grid spacing used in this study is consistent with Snook et al. (2016) and is necessary to resolve finescale structures associated with hail processes. Full model physics are used (Xue et al. 2000), including the NASA Goddard Space Flight Center short- and longwave radiation scheme; surface fluxes calculated from surface drag coefficients, surface temperature, and surface volumetric water; a two-layer soil model; and a 1.5-order subgrid-scale turbulent mixing and planetary boundary layer parameterization. The initial ensemble forecasts for the DA cycles are initialized at 2000 UTC (approximately 75 min prior to convective initiation), from a 40-member ARPS parent ensemble run at 4-km horizontal grid spacing (Jung et al. 2012b ,2013). The parent ensemble (Fig. 2a) is initialized at 1500 UTC and assimilates surface and radar observations hourly (1500–1700 UTC), the frequency of assimilation increases to every 10 min 1 h prior to the completion of DA at 1800 UTC. After 1800 UTC, a 6-h forecast is run until 0000 UTC 11 May 2010; the initial and external boundary conditions are interpolated from the parent ensemble forecast. External boundary conditions are obtained hourly from the parent ensemble (Fig. 2a), and linear time interpolation is used to interpolate between boundary conditions. Hydrometeor information is not included in the boundary conditions (“dry boundary conditions”) to minimize the occurrence of spurious convection initiating along the model boundaries.
2) Microphysics schemes
The Milbrandt and Yau (2005) DM MP scheme (hereafter MY2) and the Lin et al. (1983) scheme as modified by Tao and Simpson (1993) (hereafter LIN) are used in two different sets of ensemble forecast experiments, which will be described in detail below. Hail forecasts produced using the Milbrandt and Yau (2005) SM MP scheme (not shown) were found to be inferior to both the LIN and MY2 scheme forecasts and, thus, will not be discussed. The MY2 scheme has been used in a number of previous ARPS studies (e.g., Dawson et al. 2010; Jung et al. 2012a; Putnam et al. 2014; Dawson et al. 2015), including the recent hail prediction study of S16. The LIN scheme, which is implemented in the parent ensemble (Jung et al. 2012b), is one of the most widely used single-moment MP schemes (Snook et al. 2011, 2015; Putnam et al. 2014), has previously been compared to the MY2 scheme (Jung et al. 2012a), and includes a rimed ice category that is most similar to hail. The MY2 scheme includes predictive equations for hydrometeor number concentrations Ntx and mixing ratios qx for cloud water (Ntc, qc), cloud ice (Nti, qi), rain (Ntr, qr), snow (Nts, qs), graupel (Ntg, qg), and hail (Nth, qh), allowing the scheme to diagnose N0x and λx while assuming a fixed-shape parameter (which, in this study, is set to 0). The LIN scheme predicts the mixing ratios of five hydrometeor species: cloud water (qc), cloud ice (qi), rain (qr), snow (qs), and hail (qh), allowing the scheme to diagnose λx for precipitating species while using preset constant values for N0x. Unlike the MY2 scheme, the LIN scheme does not include a separate graupel category. The LIN hail category assumes a particle density of 913 kg m−3 and N0h = 4.4 × 104 m−4. To reduce cold pool intensity, the default N0r (8.0 × 106 m−4) is reduced to 2.0 × 106 m−4 following Snook and Xue (2008).
c. DA settings
Starting from 2000 UTC, a 30-min spinup ensemble forecast is performed. After this spinup period, cycled DA is performed (Fig. 2b) using the CAPS EnKF DA system (Tong and Xue 2005; Xue et al. 2006; Tong and Xue 2008a). This system employs a variant of the ensemble square root filter (EnSRF; Whitaker and Hamill 2002). Available S-band WSR-88D and Engineering Research Center for Collaborative and Adaptive Sensing of the Atmosphere (CASA; McLaughlin et al. 2009) X-band radar observations, soundings, profiler data, Oklahoma and West Texas Mesonet observations, along with ASOS and AWOS observations, are assimilated at 15-min intervals from 2030 to 2115 UTC. Starting from 2115 UTC, the approximate time of convective initiation (Fig. 2b), the DA interval is reduced to 5 min for the remainder of the DA period. During DA, the radar-indicated severe hail is first observed at 2150 UTC (there are too few surface-based reports to use direct hail size observations), and at 2155 UTC the storms produce multiple tornado reports (Fig. 2b). Storm D develops late within the DA period. The storm produces Z > 20 dBZ first at 2155 UTC, causing it to be undersampled by DA for ensembles launched at 2200 and 2215 UTC and completely unsampled by DA for ensembles launched at 2145 UTC.
In preliminary experiments, performing many frequent DA cycles caused time integration instability in some ensemble members because fast-moving storms on this day resulted in the repeated introduction of large, unbalanced adjustments to the model state variables. To alleviate this problem, the incremental analysis update (IAU) technique (Bloom et al. 1996) is used to incorporate the analysis increments into the forecast model. Incremental updates decrease with time during the 5-min DA cycles. IAU is particularly desirable for its filtering properties that reduce the impact of introducing large adjustments into a model. To maintain ensemble spread, a relaxation-to-prior-spread algorithm is applied following Whitaker and Hamill (2012) using a coefficient of 0.95, along with multiplicative covariance inflation in regions directly influenced by radar observations with an inflation factor of 1.10 (Tong and Xue 2005). From these continuous DA cycles, three separate ensemble forecast experiments are launched at 2145, 2200, and 2215 UTC. To refer to these forecast ensembles, we adopt a naming convention in which a forecast ensemble is denoted by the MP scheme used (MY2 or LIN) and the time T0 at which the forecast was launched (e.g., MY2_2145T0 or LIN_2215T0).
Other configurations of the DA system largely follow S16. Radar observation errors of 6.0 dBZ and 4.0 m s−1 are assumed for Z and Vr, respectively. The cutoff radius of the covariance localization function (Gaspari and Cohn 1999) is set to 3 km in the horizontal and vertical directions to be consistent with Jung et al. (2012a) and Snook et al. (2016). Radar observations are interpolated horizontally onto the model grid but are not interpolated vertically (Xue et al. 2006). The forward observation operator for DA uses the T-matrix method for raindrops and the Rayleigh scattering approximation for ice particle scattering amplitude calculations (Jung et al. 2008). Radar data from the four S-band WSR-88Ds closest to the storms of interest are assimilated; these radars include Twin Lakes, Oklahoma (KTLX); Fredrick, Oklahoma (KFDR); Dyess Air Force Base, Texas (KDYX); and Dallas–Fort Worth, Texas (KFWS). Data from four X-band radars operated by CASA (McLaughlin et al. 2009) are also assimilated. The Chickasha (KSAO), Rush Springs (KRSP), Cyril (KCYR), and Lawton (KLWE) CASA radars were located between KTLX and KFDR in southwest Oklahoma, adding more near-surface radar coverage in this area. Reflectivity data from the CASA radars were processed using automated attenuation correction; for CASA radars, only Z observations exceeding 25 dBZ are assimilated to avoid assimilation of spurious, fully attenuated observations. The CASA data preprocessing procedure follows Snook et al. (2011).
Observation errors for surface measurements are assumed to be 1.5 m s−1 for horizontal wind components u and υ, 1.5 K for air temperature T, 2.0 K for dewpoint Td, and 2.0 hPa for air pressure p. The horizontal covariance localization radius for surface observations is 300 km, and the vertical localization radius is 6 km. Assumed errors for sounding data are 1.2 K for T, 2.0 K for Td, and 0.6 hPa for p. For both soundings and wind profiler observations, an error of 2.5 m s−1 is assumed for u and υ. A horizontal localization radius of 800 km and a vertical localization radius of 6 km are used for these upper-air observations. The above settings follow Snook et al. (2015).
d. Verification procedure
For verification, the model predictions of Z are calculated using a T-matrix method to obtain scattering amplitudes for both rain and frozen model hydrometeor fields (Jung et al. 2010). This method is capable of replicating the Mie scattering caused by large hailstones, resulting in realistic estimates for radar variables from the model data. The direct use of the T-matrix method as the forward operator within DA for ice hydrometeor categories is too expensive and is, therefore, only done for forecast verification; Z is highly relevant in evaluating the skill of hail forecasts because it provides valuable information about storm mode, structure, intensity, and microphysical state.
Because the storms in this study occurred largely over rural areas, there are relatively few surface hail reports available, and the available reports likely do not fully represent the region over which hail actually fell due to population biases typical of severe weather reports (e.g., Wyatt and Witt 1997; Davis and Ladue 2004). To supplement surface reports, radar-derived products such as MESH and hail swaths indicated by a HCA are used as supplementary sources of observed hail information, as was done in S16. MESH uses a weighted vertical integration of Z in regions of Z > 40 dBZ above the freezing level H0 to estimate the maximum hail size:
where E is the weighted Z and W is a temperature-based weighting function described in Witt et al. (1998). Ortega et al. (2009, 2016) document that MESH has a large false alarm ratio, and a wide range of maximum hail sizes can be found within a single MESH value. However, MESH is a commonly used operational product, and in many cases it is the only dataset capable of providing high spatiotemporal resolution estimates of surface hail size, making it an important supplementary data source for local storm reports in less populated regions (Melick et al. 2014). Though polarimetric observations were not available from operational WSR-88D observations in 2010, available polarimetric radar observations from the experimental dual-polarization WSR-88D in Norman, Oklahoma (KOUN), are accessed to determine likely regions of hail using a variant of the HCA of Park et al. (2009) with modifications following S16. KOUN is situated 100–150 km away from the storms, and the lowest radar tilt samples the storms at approximately 1.5–3.0 km AGL. This HCA uses observed Z, Zdr, and ρhv as inputs for a fuzzy logic algorithm to distinguish among dominant observed hydrometeor types. Observed Vr is used in addition to these fields to distinguish ground clutter.
Ensemble forecasts produced using neighborhood ensemble probability (NEP; Schwartz et al. 2010, 2017) and probability-matched ensemble means (Ebert 2001) are verified in this study. NEP analyzes the fractional coverage of event occurrence over a given neighborhood. This method is used because, at the convective scales, models often perform poorly in terms of point-to-point verification (e.g., threat scores) as a result of common displacement errors, particularly for highly localized phenomena such as hail. NEP mitigates this issue by allowing for small displacement errors, producing a probabilistic forecast by verifying each point against a nearby neighborhood of points, determined using a radius of influence (2.5 km in this study, following S16). The probability-matched mean method produces a single best-guess forecast from the ensemble using the spatial distribution of the ensemble mean, but maintaining the frequency distribution of the full ensemble; this preserves extreme values that would otherwise be smoothed away when computing the ensemble mean.
Radar-derived MESH is used for objective verification in this study, as it is the only data source available with the spatial and temporal resolution necessary for surface hail size verification. The fractions skill score (FSS; Roberts and Lean 2008) is employed in this study to compare the fractional coverage of predicted surface hail to the fractional coverage of radar-derived MESH. This verification technique is particularly useful because it provides the scales at which the model predicts the spatial extent of the surface hail with acceptable skill. For this study a hail size threshold of 5 mm is used for verification; larger hail sizes (e.g., severe hail, significant severe hail) will not be verified because the sample size of these events is too small.
a. Ensemble prediction of general reflectivity structure
The forecast NEP for P(Z > 25 dBZ) at 2245 UTC is shown in Fig. 3 for the ensemble forecasts started at 2145, 2200, and 2215 UTC using the MY2 and LIN MP schemes, respectively. In each experiment, the storms in the ensemble forecasts move eastward more quickly than the observed storms. Storm motion bias is frequently documented in convective modeling studies (e.g., S16; VandenBerg et al. 2014; Yussouf et al. 2016), and is likely attributed to model error. As expected, ensembles with the longest lead times [i.e., those launched the earliest; MY2_2145T0 (Fig. 3a) and LIN_2145T0 (Fig. 3d)] contain larger regions of moderate probability values and fewer areas with probability near 1.0, reflecting the greater variability in storm placement and intensity within these ensembles than in ensembles launched at later times, though there is still strong agreement among the members of these ensembles in predicting the northern part of the storms in the subdomain. The MY2_2145T0, MY2_2200T0, and LIN_2145T0 results (Figs. 3a, 3b, and 3d, respectively) have lowered confidence in storm D; P(Z > 25 dBZ) is below 0.4 for these ensembles. Analysis of individual ensemble members indicates that the LIN scheme generally predicts smaller regions where Z exceeds 25 dBZ than the MY2 scheme. Both LIN_2145T0 (Fig. 3d) and LIN_2200T0 (Fig. 3e) predict the development of a new storm behind the primary line, displaced from a secondary line of observed storms farther to the west. The MY2 ensembles (Figs. 3a–c) and LIN_2215T0 (Fig. 3f) also predict this secondary line; however, P(Z > 25 dBZ) is less than 0.2. We can see that LIN_2215T0 (Fig. 3f) captures the spatial extent of the storms better than MY2_2215T0 (Fig. 3c). At longer lead times, the MY2 and LIN ensemble forecasts capture the spatial extent of the storms with similar levels of skill.
b. Ensemble prediction of MESH
Ensemble hail forecasts valid from 2215 to 2315 UTC are compared and verified during this study. During this time, the WSR-88D MESH swath (Fig. 4) for storm A remains nonsevere (MESH < 25 mm) and is oriented more to the northeast than the other swaths (which are oriented more to the east-northeast). Storm B, located just south of storm A, also produces a swath of nonsevere MESH before dissipating (Fig. 4). Both storms C and D produce long swaths of MESH that contain indications of severe hail (maximum predicted MESH exceeding 25 mm).
The MY2 and LIN ensembles predict MESH swaths (Fig. 5) near the observed MESH swaths (Fig. 4), but greatly overpredict the spatial extent of the MESH field. As noted earlier, the forecasted storms in all ensembles move more quickly than the observed storms (maximum observed storm motion 80–90 km h−1), causing the MESH swaths to extend too far to the north and east. Ensembles using both the MY2 and LIN schemes perform poorly in predicting MESH for storm D; the storm intensifies to maturity only after the end of the DA period, reducing the ability of the models to accurately forecast this storm. Only LIN_2200T0 predicts a MESH swath associated with storm D (Fig. 5e). Contrary to the WSR-88D MESH (Fig. 4), the ensemble forecasts predict storm B to produce the highest-intensity MESH, until after approximately 2250 UTC, when the ensembles with shorter lead times [MY2_2200T0 (Fig. 5b), MY2_2215T0 (Fig. 5c), LIN_2200T0 (Fig. 5e), and LIN_2215T0 (Fig. 5f)] predict that storm C intensifies. Because of the initially large hail in terms of simulated MESH for storm B predicted by both the MY2 (Figs. 5a–c) and LIN (Figs. 5d–f) ensembles, it is believed that the predicted storm B is similar to the observed storm C (Fig. 4) except that it is geographically displaced.
While LIN_2200T0 produces a more accurate forecast of MESH for storm D, the LIN ensembles consistently overpredict the intensity of MESH produced by all of the storms. In all three LIN ensembles (Figs. 5d–f), MESH exceeds 75 mm for both storms B and C, while the observed MESH swaths produced by these storms never exceed 50 mm. Maximum MESH values produced by the MY2 ensembles (Figs. 5a–c) are more similar to those in the observations. All three MY2 ensembles generally predict MESH values of less than 25 mm (nonsevere) for storms B and C, with only small regions of MESH exceeding 25 mm. The LIN ensembles also fail to capture the weakening of storm A later during the 2215–2315 UTC forecast period. In the observations, the left-splitting storm A stops producing MESH > 5 mm by 2305 UTC (see Fig. 4), while the LIN ensembles predict a swath of MESH for storm A continuing through 2315 UTC (Figs. 5d–f). In LIN_2215T0, storm A spuriously intensifies late in the forecast period, producing large MESH values near the eastern end of the 1-h forecast MESH swath (Fig. 5f). Though the MY2 ensembles (Figs. 5a–c) predict longer MESH swaths for storm A than the observations (Fig. 4), the intensity of the storm is better represented in these ensembles than in the LIN ensembles. Both MY2_2200T0 (Fig. 5b) and MY2_2215T0 (Fig. 5c) predict weakening in the MESH field of storm A, with maximum MESH values decreasing to only 5 mm by 2315 UTC.
Both MP schemes show limited skill in predicting the spatial coverage of MESH > 5 mm at scales beneficial to explicit short-term hail prediction. The ensemble average FSS (Roberts and Lean 2008) indicates that the LIN scheme (Figs. 6d,e) predicts the spatial coverage of MESH > 5 mm with more skill than the MY2 scheme (Figs. 6a–c). Both LIN_2215T0 (Fig. 6f) and LIN_2200T0 (Fig. 6e) have several ensemble members that predict the spatial coverage of MESH > 5 mm with skill at scales of less than 10 km. No ensemble members from MY2_2215T0 (Fig. 6c) and MY2_2200T0 (Fig. 6b) predict the spatial coverage of MESH > 5 mm with skill at scales less than 10 km. It is important to note that FSS remains unaffected by the LIN scheme’s overprediction bias in maximum hail size (Figs. 5e,f) because a low threshold of nonsevere hail (MESH > 5 mm) is analyzed. MESH values associated with severe hail (MESH > 25 mm) are too localized to be objectively analyzed for this case study. Reduced MESH forecast skill is likely due to an overprediction bias in the spatial area of MESH (Fig. 5); this is because the predicted storms move too quickly in both the MY2 and LIN schemes.
To better understand the differences in the predicted MESH swaths from the LIN and MY2 ensembles, we will now consider explicit predictions of hail from the different experiments. The total mass of frozen hydrometeor species (hail and graupel) is presented as a function of height for ensemble members 10 and 30 of MY2_2215T0 and LIN_2215T0 in Fig. 7. These two ensemble members are selected because member 10 predicts large quantities of hail, particularly in MY2_2215T0, while member 30 predicts more modest quantities of hail. The hydrometeor mass content for hydrometeor x (mx) is defined as
where qx is the mixing ratio of hydrometeor x and is the air density. The LIN scheme exhibits less variability between members in mh and, generally, predicts larger mh values above the freezing layer than the MY2 scheme for both members 10 and 30 (Figs. 7a,c). The MY2 scheme likely predicts smaller mh values aloft due in part to the fact that the scheme has two separate rimed ice categories (graupel and hail), while LIN represents all rimed ice in the form of dense ice precipitation as hail. The MY2 scheme predicts a substantial fraction of dense frozen precipitation to be graupel, with the mass of graupel mg exceeding mh in both members (Figs. 7a,c). The combined mass of frozen hydrometeors is overall much larger in the MY2 members than in the LIN members (Fig. 7), and in terms of vertical distribution, does not extend as far below the melting layer in MY2 as in the LIN members. Consistent with most other MY2 members, MY2 member 30 predicts almost no hail reaching the surface. This result is consistent with the findings of S16, who reported that very little hail reached the surface in their MY2 forecasts of supercell thunderstorms, even where hail was observed, leading to substantial underprediction of the near-surface hail mass content. In the MY2 members, graupel, which is smaller in diameter than hail in the simulations, contributes much less to the total Z than does hail, despite being the dominant hydrometeor species above the freezing layer in terms of mass (Figs. 7b,d). The component of Z from graupel (Zg) is approximately 30 dBZ over much of the vertical extent of the storm, while Zh exceeds 50 dBZ above the freezing layer. The MY2 scheme (Figs. 8a,b), which has been previously reported to predict many smaller hailstones (Johnson et al. 2016), predicts smaller hailstones aloft than does the LIN scheme (Figs. 8c,d). Small hailstones are a consequence of MY2 growth mechanisms, frozen raindrops significantly increase Nth in the freezing layer but do not proportionally modify mh; this decreases the mean mass diameter of hail aloft. The LIN scheme assumes a constant intercept parameter, and the mean mass diameter increases aloft as mh increases. A more detailed microphysical budget analysis of the two schemes will be presented in a later section. Reflectivity Z, which is proportional to the sixth moment of the PSD, is most sensitive to larger hailstones; consequently, LIN predicts larger Zh aloft than MY2 (Figs. 7b,d). Having an additional low-density rimed ice category (graupel) may also further reduce Z in MY2. MESH is a function of Z above the melting layer and is, thus, strongly sensitive to the distribution and intensity of the reflectivity field within the storm. Overall, the MY2 scheme predicts more realistic Z above the freezing layer than the LIN scheme (see Fig. 5). Differences between the hail PSDs predicted by the two schemes will be further analyzed in section 3c.
c. Explicit prediction of hail
1) Evaluation of hail mass content
We will now compare the predicted hail mass content mh against radar-indicated hail swaths defined using the HCA described earlier in section 2b, following the methods of S16. In both the LIN and MY2 ensembles, storms B and C are predicted to produce the largest mass of hail (Fig. 9). The LIN ensembles (Figs. 9d–f) predict mh exceeding 5.0 g m−3 in these storms; in contrast, MY2_2200T0 (Fig. 9b) and MY2_2215T0 (Fig. 9c) predict mh of less than 2.0 g m−3 for storms B and C, while MY2_2145T0 (Fig. 9a) predicts even smaller mh values and produces hail only for storm B in many members. The MY2 ensemble members predict mh of less than 0.01 g m−3 for storm A, which is located closest to the KOUN radar site. From the predicted MESH field (Figs. 5a–c), it is evident that the MY2 ensembles predict high Z associated with storm A above the freezing layer, but hail rapidly melts in the model below the freezing level (Figs. 7a,c), leading to very low mh for storm A at the height of the 0.5° KOUN radar tilt (approximately 1 km above the surface near storm A) in the MY2 ensembles. Both LIN_2200T0 (Fig. 9e) and LIN_2215T0 (Fig. 9f) predict hail at the height of the 0.5° KOUN radar tilt for storm A. We find that LIN_2145T0 (Fig. 9d), which generally poorly represents storm A, does not predict any hail associated with storm A. The swaths of hail predicted by the LIN ensembles (Figs. 9d–f) are larger in geographic extent than those predicted by the MY2 ensembles (Figs. 9a–c) and generally more intense. All LIN ensembles predict mh exceeding 0.03 g m−3 (the threshold for verification of the mh content used in S16) throughout much of the southern third of the subdomain; this region includes large areas where no surface hail was observed. In contrast, the MY2 scheme predicts more limited hail swaths with fewer false alarms.
An east–west vertical cross section of the mh content through the hail core of storm B at 2245 UTC for members 10, 20, 30, and 40 of MY2_2215T0 and LIN_2215T0 (Fig. 10) reveals that the largest mh values are collocated with the primary storm updraft. The LIN members (Figs. 10e–h) contain a large region of hail approximately 10 km wide within the updraft of storm B, with mh values exceeding 2.0 g m−3. The MY2 members mostly predict narrower, weaker updrafts for storm B, along with correspondingly lower amounts of hail, with mh rarely exceeding 1.0 g m−3 for storm B in MY2 members 20, 30, and 40 (Figs. 10b–d). MY2 member 10 (Fig. 10a), which is initialized with the strongest updraft for storm B of the selected members, is markedly different, exhibiting a robust updraft, with mh exceeding 2.5 g m−3. For all selected members, the MY2 scheme predicts smaller, more localized hail cores, typically less than 5 km wide. Hail in the MY2 members melts more quickly below the freezing layer than the LIN members. Only MY2 member 10 predicts hail (mh > 0.03 g m−3) that reaches the surface within the vertical cross section, while all LIN members predict hail that reaches the surface at this time. This is in agreement with the observations.
2) Evaluation of maximum hail size
The PSD for hail can be used to estimate the maximum hail diameter Dmax at any given point, as suggested by Milbrandt and Yau (2006a). The predicted PSD in the model is based upon the exponential distribution, with the shape parameter being set to 0 in Eq. (1) for this study. As hail diameter increases, the number concentration decreases. For very large diameters, the predicted hail becomes so sparse that it would not be physically observable (e.g., one hailstone per cubic kilometer). In S16, a cutoff value for number concentration of 10−4 m−3 mm−1 (equivalent of one hailstone in a cube with sides approximately 20 m in length) is used to denote Dmax. This choice of cutoff value is similar (though not identical) to that used by Milbrandt and Yau (2006a).
Hail PSDs for LIN and MY2 members 10 and 30 at varying heights in the lower troposphere are compared in Fig. 11, highlighting differences in the predicted hail PSD, particularly near and within the melting layer. The PSDs presented in Fig. 11 are calculated at varying heights, placed horizontally at the location of the mh maximum of storm B at 1000 m above mean sea level for each member. The lowest number concentration plotted in Fig. 11 is 10−4 m−3 mm−1, thus the x intercept of each PSD in Fig. 11 is Dmax.
Within the melting layer, the MY2 scheme predicts PSDs that contain relatively few but large hailstones, as evidenced by the lower slope of the PSDs at 4000 m (Figs. 11a,b). At 5000 m, 500 m above the freezing layer, the MY2 scheme predicts a large number of small hailstones in both member 10 (Fig. 11a) and member 30 (Fig. 11b). Within the melting layer at 4000 m, the sizes of the largest hailstones increase. At this height the MY2 scheme decreases Nth to preserve mean mass diameter due to melting, the scheme also predicts that hail accretes rain and cloud water, causing mh to increase. Accretional growth while hail is simultaneously melting causes the MY2 scheme to predict that the hail mean mass diameter increases in the melting layer; Milbrandt and Yau (2006a) documented similar results. An analysis of the treatment of hail within the MP schemes will be performed later in section 3d. Hailstones in MY2 member 10 (Fig. 11a) grow much larger than in MY2 member 30 (Fig. 11b), consistent with the much more robust storm updraft predicted in member 10. Because N0 is constant in the SM LIN scheme (N0h = 4 × 10−4 m−4), LIN members 10 (Fig. 11c) and 30 (Fig. 11d) exhibit PSDs with many small hailstones, even as hail melts below the freezing level. Because of its fixed N0 value, the LIN scheme decreases in mass through melting larger hailstones and decreasing hail mean mass diameter; this is unrealistic and is a disadvantage of SM schemes. Despite this disadvantage, the LIN scheme does predict a larger Dmax just below the freezing layer (Figs. 11c,d), with less variation between ensemble members than for the MY2 members (Figs. 11a,b).
Vertical cross sections of Dmax through the hail core of storm B in selected MY2 and LIN members are shown in Fig. 12. Because of the constraints on the SM scheme, the Dmax field in the LIN members (Figs. 12e–h) is directly proportional to the mh content (Figs. 10e–h). Large hail cores containing severe hail (Dmax > 25 mm) are predicted in the LIN members; these hail cores are collocated with the storm updraft. The largest Dmax value in each of the LIN members is around 45–50 mm and is consistently located near the freezing level (Figs. 12e–h). Because the MY2 scheme allows the intercept parameter to vary, Dmax is not directly proportional to mh for the MY2 members (Figs. 12a–d). In MY2 members 20, 30, and 40 (Figs. 12b–d), the extent of large hail in the hail core of storm B is less than in the corresponding LIN members, and the maximum Dmax is only slightly lower, at around 35–40 mm, despite the far lower mh values in the hail cores of the MY2 members compared to the LIN members (Figs. 10b–d and 10f–h). Member 10 of MY2_2215T0 (Fig. 12a) shows a marked difference in both the mh and Dmax fields compared to the other MY2_2215T0 members. Values of mh in the hail core of member 10 exceed 2.0 g m−3, which is comparable to LIN_2215T0 member 10, and much higher values of Dmax are present, up to 90 mm; the largest values of Dmax are predicted below the freezing layer (Fig. 12a), which is consistent with Fig. 11a. Values of Dmax indicate that hail remains largely nonsevere above the freezing layer in MY2 member 10. This is because above the freezing level there is an increase in Nth, distributing mh between more, but smaller, hailstones. In the other three selected MY2 ensemble members (Figs. 12b–d), the updrafts are weaker, and almost no hail with Dmax greater than 10 mm reaches the surface. It is important to note that although Dmax > 5 mm near the surface in these cross sections, members predict mh < 0.03 g m−3 at the surface (Fig. 10).
To produce a forecast estimate of maximum hail size at the surface, we consider swaths of maximum Dmax, presented as probability-matched mean values for each ensemble. The correlation between near-surface Dmax (SFC) and MESH is analyzed (Table 1) to better understand the relationship between these two fields. Model-simulated MESH is dependent upon simulated observations above the melting layer and implicitly accounts for melting within the algorithm (Witt et al. 1998); Dmax explicitly accounts for melting calculated within the MP scheme. MESH and Dmax should be highly correlated because both fields represent the maximum hail size at the surface; however, the correlation between the two fields is low (0.256–0.302) for MY2 and even lower (0.174–0.227) for LIN (Table 1). Low correlations between the Dmax and MESH indicate discrepancies in melting treatment. Analyzing the maximum Dmax within the lowest kilometer (1KM) of the atmosphere reduces the impact of near-surface melting, which can be excessive as noted earlier in our discussion of Figs. 10 and 12, as well as prior results (e.g., S16). Reducing the impacts of near-surface melting improves the overall correlation between the model-simulated MESH and Dmax for both the MY2 (0.438–0.461) and LIN (0.481–0.519) schemes (Table 1). Because of the improved correlation between the two fields, this study will analyze swaths of maximum Dmax within the lowest kilometer of the atmosphere (Fig. 13).
Both the MY2 and LIN ensembles produce very little near-surface hail for storms C and D. The MY2 (Figs. 13b,c) and LIN (Figs. 13e,f) ensemble forecasts were launched at 2200 UTC and later predict near-surface hail from storm C. The ensemble forecasts underestimate the spatial extent of hail when compared to the WSR-88D MESH (Fig. 4). Only LIN_2145T0 (Fig. 13d) predicts that storm A does not produce near-surface hail; the other ensemble forecasts (Figs. 13a–c and 13e–f) more closely resemble the WSR-88D MESH field (Fig. 4). Storm B produces the most extensive and largest surface hail in the ensemble forecasts. Initially, MY2_2200T0 (Fig. 13b) and MY2_2215T0 (Fig. 13c) overestimate the maximum hail size in the storm B hail swath, but Dmax quickly decreases to values close to those indicated by the observed MESH (Fig. 4). The initial overestimates of Dmax in the vicinity of storm B can largely be attributed to the addition of large hailstones near and below the freezing layer during EnKF DA. Storm B weakens during the forecast in most ensemble members, consequently producing increasingly smaller hailstones beneath the freezing level. Member 10, which is initialized with a more robust updraft, maintains the intensity of storm B during the forecast and produces large hailstones below the freezing layer during the forecast (Fig. 12a). The corresponding LIN ensembles, LIN_2200T0 (Fig. 13e) and LIN_2215T0 (Fig. 13f), do not initially overpredict Dmax for storm B, remaining relatively consistent with the WSR-88D MESH values for the full 2215–2315 UTC swath. The MY2 and LIN ensembles with the earliest initialization time, MY2_2145T0 (Fig. 13a) and LIN_2145T0 (Fig. 13d), produce similar hail-swath predictions that underestimate the spatial extent of hail near the surface compared to WSR-88D MESH, although the MY2 ensemble at this time better captures the hail swath associated with the left-splitting storm A (Fig. 13a).
While both MP schemes demonstrate the ability to explicitly predict the size of hail near the surface, the ensembles lack skill in representing the fractional coverage of Dmax > 5 mm at scales adequate for operational use (Fig. 14). The ensembles that were initialized the latest, MY2_2215T0 (Fig. 14c) and LIN_2215T0 (Fig. 14f), are capable of representing the fractional coverage of near-surface hail at scales of approximately 45 km. These ensembles assimilated more data than ensembles initialized at 2145 and 2200 UTC, leading to an overall improvement in the skill of the hail forecasts. For MY2_2145T0 (Fig. 14a), MY2_2200T0 (Fig. 14b), LIN_2145T0 (Fig. 14d), and LIN_2200T0 (Fig. 14e), we show limited skill in capturing the fractional coverage of near-surface hail at spatial scales of less than 50 km. It is important to note that radar-derived MESH (Fig. 4) is used to verify Dmax forecasts (Fig. 14); weak correlation between the two fields likely suppresses FSS values. The FSS of the simulated MESH verified against radar-derived MESH (Fig. 6) increases for all ensembles; this is because both fields contain similar biases in estimated surface hail size and spatial extent (Ortega et al. 2009, 2016). Additionally, when Dmax is calculated in the melting layer, the algorithm inherits biases associated with the MP scheme treatment of melting hail that degrade forecast skill (e.g., extreme hail growth in the MY2 scheme). An analysis of the microphysical treatment of hail is continued later in the section.
Though MESH can be used to provide an estimate of maximum hail size at the surface from Z and temperature information above the freezing level, the availability of dual-polarization radar observations provides additional information that can be utilized to verify surface hail size. To this end, we apply a hail size discrimination algorithm (HSDA; Ryzhkov et al. 2013) modified by Ortega et al. (2016) with confidence vectors equal to one (J. Krause 2016, personal communication) to verify model-predicted maximum hail size. The HSDA is run in addition to the HCA in regions where the HCA output identifies rain/hail. It includes additional membership functions for Z, Zdr, and that classify hail into one of three size categories: nonsevere (0–25 mm), severe (25–50 mm), and significant severe (>50 mm). Compared to the WSR-88D MESH field (Fig. 4), the HSDA (Fig. 15) output produces localized hail swaths with a larger maximum hail size. The HSDA output indicates multiple regions with hailstones larger than 50 mm in diameter, while the MESH field hardly contains values exceeding 50 mm. These significant severe swaths tend to be far away from KOUN, where the radar observes high above the ground near the freezing layer. Hailstones near the freezing layer are expected to be larger than those near the surface as they have undergone less melting.
For direct comparison to the HSDA (Fig. 15), we interpolate the predicted Dmax field to the KOUN 0.5° radar tilt (Fig. 16). The LIN_2200T0 (Fig. 16e) and LIN_2215T0 (Fig. 16f) ensembles predict that storm A produces nonsevere hail near the surface, similar to WSR-88D MESH. Hail associated with the left-splitting storm A is underpredicted or absent in LIN_2145T0 (Fig. 16d) and the MY2 ensembles (Figs. 16a–c). Ensembles MY2_2200T0 (Fig. 16b) and MY2_2215T0 (Fig. 16c) predict storm A to produce near-surface hail, but excessive melting limits the size and spatial extent of the hail. The MY2 ensembles (Figs. 16a–c) predict more localized swaths of severe hail than the LIN ensembles (Figs. 16d–f) for storm B at the elevation of the KOUN radar data. Both MY2_2200T0 (Fig. 16b) and MY2_2215T0 (Fig. 16c) predict relatively narrow swaths of hail exceeding 95 mm in diameter, resembling a swath of significant severe hail produced by storm C in the HSDA output (Fig. 15). Localized pockets of significant severe hail predicted by the MY2 ensembles more closely resemble severe hail regions in the HSDA output than the LIN ensembles, which predict wider swaths of severe hail no larger than 50 mm in diameter. The MY2 and LIN ensembles both predict spatially larger swaths of hail greater than 5 mm in diameter than the HSDA output.
At the elevation of the 0.5° KOUN tilt, the LIN ensembles (Figs. 16d–f) produce a larger region of hail in the southern portion of the subdomain than the MY2 ensembles (Figs. 16a–c) and much of this hail is associated with storm D (see Fig. 4). The MY2 ensembles produce limited nonsevere hail in this region, with Dmax not exceeding 20 mm in diameter (Figs. 16a–c). The LIN ensembles predict hail as large as 35 mm in diameter in this region, a result more in agreement with the HSDA output (Fig. 15) and WSR-88D MESH (Fig. 4), although much of the hail predicted by the LIN ensembles to the south of the observed storm D is spurious. The Dmax field interpolated to the elevation of the 0.5° KOUN tilt (Fig. 16) more closely resembles the WSR-88D MESH field (Fig. 4) in both spatial coverage and size of hail than the forecast MESH field (Fig. 5) for both the MY2 and LIN ensembles.
Some of the ensembles predict storms (particularly storm D) that do not produce hail, although WSR-88D-derived MESH (Fig. 4) indicates otherwise. It is important to note that each of the observed storms, including storm D, is present in most ensembles. A swath of NEP of P(Z > 35 dBZ) for the 2215–2315 UTC forecast period (Fig. 17) reveals that all ensembles, with the exception of LIN_2145T0 (Fig. 17d), capture the storm-splitting event that produces storm A, and storms B, C, and D. Given that the storms are well captured by most of the ensembles in terms of Z but not in terms of near-surface hail (i.e., Dmax), further investigation of the MP scheme’s treatment of hail beneath the freezing layer is warranted.
d. Evaluation of microphysical budgets
To identify the cause of biases observed within Dmax and the MESH forecasts, microphysical hail source and sink terms are examined using a microphysical budget analysis. These terms are shown (Fig. 18) in the form of accumulated totals within the updrafts (w > 5.0 m s−1) and downdrafts (w < −2.5 m s−1) of supercell thunderstorms in south-central Oklahoma at 2245 UTC. Terms are summed along a horizontal plane, and the resulting summed values are plotted as a function of model grid height to illustrate the model’s treatment of hail both above and below the 0°C isotherm.
The magnitude of mh within the storm is highly dependent upon the choice of MP scheme; the LIN scheme predicts larger mh values than the MY2 scheme (e.g., Figs. 9 and 10). Most mh in the LIN scheme is generated in the storm updrafts via accretion of cloud water, accretion of snow, and three-component accretion of rain (Fig. 18e); the accretion of cloud water and snow within the LIN scheme results in the generation of more mh than do all of the MY2 mh source terms combined (Fig. 18a). The LIN scheme predicts hail that accretes relatively little rain; this is because the accretion of rain and shedding water are treated as a single term during the wet growth of hail. Because of the large accretion of snow and cloud water terms during wet growth, the LIN scheme predicts hailstones that shed more water than accrete rain. Both the MY2 (Fig. 18b) and LIN (Fig. 18f) schemes predict mh growth is limited within storm downdrafts; these regions are typically unsaturated with respect to water and do not support the accretion of cloud water. In storm downdrafts modest mh growth is attributed to hail colliding with hydrometeors such as rain in the MY2 scheme (Fig. 18b) and snow in the LIN scheme (Fig. 18f).
Large accretional mh growth terms cause the LIN scheme to predict large mh values aloft, confined to storm updrafts (Figs. 10e–h). Because the LIN scheme is an SM scheme, the hail mean mass diameter is largest aloft (Figs. 8c,d), where mh is largest. MESH, which is dependent upon Z above the freezing layer, increases as a result of the large hail predicted to be aloft by the LIN scheme (Figs. 5d–f). Within the storm updraft the mh source and sink terms for the MY2 scheme (Fig. 18a) are smaller in magnitude than those in the LIN scheme (Fig. 18e); the three most significant mass growth terms in the scheme are the accretion of cloud water, the accretion of rain, and three-component accretion of rain. Of particular note is the continued accretion of rainwater by hail throughout the melting layer in both the storm updraft and downdraft regions; this is an unphysical behavior of the MY2 scheme (Figs. 18a,b). Above the melting layer in the storm updraft region, Nth quickly increases in the MY2 forecasts (Fig. 18c). This is due to three-component accretion of rain and the conversion of graupel to hail. A large increase in the Nth above the freezing layer causes the mean mass diameter of hail to decrease aloft within storm updrafts in the MY2 simulations (e.g., Fig. 8a); this pattern of behavior is the opposite of what occurs within the LIN scheme. The tendency of the MY2 scheme to predict small hail aloft is noted in Johnson et al. (2016); these small hailstones produce lower Z, resulting in lower predicted MESH values (Figs. 5a–c).
Vertical cross sections of hail mean mass diameter (Figs. 8a,b), Dmax (Fig. 12), and hail PSDs (Fig. 11) indicate that the MY2 scheme predicts rapid growth in hail size beneath the 0°C isotherm. Analysis of the MY2 total contribution to mh within the storm updraft (Fig. 19a) indicates that the scheme predicts a net growth in mh beneath the wet-bulb freezing level. Within this layer hail accretes cloud water and rain; however. almost no water is shed (Figs. 18a,b). The inability of hail to efficiently shed water in the melting layer significantly increases hail size in the MY2 simulations. Instead of being shed, accreted water freezes to hailstones and increases mh, even when hailstones undergo melting and Nth decreases (Fig. 19b). Increasing mh and decreasing Nth causes the mean mass diameter of hail to inflate in the upper melting layer (Figs. 8a,b). In extreme examples, Nth decreases quickly and mh increases, resulting in predicted Dmax > 95 mm (Fig. 12a). Within thunderstorm downdrafts hail does not increase in size as rapidly, and the total mh and Nth tendencies are much smaller and remain neutral or decreasing throughout the vertical extent of the downdraft (Fig. 19b).
PSDs throughout the hail core predicted by the LIN scheme (Figs. 11c,d) indicate that hail can grow modestly in size near the upper melting layer, primarily as a result of three-component accretion of rain, accretion of cloud water, and accretion of snow near the 0°C isotherm (Fig. 18e). Similar to the MY2 scheme, the LIN scheme predicts the hail to shed relatively little water beneath the 0°C isotherm; however, hail growth in the melting layer is limited. This is because the scheme does not allow accretional growth of hail beneath the wet-bulb freezing level (Figs. 18e,f), causing the melting term to be dominant beneath the 0°C isotherm (Figs. 19e,f). The LIN scheme also assumes a constant intercept parameter; thus, hail diameter is a monotonic function of mh.
4. Summary and discussion
In this study, we evaluate the ability of SM and DM MP schemes in an EnKF-initialized storm-scale ensemble forecasting system to explicitly predict hail in 0–90-min forecasts for both left- and right-moving supercells in south-central Oklahoma on 10 May 2010. The experiments assimilate Z and Vr from both operational WSR-88D and experimental CASA X-band radars, as well as incorporating surface observations using EnKF and 40 ensemble members. Ensemble forecasts are then launched at 15-min intervals starting approximately 30 min prior to the first indication of hail by the WSR-88D data in terms of the MESH. The DA and forecast experiments are performed using one of two MP schemes: the SM LIN and DM MY2 schemes. To supplement the sparse surface hail reports recorded on 10 May 2010, hail products derived from WSR-88D observations are used to verify hail forecasts (as in S16); these products include MESH and the output of a HCA. Ensemble forecasts are also compared to the output of a HSDA run on WSR-88D observations.
The forecast ensembles exhibit marginal to moderate skill in predicting Z produced by the supercell thunderstorms over southern and central Oklahoma on 10 May 2010. Forecasts were poorer for the ensembles that were launched earliest (the MY2 and LIN ensembles launched at 2145 UTC); these ensembles only predict the single strongest storm. Ensembles launched later (at 2200 and 2215 UTC) predict most of the hail-producing storms, though they still exhibit limited ability to predict storms that matured after the end of the DA period. The storms in all ensemble forecasts move to the north and east more quickly than do the observed storms. Similar biases in storm motion have been noted in prior studies, including S16. The previous literature suggests storm motion biases are likely attributable to model error (Yussouf et al. 2016). This study provides no additional information for why simulated storms move too quickly.
Both MP schemes predict the geographic extent and size of hail with marginal skill for 0–75-min time scales for forecasts launched when most of the storms of interest are mature or close to mature (i.e., at 2200 or 2215 UTC). The geographic extent of hail in the ensemble forecasts resembles that of the observations, but predicted hail size often differs from radar-derived hail sizes. Ensembles launched earlier, when the storms of interest were still developing (LIN_2145T0 and MY2_2145T0), exhibit low skill in predicting hail; these ensembles underestimate the intensity of the storms throughout the duration of the experiment. Because the forecast storms consistently moved faster than the observed storms, all experiments exhibit hail swaths extending farther to the east than in surface observations or radar-indicated hail fields.
The LIN scheme predicts spatially larger swaths of hail than the MY2 scheme as a result of the microphysical treatment of hail. The LIN ensembles often predict mh values nearly an order of magnitude larger than those predicted in corresponding MY2 ensembles because of the accretion of significant quantities of cloud water and snow by hail within the LIN scheme. Because SM MP schemes assume a constant intercept parameter, both Dmax and the hail mean mass diameter are proportional to mh in the LIN scheme. Since mh is largest above the melting layer in the LIN forecasts, the largest hail sizes are similarly located above the melting layer. We note that unlike the MY2 scheme, the LIN scheme does not have a separate graupel category. When mg and mh are combined in the MY2 forecasts, the combined total exceeds mh predicted in the LIN forecasts. Because the MY2 scheme predicts Nth to modify the intercept parameter, both Dmax and the hail mean mass diameter can be increased without modifying mh in the MY2 forecasts. This allows the MY2 scheme to predict large quantities of small hailstones aloft and fewer, larger hailstones near the surface. Occasionally, the MY2 scheme predicts unrealistically large near-surface hail sizes. This behavior can be attributed to the scheme allowing hail mass to increase via the accretion of cloud and rainwater in the melting level (where Nth simultaneously decreases because of melting and no accreted water is shed). Overall, in the ensemble forecasts produced in this study, the LIN scheme predicts wide swaths of severe hail (25 < Dmax < 50 mm), while the MY2 scheme predicts localized swaths of severe hail, including some significant severe hail (Dmax > 50 mm).
The WSR-88D-derived MESH is used operationally to assess the potential size of hail at the surface based on the observed Z and temperature aloft. When the MESH algorithm is applied to the ensemble forecasts, the LIN ensembles generally overpredict MESH both in terms of intensity and spatial extent. The LIN ensembles predict significant severe hail (MESH > 50 mm) in several storms, while MESH derived from WSR-88D observations does not exceed 50 mm. The widespread intense MESH in the LIN ensembles can be attributed in part to the larger hailstones predicted within the storms, producing high values of simulated Z (Z > 50 dBZ) and, thus, very large values of MESH. In contrast, the MY2 scheme favors smaller hail, which lowers the simulated Z. Including separate high- and low-density rimed ice categories in the MY2 scheme may additionally lower Z. Overall, the MY2-predicted MESH more closely resembles WSR-88D MESH than the LIN-predicted MESH. Accurate MP representation of different hydrometeor categories is an ongoing area of research; future work focusing on the characterization of hail and graupel has the potential to mitigate errors introduced into the ensemble by the MP scheme, allowing for more accurate prediction of the size and distribution of hydrometeors within storms.
Compared to proxy surface observations, including MESH derived from WSR-88D observations, and the output of a HSDA in close proximity to the radar performed on the lowest radar tilt, all ensembles generally underpredict the explicit (i.e., Dmax) size and spatial coverage of hail near the surface. In the LIN ensembles, the default value of hail N0 of the LIN scheme results in incorrect preferential melting of the largest hailstones, limiting the size of a hailstones near the surface. In contrast, in the MY2 forecasts, the scheme decreases the Nth to preserve the mean mass diameter during melting. This is a more physical approach. In some instances accretional growth within the melting layer significantly increases the maximum hail size within the MY2 simulations. Additionally the MY2 scheme predicts that hail melts too quickly, resulting in almost no hail reaching the surface. Also, even when temperatures exceed 0°C, the MY2 scheme predicts hail to grow in diameter beneath the freezing layer as a result of the accretion of rain and cloud water (wet growth). This process is most notable in storms with intense updrafts. Because there are few in situ observations of hail near the freezing level, verification of such processes is difficult; surface-based hail reports and radar observations offer only limited information on the melting of hail aloft. Currently, WSR-88D MESH is the best method for verifying predicted surface hail despite it being mainly dependent on reflectivity observations above the freezing layer. With the polarimetric upgrade of the operational WSR-88Ds, HCA and HSDA algorithms based on low-level polarimetric data have the potential to become preferred over MESH for hail verification. For this case study, surface hail reports are limited because most of the storms occurred in a sparsely populated rural portion of southern Oklahoma; future work will investigate hail-producing storms occurring over urban areas for which more widespread surface observations are available.
One potential way of further improving hail prediction is through the use of a three-moment MP scheme, such as that of Milbrandt and Yau (2006b). Introduction of a third moment allows the scheme to predict the shape parameter of the PSD. This may help to better represent hydrometeor sedimentation and mitigate the melting bias noted in this study, potentially allowing more realistic quantities of hail to reach the surface. Other recently developed schemes, such as those of Mansell et al. (2010) and Milbrandt and Morrison (2013), attempt to better represent the variety of rimed ice particles present in a storm. These MP parameterizations show promise in improving hail prediction. By explicitly predicting bulk density of graupel and/or hail, these schemes allow a greater range of possible particle characteristics. In another recently developed method, the predicted particle properties (P3) scheme of Morrison and Milbrandt (2015) and Morrison et al. (2015), the rimed ice category predicts the rimed ice number concentration, volume, mass, and mixing ratio, allowing for four degrees of freedom for describing rimed ice particles. The use of these new MP schemes may further improve short-term hail prediction.
This study finds that 0–90-min hail predictions are unreliable near the surface, regardless of the complexity of the MP scheme. The skill levels of surface hail size forecasts are intrinsically tied to the model’s ability to correctly represent microphysical processes of hail formation, growth, and decay; such forecasts will thus likely continue to show limited skill unless the treatment of hail, particularly during melting processes, is improved within the model. The forecast fields that perform best in this study are those not impacted by the effects of near-surface hail melting: mh and Dmax at the vertical location of the 0.5° radar tilt, which is relatively far from the radar site, as well as MESH. Thus, this study highlights the importance of improving the performance of MP schemes with regard to the treatment of the melting of hail. A more rigorous examination of the impact of using different rimed ice categorizations on hail prediction in future studies could help clarify the benefits and impacts of more recently developed MP schemes that allow for a spectrum of rimed ice characteristics, such as the P3 scheme. Further research on these problems is strongly desirable as this study and other recent studies have shown the potential of explicit hail prediction.
This works was primarily supported by NSF Grant AGS-1261776 as part of the Severe Hail, Analysis, Representation, and Prediction (SHARP) project. Computing was performed primarily on the XSEDE Stampede supercomputer at the University of Texas Advanced Computing Center (TACC). The authors thank Kevin Thomas for support in obtaining data; David Gagne, Cameron Homeyer, and Amy McGovern, who provided useful feedback on the project analysis. We also thank Ted Mansell and two anonymous reviewers, whose constructive comments helped to improve the manuscript.