Despite recent advances in storm-scale ensemble NWP, short-term (0–90 min) explicit forecasts of severe hail remain a major challenge as a result of the fast evolution and short time scales of hail-producing convective storms and the substantial uncertainty associated with the microphysical representation of hail. In this study, 0–90-min ensemble hail forecasts for the supercell storms of 20 May 2013 over central Oklahoma are examined and verified, with the goals of 1) evaluating ensemble forecast performance, 2) comparing the advantages and limitations of different forecast fields potentially suitable for the prediction of hail and severe hail in a Warn-on-Forecast setting, and 3) evaluating the use of dual-polarization radar observations for hail forecast validation. To address the challenges of hail prediction and to produce skillful forecasts, the ensemble uses a two-moment microphysics scheme that explicitly predicts a hail-like rimed-ice category and is run with a grid spacing of 500 m. Radar reflectivity factor and radial velocity, along with surface observations, are assimilated every 5 min for 1 h as the storms were developing to maturity, followed by a 90-min ensemble forecast. Several methods of hail prediction and hail forecast verification are then examined, including the prediction of the maximum hail size compared to Storm Prediction Center (SPC) and Meteorological Phenomena Identification Near the Ground (mPING) hail observations, and verification of model data against single- and dual-polarization radar-derived fields including hydrometeor classification algorithm (HCA) output and the maximum estimated size of hail (MESH). The 0–90-min ensemble hail predictions are found to be marginally to moderately skillful depending on the verification method used.
Severe hail is a major weather hazard that can cause injury and property damage. When severe hail strikes an urban area, the potential for injury is increased, and damage from a single event can exceed $1 billion. For example, the “Mayfest” hailstorm that struck Fort Worth, Texas, on 5 May 1995 resulted in 109 hail-related injuries and caused over $2 billion in damage (Edwards and Thompson 1998). It is estimated that hail causes over $1 billion in crop damage and $1 billion in property damage annually (Jewell and Brimelow 2009). Despite these impacts, direct prediction of hail using NWP ensembles has received relatively little attention compared to the prediction of other thunderstorm hazards (e.g., tornadoes, severe winds, and heavy rainfall).
Forecasting severe hail is a challenging problem for numerical weather prediction (NWP). Explicit prediction of hail requires the NWP model to predict the development, motion, and intensity of convective storms accurately; furthermore, hail in the model is closely tied to microphysical parameterization, which is a major source of model uncertainty and error at the convective scale. The predictability of hail is also limited by the rapid development and evolution of the convective storms that produce it; forecast errors grow quickly at the convective scale, necessitating an accurate estimation of the atmospheric state and information on the level of uncertainty in this estimate. Ensemble forecasting is needed to provide forecast uncertainty information.
Ensemble Kalman filter (EnKF; Evensen 1994, 2003) data assimilation (DA) techniques have been widely applied in recent years to produce skillful NWP analyses and forecasts ranging from global to convective scales (Houtekamer and Mitchell 1998; Hamill and Snyder 2000; Anderson 2001; Whitaker and Hamill 2002; Snyder and Zhang 2003; Dowell et al. 2004; Zhang et al. 2004; Tong and Xue 2005; Dirren et al. 2007; Tong and Xue 2008a; Snook et al. 2011; Dawson et al. 2012). EnKF methods employ an ensemble of forecasts to sample the error covariances of the background state; the resulting flow-dependent covariances and cross covariances allow state variables not directly observed to be retrieved during DA (e.g., Tong and Xue 2008a)—a feature that is particularly important at the storm scale, where radar data are vital, but observe only a very limited number of variables. Retrieval of wind, temperature, and microphysical fields can be achieved through EnKF DA of radar data (e.g., Dowell et al. 2004; Tong and Xue 2005; Tong 2006; Snook et al. 2011), and the quality of the resulting analyses can be further improved when conventional observations are also assimilated (Snook et al. 2015). Maximizing the accuracy of the initial state in this way is vital for subkilometer-scale NWP applications, such as hail prediction. EnKF-based ensembles have been shown to produce more skillful probabilistic forecasts at the global scale in operational and quasi-operational settings (e.g., Houtekamer et al. 2005; Hamill and Whitaker 2011). Quasi-operational forecast ensembles incorporating cycled EnKF DA are also being applied experimentally for real-time storm-scale NWP as part of the annual Center for Analysis and Prediction of Storms (CAPS) spring experiment (e.g., Xue et al. 2008; CAPS 2015) toward the goals of the “Warn on Forecast” vision (Stensrud et al. 2009).
Forecast verification is another challenging aspect of hail prediction, in large part because of the lack of high quality, high-resolution hail observation datasets. Most currently available in situ hail observation datasets, including storm reports collected and archived by the Storm Prediction Center (SPC) and crowdsourced reports submitted and archived via the National Severe Storms Laboratory (NSSL) Meteorological Phenomena Identification Near the Ground (mPING) smartphone application (Elmore et al. 2014), rely on reports from members of the public and, thus, may suffer from substantial biases. These biases include substantial underreporting of hail in sparsely populated regions (Cintineo et al. 2012), as well as a bias in reported hail size toward the sizes of common circular or spherical objects (e.g., dime-, quarter-, and baseball-sized hail), because such objects are often used as references in hail reports submitted by the public (Schaefer et al. 2004).
A cost-efficient method of addressing the limitations of observational hail data is the use of radar observations, particularly from dual-polarization radars, as a supplementary source of hail information. Radar data are already used operationally to estimate hail size and coverage via the maximum estimated size of hail (MESH) algorithm (Witt et al. 1998). The MESH algorithm uses a weighted integration of the reflectivity factor exceeding 40 dBZ above the melting layer to estimate the maximum size of hail occurring at the surface. While the application of MESH can provide high-resolution gridded estimates of hail coverage, biases and errors are possible as a result of the fact that MESH relies entirely upon observations in the upper levels of the atmosphere to estimate hail size at the surface. Less indirect observations of near-surface hail can be obtained using dual-polarization radar observations.
Between 2011 and 2013, the WSR-88D S-band radar network (Crum et al. 1993) used operationally by the NWS was upgraded with the capability to collect dual-polarization radar observations. Dual-polarization radars observe the atmosphere using both horizontally and vertically polarized radar beams. Observed fields obtained from these data, such as differential reflectivity Zdr, specific differential phase Kdp, and copolar correlation coefficient ρhv, allow us to infer storm properties such as the types of hydrometeors present, their relative sizes, and the presence of nonmeteorological scatterers such as insects or debris through the identification of polarimetric signatures (Kumjian and Ryzhkov 2008). Using dual-polarization observations, regions of hail can be identified either by manually examining these polarimetric signatures, or by applying an automated hydrometeor classification algorithm (HCA). In this way, dual-polarization radar observations can be used as a high quality supplementary data source to provide estimates of the geographic distribution of hail. HCAs (e.g., Vivekanandan et al. 1999; Heinselman and Ryzhkov 2006; Park et al. 2009; Lim et al. 2013; Bechini and Chandrasekar 2015) have been successfully applied in research and operational settings for the detection of hail, and have provided a basis for hail size discrimination using polarimetric radar observations (e.g., Ryzhkov et al. 2013).
In this study, we explore the ability of a storm-scale ensemble using subkilometer horizontal grid spacing to produce explicit 0–90-min ensemble hail forecasts directed toward model-based Warn-on-Forecast prediction of hail and severe hail. The goals of this study include 1) evaluating the performance of the ensemble in terms of hail forecast probability; 2) comparing the relative advantages, limitations, and biases of different hail-related model forecast fields suitable for the prediction of hail and severe hail in a Warn-on-Forecast setting; and 3) evaluating the use of dual-polarization radar observations for hail forecast validation. To accomplish these goals, the Advanced Regional Prediction System (ARPS; Xue et al. 2000, 2001) atmospheric model is used together with its ensemble Kalman filter DA system (Tong and Xue 2005; Xue et al. 2006; Tong and Xue 2008b) to assimilate WSR-88D radar reflectivity factor and radial velocity data and surface observations for the supercell storms that occurred over central Oklahoma during the afternoon of 20 May 2013. The ensemble analyses produced are used to initialize ensemble forecasts of the supercell storms. These forecasts are verified against WSR-88D observations, and their skill in predicting hail is assessed using several different methods.
Probabilistic forecasts of maximum hail size are produced explicitly from the number concentration and mass mixing ratio predicted by the model microphysics, as well as by applying the MESH algorithm used operationally by the National Weather Service (Witt et al. 1998) to simulated radar observations from the ensemble forecasts; these forecasts are verified against surface hail observations and MESH calculated from WSR-88D observations. The hail mass field within the ensemble is also verified against observed dual-polarization data via the application of a fuzzy-logic-based HCA (Park et al. 2009) to identify hail swaths in the observed data. Where possible, forecast skill is objectively assessed using metrics such as equitable threat score (ETS) (Schaefer 1990), the area under the relative operating characteristic curve (AUC), and reliability diagrams.
The remainder of this paper is organized as follows. In section 2, we provide a summary of the 20 May 2013 supercell case, introduce the ensemble DA and forecast systems used, and describe the experiments performed and the verification techniques used. Section 3 discusses the results of the ensemble forecasts, as well as the results and limitations of the hail forecast verification techniques examined. In section 4, overall hail prediction results are summarized, and the relative merit of the hail prediction methods and implications for future hail prediction and Warn-on-Forecast efforts are discussed.
2. Data and methods
On the afternoon of 20 May 2013, multiple supercell storms formed and rapidly intensified along a dryline over central and southwestern Oklahoma between 1830 and 1930 UTC. These storms moved to the northeast, passing through central Oklahoma during the subsequent hours, producing numerous reports of severe hail and several tornadoes, including the [enhanced Fujita (EF) scale] EF5 Newcastle–Moore tornado (Burgess et al. 2014). For more a detailed discussion of the storms of 20 May 2013, we refer the reader to Zhang et al. (2015). Ensemble analyses and forecasts are produced using the ARPS and its EnKF DA system (Xue et al. 2006; Tong and Xue 2008a).
The ensemble forecasts are performed using a horizontal grid spacing of 500 m on a grid consisting of 603 × 653 × 63 grid points that covers much of Oklahoma and portions of far northern Texas (Fig. 1). The model grid is stretched in the vertical, with a minimum vertical spacing of 50 m at the surface and an average vertical spacing of 425 m. Our EnKF DA and ensemble forecasting uses 40 members and is initialized at 1800 UTC on 20 May 2013 from the 40-member CAPS real-time storm-scale ensemble forecasts (SSEFs; Kong et al. 2014) based on version 3.4.1 of the ARW (Skamarock et al. 2008) model at a horizontal grid spacing of 4 km. The SSEFs also provide lateral boundary conditions for the ensemble on our 500-m grid. The ARPS model settings largely follow Snook et al. (2015): radiation is parameterized via the NASA Goddard Space Flight Center long- and shortwave radiation parameterization, a two-layer soil model with surface fluxes parameterized using predicted surface temperature and water content is applied, subgrid turbulence is parameterized using a 1.5-order turbulent kinetic energy (TKE) based scheme, and terrain data are interpolated from a USGS dataset with a resolution of 30 s of arc. References on these schemes can be found in Xue et al. (2001) and Snook et al. (2015). Microphysical processes are parameterized using the two-moment microphysics scheme of Milbrandt and Yau (2005), which we will henceforth refer to as MY2.
From the interpolated 1800 UTC SSEF forecasts, 30-min spinup ensemble forecasts are first performed. Storm-scale perturbations are added to the ensemble at 1800 UTC in order to enhance small-scale perturbations absent in the parent (4-km grid spacing) ensemble and improve the ensemble spread in the 500-m forecast experiment; they are smoothed Gaussian perturbations generated using a recursive filter with horizontal and vertical decorrelation scales of 6 km. The smoothed perturbations are applied to the horizontal wind (u, υ) and potential temperature θ fields with mean standard deviations of 0.5 m s−1 and 0.5 K, respectively. From 1830 to 1930 UTC, available radar and surface observations are assimilated at 5-min intervals using the ARPS EnKF system (Xue et al. 2006; Tong and Xue 2008b), which is based on the ensemble square root filter (EnSRF) algorithm of Whitaker and Hamill (2002). At the end of the assimilation period, 90-min ensemble forecasts are performed from 1930 to 2100 UTC.
During EnKF DA, observation errors for radar data are assumed to be 4.0 m s−1 for the radial velocity and 6.0 dBZ for the radar reflectivity factor (Snook et al. 2013); observation errors for surface observations are assumed to be 1.5 m s−1 for u and υ, 1.5 K for air temperature, 2.0 K for dewpoint, and 2.0 hPa for air pressure. The radar observation operators used follow Jung et al. (2008), and a Gaussian power-gain function (Wood and Brown 1997) is used in the vertical to provide beam pattern weighting in the operators (Xue et al. 2006). Covariance localization with a radius of 3 km in the vertical and horizontal is applied for radar data; for surface observations, localization radii of 6 km in the vertical and 300 km in the horizontal are used. To maintain ensemble spread during DA, relaxation-to-prior-spread covariance inflation (Whitaker and Hamill 2012) is applied, with an inflation coefficient of 0.95. These localization and covariance inflation settings were set based on a set of earlier sensitivity experiments and drew upon experience from our earlier studies. Radar data from five WSR-88D radar sites are assimilated: Oklahoma City, Oklahoma (KTLX); Vance Air Force Base, Oklahoma (KVNX); Frederick, Oklahoma (KFDR); Fort Worth, Texas (KFWS); and Dyess Air Force Base, Texas (KDYX). Together, these WSR-88D sites give radar coverage over the full domain, with the full line of storms observed by at least two radars during the assimilation period (Fig. 1). Surface observations from Oklahoma Mesonet sites within the domain are assimilated.
Hail within the model is closely linked to the microphysics parameterization; for this reason, the ability of the microphysics scheme to accurately capture processes related to hail formation and growth strongly impacts the quality of the resulting hail forecast. All experiments use the MY2 microphysical parameterization scheme, in which the hydrometeor particle size distribution (PSD) is represented by a gamma distribution function:
where N(D) is the number of drops/particles of a given diameter, N0 is the intercept parameter, α is the shape parameter, and λ is the slope parameter. In MY2, hydrometeor mixing ratios and number concentrations are predicted, allowing two unknowns, N0 and λ, to be retrieved, while α is held constant at 0. The variation of hydrometeor PSDs permitted by this two-moment microphysical scheme allows a better representation of microphysical processes, including size sorting and melting, than is possible using a single-moment scheme. This improved representation is important because size sorting and melting are the primary contributors to predicted dual-polarization radar signatures within the model (Dawson et al. 2014). In contrast, single-moment microphysics schemes are generally inadequate for realistically representing dual-polarization signatures (Dawson et al. 2010; Jung et al. 2010; Jung et al. 2012; Putnam et al. 2014). The results, analyses, and forecasts are presented in the next section.
In this section, the ensemble analyses and 0–90-min ensemble forecasts produced by the model are evaluated using several qualitative and quantitative verification methods, with a focus on model representation and prediction of hail. The ensemble analyses are qualitatively evaluated in terms of the predicted storm structure and radar reflectivity factor. Quantitative verification of predicted hail is performed using the methods summarized in the previous section. Maximum hail size predicted explicitly from the hail PSD within the model (using methods described below in section 3a), and via application of the MESH algorithm to radar reflectivity factor and temperature fields predicted by the model, are verified against MESH observations calculated from WSR-88D radar observations. Dual-polarization radar observations are also used for verification: hail swaths predicted by the ensemble, in terms of the hail mass field, are verified qualitatively against dual-polarization radar observations via application of a fuzzy-logic-based HCA (Park et al. 2009).
The final EnKF analysis at the end of the DA period (1930 UTC) is compared to radar observations in terms of the radar reflectivity factor in Figs. 2a and 2b. By this time, the line of supercell thunderstorms was relatively mature, with several cells possessing robust mesocyclones and radar presentation typical of supercell storms. The EnKF analysis accurately captures the developing line of supercell storms; the probability-matched (PM; Ebert 2001) ensemble mean analysis (Fig. 2b) closely matches the observed reflectivity (Fig. 2a). The skillful nature of the EnKF analysis makes it a suitable initial condition for the subsequent ensemble hail prediction forecasts. Some spurious light precipitation is present in the northeastern portion of the forecast domain at 1930 UTC (Fig. 2b), but this spurious precipitation quickly dissipates, and is almost entirely absent from the 15-min ensemble PM mean forecast at 1945 UTC (Fig. 2d), which still shows good agreement with observations (Fig. 2c) at that time.
a. Verification of forecasts of PSD-estimated maximum hail size
The forecast ensemble contains direct information about hail within the model from the output of the microphysical scheme. Using the gamma distribution assumed by the model [defined by (1)], we can directly obtain the hail PSD at each model grid point given the hail mixing ratio qh and the hail total number concentration nh. To translate the model PSD into a meaningful estimate of maximum hail size, we must consider the nature of the hail reports—typically, the size of the largest hailstones is reported. The PSD defined by (1) does not have a corresponding maximum diameter value; instead, it extends to arbitrarily large values of hail diameter (with extremely low probabilities). We address this by defining the maximum observable hail diameter Dmax, estimated from the hail PSD, as the diameter value for which m−3. This definition, which is similar to the definition proposed by Milbrandt and Yau (2006a), is chosen subjectively for delineating between PSD-defined hail concentrations that would be physically observable and those that would not.
A swath of maximum Dmax between 1930 and 2100 UTC, calculated using the PM ensemble mean at 5-min intervals, is presented in Fig. 3. SPC storm data and mPING hail reports received during the forecast period are indicated in Fig. 3 by orange dots. Though radar observations indicated several substantial hail swaths (which will be discussed in greater detail below), SPC and mPING hail reports were rather sparse. Only four reports of hail were received between 1930 and 2100 UTC, all of which were associated with the northernmost supercell, and all within the Oklahoma City metropolitan area (OKC metro). The forecast ensemble predicted two hail swaths in the vicinity of these reports (Fig. 3); the two most northwesterly reports fall directly along and within the track of a small but intense hail swath extending through the northwestern portion of the OKC metro area, while the other two reports fall along the southern edge of a larger swath of small to marginally severe hail crossing the southeastern portion of the OKC metro area and extending several tens of kilometers to the northeast. The forecast PM ensemble mean predicted another swath of hail to the south of the OKC metro area, though no SPC or mPING hail reports were received in the vicinity of this predicted hail swath.
From the ensemble, we can also derive the probability of Dmax exceeding a specified threshold. During severe weather operations, NWS forecasters issue severe thunderstorm warnings for storms likely to be producing hail 1 in. (25.4 mm) or greater in diameter at the surface. While there is no commonly accepted definition on the minimum diameter of hail, the smallest hail size commonly accepted by mPING or the NWS is 0.25 in. (6.35 mm), corresponding to pea-sized hail. With these guidelines in mind, we consider two thresholds: one to identify severe hail within the ensemble (25 mm) and one to identify areas where any hail is present (5 mm). The probability of Dmax exceeding (Fig. 4a) 5.0 mm and (Fig. 4b) 25.0 mm using a neighborhood ensemble probability (NEP; Schwartz et al. 2010) method are presented in Fig. 4. For NEP calculations, a neighborhood radius of 2.5 km is used. Severe reports exceeding the specified diameter are indicated by orange dots in Fig. 4.
The region of P(Dmax > 5 mm) > 0.4 (Fig. 4a) roughly corresponds to the region of Dmax > 5 mm in the PM ensemble mean (Fig. 3). Areas of lower probability extend primarily to the north and east, reflecting variation within the ensemble in the speed and direction of motion of the storms and in the length of time that hail was produced. Similar as for the PM mean (Fig. 3), the observed SPC and mPING hail reports fall near or within areas of high predicted probability of hail (Fig. 4a), though reports are absent in the region of high probability in the rural areas northeast of the OKC metro area. Two of the received hail reports were severe (reported hail diameter > 1 in.); these reports, which occurred in the southeastern part of the OKC metro area, are on the southern edge of a relatively low-probability region for severe hail probability [P(Dmax > 25 mm); Fig. 4b] within the ensemble. Overall, the probability of severe hail for this case was relatively modest, with the highest probabilities of around 0.3–0.6 occurring in two swaths over the OKC metro area. The proximity of the reported severe hail to these swaths, however, is an encouraging indication of model skill for explicit short-term severe hail prediction.
b. Radar-based ensemble hail forecast verification using MESH
As noted in our discussion of methods, a major limitation of hail reports is a substantial population bias (e.g., Wyatt and Witt 1997; Davis and LaDue 2004). Most hail reports are received in high-population regions, with substantial underreporting in rural areas. For this reason, it is unclear whether the large areas of hail predicted within the model to the north and east of the OKC metro area (Figs. 3 and 4a) are false detections, or whether hail occurred in those areas but was not reported. WSR-88D radar observations, which offer full volumetric coverage of the storms that occurred on 20 May 2013, can be used as an indirect source of observations to address this ambiguity. In this section, we consider two methods of radar-based hail verification: prediction and verification of MESH and verification of ensemble-predicted regions of hail against areas of hail identified by dual-polarization WSR-88D observations using an implementation of the fuzzy-logic-based HCA of Park et al. (2009).
MESH (Witt et al. 1998) is calculated using a weighted integration of the radar reflectivity factor exceeding 40 dBZ above the melting level to estimate the maximum size of hail occurring at the surface. Temperature and simulated reflectivity factor, calculated using the radar observation operator of Jung et al. (2008), are readily available from our forecast ensemble, making it straightforward to directly apply the MESH algorithm to produce gridded forecast MESH fields. For verification, we interpolate hourly MESH observations from the WSR-88D network to our model grid. A comparison between the WSR-88D observed hourly MESH swath from 1900 to 2100 UTC (Fig. 5a), the available verification period most closely corresponding to the 1930–2100 UTC forecast period, and the ensemble PM mean mesh swath during the forecast period (Fig. 5b) shows a relatively favorable comparison between the forecast ensemble and the observations. In the observed MESH field (Fig. 5a), three primary MESH swaths are present, corresponding to the Newcastle–Moore supercell over the OKC metro area and two other supercells to the south. These three primary swaths are also present in the PM mean forecast (Fig. 5b), in locations that match well with the observed swaths. The MESH forecasts show skill (ETS > 0) at the 5- and 25-mm thresholds, with ETS values of 0.28 and 0.12 at these thresholds, respectively.
Compared to the observations, the forecast PM mean MESH swaths have narrower cores of high MESH values and higher maximum values for the northern two storms (Fig. 5). Because storms in many ensemble members moved to the northeast faster than the observed storms, the forecast swaths extend farther northeast than the observed swaths; this behavior is most prominent in the northernmost storm. In contrast, the ensemble underpredicted the extent of MESH in the southernmost storm, located near the Oklahoma–Texas border (Fig. 5). While hail was present at upper levels in the southern storm in the ensemble, as we will discuss below, this storm, in many ensemble members, exhibited a relatively low radar reflectivity factor (<45 dBZ) within the updraft core in the upper portion of the storm (not shown). Because MESH is calculated using a weighted integration of the radar reflectivity factor exceeding 40 dBZ above the 0°C level, this led to low MESH values in the southern storm within the ensemble.
MESH, which uses the reflectivity factor aloft as a proxy for estimating hail size at the surface, is employed operationally to produce estimates of maximum hail size that may or may not agree with the microphysically predicted Dmax at the surface. When ensemble-predicted MESH (Fig. 5b) is compared to predicted Dmax at the surface (Fig. 3), however, the MESH estimate of maximum hail diameter at the surface exceeds the actual Dmax at the surface in most areas, and the hail swaths predicted by MESH cover a larger region for each of the hail-producing storms in this case. At least for this case, MESH, when applied to the ensemble, appears to have a high bias in predicted hail size, including prediction of nonsevere hail in areas near hail-producing storms where no actual hail was present in the microphysical fields at the surface (Figs. 3 and 5b). Unfortunately, because high-resolution observed hail size data are not available at the surface, it is not feasible to determine whether similar biases in MESH are also present in the observations, though prior studies have noted high biases in MESH calculated from WSR-88D observations (e.g., Ortega et al. 2009; Cintineo et al. 2012).
c. Verification of the ensemble hail forecast using dual-polarization radar observations
Dual-polarization radar observations provide a method for obtaining dense, high quality estimates of the spatial distribution of hail in the area observed by the radar; such observations are well suited for hail forecast verification. To estimate the hydrometeor type from observed dual-polarization radar data, we apply a variant of the HCA of Park et al. (2009), which uses fuzzy-logic-based classification to determine the dominant hydrometeor type from observed radar data (horizontally polarized radar reflectivity factor Z, differential reflectivity Zdr, and copolar correlation coefficient ρhv) and the model vertical temperature profile. The variant of the Park et al. (2009) HCA used in this study applies the membership functions and weights for Z, Zdr, and ρhv, as well as for standard deviations of Z and differential phase along radials. Confidence vectors are not used; despite their omission, however, HCA output using this configuration did not appear to suffer from any data quality issues. The radar data used in the HCA underwent preliminary quality control, including the rejection of radar observations, where ρhv was less than 0.85. As the model directly predicts hail, we compare observed HCA-generated swaths directly to ensemble-predicted swaths of hail defined using the hail mass field mh,
where qh is the hail mixing ratio and ρair is the air density.
HCA-indicated hail swaths from the lowest-elevation scans of KTLX and KFDR (the two nearest WSR-88D radars to the storms in this case study) between 1930 and 2100 UTC are shown in Fig. 6, with the elevation of the beam above mean sea level indicated. These swaths are used to verify ensemble forecasts of P(mh > 0.03 kg m−3) in Fig. 7, with HCA-indicated hail swaths marked by the contour and the probabilistic ensemble forecast shaded. In Fig. 7, the model mh field, calculated using (2), is interpolated to the same vertical level as the HCA-indicated hail swaths plotted in Fig. 6, resulting in mh on the model grid in the horizontal, but located at the elevation of the radar beam in the vertical. The HCA-indicated hail swaths, smoothed for visual clarity, are also plotted in Fig. 7 for comparison and verification. The threshold of 0.03 kg m−3 for mh used in Fig. 7 was chosen based on preliminary experiments using a wide range of mh thresholds; lower mh thresholds produced forecasts with a higher probability of detection, but resulted in substantial overprediction of the geographic extent of hail swaths, while higher thresholds improved forecast reliability but reduced the probability of detection (not shown). The threshold of 0.03 kg m−3 was found subjectively to produce the best balance between forecast reliability and the high probability of detection, at least for this case.
The ensemble forecast NEP swath for P(mh > 0.03 kg m−3) generally captures the three main hail-producing cells, producing relatively skillful forecasts over the portion of the domain observed by the radars, generating forecasts with AUC values of 0.904 for KTLX (Fig. 7a) and 0.899 for KFDR (Fig. 7b). For reference, a perfect forecast produces an AUC of 1.0, while a forecast with no skill produces an AUC of 0.5. The ensemble-predicted hail swaths also show fair to good forecast reliability (Fig. 8), although hail swath coverage is substantially overforecast when verified against the KFDR observations (Fig. 8b). In both the model and the observations, the geographic extent of the hail swaths is greater at higher altitudes (i.e., farther from the radar site). This pattern is more prominent in the model swaths than in the observed data; the model shows large hail swaths in the northern two storms several kilometers above the surface for the KFDR radar surface (Fig. 7b), but relatively limited hail from these storms reaching the near-surface region observed by KTLX (Fig. 7a). Likewise, the model predicts a hail swath close to the HCA-indicated hail swath for the southern storm along the Texas–Oklahoma border for KTLX (Fig. 7a), but almost no hail for this storm over the near-surface region indicated by the HCA in the KFDR observations (Fig. 7b).
When the ensemble-predicted MESH field (Fig. 5b) is compared to the predicted Dmax at the surface (Fig. 3), it is apparent that the MESH field consistently has greater geographic coverage than the Dmax field. MESH, which relies on the model state above the 0°C level, indicates the presence of abundant hail and graupel in the three hail-producing storms, with geographic coverage comparable to MESH calculated from WSR-88D observations (Fig. 5). However, the predicted PM mean Dmax field shows less extensive hail swaths in each of the three storms, with almost no hail predicted for the southernmost storm. While the intensity of the southernmost storm is underpredicted in all forecast fields we have considered thus far, the systematic difference between MESH and Dmax at the surface is also apparent in the northern storms where forecast MESH agreed well with MESH calculated from WSR-88D observations.
To help illuminate the reasons for the noted differences in the MESH and Dmax fields, we examine vertical cross sections of the hail and graupel mass fields taken through the core of the northern storm in an individual ensemble member (Fig. 9). While the cross sections in Fig. 9 are taken from a single member, they are typical of such cross sections within the ensemble. Both hail and graupel are present in the storm: hail is tightly concentrated, while graupel is more widespread within the upper levels of the storm. These results suggest that contributions to reflectivity factor from graupel are in part responsible for the larger geographic extent of MESH swaths compared to swaths of Dmax at the surface. We also note that while substantial hail mass is present in the mid- and upper levels of the storm, exceeding 4.0 kg m−3, the mass of hail reaching the surface is much lower (approximately 0.4–0.6 kg m−3), consistent with the pattern of low mh at lower altitudes noted above in our discussion of Fig. 7. These results are consistent with the findings of Johnson et al. (2016), who compared several two-moment microphysics schemes for simulations of an idealized supercell storm produced using the WRF Model and found that the size of the hail simulated with MY2 was relatively small, making hail in the MY2 scheme prone to melting quickly.
While gridded hail observations are not available at the surface, we can see a similar systematic difference between the geographic coverage of MESH (Fig. 5a) and hail swaths identified in the KTLX and KFDR radar data via the HCA (Fig. 6). The extent of the observed MESH swaths is greater than the extent of the HCA-indicated hail swaths near the surface (e.g., in areas less than 2 km above the surface in Fig. 6), but of similar geographic extent for HCA-based observed hail swaths farther aloft (e.g., in areas 3 km or more above the surface in Fig. 6). This suggests that the discrepancy between MESH and hail swath extent is present to some extent in the observations for this case and is not entirely due to model error (though model error, particularly resulting from the microphysical scheme and faster storm motion, is likely still a sizeable contributor). There are several possible sources of bias in the HCA classifications. For example, it is possible for the HCA to misclassify areas of very heavy rain as containing hail. It is also possible that the HCA-indicated swaths might miss some regions of small hail when they are mixed with intense rainfall, because the HCA identifies only the dominant hydrometeor category; in such a case (low concentrations of small hail mixed with heavy rainfall), the HCA might indicate only rain instead of rain mixed with hail.
4. Summary and discussion
In this study, we evaluated the ability of an EnKF-based storm-scale ensemble forecast (at 500-m horizontal grid spacing) to explicitly predict hail on a 0–90-min time scale for the supercell thunderstorms that occurred over Oklahoma on 20 May 2013. Radar radial velocity and reflectivity data, along with surface observations, were assimilated into a 40-member ensemble using the ARPS EnKF system; at the end of the data assimilation period, an ensemble of 90-min forecasts was launched. The ensemble analyses and forecasts generally captured the three major observed hail-producing supercells in terms of location, size, and evolution. Some modest errors were noted in the forecast; the predicted storms in the ensemble moved to the northeast somewhat more quickly than the observed storms, and the southernmost of the three main hail-producing storms weakened somewhat in the forecast ensemble, contrary to observations, late in the forecast period.
Ensemble-based hail forecasts were produced and verified using several methods, including the prediction of maximum hail diameter at the surface from the number concentration and mass mixing ratio predicted by the model microphysical scheme, prediction of maximum estimated size of hail (MESH) obtained by applying the operational MESH algorithm to the ensemble output, and direct prediction of hail mass within the ensemble. These predictions were verified against several different data sources, including observed hail reports at the surface in data collected by the SPC and mPING, MESH calculated from WSR-88D radar observations, and hail swaths obtained from WSR-88D dual-polarization observations using a fuzzy-logic-based hydrometeor classification algorithm.
The ensemble produced forecasts ranging from marginally skillful to highly skillful depending on the forecast and verification method chosen. Surface hail reports were relatively sparse from this case (only four reports were recorded in SPC and mPING data), but all occurred very near or within hail swaths predicted by the ensemble. Population bias in the human reports was likely a major factor in this case; the only surface hail reports received were located within the Oklahoma City metropolitan area, while radar-based hail observations showed large hail swaths in rural areas as well as hail over the metropolitan area. Of the verification methods examined, ensemble prediction of MESH showed the closest agreement with the observations (in this case, MESH calculated from WSR-88D observations interpolated to the model grid). The predicted MESH field correctly identified hail swaths in the three main hail-producing storms, and showed relatively good agreement with observations in terms of maximum hail size and swath extent, except for some underprediction of MESH in the southernmost storm; objective verification of the MESH forecast using the equitable threat score (ETS) indicated positive skill (ETS > 0) both for the prediction of all hail (>5 mm) and for the prediction of severe hail (>25 mm). Forecasts of hail mass, when verified against hail swaths identified in dual-polarization WSR-88D observations by the hydrometeor classification algorithm, similarly showed good skill, but revealed a systematic pattern of abundant hail present aloft, but not much hail actually reaching the surface. While the geographic extent of hail swaths observed at upper levels was generally larger than that near the surface, this pattern appeared overly strong in the model, with hail in some swaths (such as for the southernmost storm) melting almost entirely before reaching the surface. Microphysical representation of hail, particularly in multimoment microphysical schemes, is an ongoing area of research; it is hoped that model performance can be improved by future work focusing on the characterization and mitigation of model errors related to the microphysical treatment of hail.
Using dual-polarization WSR-88D observations as a supplemental source of information about the location and extent of hail within the storms using a hydrometeor classification algorithm (HCA) provided a valuable means of validating forecasts of the hail field within the ensemble. In particular, the full 3D coverage of the hail region estimates obtained from the HCA allowed for validation of the ensemble hail forecast even in regions of sparse surface hail observations, as well as aloft. We note, however, that such validation is subject to possible error in the HCA; this is a possible topic for future study. Further, since the HCA does not tell us the exact size or the amount of hail present, it should be used in combination with other sources of observations whenever possible.
One way that hail prediction may be improved is through better explicit prediction of the properties of rimed ice. For example, prediction of the maximum hail size at the surface may be improved by using a three-moment scheme (Milbrandt and Yau 2006b). Mansell et al. (2010) and Milbrandt and Morrison (2013) predict bulk graupel density to account for the wide variation in the density of rimed particles. More recently, the predicted particle properties (P3) scheme, which predicts the total ice mass, ice number, ice mass from rime growth, and bulk rime volume, was shown to simulate wide ranges of rimed-ice particle size distributions reasonably well for squall-line and orographic precipitation cases (Morrison and Milbrandt 2015; Morrison et al. 2015). Future studies using these more advanced microphysics schemes could offer new insights and advances in hail prediction.
Prediction of maximum hail diameter at the surface from the number concentration and mass mixing ratio predicted by the model microphysics produced hail swaths in close proximity to observed surface hail reports, though surface reports were relatively sparse and limited to the Oklahoma City metropolitan area. Because hail surface observations are often relatively sparse, as was true in this case, we note that it is difficult to quantify the advantage of ensemble hail predictions over “nowcasting” methods (e.g., extrapolating based upon the presence of a supercell and estimated storm motion), although we note that nowcasting does not provide quantitative prediction of the size and geographic distribution of hail. To better quantify the benefit of ensemble hail predictions, additional study will be needed, ideally including a variety of storms ranging from those not producing hail at the surface to those producing very large hail at the surface. It may also be beneficial in future studies to apply a hail size discrimination algorithm, such as that of Ryzhkov et al. (2013), to polarimetric radar observations as a supplemental source of information on hail size.
Overall, the results of this study demonstrate the promise of high-resolution, storm-resolving ensembles using state-of-the-art data assimilation schemes to skillfully predict hail on time scales of up to 90 min, as well as the utility of radar observations (especially dual-polarization radar observations) as a supplemental source of information for hail verification. We note, though, that additional studies (including storms of varying convective modes, in varying locations, and at varying times of the day and the year) will be necessary to evaluate the robustness of the results obtained for this case. Furthermore, while it is our hope that storm-scale ensembles of this type will someday become feasible using operational computing resources, the computational resources needed to perform the analyses and forecasts in this study were quite substantial, necessitating the use of significant supercomputing resources and precluding, at least for now, the use of such a method in real time. As such, we encourage future work aimed at improving the computational efficiency of storm-scale ensembles similar to that used in this study.
This work was primarily supported by NSF Grant AGS-1261776 as part of the Severe Hail Analysis, Representation, and Prediction (SHARP) project. Supplementary support was also supported by NSF Grants AGS-0802888 and AGS-1046171. Computing was performed primarily on the XSEDE Stampede supercomputer at the University of Texas Advanced Computing Center (TACC). The authors thank Kevin Thomas for support in obtaining data, Keith Brewster and David Gagne for assistance with radar data processing, and Tim Supinie for helpful discussions regarding postprocessing methods.