Evaluating Ensemble Kalman Filter Analyses of Severe Hailstorms on 8 May 2017 in Colorado: Effects of State Variable Updating and Multimoment Microphysics Schemes on State Variable Cross Covariances

Jonathan Labriola School of Meteorology, and Center for Analysis and Prediction of Storms, University of Oklahoma, Norman, Oklahoma

Search for other papers by Jonathan Labriola in
Current site
Google Scholar
PubMed
Close
,
Nathan Snook Center for Analysis and Prediction of Storms, University of Oklahoma, Norman, Oklahoma

Search for other papers by Nathan Snook in
Current site
Google Scholar
PubMed
Close
,
Youngsun Jung Center for Analysis and Prediction of Storms, University of Oklahoma, Norman, Oklahoma

Search for other papers by Youngsun Jung in
Current site
Google Scholar
PubMed
Close
, and
Ming Xue School of Meteorology, and Center for Analysis and Prediction of Storms, University of Oklahoma, Norman, Oklahoma

Search for other papers by Ming Xue in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Ensemble Kalman filter (EnKF) analyses of the storms associated with the 8 May 2017 Colorado severe hail event using either the Milbrandt and Yau (MY) or the NSSL double-moment bulk microphysics scheme in the forecast model are evaluated. With each scheme, two experiments are conducted in which the reflectivity (Z) observations update in addition to dynamic and thermodynamic variables: 1) only the hydrometeor mixing ratios or 2) all microphysical variables. With fewer microphysical variables directly constrained by the Z observations, only updating hydrometeor mixing ratios causes the forecast error covariance structure to become unreliable, and results in larger errors in the analysis. Experiments that update all microphysical variables produce analyses with the lowest Z root-mean-square innovations; however, comparing the estimated hail size against hydrometeor classification algorithm output suggests that further constraint from observations is needed to more accurately estimate surface hail size. Ensemble correlation analyses are performed to determine the impact of hail growth assumptions in the MY and NSSL schemes on the forecast error covariance between microphysical and thermodynamic variables. In the MY scheme, Z is negatively correlated with updraft intensity because the strong updrafts produce abundant small hail aloft. The NSSL scheme predicts the growth of large hail aloft; consequently, Z is positively correlated with storm updraft intensity and hail state variables. Hail production processes are also shown to alter the background error covariance for liquid and frozen hydrometeor species. Results in this study suggest that EnKF analyses are sensitive to the choice of MP scheme (e.g., the treatment of hail growth processes).

Corresponding author: Jonathan Labriola, j.labriola@ou.edu

Abstract

Ensemble Kalman filter (EnKF) analyses of the storms associated with the 8 May 2017 Colorado severe hail event using either the Milbrandt and Yau (MY) or the NSSL double-moment bulk microphysics scheme in the forecast model are evaluated. With each scheme, two experiments are conducted in which the reflectivity (Z) observations update in addition to dynamic and thermodynamic variables: 1) only the hydrometeor mixing ratios or 2) all microphysical variables. With fewer microphysical variables directly constrained by the Z observations, only updating hydrometeor mixing ratios causes the forecast error covariance structure to become unreliable, and results in larger errors in the analysis. Experiments that update all microphysical variables produce analyses with the lowest Z root-mean-square innovations; however, comparing the estimated hail size against hydrometeor classification algorithm output suggests that further constraint from observations is needed to more accurately estimate surface hail size. Ensemble correlation analyses are performed to determine the impact of hail growth assumptions in the MY and NSSL schemes on the forecast error covariance between microphysical and thermodynamic variables. In the MY scheme, Z is negatively correlated with updraft intensity because the strong updrafts produce abundant small hail aloft. The NSSL scheme predicts the growth of large hail aloft; consequently, Z is positively correlated with storm updraft intensity and hail state variables. Hail production processes are also shown to alter the background error covariance for liquid and frozen hydrometeor species. Results in this study suggest that EnKF analyses are sensitive to the choice of MP scheme (e.g., the treatment of hail growth processes).

Corresponding author: Jonathan Labriola, j.labriola@ou.edu

1. Introduction

Hail causes significant damage to both crops and personal property in the United States—often exceeding $1 billion (U.S. dollars) in damage annually (Changnon 2009). Damage from hail is particularly costly when urban areas are affected. Just since the late 1990s, multiple major metropolitan areas have sustained more than $1 billion in damage from a single hail event, including Dallas/Fort Worth, Texas; Denver, Colorado; Kansas City, and St. Louis, Missouri; and greater Chicago (Edwards and Thompson 1998; Changnon and Burroughs 2003; Changnon et al. 2009; NOAA National Centers for Environmental Information 2018). The continued expansion of these metropolitan areas is expected to further increase hail damage in the future (Rosencrants and Ashley 2015).

Increasing the warning lead time of severe hail events can potentially mitigate damage by allowing more time for preparation. Current warning lead times for severe hail are limited because the National Weather Service (NWS) issues warnings based upon the detection of severe hail (diameter > 1.0 in.). Generally, hail detections are based upon either surface-based reports or radar signatures that are indicative of severe hail. To extend severe hail warning lead times the NWS is investigating a warn-on-forecast framework (Stensrud et al. 2009, 2013), where instead of issuing warnings based upon the detection of severe hail, the NWS will issue warnings based upon high-resolution, frequently updated numerical weather prediction (NWP) model guidance. While convection-allowing forecasts can skillfully predict convective hazards (e.g., Kain et al. 2008, 2010; Clark et al. 2012; Dawson et al. 2012; Snook et al. 2016, 2019; Yussouf et al. 2015; Yussouf and Knopfmeier 2019; Johnson et al. 2015; Johnson and Wang 2017; Schwartz et al. 2015; Skinner et al. 2018; Dawson et al. 2017; Labriola et al. 2017, 2019a; Jones et al. 2016, 2019; Supinie et al. 2016, 2017; Gallo et al. 2019; Stratman et al. 2020), forecast skill is highly dependent upon errors introduced by the initial conditions and model physics.

Previous studies have shown, either through observing system simulation experiments (OSSEs) or real-data experiments (e.g., Snyder and Zhang 2003; Zhang et al. 2004, 2006; Dowell et al. 2004; Tong and Xue 2005; Caya et al. 2005; Xue et al. 2006; Snook et al. 2011; Dawson et al. 2012; Romine et al. 2013; Schwartz et al. 2015; Johnson and Wang 2017; Lawson et al. 2018; Stratman et al. 2020) that an ensemble Kalman filter (EnKF; Evensen 1994, 2003) can be successfully applied to convection-resolving forecasts. EnKF methods employ an ensemble of forecasts to sample error covariance. Flow-dependent error covariances derived from ensemble forecasts are used to correct errors in unobserved variables during data assimilation (DA) that strongly influence convective-scale dynamics such as updraft speed, in-cloud temperature, and microphysical properties (Tong and Xue 2005, hereafter TX05; Tong and Xue 2008a). This feature of EnKF is particularly useful when assimilating radar data (e.g., Snyder and Zhang 2003; Zhang et al. 2004; Dowell et al. 2004; Tong and Xue 2005; Caya et al. 2005; Dowell and Wicker 2009; Aksoy et al. 2009), which provide indirect but high spatial and temporal resolution observations.

Even when forecasts are initialized from relatively accurate initial conditions, forecast error will continue to increase with time due to model errors including those associated with subgrid-scale processes (Zhu and Navon 1999; Houtekamer et al. 2005; Zhang et al. 2006; Hawblitzel et al. 2007; Melhauser and Zhang 2012; Zhang et al. 2015). For convective-scale simulations, one of the largest sources of model error are microphysical parameterizations. These parameterizations represent fine, subgrid-scale processes that undergo rapid, nonlinear transformations at convective scales and often rely upon ad hoc procedures due to the limited understanding of microphysical processes.

Bulk microphysics schemes (referred to as MP schemes) are the most commonly used type of microphysical parameterization in convective modeling studies. MP schemes predict the evolution of an assumed hydrometeor particle size distribution (PSD) that typically follows a gamma distribution (Ulbrich 1983):

N(D)=N0xDαxeλxD,

where D is the particle diameter, N0x is the intercept parameter, λx is the slope parameter, and αx is the shape parameter for hydrometeor species x. To diagnose the PSD, MP schemes predict certain p moments of the distribution:

Mx(p)=0DpNx(D)dD=N0xΓ(α+p+1)λxα+p+1,

where Γ(n) is a gamma function. The zeroth, third, and sixth moments of a PSD are most commonly predicted by MP schemes and are proportional to hydrometeor number concentration (Ntx), mixing ratio (qx), and reflectivity (Zx), respectively. The number of predicted moments corresponds to the number of diagnosed PSD parameters. All undiagnosed parameters are typically assumed to be constant.

While single-moment MP schemes (e.g., Lin et al. 1983) are generally computationally efficient because they predict only hydrometeor mixing ratio, they are unable to accurately represent microphysical processes such as selective melting/evaporation and size sorting. Multimoment MP schemes (e.g., Ferrier 1994; Milbrandt and Yau 2005a; Morrison et al. 2005; Morrison and Grabowski 2008; Thompson et al. 2008; Mansell et al. 2010; Milbrandt and Morrison 2013; Morrison and Milbrandt 2015; Milbrandt and Morrison 2016; Lim and Hong 2010) can replicate these microphysical processes and improve the representation of storm structure. Double-moment schemes that predict both qx and Ntx (diagnose N0x, λx), can reproduce observed polarimetric radar signatures (Jung et al. 2012; Johnson et al. 2016; Putnam et al. 2014, 2017a,b); however, multiple studies (e.g., Milbrandt and Yau 2005b; Dawson et al. 2014; Johnson et al. 2016; Morrison et al. 2015) note these schemes suffer from excessive size sorting. Excessive size sorting can be mitigated using a correction mechanism (Thompson et al. 2008; Mansell 2010), a diagnostic shape parameter, or by using a triple-moment scheme (Milbrandt and Yau 2005b).

In some newer MP schemes, instead of predicting many static hydrometeor categories (e.g., Straka and Mansell 2005), the scheme predicts the evolution of particle characteristics such as density (Mansell et al. 2010; Milbrandt and Morrison 2013; Morrison and Milbrandt 2015; Morrison et al. 2015; Milbrandt and Morrison 2016). MP schemes generally predict the density of rimed ice; this is because ice particle density can vary substantially, and because ice particles can undergo large fluctuations in density during the riming process. Prognostic density equations allow the MP scheme to update rimed ice fall speeds and improve the representation of hail production processes (Labriola et al. 2019a).

Predicted microphysical variables can be used to diagnose surface hail size in a NWP forecast, either via simulated radar-derived hail products (e.g., Snook et al. 2016; Luo et al. 2017, 2018; Labriola et al. 2017) such as the maximum estimated size of hail (MESH; Witt et al. 1998a), or by using hail PSDs to diagnose the largest observable hail size (e.g., Snook et al. 2016; Labriola et al. 2017, 2019a,b; Luo et al. 2018; Gagne et al. 2019). Hail size forecasts not only provide useful information for forecasters but are also useful in evaluating MP scheme treatment of hail growth and decay processes. Due to the limited coverage and biased nature of surface hail reports (e.g., Sammler 1993; Witt et al. 1998b; Doswell et al. 2005), most hail size forecasts are evaluated against radar-derived hail products that serve as a proxy for hail size such as MESH or output from a hydrometeor classification algorithm (HCA; Heinselman and Ryzhkov 2006; Park et al. 2009; Ryzhkov et al. 2013; Ortega et al. 2016).

Over the continental United States, the Next Generation Weather Radar (NEXRAD; Crum et al. 1993) network provides full volumetric scans at a temporal resolution sufficient to observe the rapid evolution of convective storms. Reflectivity (Z) is proportional to the sixth moment of a hydrometeor PSD and is strongly influenced by larger particles within a radar volume. After a recent polarimetric upgrade to the NEXRAD system, radars now emit both horizontally and vertically polarized signals. Dual-polarization variables such as differential reflectivity (Zdr), copolar cross-correlation coefficient (ρhv), and specific differential phase (Kdp), which in addition to single-polarization products, Z and radial velocity (Vr), provide information about hydrometeor size, orientation, phase, and shape (Doviak et al. 2000; Kumjian and Ryzhkov 2008). Due to the large amount of hydrometeor information that can be inferred from polarimetric radar products, several recent studies have used simulated or observed polarimetric variables (e.g., Jung et al. 2010a, 2012; Ryzhkov et al. 2011; Kumjian et al. 2014; Dawson et al. 2014; Putnam et al. 2014, 2017a,b, 2019; Johnson et al. 2016, 2018, 2019; Snook et al. 2016; Snyder et al. 2017a,b) to evaluate microphysical parameterizations.

Generally, Z is the primary observation of microphysical relevance assimilated during DA. Despite a limited number of observations, OSSEs have shown that an EnKF system can reasonably update single-moment MP scheme variables when assimilating Z and Vr (TX05). Multimoment schemes, which predict approximately twice as many microphysical variables as a single-moment scheme, are even more underconstrained by observations. Although Xue et al. (2010) (hereafter XJZ10) obtained a reasonably good analysis using a double-moment scheme in an OSSE, it was noted that Z alone may be insufficient to constrain the increased number of variables in multimoment schemes because many different microphysical configurations can correspond to the same Z. Additionally, the large number of predicted variables introduces error into the analysis because increasing the number of error-containing parameters decreases the accuracy of the estimation (Aksoy et al. 2006; Tong and Xue 2008b; Jung et al. 2010b).

Relatively few studies have considered developing an optimal EnKF-based framework for updating multimoment MP scheme variables during DA. Previously, to improve constraint of microphysical variables, XJZ10 assimilated both radar Z and Vr, and Labriola et al. (2019a) used an EnKF system to update only hydrometeor mixing ratio; however, both studies conclude that there is a need to further analyze the ability of an EnKF system to update multimoment scheme variables. This study evaluates EnKF analysis sensitivity to selective state variable updating and the microphysical assumptions made within two double-moment MP schemes used in the prediction model. Knowledge gained through this study provides information that can be used to optimize EnKF configurations when using such multimoment MP schemes. To this end, EnKF analyses are evaluated for the 8 May 2017 Colorado severe hail event. The rest of the paper is organized as follows: in section 2a brief overview of the 8 May 2017 Colorado severe hail event is provided along with the experimental design, including the EnKF and prediction model settings. Verification statistics and ensemble correlation analyses are provided in section 3. Finally, section 4 contains a summary and discussion of the results.

2. Methods

a. Case overview

On 8 May 2017, multiple hail producing thunderstorms occurred over north-central Colorado. According to surface-based hail reports received by the Storm Prediction Center (SPC), these storms produced severe (diameter > 25 mm) and significant severe (diameter > 50 mm) hail starting at approximately 2000 UTC. One storm that impacted downtown Denver (Fig. 1) caused a particularly large amount of damage; due to heavy traffic volume during evening rush hour, a large number of cars sustained major hail damage during the storm (Fritz 2017). As of the date of this writing, an estimated $2.3 billion in insurance claims (NOAA National Centers for Environmental Information 2018) have been filed in Colorado for hail damage sustained during this severe weather event.

Fig. 1.
Fig. 1.

(a) An areal map of the Denver region. (b) Observed Z from the lowest radar tilt (0.5°) of KFTG (Denver) at 2040 UTC. (c) Merged HCA output between 2000 and 2040 UTC. The HCA is applied to the lowest radar tilts (0.5°) of the two closest radars: KFTG (Denver) and KCYS (Cheyenne). Purple contours represent urban boundaries, thin black lines represent major highways. All regions of the domain that are more than 2.5 km above mean sea level are shaded gray.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Forcing for large-scale ascent was moderate during this severe weather event. Atmospheric conditions were conducive for the development of supercell thunderstorms with moderate shear [0–6 km shear: 30–40 kt (1 kt ≈ 0.51 m s−1)] and instability (mixed-layer CAPE: 1000–1500 J kg−1) (Marsh 2017). On 8 May 2017, Colorado was located downstream of a trough associated with an upper-level low pressure system stationed over Baja California. Cyclonic vorticity advection from the trough paired with low-level convergence along a weak frontal boundary initiated multicellular storms to the south and east of Denver at approximately 1930 UTC. Additional, more isolated storms initiated along the front range of the Rocky Mountains via upslope flow, one of these storms later produced significant severe hail over Denver (Fig. 1c). For additional information regarding this severe weather event we refer the reader to Labriola et al. (2019b).

b. Prediction model settings

All experiments use the Advanced Regional Prediction System (ARPS; Xue et al. 2000, 2001) NWP model on a domain of 483 × 443 × 53 grid points with a horizontal grid spacing of 500 m. The model grid is stretched in the vertical, with a minimum grid spacing of 50 m at the surface and an average vertical grid spacing of 425 m. Model physics include NASA Goddard Space Flight Center short- and longwave radiation parameterization; surface fluxes calculated from surface drag coefficients, surface temperature, and volumetric water; a two-layer soil model; and a 1.5-order turbulent mixing and planetary boundary layer parameterization (Xue et al. 2000, 2001).

Ensemble forecasts are run using either the Milbrandt and Yau (2005a) double-moment (MY) MP scheme or the NSSL double-moment variable density rimed ice MP scheme (Mansell et al. 2010). These MP schemes were selected for this study because they are commonly used multimoment MP schemes that have been used in several previous experiments to produce explicit hail forecasts (Milbrandt and Yau 2006; Snook et al. 2016; Luo et al. 2017, 2018; Labriola et al. 2017, 2019a,b). Following Labriola et al. (2019a), the minimum hydrometeor number concentration threshold is set to 10−8 m−3 for both schemes to avoid excessive removal of hail near the surface, which can result when using the MY default setting (10−3 m−3).

Initial and boundary conditions are interpolated from the Center for Analysis and Prediction of Storms (CAPS) EnKF storm-scale ensemble forecast (SSEF; Jung et al. 2018)—a 40-member ensemble of forecasts run using the Advanced Research version of the Weather Research and Forecasting Model (WRF-ARW; Skamarock et al. 2008). The SSEF uses 3-km horizontal grid spacing, and its model domain spans the contiguous United States. The SSEF was initialized via EnKF at 1800 UTC 8 May 2017 from the North American Mesoscale Forecast System (NAM; Environmental Modeling Center 2017) analysis, with initial perturbations introduced by the Short-Range Ensemble Forecast (SREF; Du et al. 2015). The SSEF is run until 0000 UTC 9 May 2017. At 1903 UTC the 500-m domain is interpolated from the SSEF, with boundary conditions obtained from the SSEF every 9 min; these times were selected because the SSEF outputs model data every 9 min. No microphysical information is provided along domain boundaries to avoid development of spurious convection. For additional information regarding the 2017 CAPS SSEF, we refer the reader to CAPS (2017).

c. DA and observation operator settings

After initialization, the 500-m ensemble forecasts undergo a ~60-min spinup period until 2000 UTC, when DA cycling begins (Fig. 2). The CAPS EnKF system (Xue et al. 2006; Tong and Xue 2008b), which is based upon the Whitaker and Hamill (2002) ensemble square-root filter (EnSRF) algorithm, assimilates all available observations every 5 min between 2000 and 2040 UTC (Fig. 2).

Fig. 2.
Fig. 2.

The experiment configuration for ensembles MY-Q, MY-ALL, NSSL-Q, and NSSL-ALL. The blue region represents the approximate 1-h spinup period and the orange region represents the 40-min period when 5-min DA cycling occurs. Downward pointing arrows indicate where assimilation occurs.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Surface (ASOS and AWOS) observations and data from the Denver NEXRAD radar (KFTG) are assimilated during the experiment. Observation error settings follow Snook et al. (2016). For surface observations, errors are assumed to be 1.5 m s−1 for horizontal wind components u and υ, 2.0 K for potential temperature θ, and 2.0 K for dewpoint temperature Td. The covariance localization radius is set to 300 km in the horizontal and 6 km in the vertical. Surface observations update only select model state variables (i.e., u, υ, θ, qυ), as was done in Labriola et al. (2019a) because unreliable covariances led to the intensification of spurious updrafts during DA. KFTG observation errors are assumed to be 4.0 m s−1 for Vr and 6.0 dBZ for Z. These values are larger than typical instrument errors associated with NEXRAD radars (e,g., Doviak and Zrnić 1993; Ryzhkov et al. 2005), but for data assimilation purposes, the errors also include representative errors and other sources of uncertainties. Also, specifying relatively large observation errors can help alleviate underdispersion problems of the EnKF ensemble and past sensitivity studies have shown better performance of the ensemble analyses and subsequent ensemble forecasts (e.g., Dowell et al. 2004; Snook et al. 2013). The Gaspari and Cohn (1999) covariance localization cutoff radius is set to 3 km in both the horizontal and vertical. KFTG data are interpolated horizontally to the model domain, but the height of the radar beam is preserved in the vertical (Xue et al. 2006); Z and Vr observations are preprocessed in the horizontal directions to 1 km grid spacing in regions of precipitation (Z > 5 dBZ) and to 2 km spacings in regions of clear air (Z < 5 dBZ). Given that the radar observations have average resolution of about 1 km, such observation density preserves most useful information in the observations. The relaxation-to-prior spread (Whitaker and Hamill 2012) algorithm is used to inflate the posterior ensemble spread to 95% of the prior spread for all thermodynamic variables (i.e., u, υ, w, θ, p) and hydrometeor mixing ratio qx. Inflating all hydrometeor state variables (i.e., qx, Ntx, and υx) can potentially cause PSDs to become unbalanced, and cause hydrometeor properties to become unrealistic (e.g., hail is too dense). Following Labriola et al. (2019a), Ntx and υx are updated during inflation to preserve EnKF estimated mean mass diameter and density, respectively.

The radar observation operator used during this experiment, which is a variant of the Jung et al. (2010a) T-matrix method, uses lookup tables based upon PSD parameters to calculate the hydrometeor scattering amplitude; an in-depth discussion of this observation operator is provided in Putnam et al. (2019). Generally, MP schemes do not track meltwater, instead the water fraction is diagnosed in the observation operator. The Jung et al. (2008a) melting model is used to create an ice–water hybrid mixture for both hail and graupel in the observation operator. The mixing ratio of the ice–water hybrid category is determined via a ratio between ice and rainwater mixing ratios, the number concentration is updated to preserve particle mean mass diameter and prevent the excessive accumulation of rainwater on the surface of ice particles (Labriola et al. 2019a).

A total of four experiments are conducted during this study; ensembles run using either the MY or NSSL schemes will assimilate observations where either Z updates: 1) hydrometeor mixing ratios only or 2) all microphysical state variables (qx, Ntx, υx); in addition to dynamic and thermodynamic variables (u, υ, w, θ, qυ). To determine the impact of assimilating Z observations, no other observations update microphysical state variables. Experiments are named after the MP scheme used and the microphysical variables that are updated (i.e., MY-Q, MY-ALL, NSSL-Q, NSSL-ALL). In addition to assimilating Z, Vr is used in all experiments to update u, υ, and w, along with water vapor mixing ratio (qυ). The Vr forward observation operator does not consider hydrometeor terminal velocities due to the large uncertainties in hydrometeor size distribution and fall speed assumptions and small elevation angles of radar beams.

Analyses and forecasts during the DA period (2000–2040 UTC) are objectively and subjectively evaluated to determine which experiment produces the most accurate initial state estimate. This time period was selected for evaluation because radar derived hail products (i.e., HCA output) indicate that multiple thunderstorms were producing significant severe surface hail, the largest HCA hail size bin. While storms caused more substantial damage after 2040 UTC when they impacted downtown Denver, they produced similarly large hailstones both during and after the DA period. Forecast evaluation after 2040 UTC is beyond the scope of this study, instead we refer the reader to previous studies (e.g., Snook et al. 2016; Luo et al. 2018) that evaluate surface hail size forecasts.

d. Analysis evaluation procedure

Due to the nature of this severe weather event, estimated surface hail size is verified against radar-based hail size estimates. Surface-based hail reports are generally too sparse and unreliable to be used for verification; severe hail occurrences are underreported in rural areas and away from highways, as well as when a more severe weather event, such as a tornado, occurs nearby (Doswell et al. 2005; Witt et al. 1998b; Allen and Tippett 2015). For example, there are only 4 hail reports recorded by the SPC during the DA period (2000–2040 UTC) and all reports are in Denver. Radar observations during the same period (Fig. 1c) indicate multiple hail producing storms were present in the experiment domain. Further, hail sizes that correspond with familiar circular objects (e.g., quarters, golfballs) are overreported by the public (Sammler 1993), skewing reported hail sizes. Radar derived hail products such as MESH (Witt et al. 1998a) and HCA output (Heinselman and Ryzhkov 2006; Park et al. 2009; Ryzhkov et al. 2013; Ortega et al. 2016; Putnam et al. 2017b) have been shown to be superior to surface-based reports for hail size verification because they have a high spatial resolution and fewer systematic biases (Cintineo et al. 2012).

When working with unbiased observations, HCA output is shown to better discriminate hail size than the MESH algorithm (Ortega et al. 2016). Subjective comparisons also demonstrate the HCA output closely resembles surface-based hail reports (when available) both in terms of hail size and the location of the report. The Ortega et al. (2016) HCA applies a fuzzy logic algorithm to both single- (Z) and dual-polarization (Z, Zdr, ρhv) radar observations along with environmental information (i.e., temperature) to classify the dominant hail size for a grid point as either: nonsevere (≥5 mm), severe (≥25 mm), or significant severe (≥50 mm). Output from the Ortega et al. (2016) HCA (Fig. 1c) along with radar observed Z (Fig. 1b) is used to evaluate ensemble analyses and forecasts during the DA period.

In this study, the HCA is applied to NEXRAD radar observations from both Denver (KFTG) and Cheyenne (KCYS); these are the two closest radars to the hail producing storms. Radar observations are interpolated to the 500-m model grid for verification, with a 9-point smoother applied to Zdr and ρhv to reduce noise. Environmental information is obtained from the ensemble forecasts. HCA output is considered within a 120 km radius of a given radar, this is the maximum distance Ortega et al. (2016) uses to diagnose surface hail size. HCA output from KFTG and KCYS is merged by taking the maximum detected hail size at each grid point and applying a smoothing filter that decreases hail size detections by one size bin (e.g., significant severe to severe) when the majority of grid points within a 1 km radius are smaller than the given HCA detection (Labriola et al. 2019a).

NWP models do not explicitly predict surface hail size, instead hail PSDs are frequently used to diagnose the maximum observable hail size at a grid point (e.g., Snook et al. 2016; Labriola et al. 2017, 2019a,b; Luo et al. 2018; Gagne et al. 2019). This study uses the Snook et al. (2016) maximum hail size algorithm, which defines the maximum hail size (Dmax) as the largest diameter for which the PSD predicts at least 1 hailstone in a 100 m × 100 m box located in the lowest meter of the atmosphere (Ntmin = 10−4 m−4). This criterion is similar to minimum number concentration thresholds defined for hail in previous studies (e.g., Milbrandt and Yau 2006; Gagne et al. 2019). Although the minimum number concentration threshold can be modified, the average Dmax value does not substantially change (approximately 1–2 mm) when the threshold is increased or decreased by an order of magnitude. The spatial coverage of analysis estimated Dmax at the lowest model height (~25 m AGL) is compared subjectively and objectively to the HCA output (Fig. 1c).

3. Results

a. Evaluating ensemble forecast and analysis innovations

The root-mean-square innovation (RMSI) and ensemble spread of Z and Vr (Fig. 3) quantitatively evaluate EnKF analyses and forecasts. The RMSI is defined as

RMSI=d2,

where ⟨d2⟩ is the mean squared innovation or difference between the observation and model mapped to observation space; d is defined as

d=yoH(x)¯,

where yo is the observation, H is the forward operator that maps the model state vector to observation space, and x is either the model state forecast or analysis vector, respectively. Innovations are averaged over the KFTG volume in regions where either the observed or ensemble mean simulated Z exceeds 15 dBZ. This criterion includes both observed precipitation and spurious echoes, but eliminates potentially large regions of clear air and light precipitation from the statistics (Snook et al. 2011; Jung et al. 2012).

Fig. 3.
Fig. 3.

The ensemble RMSI (solid lines) and spread (dotted lines) for (a),(b) Z and (c),(d) Vr. The performance of ensembles (a),(c) MY-Q and (b),(d) NSSL-Q are marked with red lines, and (a),(c) MY-ALL and (b),(d) NSSL-ALL are marked with black lines. Statistics are calculated over the experiment domain from 2000 to 2040 UTC, calculations are limited to locations where the observed and/or model (ensemble mean) Z exceeds 15 dBZ.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Although innovations are relatively similar between NSSL-Q and NSSL-ALL (Figs. 3b,d), RMSIs differ substantially between MY-Q and MY-ALL for Z (Fig. 3a). The MY-Q and MY-ALL RMSIs for Vr are relatively similar (Fig. 3c); this is partly because only winds in the precipitation regions are included in the statistics while differences in the outflow regions are not included. MY-Q RMSIs for Z exhibit only a small reduction during DA cycling, suggesting updating only one PSD parameter for multimoment schemes may lead to an unreasonable PSD in MY-Q (Fig. 3a). RMSIs for MY-Q and MY-ALL Vr are more similar than RMSIs for Z. Wind speed is indirectly influenced by microphysical state variables through thermodynamic feedbacks, and therefore unlike Z, is less affected by updates to total number concentrations during assimilation. When only parts of PSD variables are updated, the RMSI for MY-Q Vr does not change much with cycles (Fig. 3c). This behavior is not observed in NSSL-Q for reasons that will be discussed later in this section.

Analysis mean simulated Z (Fig. 4) and KFTG observations (Fig. 1b) are subjectively compared at the time of the final DA cycle (2040 UTC) to identify why the RMSI for MY-Q Z (Fig. 3a) is larger than any other ensemble (Figs. 3a,b). Although there are slight variations in simulated Z intensity, MY-ALL (Fig. 4b), NSSL-Q (Fig. 4c), and NSSL-ALL (Fig. 4d) analyses closely match observed Z (Fig. 1b). MY-Q (Fig. 4a) Z is somewhat different from observations (Fig. 1b); storm structure is less organized, and the MY-Q analysis frequently underestimates Z. Additionally, MY-Q predicts reflectivity to be spuriously intense in regions of relatively little observed precipitation, such as to the east of Denver (Fig. 4a). Poor MY-Q Z estimates that increase the RMSI for MY-Q Z (Fig. 3a), are due in part to unreliable error covariances that develop as a consequence of unbalanced updates of hydrometeor mixing ratios and number concentrations.

Fig. 4.
Fig. 4.

Ensemble mean simulated Z interpolated to the lowest tilt (0.5°) of KFTG at the time of the final analysis (2040 UTC) for ensembles (a) MY-Q, (b) MY-ALL, (c) NSSL-Q, and (d) NSSL-ALL. A black square in each figure marks the subdomain analyzed in Fig. 6, the background map is the same as Fig. 1.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Despite predicting weaker Z values in the convective core of the observed storms, MY-Q predicts the total mass and number of rain (Figs. 5a,c) and hail (Figs. 5b,d) to be larger than MY-ALL. This is partly because MY-Q predicts more spurious Z throughout the domain during DA (Fig. 4a); however, the ensemble also predicts observed storms to produce more precipitation. NSSL-Q (Fig. 4c) and NSSL-ALL (Fig. 4d) predict more intense Z than either MY ensemble (Figs. 4a,b) despite both experiments predicting substantially less hail in terms of both mass and number (Figs. 5b,d). This is because although the NSSL scheme predicts storms to produce relatively large hail aloft that contributes to high Z values. Results agree with previous studies (e.g., Johnson et al. 2016, 2019), which note the MY scheme predicts storms to produce a large number of small hailstones aloft while the NSSL scheme predicts storms to produce fewer but larger hailstones. Differences in simulated Z between the experiments demonstrate that Z is not a monotonic function of hydrometeor mixing ratio but is sensitive to changes in the hydrometeor particle size distribution.

Fig. 5.
Fig. 5.

Ensemble mean (forecast and analysis) (a),(b) total mass and (c),(d) number of (a),(c) rain and (b),(d) hail between 2000 and 2040 UTC. Hydrometeor concentrations are summed over a volume that spans the experiment subdomain.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

For an EnKF system to accurately estimate model state variables that are not directly observed, the filter must develop reliable multivariate covariances during the assimilation period. The cross covariance between state variables and observation priors is used to retrieve unobserved variables and modify storm structure. Previous studies, such as TX05 and XJZ10, analyzed forecast cross correlations between model state variables and an observation prior at an assumed observation location. Such correlation analyses demonstrate the expected impact of assimilating that observation on analyzed model state variables; a positive (negative) correlation between state variables and observation prior suggests the EnKF will adjust both variables in the same (opposite) direction. For instance, if the Z observation prior and vertical velocity are positively correlated the EnKF will strengthen the storm updraft (i.e., intensify the storm) when the Z prior is adjusted upward toward the observed Z value by the filter. Figure 1 of Snyder and Zhang (2003) illustrates this idea for a radial velocity observation and vertical velocity.

Examples of an ensemble correlation analysis are provided in Figs. 6a–h. The ensemble correlation analysis is performed over a subdomain containing spuriously strong Z values in MY-Q, highlighted in Fig. 4. The assumed Z observation prior (sampled at the location of the white star) and hydrometeor mixing ratios are interpolated to the lowest radar tilt of KFTG (0.5°) from the closest model grid points both above and below the radar beam prior to calculating correlation; this allows for comparison between simulated (Figs. 6j–m) and observed Z (Fig. 6i). In previous OSSEs (e.g., Zhang et al. 2004; Caya et al. 2005; TX05), analyzed state variable estimates are often poor during the first several DA cycles because the error covariance between model state variables and observation priors is unreliable. Ensemble covariances are typically examined when the state estimation and ensemble covariance become reasonably reliable in later cycles (e.g., TX05). In this study, correlation analyses are performed on the ensemble forecasts prior to the final DA at 2040 UTC when multivariate ensemble covariances can be reasonably good (Fig. 2).

Fig. 6.
Fig. 6.

Forecast error correlations for ensembles (a),(e) MY-Q, (b),(f) MY-ALL, (c),(g) NSSL-Q, and (d),(h) NSSL-ALL. The location of the subdomain is shown in Fig. 4. Correlations are calculated between the (a)–(d) mass of rain qr or (e)–(h) mass of hail qh and an assumed Z observation (white star) interpolated to the lowest tilt of (0.5°) KFTG prior to the final assimilation cycle at 2040 UTC. Positive correlations (solid lines) and negative correlations (dashed lines) are plotted in increments of 0.15 between −0.3 and 0.3. Stronger correlations (0.3 and −0.3) are contoured with a thick black line, weaker correlations (0.15, −0.15) are contoured with a thin black line. To reduce noise, the correlation field is smoothed using a 9-point filter. Color shading represents ensemble forecast mean (a)–(d) qr or (e)–(h) qh. Observed (i) Z and ensemble analysis mean Z for ensembles (j) MY-Q, (k) MY-ALL, (l) NSSL-Q, and (m) NSSL-ALL are provided at the same time and location.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

MY-Q predicts high Z values where no organized convection is observed (Fig. 4a); such spurious precipitation develops within the model during the DA cycles near 2030 UTC and is not effectively suppressed by the filter in this case. A correlation analysis is conducted to understand why the MY-Q EnKF intensifies Z in a spurious storm to the east of Denver. Although MY-Q analyses predict the storm to produce large Z values, the updraft is weak and disorganized, which remains unchanged during DA. In the vicinity of the spurious storm, Z and qr in MY-Q are relatively uncorrelated (Fig. 6a), consequently the EnKF is unable to decrease qr in the spurious storm when assimilating Z observations. In the same location MY-Q Z and qh are negatively correlated (Fig. 6e), this relationship intensifies reflectivity in the spurious storm during DA (Fig. 6j) because the EnKF increases qh when decreasing spuriously large model diagnosed Z. While experiment MY-Q exhibits the weakest correlations between Z and qr (Fig. 6a), the other ensembles exhibit mostly strong positive correlations throughout much of the subdomain (>0.3) (Figs. 6b–d). Differences in the forecast error covariance structure between MY-Q (Fig. 6a) and MY-ALL (Fig. 6b) suggest the importance of correctly updating all model state variables. MY-ALL, NSSL-Q, and NSSL-ALL analyses predict weaker Z values that better fit observations in the subdomain (Figs. 6k–m). This is in part because the background better fits observations (i.e., no spurious storm prior to assimilation) and the background error covariance produces a more optimal analysis.

Updates to microphysical state variables, such as qr, also impact thermodynamic state variables. For example, enhanced evaporational cooling attributed to an increase in precipitation during data assimilation (Figs. 5a,b) may cause MY-Q to predict more intense cold pools (Figs. 7a,b) and stronger (Figs. 7c,d) winds than MY-ALL. Winds modified by spurious storms also increase the RMSI for MY-Q Vr (Fig. 7c) more than for MY-ALL Vr (Fig. 7d).

Fig. 7.
Fig. 7.

Ensemble mean forecast (a),(b) temperature at the lowest model grid level above the surface and (c),(d) radial velocity at the lowest KFTG grid tilt for (a),(c) MY-Q and (b),(d) MY-ALL prior to the final assimilation cycle at 2040 UTC. (e) Observed KFTG radial velocity at the same time is provided. Thick black contours represent the 30 dBZ ensemble forecast mean Z smoothed using a 9-point filter. Background maps are the same as Fig. 1.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

b. Evaluating surface hail size

For an analysis to skillfully estimate surface hail size, microphysical variables must be properly updated so that the analyzed hail PSD produces reasonable hail size estimates. Box-and-whisker plots (Fig. 8) compare the ensemble member analysis estimated areal surface coverage of nonsevere, severe, and significant severe hail to HCA output (Fig. 1c). Additional subjective comparisons are performed by comparing the probability-matched mean (Ebert 2001) of the ensemble analysis Dmax (Fig. 9) to HCA output (Fig. 1c). Unlike a simple ensemble mean, which typically smooths out extreme values (i.e., large hail sizes), the probability-matched mean maintains the frequency distribution of the ensemble to preserve extreme values.

Fig. 8.
Fig. 8.

Analysis estimated coverage of nonsevere (green), severe (blue), and significant severe (purple) hail according to Dmax between 2000 and 2040 UTC for ensembles (a) MY-Q, (b) MY-ALL, (c) NSSL-Q, and (d) NSSL-ALL. Hail coverage is only considered within the domain shown in Fig. 1 and where land elevation is less than 2.5 km above mean sea level. Actual coverage of nonsevere, severe, and significant severe hail based upon HCA output (Fig. 1c) is marked with a thick horizontal line of the corresponding color. Although the largest observed hailstone on this day was approximately 70 mm, the coverage of Dmax > 100 mm (yellow) is included to identify hail size overestimation biases.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Fig. 9.
Fig. 9.

The probability match mean of Dmax for ensemble analyses (a) MY-Q, (b) MY-ALL, (c) NSSL-Q, and (d) NSSL-ALL between 2000 and 2040 UTC. A black “×” in (a) and (b) marks the location where hail PSDs are sampled in Fig. 10. Horizontal dashed lines in (b) and (d) mark the locations where vertical cross sections are taken in Figs. 1115.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

All ensembles overestimate the spatial coverage of nonsevere hail (Fig. 8); in addition, MY-Q overestimates the coverage of severe and significant severe hail (Fig. 8a). In the probability-matched mean of the ensemble analyses, Dmax in MY-Q (Fig. 9a) predicts the largest hail (Dmax > 200 mm) in spurious convection located to the south and east of Denver. HCA output is unable to identify the maximum hail size because all hail greater than 50 mm in diameter is classified as “significant severe”; however, the largest hail report recorded by the SPC for this event is 70 mm, suggesting that MY-Q substantially overestimates surface hail size. Further, almost all MY-Q estimated severe and significant severe hail coverage occurs in spurious convection away from where HCA output indicates large hail to occur (Fig. 1c). Although NSSL-Q underestimates the coverage of significant severe hail (Fig. 8c), the analyses (Fig. 9c) exhibit some qualitative skill and much of the largest hail occurs in regions where significant severe hail is detected in HCA output (Fig. 1c). MY-ALL (Fig. 8b) and NSSL-ALL (Fig. 8d) underestimate the spatial coverage of severe and significant severe hail, and their probability-matched means (Figs. 9b,d) suggest the analyses rarely indicate hail exceeding 25 mm in diameter.

Hail size changes as the DA system modifies qh and Nth. The MY-Q filter, which does not modify Nth, frequently estimates large hailstones (Fig. 9a), primarily because the EnKF makes large adjustments to qh in regions where relatively little hail is present. When large quantities of added ice mass are shared between a relatively small number of hailstones, the average hailstone diameter greatly increases. To demonstrate this behavior a PSD is sampled from a hailstorm southeast of Denver (“×” in Fig. 9a) using the MY-Q ensemble mean both before and after assimilation at 2140 UTC. At the MY-Q sampled grid point qh increases by 5.3 × 10−2 g kg−1 (Table 1) during DA. Because qh increases but Nth remains constant, the slope of the analysis hail PSD becomes more shallow than the forecast (Fig. 10a) and Dmax increases by approximately 11 mm (Table 1). In extreme instances DA causes Dmax to exceed 200 mm in MY-Q (Fig. 9a). NSSL-Q estimates hail sizes (Fig. 9c) that more closely resemble HCA output (Fig. 1c). A more reliable multivariate covariance, due in part to fewer spurious storms in the NSSL-Q background forecasts compared to the MY-Q background forecasts (not pictured), prevents large updates to qh and limits the most extreme hail sizes.

Table 1.

MY-Q and MY-ALL forecast and analysis mean qh, Nth, and Dmax at 2040 UTC. The locations where the MY-Q and MY-ALL hail variables are sampled is marked with an “×” in Figs. 9a and 9b, respectively. Hail state variables diagnose the hail PSDs shown in Fig. 10.

Table 1.
Fig. 10.
Fig. 10.

Hail PSDs diagnosed from model level 15 (1.96 km above ground level) of (a) MY-Q and (b) MY-ALL ensemble mean forecasts and analyses at 2040 UTC. The grid point where the MY-Q and MY-ALL hail PSDs are diagnosed is marked with an “×” in Figs. 9a and 9b, respectively. The hail variables used to diagnose the PSDs are provided in Table 1. PSDs were selected to highlight examples of how the EnKF updates a hail PSD during assimilation.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Hail PSDs sampled from the MY-ALL ensemble forecast and analysis mean at 2140 UTC in a hailstorm to southeast of Denver (“×” in Fig. 9b) demonstrate how the EnKF adjusts hail size through changes to both qh and Nth (Fig. 10b). During DA, the MY-ALL EnKF increases qh by approximately 2.08 × 10−1 g kg−1 and nearly triples Nth (Table 1). Although qh in MY-ALL increases more than in MY-Q (Table 1), Dmax remains relatively unchanged because of the increase in Nth. The slope of the MY-ALL analysis hail PSD is similar to that of the forecast PSD (Fig. 10b) and subsequently Dmax increases by only 2 mm (Table 1). For this analyzed grid point (Fig. 10b) Dmax is classified as nonsevere, despite the HCA output (Fig. 1c) detecting significant severe hail in the surrounding region. Microphysical variables, such as Nth, need additional constraint to accurately estimate hail size; previous work by XJZ10 suggests assimilating additional data improves initial condition estimates by further constraining microphysical variables.

Although MY-ALL and NSSL-ALL analyses estimate microphysical properties, such as Z with a moderate to high level of skill (Figs. 3a,b), there are an insufficient number of observations to constrain all microphysical variables associated with multimoment microphysics schemes and, as a result, the performance of hail size forecast is limited. To the west of Denver where the HCA detects significant severe hail (Fig. 1c), MY-ALL (Fig. 9b) and NSSL-ALL (Fig. 9d) analyses underestimate hail size in part because the EnKF modifies moments of the hail PSD so that the slope of the PSD remains relatively constant (e.g., Fig. 10b). This causes hail mass to be split among more hailstones, limiting the ability to increase the mean hailstone diameter.

c. Ensemble correlation analysis

Differences in the microphysical parameterizations cause hydrometeor properties (e.g., size, density, fall speed) to differ substantially between MP schemes. For example, Johnson et al. (2016) note that hail behavior differs between the MY and NSSL schemes primarily due to differences in hail production processes. Typically, the MY scheme produces many small hail particles aloft because the MY hail category is primarily composed of small, frozen raindrops that have been converted into hail particles (Johnson et al. 2016). The NSSL scheme produces fewer and larger hailstones because hail is produced from dense graupel that has undergone wet growth (Mansell et al. 2010). Simulated Z is intrinsically related to such microphysical assumptions (e.g., Jung et al. 2008b,a, 2012; Dawson et al. 2014; Putnam et al. 2014, 2017b; Johnson et al. 2016, 2018); these assumptions also determine the forecast error covariance (TX05; XJZ10).

The ensemble correlation between observation prior Z and model state variables (i.e., w, θ, qh, Nth, qr, Ntr) illustrates how different hail treatments in MP schemes impact the analysis increments. Ensemble correlation analyses are performed on the hailstorm located to the west of Denver over Interstate-70 (Fig. 1) prior to the final DA cycle at 2040 UTC. Vertical cross sections are taken through the updraft of the hailstorm (Figs. 9b,d) and coincide with where most hail growth processes occur. The correlation analysis is performed in the vertical to capture the variation of hail particle behavior both above and below the 0°C isotherm.

When an assumed observation is taken from the ice-phase dominant region of the MY-ALL updraft, w and Z become negatively correlated (Fig. 11a). This correlation pattern was first observed in XJZ10 and was attributed to microphysical assumptions made in the MY scheme. Stronger updrafts loft more rain and cloud water aloft, leading to a positive correlation between w and rainwater throughout much of the updraft above the 0°C isotherm (i.e., supercooled water) (Figs. 12b,d). Supercooled raindrops are converted into hail within the MY scheme, hence updraft intensity is also positively correlated with Nth (Fig. 12e) throughout much of the storm. Because raindrops are relatively small, the mean mass diameter of hail decreases in strong updrafts (e.g., Johnson et al. 2016; XJZ10), and thus Z is negatively correlated with w in MY-ALL forecasts. Weaker updrafts advect less supercooled liquid above the 0°C isotherm and lower the production rate of small hailstones; additionally larger hailstones aloft fall through the weaker updrafts to enhance Z.

Fig. 11.
Fig. 11.

Vertical cross sections of MY-ALL ensemble forecast mean (a) w, (b) qr, (c) qh, (d) θ, (e) Ntr, and (f) Nth prior to the final assimilation cycle at 2040 UTC. Cross sections are taken through the Denver hailstorm, denoted as east–west line in Fig. 9b. Forecast error correlations between an assumed Z observation taken from within the hail growth zone (white star) and state variables are plotted. The same plotting convention is used to contour correlation as in Fig. 6. The horizontal brown line represents the 0°C isotherm.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Fig. 12.
Fig. 12.

As in Fig. 11, but for (a) θ, (b) qr, (c) qh, (d) Ntr, and (e) Nth. Plotted correlations are between an assumed w observation in the hail growth zone and the plotted model state variables.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

MY-ALL correlations determine how model state variables are modified during reflectivity assimilation. Similar to XJZ10, Z at the observation location is negatively correlated with θ in the updraft region (Fig. 11d) because w and θ are positively correlated in the updraft region (Fig. 12a). Hydrometeor state variables: qr (Fig. 11b), Ntr (Fig. 11e), and Nth (Fig. 11f) are negatively correlated with Z between 4 and 8 km above MSL; this is where frozen raindrops are converted to hail. Unlike with Nth, Z exhibits mostly positive correlations with qh (Fig. 11c) above the 0°C isotherm, suggesting the complex relationship between Z and the hail size spectrum.

Unlike MY-ALL, Z at the observation location is positively correlated with w (Fig. 13a) throughout much of the NSSL-ALL storm updraft; however, similar to MY-ALL, w is positively correlated with both rain and hail state variables (Figs. 14b–e). Strong updrafts in the NSSL-ALL storm loft liquid water above the 0°C isotherm. Instead of being converted into small hail particles, the supercooled liquid is accreted by graupel and hail, and causes rimed ice particles to increase in size (Labriola et al. 2019a). It is also noted that Z and θ are positively correlated throughout the updraft in the NSSL-ALL storm (Fig. 13d) due to the same processes in MY-ALL.

Fig. 13.
Fig. 13.

As in Fig. 11, but for NSSL-ALL (a) w, (b) qr, (c) qh, (d) θ, (e) Ntr, and (f) Nth. Plotted correlations are between an assumed Z observation in the hail growth zone and the plotted model state variables. Cross sections are taken through the Denver hailstorm, denoted as east–west line in Fig. 9d.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Fig. 14.
Fig. 14.

As in Fig. 13, but for (a) θ, (b) qr, (c) qh, (d) Ntr, and (e) Nth. Plotted correlations are between an assumed w observation in the hail growth zone and the plotted model state variables.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Prognostic graupel and hail volume (υg and υh, respectively) equations allow the NSSL scheme to vary rimed ice density during the forecast and modify both particle sedimentation and hail production processes. In the storm updraft region, Z and υh are positively correlated (Fig. 15a). Hail accreting more liquid water in the stronger updraft increases hail volume and Z in the updraft (Figs. 15a,b). It is noted that hail density is a nonunique solution, and that an infinite combination of qh, υh pairs can produce the same density value; correlations are therefore not indicative of modifications to hail density.

Fig. 15.
Fig. 15.

As in Fig. 13, but for υh. Plotted correlations are between υh and an assumed (a) Z observation or (b) w observation in the hail growth zone.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

While this study primarily evaluates the ensemble background error correlation fields within strong thunderstorms at 2040 UTC, it is important to note the background error covariance evolves with time through the DA cycles. Correlations between observation prior Z and w taken from within the updraft of a storm to the east of Denver are evaluated throughout the DA window (Fig. 16). This storm is selected because it is present in observations throughout the DA window. Prior to the first DA cycle both MY-ALL and NSSL-ALL are not confident in the location of organized convection, which causes correlations between w and Z to be noisy (Figs. 17a,d). Both ensembles predict storm updrafts to be weak (w < 15 m s−1) at 2020 UTC (Figs. 17b,e). Because of that, the MY scheme predicts the storm to produce little hail. Instead, the weak updrafts loft precipitation above the 0°C isotherm and enhance Z; w and Z become negatively correlated with time as the storm intensifies and strong updrafts increase the number of small hail stones aloft. These correlation fields are similar to those observed in the Denver hail storm (Fig. 11a). Unlike MY-ALL, w and Z are positively correlated in NSSL-ALL between 2020 and 2040 UTC (Figs. 17e,f) because the updrafts advect more precipitation above the 0°C isotherm and enhance the production of large hailstones. The evolving correlation fields demonstrate the sensitivity of the background error covariance to the microphysical assumptions made within MP schemes as well as other factors including the quality of the state estimation. In this paper, we focus more on the later time (2040 UTC) when the state estimation becomes better.

Fig. 16.
Fig. 16.

KFTG observed Z at the lowest radar tilt (0.5°). Thick black lines denote the approximate location of where vertical cross sections are taken in Fig. 17.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Fig. 17.
Fig. 17.

Vertical cross section of the (a)–(c) MY-ALL and (d)–(f) NSSL-ALL ensemble mean w at (a),(d) 2000, (b),(e) 2020, and (c),(f) 2040 UTC. Cross sections are taken from within a storm to the east of Denver. Forecast error correlations between an assumed Z observation taken from within the hail growth zone (white star) and w are plotted. The same plotting convention is used to contour correlation as in Fig. 6. The horizontal brown line represents the 0°C isotherm.

Citation: Monthly Weather Review 148, 6; 10.1175/MWR-D-19-0300.1

Correlations between model state variables (i.e., w and θ) and Z have important consequences on the representation of analyzed storms. The correlation patterns observed in this study were also observed in other strong storms at 2040 UTC (e.g., Figs. 17c,f) and during earlier DA cycles (not shown). When MY-ALL underestimates (overestimates) Z, the filter will decrease (increase) w (Fig. 11a) and make the updraft air temperature cooler (warmer) (Fig. 11d). In effect the EnKF in MY-ALL weakens the analyzed storm. Both w (Fig. 13a) and θ (Fig. 13d) are positively correlated with Z in NSSL-ALL and cause the filter to intensify the storm under similar circumstances. Z is generally a function of the diameter of a hailstone to the sixth power although it becomes more complex for a large or wet hailstone due to the Mie scattering effect. In addition, numerous observational studies (e.g., Heymsfield 1983; Nelson 1983; Ziegler et al. 1983; Foote 1984; Dennis and Kumjian 2017) have suggested that hail size is not a monotonic function of updraft strength, but is also influenced by vertical wind shear, environmental moisture, and updraft volume. More observations are needed to determine which microphysics scheme produces an analysis increment that more closely reflects reality.

4. Summary and further discussion

Newer multimoment bulk microphysics schemes (MP schemes) are increasing in complexity and predicting more state variables in order to improve the representation of microphysical processes (e.g., riming, sedimentation). As the number of degrees of freedom within a MP scheme increases, initial state estimation using an EnKF becomes more challenging, in part because the large number of microphysical state variables predicted by a MP scheme are insufficiently constrained by the limited number observations that can infer microphysical properties [e.g., radar reflectivity (Z)].

In this study a cycled EnKF framework is used to update the microphysical properties of hail-producing storms for the 8 May 2017 Colorado severe hail event. Four ensemble forecast experiments are conducted using either the Milbrandt and Yau (2005a) double-moment (MY) scheme or the NSSL double-moment variable density rimed ice scheme (Mansell et al. 2010). An EnKF is used to update either only hydrometeor mixing ratios (MY-Q, NSSL-Q) or all microphysical state variables (MY-ALL, NSSL-ALL); in addition to dynamic and thermodynamic information. While most previous studies use observed Z to update all microphysical variables, this study examines whether updating a limited number of particle size distribution moments (i.e., mass mixing ratio) provides sufficient constraint for an ensemble system run using a multimoment scheme. The model forecasts and analyses are evaluated against observed Z in addition to output from a hydrometeor classification algorithm (HCA) to determine which configuration produces the more realistic state variable estimates related to hail.

For ensembles that update only hydrometeor mixing ratio, in particular MY-Q, the forecast error covariance is often unreliable and limits the accuracy of state variable estimates. We suspect that the forecast error covariance is unreliable because updating a limited number of microphysical variables introduces large imbalances into the ensemble prediction system during assimilation. For example, negative correlations between Z and hydrometeor mixing ratios in experiment MY-Q cause spurious radar echoes to be enhanced during assimilation, even if clear-air reflectivity data are assimilated. Further, due to the poor multivariate error covariance structure, the EnKF in MY-Q was unable to replicate Z intensity or structure within mature storms. Generally, the forecast error covariance of MY-ALL and the NSSL ensembles (NSSL-Q and NSSL-ALL) is more reliable than that of MY-Q, allowing the ensemble analyses to estimate Z and radial velocity (Vr) with less error.

Comparison (and verification) of analyzed surface hail size against HCA output provides insight into how the different EnKF configurations update hail particle size distributions. Generally, ensembles that update only mixing ratio (MY-Q, NSSL-Q) estimate large surface hail sizes because large quantities of hail mass can be distributed among a relatively small number of hailstones. Ensembles that update all microphysical variables (MY-ALL and NSSL-ALL) tend to underestimate surface hail size because these ensembles typically predict larger hail number concentrations and cause hail mass to be split between many small hailstones. Results suggest the need to assimilate additional observations of microphysical relevance in order to better constrain the increased number of state variables (e.g., number concentration).

Hail production and growth assumptions made by MP schemes substantially influence the forecast error covariance. The MY scheme generates hail primarily from small frozen raindrops; this process increases the number of small hailstones above the 0°C isotherm and causes Z to be negatively correlated with air temperature and updraft strength. Due to differences in hail production processes, the opposite correlation patterns are observed in the NSSL scheme forecasts for intense hailstorms. Hail growth assumptions also influence correlation patterns for hydrometeor state variables. For this study the NSSL-ALL experiment favors more positive correlations between Z and hail variables (mass and number) than the MY-ALL experiment. While this study provides insight into the complexities of updating microphysical variables via an EnKF, it is noted that only a limited number of the available schemes are evaluated. A large number of multimoment schemes (e.g., Thompson et al. 2008; Morrison et al. 2005, 2009; Lim and Hong 2010; Morrison and Milbrandt 2015; Morrison et al. 2015; Milbrandt and Morrison 2016) are used in weather prediction systems; however, relatively few studies have analyzed how underlying microphysical assumptions made within these schemes impact multivariate ensemble background error covariances and state variable updates within ensemble DA. We have also shown that the multivariate ensemble covariances can be sensitive to the quality of storm analysis and possibly also the storm intensity and morphology.

Due to the many possible combinations of variables that are able to produce a given Z value, microphysical state variables will remain insufficiently constrained by observed reflectivity. The assimilation of polarimetric observations is shown to provide additional constraint in observation system simulation experiments (OSSEs) conducted by Jung et al. (2008b, 2010b) and a real case study conducted by Putnam et al. (2019); however, assimilating polarimetric variables remains nontrivial. Although MP schemes can replicate basic polarimetric signatures (e.g., Johnson et al. 2016, 2019; Putnam et al. 2017a,b, 2019), the NWP output from these schemes is often biased in intensity and coverage. To benefit from the assimilation of polarimetric observations, more effort is needed to improve the representation of microphysical processes that generate polarimetric signatures and to find the optimal configurations that can maximize the impact of polarimetric data. Improving the representation of microphysical processes has the potential to not only improve EnKF estimates, but also mitigate misrepresentations of subgrid-scale processes and reduce model errors.

Acknowledgments

This work was primarily supported by NSF Grant AGS-1261776 as part of the Severe Hail, Analysis, Representation, and Prediction (SHARP) project. Supplemental support was provided by NOAA Grant NA16OAR4590239. The third and fourth authors are also supported by the NOAA Warn-on-Forecast (WoF) and Spectrum Efficient National Surveillance Radar (SENSR) Grant NA160AR4320115. The SHARP team includes David Gagne, Amy McGovern, Amanda Burke, in addition to the coauthors. Computing was performed primarily on the XSEDE Stampede2 supercomputer at the University of Texas Advanced Computing Center (TACC). The authors thank Kevin Thomas for support in obtaining data along with Marcus Johnson who provided useful feedback on the project.

REFERENCES

  • Aksoy, A., F. Zhang, and J. W. Nielsen-Gammon, 2006: Ensemble-based simultaneous state and parameter estimation with MM5. Geophys. Res. Lett., 33, L12801, https://doi.org/10.1029/2006GL026186.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Aksoy, A., D. C. Dowell, and C. Snyder, 2009: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part I: Storm-scale analyses. Mon. Wea. Rev., 137, 18051824, https://doi.org/10.1175/2008MWR2691.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Allen, J. T., and M. K. Tippett, 2015: The characteristics of United States hail reports : 1955–2014. Electron. J. Severe Storms Meteor., 10 (3), https://ejssm.org/ojs/index.php/ejssm/article/viewArticle/149.

    • Search Google Scholar
    • Export Citation
  • CAPS, 2017: CAPS spring forecast experiment program plan. Center for The Analysis and Prediction of Storms, accessed 14 July 2018, http://www.caps.ou.edu/~fkong/sub_atm/spring17.html.

  • Caya, A., J. Sun, and C. Snyder, 2005: A comparison between the 4DVAR and the ensemble Kalman filter techniques for radar data assimilation. Mon. Wea. Rev., 133, 30813094, https://doi.org/10.1175/MWR3021.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Changnon, S. A., 2009: Increasing major hail losses in the U.S. Climatic Change, 96, 161166, https://doi.org/10.1007/s10584-009-9597-z.

  • Changnon, S. A., and J. Burroughs, 2003: The tristate hailstorm: The most costly on record. Mon. Wea. Rev., 131, 17341739, https://doi.org/10.1175//2549.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Changnon, S. A., D. Changnon, and S. D. Hilberg, 2009: Hailstorms across the nation: An atlas about hail and its damages. Illinois State Water Survey Contract Rep. 2009-12, 92 pp.

  • Cintineo, J. L., T. M. Smith, V. Lakshmanan, H. E. Brooks, and K. L. Ortega, 2012: An objective high-resolution hail climatology of the contiguous United States. Wea. Forecasting, 27, 12351248, https://doi.org/10.1175/WAF-D-11-00151.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Clark, A. J., J. S. Kain, P. T. Marsh, J. Correia, M. Xue, and F. Kong, 2012: Forecasting tornado pathlengths using a three-dimensional object identification algorithm applied to convection-allowing forecasts. Wea. Forecasting, 27, 10901113, https://doi.org/10.1175/WAF-D-11-00147.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Crum, T. D., R. L. Alberty, and D. W. Burgess, 1993: Recording, archiving, and using WSR-88D data. Bull. Amer. Meteor. Soc., 74, 645653, https://doi.org/10.1175/1520-0477(1993)074<0645:RAAUWD>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dawson, D. T., II, L. J. Wicker, E. R. Mansell, and R. L. Tanamachi, 2012: Impact of the environmental low-level wind profile on ensemble forecasts of the 4 May 2007 Greensburg, Kansas, tornadic storm and associated mesocyclones. Mon. Wea. Rev., 140, 696716, https://doi.org/10.1175/MWR-D-11-00008.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dawson, D. T., II, E. R. Mansell, Y. Jung, L. J. Wicker, M. R. Kumjian, and M. Xue, 2014: Low-level Z DR signatures in supercell forward flanks: The role of size sorting and melting of hail. J. Atmos. Sci., 71, 276299, https://doi.org/10.1175/JAS-D-13-0118.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dawson, L. C., G. S. Romine, R. J. Trapp, and M. E. Baldwin, 2017: Verifying supercellular rotation in a convection-permitting ensemble forecasting system with radar-derived rotation track data. Wea. Forecasting, 32, 781795, https://doi.org/10.1175/WAF-D-16-0121.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dennis, E. J., and M. R. Kumjian, 2017: The impact of vertical wind shear on hail growth in simulated supercells. J. Atmos. Sci., 74, 641663, https://doi.org/10.1175/JAS-D-16-0066.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Doswell, C. A., H. E. Brooks, and M. P. Kay, 2005: Climatological estimates of daily local nontornadic severe thunderstorm probability for the United States. Wea. Forecasting, 20, 577595, https://doi.org/10.1175/WAF866.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Doviak, R. J., and D. S. Zrnić, 1993: Doppler Radar and Weather Observations. 2nd ed. Academic Press, 562 pp.

  • Doviak, R. J., V. Bringi, A. Ryzhkov, A. Zahrai, and D. Zrnić, 2000: Considerations for polarimetric upgrades to operational WSR-88D radars. J. Atmos. Oceanic Technol., 17, 257278, https://doi.org/10.1175/1520-0426(2000)017<0257:CFPUTO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dowell, D. C., and L. J. Wicker, 2009: Additive noise for storm-scale ensemble data assimilation. J. Atmos. Oceanic Technol., 26, 911927, https://doi.org/10.1175/2008JTECHA1156.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dowell, D. C., F. Zhang, L. J. Wicker, C. Snyder, and N. A. Crook, 2004: Wind and temperature retrievals in the 17 May 1981 Arcadia, Oklahoma, supercell: Ensemble Kalman filter experiments. Mon. Wea. Rev., 132, 19822005, https://doi.org/10.1175/1520-0493(2004)132<1982:WATRIT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Du, J., G. Dimego, B. Zhou, D. Jovic, B. Ferrier, and B. Yang, 2015: Regional ensemble forecast systems at NCEP. 27th Conf. on Weather Analysis and Forecasting/23rd Conf. on Numerical Weather Prediction, Chicago, IL, Amer. Meteor. Soc., 2A.5, https://ams.confex.com/ams/27WAF23NWP/webprogram/Manuscript/Paper273421/NWP2015_NCEP_RegionalEnsembles_paper.pdf.

  • Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation. Mon. Wea. Rev., 129, 24612480, https://doi.org/10.1175/1520-0493(2001)129<2461:AOAPMS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Edwards, R., and R. L. Thompson, 1998