## 1. Introduction

A major focus in recent convective-scale numerical weather prediction (NWP) research has been improving both the forecast initial conditions and the microphysics parameterizations that are important for convective-scale predictions; both areas address major challenges identified for the Warn-on-Forecast paradigm by Stensrud et al. (2013). Data assimilation (DA), which is an indispensable part of convective-scale NWP, aims to improve the forecast initial condition by optimally combining available observations and a background model state to produce the best possible estimate of the atmospheric state. One popular DA method for convective-scale NWP is the ensemble Kalman filter (EnKF; Evensen 1994, 2003), which uses an ensemble of forecasts to estimate the background error covariance. The application of EnKF methods for the assimilation of radar observations has produced successful results for a variety of real storm cases (e.g., Dowell et al. 2004; Dowell and Wicker 2009; Lei et al. 2009; Aksoy et al. 2009, 2010; Dowell et al. 2011; Snook et al. 2011; Dawson et al. 2012; Jung et al. 2012; Snook et al. 2012; Yussouf et al. 2013; Tanamachi et al. 2013; Putnam et al. 2014, hereafter P14; Wheatley et al. 2014; Snook et al. 2015; Yussouf et al. 2015).

*x*with diameter

*D*in a unit volume; and

*q*

_{x}, while specifying the intercept and shape parameters; double-moment (DM) schemes also predict the zeroth moment, the total number concentration

*N*

_{tx}, so that both the slope and intercept parameters can be updated; triple-moment (TM) schemes predict the additional sixth moment of the distribution, often called radar reflectivity factor

*z*, and effectively allow the slope, intercept, and shape parameters of the gamma distribution to vary independently. The shape parameter is specified as a constant or diagnosed value in DM schemes.

The use of DM schemes for EnKF-based convective-scale NWP has been shown to improve storm structure and evolution during the analysis cycles as well as forecasts for both supercell and mesoscale convective system (MCS) cases. Dawson et al. (2015) showed that DM and TM schemes produced better predictions of a supercell storm than an SM scheme. Xue et al. (2010) first successfully applied EnKF to the estimation of model states associated with a DM scheme using simulated radar observations of a supercell, while Jung et al. (2012) first successfully used a DM scheme for EnKF radar DA for a real supercell storm. For the 8 May 2003 Moore, Oklahoma, supercell, Yussouf et al. (2013) found that both a fully DM scheme (which predicts the total number concentration for graupel, *N*_{tg}) as well as a semi-DM scheme (which diagnoses the intercept parameter for graupel, *N*_{0g}) produced more small graupel than an SM scheme; this graupel was advected farther downwind, forming a broader forward flank downdraft (FFD), in agreement with observations. For MCS cases, P14, and subsequently Wheatley et al. (2014), found that DM MP schemes improved the development of trailing stratiform precipitation compared to an SM scheme. A dramatic increase in the formation and detrainment of snow and ice from the leading convective towers rearward over the stratiform region resulted in much broader stratiform coverage.

Recently, simulated dual-polarization (dual-pol) radar variables have been used to evaluate microphysical states estimated through data assimilation and predicted by convective-scale models for real cases, by comparing these variables to observations (Jung et al. 2012; Li and Mecikalski 2012; Dawson et al. 2014; P14; Posselt et al. 2015; Putnam et al. 2017). The dual-pol variables contain additional information on PSDs over reflectivity *Z*, specifically information about the size, content, and diversity of hydrometeors present in the radar volume. For example, differential reflectivity (*Z*_{DR}) values are dependent on the horizontal-to-vertical axis ratio of hydrometeors; values are higher for large, oblate raindrops and low for dry, tumbling hail (Bringi and Chandrasekar 2001). Additionally, specific differential phase (*K*_{DP}) is sensitive to the amount of liquid water with which the radar pulse interacts.

Dynamical and microphysical processes can lead to significant variation in hydrometeor PSDs over small spatial scales. For example, the size sorting of hydrometeors associated with storm-relative wind shear in the forward flank of supercells leads to a significant increase in the number of large raindrops in low-level rain PSDs that can be identified by an increase in *Z*_{DR} values known as the *Z*_{DR} arc (Kumjian and Ryzhkov 2008, 2012; Dawson et al. 2014). This signature is indistinguishable in the observed *Z* pattern. Jung et al. (2012), in an EnKF data assimilation study of a supercell storm that occurred on 29 May 2004 in central Oklahoma, showed that using a DM MP scheme (Milbrandt and Yau 2005b) allowed the model to replicate observed dual-pol signatures such as the *Z*_{DR} arc. P14 found that simulated *Z*_{DR} patterns in the final EnKF analysis of an MCS produced using a DM scheme better represented the distribution of large, oblate raindrops in the leading convective line and small- to medium-sized raindrops in the trailing stratiform region compared to an analysis produced using an SM scheme. The SM analysis failed to capture this distinction, overestimating raindrop size in the stratiform region.

P14, which considered DM schemes and simulated dual-pol variables, focused on the final EnKF analyses of the experiments and on deterministic forecasts of simulated *Z*. P14 paid particular attention to the improvement in the microphysical and dynamical aspects of the MCS when using the DM scheme, such as the hydrometeor distributions and cold pool, and did not consider forecasts of dual-pol variables in depth. The current study expands upon P14 by performing and examining ensemble forecasts of the 8–9 May 2007 MCS case in terms of both *Z* and dual-pol radar variables.

Ensemble forecasts offer additional benefits compared to deterministic forecasts, including the ability to produce probabilistic forecasts that account for uncertainties in the initial condition and prediction model (including microphysics). Ensemble forecasts are integral to the Warn-on-Forecast vision outlined in Stensrud et al. (2009), providing the basis for operational probabilistic prediction of hazards associated with severe convection. EnKF methods inherently provide an ensemble of analyses suitable for initializing ensemble forecasts (Kalnay 2002). Analyses from well-tuned EnKF systems represent the flow-dependent background error that properly characterizes the analysis uncertainty (Kalnay et al. 2006). EnKF-initialized ensemble forecasts have been used to produce convective-scale probabilistic forecasts in several recent studies. For tornadic storms, probabilistic forecasts have focused on low-level vorticity; Dawson et al. (2012) and Yussouf et al. (2013, 2015, 2016) showed that the ensemble probability of vorticity exceeding certain thresholds predicted the observed damage paths of tornadoes well in supercell cases, while Snook et al. (2012, 2015) obtained similarly successful results for an MCS case. Snook et al. (2012, 2015) also demonstrated the benefits of using multiple MP schemes in EnKF ensembles for probabilistic forecasts of *Z*, while Yussouf et al. (2016) showed assimilating radar data using EnKF produced significantly improved probabilistic quantitative precipitation forecasts.

In previous convective-scale EnKF studies using DM MP schemes, little attention has been given to probabilistic prediction of simulated radar variables or quantitative probabilistic forecast skill scores of simulated radar variables. In particular, probabilistic forecasting of simulated dual-pol variables has never been reported in the formal literature as far as we know. Although Snook et al. (2012, 2015) examined probabilistic prediction of *Z*, the studies were limited to the use of SM MP schemes, and they did not examine any of the dual-pol variables either. Dawson et al. (2012), Yussouf et al. (2013), and Wheatley et al. (2014) conducted ensemble forecasts using DM MP schemes, but they only examined individual member or ensemble mean forecasts, not probabilistic forecasts of *Z*. The more recent studies of Yussouf et al. (2015, 2016) showed that probabilistic forecasts of *Z* exceeding 40 dB*Z* based on the semi-DM Thompson (Thompson et al. 2004; 2008) scheme for two tornadic supercell cases matched the locations of observed supercells well. However, no quantitative probabilistic forecast skill scores for *Z* were presented. Additionally, these preceding studies did not directly compare simulated radar variables on the elevation levels where observed data were taken, but such comparisons are more intuitive for operational forecasting purposes and therefore should be performed first. Putnam et al. (2017) simulated dual-pol variables from the Center for Analysis and Prediction of Storms (CAPS) storm-scale ensemble forecasts as part of the Hazardous Weather Testbed Spring Experiment (Kong 2013) for several members that differed only in the use of MP schemes. The study emphasized the differences among the different MP schemes in their ability to simulate dual-pol radar signatures, but ensemble probabilistic forecasting of dual-pol radar variables was not investigated.

In this study, we examine two ensemble forecasts of an MCS produced using either mixed SM MP schemes or a DM MP scheme during both the EnKF DA and subsequent forecasts. We evaluate the simulated dual-pol variables both qualitatively and quantitatively. Neighborhood probabilities are calculated for both *Z* and the dual-pol variables from ensemble forecasts with both perturbed initial conditions and microphysics perturbations, and the probabilistic forecasting performance of the two ensembles is compared. Probabilistic forecasts of the dual-pol variables include additional physical meaning beyond what *Z* can show, including the connection between *K*_{DP} and rainfall rate, and the uncertainty such forecasts may contain. As pointed out earlier, probabilistic forecasts of dual-pol radar variables have never been examined before.

The remainder of this paper is organized as follows: section 2 reviews the 8–9 May 2007 MCS case and the experiment design, and briefly summarizes the methods used in the SM and DM ensemble forecasts. In section 3, we assess the skills of the ensemble probabilistic forecasts obtained with the SM and DM schemes. Finally, section 4 summarizes the findings. The challenges associated with probabilistic forecasting and evaluation of highly localized dual-pol signatures are also discussed and some suggestions for future research are given.

## 2. Experimental case and method

The model, EnKF settings, and data sources used in this study are all inherited from P14. Two experiments are conducted using an SM and DM MP scheme, respectively, in which ensemble forecasts are initialized from the final EnKF analyses for the 8–9 May 2008 MCS. The SM ensemble (EXP_S) and the DM ensemble (EXP_D) use the same configuration during the EnKF analysis period as the corresponding control experiments EXP_S_M_3_5/EXP_S and EXP_D_M_3_5/EXP_D from P14. A summary of the case and experiment settings is provided below.

### a. System overview

On 8 May 2007, an MCS developed in western Texas and moved to the northeast into southwestern and central Oklahoma during the evening hours (approximately 0000–0500 UTC 9 May). During the day on 8 May, a positively tilted upper-level trough and seasonably warm, moist air at the surface led to the development of widespread convection over western Texas. The cool outflow from these storms helped to initiate additional convection and contributed to upscale growth over time as the storms became organized into a convective line. Ahead of the line, isolated supercell storms developed in northwest Texas and southwest Oklahoma. The developing MCS interacted with two of these storms, leading to the development and maintenance of a line end vortex (LEV) near the northern end of the MCS (P14; Schenkman et al. 2011). During the 0100–0500 UTC 9 May timeframe the system remained in the asymmetric stage of MCS development, with a broad area of leading stratiform precipitation, an intense leading convective line, and a trailing region of stratiform precipitation [Fig. 1, with term definitions based on Fritsch and Forbes (2001)]. Widespread heavy rain was observed with this MCS, and four tornadoes were reported near the LEV (NWS 2012). For a more detailed discussion of the development, structure, and impacts of this MCS, we refer the reader to P14, Schenkman et al. (2011), and Snook et al. (2011).

### b. Forecast model settings

The forecast model used is the Advanced Regional Prediction System (ARPS; Xue et al. 2000, 2001, 2003). ARPS is a fully compressible, nonhydrostatic, three-dimensional atmospheric model suitable for convective-scale simulation and prediction. ARPS predicts the three-dimensional wind components (*u*, *υ*, *w*), pressure *p*, potential temperature *θ*, water vapor mixing ratio *q*_{υ}, as well as the mixing ratios for cloud water *q*_{c}, rain *q*_{r}, snow *q*_{s}, cloud ice *q*_{i}, and graupel-like rimed ice *q*_{g} and/or hail-like rimed ice *q*_{h}, depending on the SM MP scheme used. For a DM MP scheme, the model also predicts the hydrometeor number concentrations (*N*_{tx}, where *x* refers to individual hydrometeor species). Additional parameterizations used include NASA Goddard Space Flight Center longwave and shortwave radiation, 1.5-order turbulent kinetic energy (TKE)-based subgrid-scale turbulence closure and convective boundary layer parameterization schemes, and a two-layer land surface/soil-vegetation model. More details on the model physics can be found in Xue et al. (2001). The model domain used consists of 259 × 259 grid points in the horizontal with a 2-km horizontal grid spacing and a stretched vertical grid using 53 vertical grid points with a minimum grid spacing of 100 m and average grid spacing of 500 m. The model terrain is interpolated to the 2-km grid from a 30-arcsecond high-resolution USGS dataset.

The full experiment consists of a 1-h spinup period, 1-h data assimilation period, and a 3-h ensemble forecast. During the spinup period, a 1-h deterministic forecast on the 2-km model grid is initialized from the NCEP North American Mesoscale Forecast System (NAM) analysis at 0000 UTC. The 3-h NAM forecast from 0000 UTC valid at 0300 UTC and the NAM analysis at 0600 UTC provide lateral boundary conditions during the forecast. At 0100 UTC, smoothed, random perturbations are added to the 1-h spinup forecast (Tong and Xue 2008; Snook et al. 2011) to initialize a 40-member ensemble for performing the EnKF data assimilation cycles. The first assimilation is performed at 0105 UTC and the last at 0200 UTC, with an assimilation cycle length of 5 min. Only radar data are assimilated. Further details on the data assimilation are given below. Following the assimilation period, the final ensemble analyses are used to initialize 3-h ensemble forecasts from 0200 to 0500 UTC.

### c. Data sources

As in Snook et al. (2011) and P14, Level-II *Z* and radial velocity (*V*_{r}) data from five WSR-88D S-band radars in Oklahoma and Texas are assimilated. These include KTLX (Twin Lakes, Oklahoma City, Oklahoma), KVNX (Vance Air Force Base, Oklahoma), KAMA (Amarillo, Texas), KLBB (Lubbock, Texas), and KDYX (Abilene, Texas). Together, these five radar sites provide full coverage of the MCS during the DA period. KFDR (Fredrick, Oklahoma) is also located near the MCS, but level-II data from KFDR are unavailable during the assimilation window. Both *Z* and *V*_{r} data are also assimilated from four experimental X-band radars maintained by the Engineering Research Center (ERC) for Collaborative and Adaptive Sensing of the Atmosphere (CASA; McLaughlin et al. 2009) in southwestern Oklahoma. These radars, KCYR (Cyril, Oklahoma), KSAO (Chickasha, Oklahoma), KLWE (Lawton, Oklahoma), and KRSP (Rush Springs, Oklahoma), provide additional low-level radar coverage over a portion of the MCS near the LEV. The National Severe Storms Laboratory’s dual-pol S-band radar KOUN (Norman, Oklahoma) is used for verification. The locations of radars used in this study are marked in Fig. 1.

Radar observations are interpolated to the model grid horizontally, but are left at the height of the radar elevation scan in the vertical, following Xue et al. (2006). The observations are interpolated to the time of each assimilation cycle using the previous and subsequent volume scan. Quality control procedures, include despeckling, ground clutter removal, and velocity dealiasing, are applied to the radar data prior to assimilation. For the CASA X-band *Z* observations, attenuation correction is performed before the data are assimilated (Chandrasekar et al. 2004). The data quality control procedure used follows P14. Specifically, for KOUN, dual-pol variables are removed when *ρ*_{HV} < 0.8, which corresponds to nonmeteorological echoes. Here *K*_{DP} is calculated by first unfolding and then smoothing the differential phase (Φ_{DP}) data using an averaging window with 9 gates when *Z* > 40 dB*Z* and 25 gates when *Z* < 40 dB*Z*. The least squares fit method of Ryzhkov and Zrnic (1996) is then used to calculate *K*_{DP} using the same threshold to determine the number of gates.

### d. Ensemble Kalman filter settings

The EnKF algorithm used is an implementation of the ensemble square root filter (EnSRF) of Whitaker and Hamill (2002). As mentioned earlier, the ensemble is first initialized at 0100 UTC by adding random, smoothed, Gaussian perturbations to the 1-h spinup forecast. Perturbations with a standard deviation of 2 m s^{−1} are added to *u*, *υ*, and *w* and a standard deviation of 2 K are added to *θ* (using positive values only) across the entire model domain. Additional perturbations with a standard deviation of 0.001 kg kg^{−1} are added to the hydrometeor mixing ratios and water vapor but they are confined to regions of precipitation where *Z* is greater than 5 dB*Z*. The perturbations are smoothed following Tong and Xue (2008) and we use a horizontal correlation length scale of 8 km and vertical scale of 5 km.

Processed *Z* and *V*_{r} data from the nine radars are assimilated every 5 min between 0105 and 0200 UTC. This includes clear-air *Z* data from the WSR-88Ds, which Tong and Xue (2005) have shown helps to suppress development of spurious convection. Clear-air data from the CASA network are not used because of uncertainties associated with the X-band attenuation (*Z* values similar to those associated with clear air may be due to a completely attenuated signal). Assimilation of *V*_{r} is limited to regions where *Z* > 20 dB*Z*. The radar observation operator used is that of Jung et al. (2008), which is different from that used in Snook et al. (2011) and the same as that in P14. A horizontal and vertical covariance localization radius of 6 km is used for both *Z* and *V*_{r} based on the correlation function of Gaspari and Cohn (1999).

The observation error and covariance inflation methods used are the same as in P14. They were chosen based on preliminary experiments using various configurations. Radar observation error values of 5 dB*Z* for *Z* and 3 m s^{−1} for *V*_{r} are used. Multiplicative inflation (Anderson 2001) with a factor of 1.25 is applied to the prior ensemble for grid points where *Z* > 20 dB*Z* in order to maintain ensemble spread and produce a closer to optimal consistency ratio value (Dowell et al. 2004) throughout the assimilation period than could be achieved using lower values of observation error and other covariance inflation methods such as additive noise (Dowell and Wicker 2009) and relaxation to prior ensemble (Zhang et al. 2004).

### e. Microphysics schemes used and their configurations

The two control experiments differ solely in terms of the microphysics scheme used. EXP_S uses a combination of three different SM MP schemes during both the assimilation period and the forecast. Using multiple MP schemes within the ensemble was shown to increase ensemble spread and improve root-mean-square innovation (RMSI) during the assimilation period by Snook et al. (2011). Of the 40 ensemble members, 16 use the Lin scheme (Lin et al. 1983), 16 use the WRF single-moment 6-class microphysics scheme (WSM6; Hong and Lim 2006), and 8 use the simplified NWP scheme (NEM) of Schultz (1995). Fewer NEM members are included because NEM member forecasts did not tend to perform as well as members using the other SM schemes. The intercept parameter used for rain (*N*_{0r}) is reduced by a factor of 10 from the typical value of 8 × 10^{6} m^{−4} to 8 × 10^{5} m^{−4}, following Snook and Xue (2008), who found that the reduced *N*_{0r} value led to a lower and more realistic evaporation rate and associated surface cold pool intensity.

The DM experiment, EXP_D, uses the Milbrandt and Yau (MY2; Milbrandt and Yau 2005b) scheme. During the assimilation period, the shape parameters *α* for rain and hail vary inversely between 0.0 and 2.0 in 0.05 increments for each member to increase ensemble spread. All other hydrometeor categories use *α* = 0.0; furthermore, *α* is set to 0.0 for all categories in the forecasts after 0200 UTC. As in Snook et al. (2011) and P14, the graupel, or low density rimed ice, category of the MY scheme is turned off to more closely resemble the majority of members in EXP_S which exclusively predict a high density, or hail-like, rimed ice category.

## 3. Results of experiments

In this section, ensemble forecast results from EXP_S and EXP_D are presented. The results are divided into two parts: 1) an evaluation of the overall forecast quality of the complete MCS using *Z* mosaics and 2) verification of simulated dual-pol variables against KOUN observations. Evaluations include qualitative discussion of system structure and feature placement, evaluation of probabilistic forecasts, and quantitative verification. We also discuss methods and challenges as they relate to dual-pol variables.

### a. Ensemble forecasts of radar reflectivity

#### 1) Qualitative evaluation of reflectivity mosaics

Ensemble forecasts of the MCS are evaluated at 1-, 2-, and 3-h forecast times by verifying the probability matched ensemble mean (PMEM; Ebert 2001) forecasts of *Z* from EXP_S and EXP_D against mosaics of observed *Z* plotted at model level 10, which is approximately 2 km above ground level (AGL) (Fig. 2). Model level 10 is the lowest level where complete radar coverage of the MCS is available without gaps between radars. The mosaics of observed *Z* are created by combining observations from the five WSR-88Ds used during assimilation, with observations interpolated to the model grid as discussed above in section 2c. Where multiple radars observe a specific grid point, the maximum value of *Z* is used in the mosaics. The larger values are used because they are less likely to have been subject to resolution smearing and attenuation effects, although the latter is usually rather small. The PMEM is used instead of a regular ensemble mean because *Z* can vary greatly over small distances, leading to underprediction of intensity and overprediction of areal coverage when ensemble members with even slightly displaced convective features are averaged. The PMEM ranks all *Z* values in the domain from highest to lowest for both the ensemble mean and the full ensemble, then reassigns values from the full ensemble probability density function of *Z* to the grid location with the same rank in the ensemble mean; this process helps mitigate the aforementioned biases introduced by taking the ensemble mean (Ebert 2001; Clark et al. 2009).

Mosaics of observed reflectivity (dB*Z*) as in Fig. 1 from (a)–(c) 0300–0500 UTC as well as probability matched ensemble mean reflectivity for (d) EXP_S and (g) EXP_D at 0300 UTC/1-h forecast; (e),(h) 0400 UTC/2-h forecast; and (f),(i) 0500 UTC/3-h forecast.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Mosaics of observed reflectivity (dB*Z*) as in Fig. 1 from (a)–(c) 0300–0500 UTC as well as probability matched ensemble mean reflectivity for (d) EXP_S and (g) EXP_D at 0300 UTC/1-h forecast; (e),(h) 0400 UTC/2-h forecast; and (f),(i) 0500 UTC/3-h forecast.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Mosaics of observed reflectivity (dB*Z*) as in Fig. 1 from (a)–(c) 0300–0500 UTC as well as probability matched ensemble mean reflectivity for (d) EXP_S and (g) EXP_D at 0300 UTC/1-h forecast; (e),(h) 0400 UTC/2-h forecast; and (f),(i) 0500 UTC/3-h forecast.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Unlike in P14, the simulated radar variables in the results in this manuscript use a different, more complex observation operator than was used for EnKF DA. This operator, outlined in Jung et al. (2010), uses a lookup table of scattering amplitudes for all hydrometeors calculated using the T-matrix method (Vivekanandan et al. 1991; Bringi and Chandrasekar 2001). This operator enables us to take into account Mie scattering for large ice particles, such as hail or graupel, and to use a new axis ratio for rain revised based on observations. The simpler operator used during EnKF DA, based on Jung et al. (2008), uses a fitted approximation to the T-matrix values for rain, and uses the Rayleigh approximation for ice species. The simpler operator is used during DA to reduce computational expense, while the more advanced operator is used for forecast verification because it allows for a more realistic comparison to observations. Specifically, this has a noticeable effect on *Z*_{DR}, reducing maximum values by more than 0.5 dB, which is beyond the estimated uncertainty of observed *Z*_{DR} of approximately 0.1–0.3 dB (Ryzhkov et al. 2005; Doviak and Zrnic 1993). The new operator also, more correctly, simulates lower *Z* values for both dry and wet hail beyond the typical uncertainty for *Z* observations, which is approximately 1–2 dB*Z*.

The PMEM of *Z* in EXP_S contains a region of anomalously high *Z* (>55 dB) centered near the LEV (see Fig. 1), and there is little distinction between regions of stratiform and convective precipitation (Figs. 2d–f). The intensity of the trailing stratiform precipitation is also overforecast. On the other hand, the PMEM of *Z* in EXP_D (Figs. 2g–i) contains broader precipitation coverage in the leading stratiform region and a convective line with greater southern extent, though it does overforecast *Z* intensity in the leading stratiform region. The ensemble spread of *Z* is lower in EXP_D than in EXP_S (not shown); only one MP scheme is used in EXP_D, leading to closer agreement among members and higher ensemble mean values (Snook et al. 2012). These results are similar to those obtained in deterministic forecasts of this case in P14, where the authors found the size sorting of smaller raindrops rearward in the leading convective line when using a DM scheme (absent in EXP_S) led to greater evaporative cooling and a stronger cold pool that helped maintain a more realistic MCS structure. They also found that the cold pool in EXP_S is disorganized, contributing to the development of spurious convection near the LEV. It should be noted that neither EXP_S nor EXP_D predict the small clusters of storms that develop in the southeast and southwest portion of the domain in the observations, likely in part because this convection developed mostly after the DA period.

#### 2) Probabilistic forecasts of reflectivity

Uncertainty within the ensemble forecast due to, for example, initial condition and model errors, can be considered by producing probabilistic forecasts of *Z* from the forecast ensemble. High-resolution, convection-permitting NWP forecasts are particularly sensitive to timing and location errors as forecast lead time increases due to the small spatial and temporal scales of convective storms (Lorenz 1969; Roberts 2008). To account for this sensitivity, we use the neighborhood ensemble probability (NEP) method (Ebert 2008; Roberts and Lean 2008; Schwartz et al. 2009), which, at each model grid point, produces a probabilistic forecast using a collection of nearby points in all ensemble members rather than relying solely on data from that single grid point in each member. In this way, NEP accounts for spatial uncertainty as well as uncertainty conferred by the ensemble. Appropriate specification of the neighborhood is important; in this study we use a circular neighborhood with a radius of 5 km, which is appropriate for the grid spacing used and convective features predicted (Snook et al. 2012, 2015). NEP is calculated for *P*(*Z* > 20 dB*Z*) (Fig. 3) and *P*(*Z* > 40 dB*Z*) (Fig. 4) at the same vertical level 10 as in Fig. 2. The 20-dB*Z* threshold is used to consider overall precipitation coverage in the MCS, including the stratiform regions, while the 40-dB*Z* threshold is chosen to focus on areas of heavy, convective precipitation. In Figs. 3 and 4, the observed *Z* contours for the corresponding threshold are also plotted.

Neighborhood ensemble probability of reflectivity exceeding 20 dB*Z* using a 5-km radius at about 2 km above ground level (AGL) for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D. The thick black line outlines observed reflectivity exceeding 20 dB*Z*.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Neighborhood ensemble probability of reflectivity exceeding 20 dB*Z* using a 5-km radius at about 2 km above ground level (AGL) for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D. The thick black line outlines observed reflectivity exceeding 20 dB*Z*.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Neighborhood ensemble probability of reflectivity exceeding 20 dB*Z* using a 5-km radius at about 2 km above ground level (AGL) for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D. The thick black line outlines observed reflectivity exceeding 20 dB*Z*.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Neighborhood ensemble probability of reflectivity exceeding 40 dB*Z* using a 5-km radius at about 2 km above ground level (AGL) for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D. The thick black line outlines observed reflectivity exceeding 40 dB*Z*.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Neighborhood ensemble probability of reflectivity exceeding 40 dB*Z* using a 5-km radius at about 2 km above ground level (AGL) for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D. The thick black line outlines observed reflectivity exceeding 40 dB*Z*.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Neighborhood ensemble probability of reflectivity exceeding 40 dB*Z* using a 5-km radius at about 2 km above ground level (AGL) for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D. The thick black line outlines observed reflectivity exceeding 40 dB*Z*.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

As was noted in the PMEM forecasts, the NEP forecasts of *Z* for EXP_D exhibit improved precipitation structure and feature placement compared to EXP_S. At the 20-dB*Z* threshold, the region of high *P*(*Z* > 20 dB*Z*) in EXP_D (Figs. 3d–f) closely matches the observed region of +20-dB*Z Z*, particularly in the leading stratiform region and leading convective line. In particular, EXP_D predicts a broad area of very high probability (>0.9) that closely matches the observed leading stratiform region in terms of position, shape, and motion throughout the forecast period. In contrast, EXP_S (Figs. 3a–c) exhibits high probability (>0.8) for only about half of the observed region of +20-dB*Z Z* during the first 2 h of the forecast, and even less in the 3-h forecast. EXP_S also has a substantial region of moderately high probabilities (up to 0.8) to the west of the MCS where no precipitation is observed. Considering the individual SM microphysics schemes within EXP_S, the LIN members exhibit the best agreement with observations in terms of forecast coverage and intensity of *Z*; WSM6 members generally overforecast the extent of the trailing stratiform region, while NEM members underforecast the extent of both the trailing and leading stratiform regions (not shown). These results are consistent with those of Snook et al. (2012), which, using a similar ensemble with the same MP schemes, found that the RMS innovation of Lin members during the forecast period was lower than that of WSM6 and NEM members. In both EXP_D and EXP_S, low probabilities are predicted for the trailing stratiform precipitation region; overall, this region is the worst forecast portion of the MCS.

Although overall precipitation coverage (*Z* > 20 dB*Z*) is generally good for both cases, the *P*(*Z* > 40 dB*Z*) associated with heavier, convective precipitation exhibits greater error. For the leading convective line, EXP_S has only a small overlap of low probabilities (0.05–0.2) with the observed 40-dB*Z* region in the 1-h forecast (Fig. 4a); EXP_D has greater overlap throughout the forecast period (Figs. 4d–f), but the predicted probabilities remain low. The convective line has a width of a few kilometers and will be more susceptible to spatial error as forecast lead time increases compared to the stratiform regions, even with the consideration of a 5-km neighborhood. EXP_D also has higher probabilities for the convection near the LEV on the north end of the MCS. However, there are areas of high probability in the stratiform region as well, where EXP_D overforecasts *Z* intensity. The overforecast in intensity is in part due to the fields plotted in Figs. 2–4 being near the bottom of the model melting layer where *Z* increases due to the presence of large and oblate water-coated ice hydrometeors. Certain MP schemes have shown a tendency to underestimate melting over a deep layer below the 0° isotherm in the model, compared to observations, due to overestimated evaporative cooling, which occurs in this case (not shown); this behavior has also been noted in previous studies (Jung et al. 2008, 2010; Johnson et al. 2016). This issue is difficult to avoid because radar coverage below this level is incomplete and we want to evaluate the model results without any gaps in the observations. A modified melting model in the radar simulator that includes temperature information to help account for the delay in the model MP scheme is considered for future work. Previous studies have also shown that DM MP schemes can overestimate *Z* values compared to observations due to excessive size sorting (Kumjian and Ryzhkov 2012).

#### 3) Quantitative evaluation of reflectivity forecasts

Qualitative evaluations based on the PMEM (Fig. 2) show quite skillful forecasts in terms of *Z* but there are still apparent spatial errors that would adversely affect quantitative skill scores. The NEP of *Z* > 40 dB*Z* used to identify the leading convective line indicated how small spatial error can lead to lower *Z* probabilities. When considering features with small spatial scales, scores such as the equitable threat score, which consider hits, misses, and false alarms in a deterministic point-by-point framework, are susceptible to a “double penalty”; a forecast with even a modest spatial displacement of a feature not only misses the observed feature but also produces a false alarm because the forecast feature is not coincident with any observed feature (Ebert and McBride 2000; Rossa et al. 2008; Mittermaier et al. 2013). Therefore, quantitative measures that consider the probability of an event within a neighborhood are considered.

The first metric considered is the area under the relative operating characteristic (ROC) curve (AUC; Mason 1982; Mason and Graham 1999) used to verify neighborhood forecast probability. The AUC is a summary score that compares the probability of detection and the probability of false detection for a given event over a range of probability thresholds; in this case the event is that *Z* exceeds a given threshold and the NEP is calculated as a fraction of grid points within a 5-km radius neighborhood where the event occurs among all ensemble members. Possible AUC values range from 0.0 to 1.0, with 1.0 indicating a perfect forecast (no false alarms or misses). AUC values of 0.5 or below indicate that the forecast has no useful skill. AUC is calculated for EXP_S and EXP_D using *Z* thresholds ranging from 10 to 50 dB*Z* for 1-, 2-, and 3-h forecast times (Fig. 5), and a bootstrap procedure is used to resample the ensemble 1000 times to determine the 5th–95th percentile range, which is shaded. Background shading is included to indicate the areas of useful forecast skill (green; AUC > 0.7), low skill (yellow; 0.5 < AUC < 0.7), and no skill (red; AUC < 0.5). Calculations are performed over the full experiment domain (Figs. 5a–c) as well as an Oklahoma subdomain positioned to cover the leading stratiform region and leading convective line, where both forecasts performed better compared to the trailing line (Figs. 5d–f).

Area under the relative operating characteristic curve (AUC) for reflectivity for EXP_S (red line and shading) and EXP_D (blue line and shading) at (a) 1-, (b) 2-, and (c) 3-h forecast times at about 2 km above ground level (AGL) for the full experiment domain and also (d)–(f) a subdomain covering Oklahoma. The green shading indicates useful forecast skill, the yellow shading indicates low skill, and the red shading indicates no skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Area under the relative operating characteristic curve (AUC) for reflectivity for EXP_S (red line and shading) and EXP_D (blue line and shading) at (a) 1-, (b) 2-, and (c) 3-h forecast times at about 2 km above ground level (AGL) for the full experiment domain and also (d)–(f) a subdomain covering Oklahoma. The green shading indicates useful forecast skill, the yellow shading indicates low skill, and the red shading indicates no skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Area under the relative operating characteristic curve (AUC) for reflectivity for EXP_S (red line and shading) and EXP_D (blue line and shading) at (a) 1-, (b) 2-, and (c) 3-h forecast times at about 2 km above ground level (AGL) for the full experiment domain and also (d)–(f) a subdomain covering Oklahoma. The green shading indicates useful forecast skill, the yellow shading indicates low skill, and the red shading indicates no skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Both experiments generally produce high AUC values, except for the very highest *Z* thresholds, associated with intense convective precipitation; confidence in AUC at these thresholds is low, however, because the sample size of *Z* exceeding these thresholds is quite small, and the regions in question are very small in spatial extent. AUC also, as expected, decreases with increasing forecast time. In general, EXP_D shows improvement over EXP_S in skill, especially for moderate *Z* thresholds representing the stratiform region in the later hours.

The AUC increases overall for both experiments when calculations are limited to the Oklahoma subdomain (Figs. 5d–f). In the 1-h forecast, AUC is similar in EXP_S and EXP_D, but in the 2- and 3-h forecasts, EXP_D outperforms EXP_S in terms of AUC at nearly all thresholds. In particular, EXP_D has an AUC value over 0.9 for thresholds of 20–25 dB*Z* throughout the forecast period over the Oklahoma subdomain (Figs. 5d–f), indicating a highly skillful forecast of general precipitation coverage of the leading stratiform region. EXP_D also exhibits useful skill (AUC > 0.7) for higher *Z* thresholds representing convective precipitation throughout the forecast period over the Oklahoma subdomain, suggesting that the poorer scores over the full domain are partially due to the overly quick dissipation of the trailing convective line and the newly developed convection in the southern portion of the domain, while the leading convective line is generally well forecast.

Reliability and sharpness diagrams are examined next. A probabilistic forecast is considered reliable when the probability of an event forecast to occur closely corresponds to the rate at which the event actually occurs (Brown 2001). Reliability diagrams are calculated for *P*(*Z* > 20 dB*Z*) using a 5-km radius neighborhood at 1-, 2-, and 3-h forecast times (Fig. 6). In these reliability diagrams, perfect reliability is indicated by the one-to-one diagonal line, and the shaded region indicates a skillful forecast (i.e., where the reliability contributes positively to the Brier skill score). Areas where the calculated reliability lies above the diagonal indicate that *Z* is underforecast (forecast probability is lower than the observed frequency); conversely, areas below the diagonal indicate that *Z* is overforecast (forecast probability is higher than the observed frequency). Sharpness diagrams, which are histograms of the calculated probability values, are shown in Fig. 7. An ideal forecast will have many values near 1.0 or 0.0, distinguishing sharply between events and nonevents. Calculations are again performed over both the full domain and Oklahoma subdomain.

Reliability diagrams calculated for reflectivity exceeding 20 dB*Z* for EXP_S (red line) and EXP_D (blue line) at (a) 1-, (b) 2-, and (c) 3-h forecast times at about 2 km above ground level (AGL) for the full experiment domain and also (d)–(f) a subdomain covering Oklahoma. The blue shading indicates whether the forecasts have skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Reliability diagrams calculated for reflectivity exceeding 20 dB*Z* for EXP_S (red line) and EXP_D (blue line) at (a) 1-, (b) 2-, and (c) 3-h forecast times at about 2 km above ground level (AGL) for the full experiment domain and also (d)–(f) a subdomain covering Oklahoma. The blue shading indicates whether the forecasts have skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Reliability diagrams calculated for reflectivity exceeding 20 dB*Z* for EXP_S (red line) and EXP_D (blue line) at (a) 1-, (b) 2-, and (c) 3-h forecast times at about 2 km above ground level (AGL) for the full experiment domain and also (d)–(f) a subdomain covering Oklahoma. The blue shading indicates whether the forecasts have skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Sharpness diagrams calculated for reflectivity exceeding 20 dB*Z* for EXP_S (red) at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D (blue) at about 2 km above ground level (AGL) for the full experiment domain and also (g)–(l) a subdomain covering Oklahoma.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Sharpness diagrams calculated for reflectivity exceeding 20 dB*Z* for EXP_S (red) at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D (blue) at about 2 km above ground level (AGL) for the full experiment domain and also (g)–(l) a subdomain covering Oklahoma.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Sharpness diagrams calculated for reflectivity exceeding 20 dB*Z* for EXP_S (red) at (a) 1-, (b) 2-, and (c) 3-h forecast times and (d)–(f) EXP_D (blue) at about 2 km above ground level (AGL) for the full experiment domain and also (g)–(l) a subdomain covering Oklahoma.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Overall, there is not much difference in the reliability of EXP_S and EXP_D either for the full domain or the Oklahoma subdomain. For the 1-h forecast time (Figs. 6a,d), both forecasts show good reliability, with the region of *Z* > 20 dB*Z* slightly underforecast in EXP_S and slightly overforecast in EXP_D. For the 2- and 3-h forecast times (Figs. 6b,c,e,f), precipitation coverage is generally overforecast in both experiments. EXP_D does show greater sharpness than EXP_S, particularly over the Oklahoma subdomain (Figs. 7j–l). Both experiments have a large number of probabilities of 0.0 that represent the large areas where precipitation is not observed, but EXP_D has a much higher number of points with probabilities close to 1.0 where the ensemble predicts precipitation with very high confidence. As indicated by the AUC (Fig. 5) and the qualitative evaluation of NEP forecasts (Fig. 3), this region of very high confidence agrees well with observations in EXP_D, outperforming EXP_S. We would like to point out that the mixed MP scheme setup of EXP_S may lead to a decrease in sharpness compared to EXP_D because of higher spread in the ensemble. However, Snook et al. (2012) showed that the use of multiple SM MP schemes within the EnKF DA cycles as well as subsequent forecasts actually increased the sharpness of the forecast of mesovortices compared to an ensemble using only the SM LIN MP scheme; this is believed to be due to improved ensemble mean analyses and forecasts.

### b. Ensemble forecasts of polarimetric variables

#### 1) Qualitative evaluation of predicted polarimetric variables

The PMEM is calculated as in Fig. 2 for simulated *Z*, *Z*_{DR}, and *K*_{DP} as though the ensemble forecasts of EXP_S and EXP_D were observed by KOUN at 1-h (Fig. 8), 2-h (Fig. 9), and 3-h (Fig. 10) forecast times; KOUN observations at the corresponding times are provided for comparison. The simulated fields are shown at the 0.5° elevation; this choice of the lowest elevation is because dual-pol radar signatures tend to be the strongest at the low levels where size sorting effects (Dawson et al. 2014) and rainwater species dominate. Also, the lower elevation is less affected by the melting layer. The difference in *Z* between the forecasts over the KOUN observing region is similar to the PMEM mosaics considered earlier (Fig. 2); EXP_D exhibits improved representation of the leading convective line and better coverage of the stratiform region compared to EXP_S, though it somewhat overestimates intensity due to the low model melting layer compared to the 0° isotherm and the excessive size sorting seen in DM MP schemes.

(a) Observed reflectivity (dB*Z*) and ensemble probability matched mean reflectivity from (b) EXP_S and (c) EXP_D at 0300 UTC/1-h forecast at a 0.5° tilt from KOUN, as well as (d)–(f) differential reflectivity (dB) and (g)–(i) specific differential phase (° km^{−1}).

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

(a) Observed reflectivity (dB*Z*) and ensemble probability matched mean reflectivity from (b) EXP_S and (c) EXP_D at 0300 UTC/1-h forecast at a 0.5° tilt from KOUN, as well as (d)–(f) differential reflectivity (dB) and (g)–(i) specific differential phase (° km^{−1}).

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

(a) Observed reflectivity (dB*Z*) and ensemble probability matched mean reflectivity from (b) EXP_S and (c) EXP_D at 0300 UTC/1-h forecast at a 0.5° tilt from KOUN, as well as (d)–(f) differential reflectivity (dB) and (g)–(i) specific differential phase (° km^{−1}).

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

As in Fig. 8, but at 0400 UTC with 2-h forecast results.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

As in Fig. 8, but at 0400 UTC with 2-h forecast results.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

As in Fig. 8, but at 0400 UTC with 2-h forecast results.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

As in Fig. 8, but at 0500 UTC with 3-h forecast results.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

As in Fig. 8, but at 0500 UTC with 3-h forecast results.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

As in Fig. 8, but at 0500 UTC with 3-h forecast results.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

There are two notable differences between EXP_D and EXP_S in terms of their forecast dual-pol fields. First, the areal coverage of high *Z*_{DR} values (*Z*_{DR} > 2.3 dB), a threshold that distinguishes the convective region from the stratiform region in the observations, is overforecast in EXP_S. The highest *Z*_{DR} values predicted by EXP_S are coincident with the poorly organized region of intense convection within the system due to the monotonic relationship between the *Z* and *Z*_{DR} (e.g., Fig. 8e). The *Z*_{DR} values in EXP_D (Fig. 8f), while slightly higher than the observations (Fig. 8d), still show a similar general distribution of high and low *Z*_{DR} regions compared to the observations, indicating a distinct difference in maximum raindrop size between the convective and stratiform regions that is maintained throughout the entire forecast period (Figs. 9f and 10f). P14 found that these MCS features were maintained by an improved cold pool due to increased evaporative cooling from the advection of small raindrops rearward by the DM scheme.

The second notable difference in the forecast dual-pol fields of EXP_S and EXP_D is that the *K*_{DP} values in EXP_S are unrealistically high compared to the observations, with values peaking at nearly 10° km^{−1} (Figs. 8h–10h). This suggests that EXP_S greatly overforecasts the liquid water content of the convective precipitation. By comparison, *K*_{DP} in EXP_D is much closer to the observations throughout the forecast (Figs. 8i–10i). In fact, the *q*_{r} values near the surface in EXP_S associated with the maximum values of simulated *K*_{DP} are, on average, twice as high as those in EXP_D (not shown). Rain development in the stratiform region of the MCS is heavily dependent on the transport of frozen hydrometeors in the mid and upper levels of the MCS from the convective to the stratiform region. There is very little hydrometeor transport from the convective line to stratiform region in the SM case (P14), and therefore there is a higher precipitation rate in the convective line. The improved development and maintenance of the MCS when using the DM scheme leads to an improved representation of the *K*_{DP} fields in EXP_D relative to the observations compared to EXP_S.

The patterns in the dual-pol variables that reflect microphysical processes can be subtle; one such pattern is increased *Z*_{DR} along the leading convective line due to size sorting. Though the PMEM helps to alleviate some of the biases introduced by taking an ensemble mean, it can smear such high-detail patterns. For this reason, the best individual ensemble member from each experiment is examined in order to bring to light distinct pattern differences within the predicted dual-pol fields (Fig. 11). The 2-h forecast of EXP_S member 14 and EXP_D member 39 are chosen based upon a qualitative examination of the ensemble members that considers placement of system features, *Z*_{DR} patterns, and overall *Z*_{DR} value range. The best EXP_S member contains precipitation extending southeastward where the observations have the leading convective line, but the intensity and extent is rather limited compared to the best EXP_D member. As expected, areas of high *Z*_{DR} coincide with areas of high *Z* in the EXP_S member. In the EXP_D member, however, high *Z*_{DR} values are located along the eastern/leading edge of the leading convective line. This *Z*_{DR} pattern is indicative of the size sorting of raindrops within the convective line, with smaller raindrops being advected rearward in the line while larger raindrops remain.

(a) Observed reflectivity (dB*Z*) and simulated reflectivity from (b) EXP_S member 14 and (c) EXP_D member 39 at 0400 UTC/2-h forecast at a 0.5° tilt from KOUN, as well as (d) observed and (e),(f) simulated differential reflectivity (dB).

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

(a) Observed reflectivity (dB*Z*) and simulated reflectivity from (b) EXP_S member 14 and (c) EXP_D member 39 at 0400 UTC/2-h forecast at a 0.5° tilt from KOUN, as well as (d) observed and (e),(f) simulated differential reflectivity (dB).

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

(a) Observed reflectivity (dB*Z*) and simulated reflectivity from (b) EXP_S member 14 and (c) EXP_D member 39 at 0400 UTC/2-h forecast at a 0.5° tilt from KOUN, as well as (d) observed and (e),(f) simulated differential reflectivity (dB).

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

#### 2) Probabilistic forecasts of polarimetric variables

In section 3a(2), probabilistic forecasts were used to evaluate the ensemble forecast precipitation coverage of stratiform and convective precipitation, based on 20- and 40-dB*Z* Z thresholds, respectively. A distinct variation in the *Z*_{DR} values also occurs, with *Z*_{DR} increasing where larger raindrops are present along the leading edge of the convective line. To evaluate how well the two experiments forecast the high *Z*_{DR} signatures, the probability of *Z*_{DR} > 2.3 dB at the 1-, 2-, and 3-h forecast time is calculated (Fig. 12). The threshold of *Z*_{DR} = 2.3 dB is chosen based on the observed values that distinguish between the convective and stratiform precipitation regions in this case (Figs. 8d, 9d, and 10d), and the observed *Z*_{DR} = 2.3 dB contour is shown as a thick black line. EXP_S has a broad expanse of relatively high probability of *Z*_{DR} > 2.3 dB over the stratiform region, a result consistent with the overall pattern of *Z*_{DR} in Figs. 8e, 9e, and 10e. This region of high *P*(*Z*_{DR} > 2.3 dB) is significantly displaced from the observed leading convective line. In EXP_D, there is some overlap of low-to-moderate probabilities of *Z*_{DR} > 2.3 dB with the observed 2.3 dB contour in the 1-h forecast, and some overlap of low probabilities at the 2- and 3-h forecasts. Though the regions of moderate *P*(*Z*_{DR} > 2.3 dB) in EXP_D do not exactly match the observed region of high *Z*_{DR}, the geographic distribution of higher probability follows a north-northwest to south-southeast orientation, similar to that of the observed leading convective line. The areal coverage of the probabilities is improved in EXP_D compared to the more circular pattern found in EXP_S. Moderate probability of an arc of larger raindrops relatively near the observed leading convective line and the existence of the leading convective updrafts in the MCS within the ensemble forecast are better depicted by EXP_D and suggest potential for further improvement.

Neighborhood ensemble probability of differential reflectivity exceeding 2.3 dB using a 5-km radius at a 0.5° tilt from KOUN for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and for (d)–(f) EXP_D. The thick black line outlines observed differential reflectivity exceeding 2.3 dB.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Neighborhood ensemble probability of differential reflectivity exceeding 2.3 dB using a 5-km radius at a 0.5° tilt from KOUN for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and for (d)–(f) EXP_D. The thick black line outlines observed differential reflectivity exceeding 2.3 dB.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Neighborhood ensemble probability of differential reflectivity exceeding 2.3 dB using a 5-km radius at a 0.5° tilt from KOUN for EXP_S at (a) 1-, (b) 2-, and (c) 3-h forecast times and for (d)–(f) EXP_D. The thick black line outlines observed differential reflectivity exceeding 2.3 dB.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

#### 3) Quantitative verification of polarimetric variables

The same concerns for how small spatial errors can affect quantitative skill scores of *Z* discussed in section 3a(3) are even greater when considering skill scores for predicted dual-pol variables. Dual-pol signatures follow patterns associated with microphysical processes that occur at very small scales, such as the size sorting of raindrops along the leading convective line. With this potential limitation in mind, the AUC is calculated for *Z*_{DR} (0.0–2.7 dB) and *K*_{DP} (0.0°–1.5° km^{−1}) for the 1-, 2-, and 3-h forecasts (Fig. 13) using a 5-km neighborhood radius as was done in Fig. 5 for *Z*. Both experiments have similar, skillful AUC values for predicting *Z*_{DR} at thresholds of 0.0–1.0 dB (Figs. 13a–c). For higher thresholds, the AUC for EXP_S indicates very poor skill, while EXP_D still produces a skillful forecast. AUC for *Z*_{DR} is better in EXP_D due to the lower *Z*_{DR} values throughout the leading stratiform region, which agree much more closely with observations than the forecast of EXP_S. The *Z*_{DR} associated with the leading convective line also has a good overlap with observed values in EXP_D. EXP_S outperforms EXP_D for the considered thresholds of *K*_{DP} due to erroneous broader coverage in EXP_S that overlaps the observations and the displacement error in EXP_D. Additionally, the significant high bias in *K*_{DP} in EXP_S is not accounted for at these thresholds chosen based on observed values; the AUC threshold limit is set to 1.5° km^{−1} because few observations exceed this value. The *K*_{DP} is poorer qualitatively in comparison to EXP_D, but limitations in the quantitative scores used lead to poor and misleading results.

Area under the relative operating characteristic curve (AUC) for differential reflectivity (dB) for EXP_S (red line and shading) and EXP_D (blue line and shading) at (a) 1-, (b) 2-, and (c) 3-h forecast times at a 0.5° tilt as well as for (d)–(f) specific differential phase (° km^{−1}). The green shading indicates useful forecast skill, the yellow shading indicates low skill, and the red shading indicates no skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Area under the relative operating characteristic curve (AUC) for differential reflectivity (dB) for EXP_S (red line and shading) and EXP_D (blue line and shading) at (a) 1-, (b) 2-, and (c) 3-h forecast times at a 0.5° tilt as well as for (d)–(f) specific differential phase (° km^{−1}). The green shading indicates useful forecast skill, the yellow shading indicates low skill, and the red shading indicates no skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Area under the relative operating characteristic curve (AUC) for differential reflectivity (dB) for EXP_S (red line and shading) and EXP_D (blue line and shading) at (a) 1-, (b) 2-, and (c) 3-h forecast times at a 0.5° tilt as well as for (d)–(f) specific differential phase (° km^{−1}). The green shading indicates useful forecast skill, the yellow shading indicates low skill, and the red shading indicates no skill.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Because of the large impact of spatial error on the quantitative skill scores for the dual-pol variables, other quantitative methods of evaluation not reliant on location are useful. Domain-wide histograms of the simulated dual-pol variables can be used to identify significant biases in the forecast. Histograms of the simulated values from all members of EXP_S and EXP_D as well as the observed values are plotted in Fig. 14. The values from EXP_S and EXP_D are normalized by the size of the ensemble for comparison to the observations. For observed *Z*, values associated with the widespread stratiform precipitation lead to a peak between about 30 and 35 dB*Z* throughout the experiment period (Figs. 14a–c). The EXP_D ensemble forecast *Z* values match the observed distribution in this range better than EXP_S during the first two forecast hours. Both experiments overforecast the geographic extent of the convective precipitation, and overforecast the intensity of *Z* in part due to the model melting layer being displaced to lower heights as ice particles survive for several kilometers in depth below the freezing level and excessive size sorting in EXP_D, leading to a higher number of *Z* > 50 dB*Z* values compared to the observations, though this high bias is slightly greater in EXP_S than in EXP_D in the 1- and 2-h forecasts.

Histograms of observed KOUN (black) and simulated reflectivity (dB*Z*) values from EXP_S (red) and EXP_D (blue) at (a) 0300, (b) 0400, and (c) 0500 UTC at a 0.5° tilt as well as (d)–(f) observed and simulated differential reflectivity (dB) values and (g)–(i) observed and simulated specific differential phase (° km^{−1}) values. The values in EXP_S and EXP_D are normalized by the size of each ensemble.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Histograms of observed KOUN (black) and simulated reflectivity (dB*Z*) values from EXP_S (red) and EXP_D (blue) at (a) 0300, (b) 0400, and (c) 0500 UTC at a 0.5° tilt as well as (d)–(f) observed and simulated differential reflectivity (dB) values and (g)–(i) observed and simulated specific differential phase (° km^{−1}) values. The values in EXP_S and EXP_D are normalized by the size of each ensemble.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Histograms of observed KOUN (black) and simulated reflectivity (dB*Z*) values from EXP_S (red) and EXP_D (blue) at (a) 0300, (b) 0400, and (c) 0500 UTC at a 0.5° tilt as well as (d)–(f) observed and simulated differential reflectivity (dB) values and (g)–(i) observed and simulated specific differential phase (° km^{−1}) values. The values in EXP_S and EXP_D are normalized by the size of each ensemble.

Citation: Monthly Weather Review 145, 6; 10.1175/MWR-D-16-0162.1

Differences between EXP_S and EXP_D are readily apparent in histograms of the predicted dual-pol values (Figs. 14d–i). Observed *Z*_{DR} values (Figs. 14d–f) peak at about 1.0–1.5 dB due to the broad coverage of moderate- sized raindrops in the leading stratiform region. EXP_D overforecasts the coverage of the leading stratiform precipitation, leading to an overall high bias in the *Z*_{DR} histogram, and slightly overforecasts the location of the histogram peak in *Z*_{DR} values, but the overall histogram pattern is similar to that of the observations. EXP_S, on the other hand, has a uniform distribution of *Z*_{DR} values throughout the forecast period, with no evidence of the peak seen in the observations and in EXP_D, due to the lack of broad coverage of stratiform precipitation in EXP_S. EXP_S also has a larger number of very high values (*Z*_{DR} > 3.0 dB) resulting from the unorganized region of intense convection in the center of the system. Relatively little bias is noted in the *K*_{DP} histogram for EXP_D, with values similar to the observations (Figs. 14g–i). EXP_S overforecasts the total coverage of nonzero *K*_{DP} values, again suggesting a high bias in liquid water content overall compared to the observations. This substantial high bias in liquid water content in convective precipitation skews EXP_S toward high values, with grid volumes exhibiting *K*_{DP} > 3.0° km^{−1}, particularly in the 1-h forecast (Fig. 14g).

## 4. Summary and conclusions

Ensemble forecasts initialized from cycled EnKF ensemble analyses are produced for a mesoscale convective system (MCS) that occurred over Oklahoma and northern Texas on 8–9 May 2007 using single-moment (SM; Lin et al. 1983) and double-moment (DM) microphysics (Milbrandt and Yau 2005b) schemes. Qualitative and quantitative probabilistic methods are used to examine the MCS structure and precipitation distribution for the SM (EXP_S) and DM (EXP_D) experiments. Additionally, predicted dual-polarization (dual-pol) radar variables and their probabilistic forecasts are also evaluated against available dual-pol radar observations, and discussed in connection with model-predicted microphysical states and structures. The current study expands on the work of Putnam et al. (2014), which focused on the EnKF data assimilation and the deterministic forecasting aspects of the same two experiments that used SM and DM microphysics schemes, respectively. This paper focuses on ensemble probabilistic forecasting of reflectivity (*Z*) and the simulated dual-pol radar variables associated with the 8–9 May 2007 MCS.

Both qualitative and quantitative evaluations of the probabilistic forecasts show that EXP_D predicts the MCS with high confidence. EXP_D predicts the overall precipitation coverage of the system (considering a threshold region of *Z* > 20 dB*Z*) with very high probabilities throughout the forecast period, particularly for the stratiform precipitation region. EXP_S predicts similarly high probabilities for approximately half of this region and includes a large area of moderate probability of *Z* > 20 dB*Z* outside of the observed region. EXP_D has higher forecast skill, measured in terms of the area under the relative operating characteristic curve (AUC), for 2- and 3-h forecasts of the stratiform precipitation and leading convective line comprising the northern portion of the MCS. EXP_D also provides ensemble forecasts with greater sharpness, where the highest precipitation probabilities match regions of observed precipitation at a higher frequency.

EXP_D better represents the microphysics-related features in the MCS throughout the forecast period. This is notable in terms of *Z*_{DR} values, where EXP_D shows a clear distinction between the convective and stratiform precipitation regions, similar to that seen in the final EnKF analysis in Putnam et al. (2014), which continues throughout the forecast period. Additionally, EXP_D implies more realistic liquid water content in the convective region than EXP_S, where unrealistically high *K*_{DP} values suggest the liquid water content has been overforecast, associated with the unorganized system structure and precipitation development in the forecasts.

Producing meaningful probabilistic forecasts of dual-pol variables proves challenging. Dual-pol signatures are often produced by physical processes within convective systems with very small spatial scales; often less than a few kilometers. These small-scale structures are smeared when probabilistic forecasts are generated using neighborhood methods or a probability-matched ensemble mean. When individual ensemble members are examined, though, EXP_D maintains a better quality in the predicted dual-pol fields compared to EXP_S, similar to the final EnKF analysis noted in Putnam et al. (2014). There is a notable arc of high *Z*_{DR} along the leading convective line in the MCS, resulting from size-sorting processes that are not represented in EXP_S, where *Z*_{DR} shows a monotonic relationship with *Z*. Probabilistic forecasts of *Z*_{DR} for EXP_D, while not particularly accurate in matching the location of the observations, still indicate the presence of an arc of large raindrops along the leading convective line and the potential for more improvement in the future. The EXP_D *Z*_{DR} forecasts also show higher skill based on AUC calculations compared to EXP_S. For *K*_{DP}, the EXP_S forecasts show higher skill. However, this is due to spatial displacement in EXP_D and significant erroneous coverage of *K*_{DP} in EXP_S. The low probabilities and spatial displacement errors associated with both of these variables indicate how uncertain the forecast of intense, convective updrafts and excessive rainfall can be. Understanding and increasing the skill in predicting intense rainfall rates has the greatest broader impact on forecasting potential flash flood events.

When evaluating a DM ensemble, the increased computational expense cost of a DM over a SM scheme should also be considered. Previous studies have shown that DM schemes can increase computation time by 10%–30% depending on the scheme used (Morrison et al. 2005; Milbrandt and McTaggart-Cowan 2008; Morrison and Gettelman 2008; Lim and Hong 2010). However, future increases in available computing resources will make the operational use of DM schemes increasingly feasible, and current operationally oriented research projects, such as the 2016 Storm Scale Ensemble Forecasts (SSEF) produced by the Center for Analysis and Prediction of Storms (CAPS; Kong 2016) as part of the NOAA Hazardous Weather Testbed Spring Experiment, are already using DM MP schemes successfully in real time.

This is the first study to consider explicit ensemble-based probabilistic forecasting of simulated dual-pol radar variables, and it highlights several challenges for future work. Even on high-resolution grids capable of resolving microphysical patterns that occur on small spatial scales, quantitative verification scores for dual-pol signatures that usually have very small spatial scales (even compared to convective storms) suffer from a double penalty: forecasts of precipitation variables not only miss the location of the observations (a “miss”), but also occur in a nearby location where the event was not observed (a “false alarm”). This was also noted in a recent study evaluating storm-scale forecasts using different DM MP schemes (Putnam et al. 2017). Some probabilistic neighborhood-based metrics are used in this case to help account for spatial errors, but the distance and orientation of patterns in the simulated variables still presents a challenge when using such methods. AUC scores used to verify neighborhood forecast probability for dual-pol variables are poorer as the threshold considered increases, specifically for *K*_{DP}, despite the neighborhood radius of 5 km used, due to both the small spatial scale of the patterns being considered and discrepancy in the range of forecast values versus observed values. Although using a larger neighborhood may alleviate to a larger extent the effect of spatial error, the probabilistic forecasts produced using progressively larger neighborhood radii will be more and more smoothed, losing the resolution necessary to capture small-scale features and negating their intended purpose. Additional methods of quantitatively evaluating dual-pol variables include histograms, which can provide information on general biases without considering spatial error. In such histograms produced for this case, high biases in the number of large drops and overall liquid water content, as suggested by high biases in predicted *K*_{DP} values, are identified in EXP_S, likely due to the representation of convective precipitation within EXP_S.

Possible future quantitative verification methods for dual-pol fields include object-based methods (e.g., Davis et al. 2006; Johnson et al. 2013; Zhu et al. 2015) that match similar storm features in observations to those in the forecasts to compare dual-pol variable patterns better. Additionally, the probabilities in the ensemble forecasts could be defined in terms of whether an event occurred within a radius of a given location, rather than defining the probabilities in terms of a fixed radius neighborhood, as was used in Sobash et al. (2011) for updraft helicity or Snook et al. (2016) for tornado prediction. This method may produce higher probabilities for dual-pol fields, which are rare events and subject to significant displacement error that cannot be accounted for with only a 5-km radius neighborhood. These, and other forecast evaluation methods for dual-pol fields remain a promising area for future research. The methods used in this paper can be applied to storm-scale ensemble forecasts, such as the Storm Scale Ensemble Forecasts (SSEF) run as part of the NOAA Hazardous Weather Testbed Spring Experiment (e.g., Kong 2016), to evaluate similar issues over multiple cases. Putnam et al. (2017) represents the first effort in that direction. More studies evaluating and improving microphysics parameterizations and the dual-pol radar simulators are also needed (e.g., Johnson et al. 2016; Putnam et al. 2017).

## Acknowledgments

This work was primarily supported by NSF Grant AGS-1046171. The first and fifth authors were also supported by NOAA Grant NA11OAR4320072. The second and forth authors were also supported by NSF Grant AGS-1261776. The third author is also partially supported by a research grant of “Development of a Polarimetric Radar Data Simulator for Local Forecasting Model” by the Korea Meteorological Administration. Computing was performed primarily on the Kraken system at the National Institute for Computational Sciences, part of the XSEDE resources. Critiques from Dr. Jason Milbrandt as well as from two additional anonymous reviewers helped to improve the original manuscript.

## REFERENCES

Aksoy, A., D. C. Dowell, and C. Snyder, 2009: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part I: Storm-scale analysis.

,*Mon. Wea. Rev.***137**, 1805–1824, doi:10.1175/2008MWR2691.1.Aksoy, A., D. C. Dowell, and C. Snyder, 2010: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part II: Short-range ensemble forecasts.

,*Mon. Wea. Rev.***138**, 1273–1292, doi:10.1175/2009MWR3086.1.Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.Bringi, V. N., and V. Chandrasekar, 2001:

*Polarimetric Doppler Weather Radar*. Cambridge, 636 pp.Brown, B. G., 2001: Verification of precipitation forecasts: A survey of methodology. Part II: Verification of probability forecasts at points. National Center for Atmospheric Research, 20 pp.

Chandrasekar, V., S. Lim, N. Bharadwaj, W. Li, D. McLaughlin, V. N. Bringi, and E. Gorgucci, 2004: Principles of networked weather radar operation at attenuating frequencies.

*Proc. Third European Conf. on Radar Meteorology and Hydrology*, Visby, Sweden, SMHI, 109–114. [Available online at http:// www.copernicus.org/erad/2004/online/ERAD04_P_109.pdf.]Clark, A. J., W. A. Gallus Jr., M. Xue, and F. Kong, 2009: A comparison of precipitation forecast skill between small convection-permitting and large convection-parameterizing ensembles.

,*Wea. Forecasting***24**, 1121–1140, doi:10.1175/2009WAF2222222.1.Davis, C., B. Brown, and R. Bullock, 2006: Object-based verification of precipitation forecasts. Part II: Application to convective rain systems.

,*Mon. Wea. Rev.***134**, 1785–1795, doi:10.1175/MWR3146.1.Dawson, D. T., II, L. J. Wicker, E. R. Mansell, and R. L. Tanamachi, 2012: Impact of the environmental low-level wind profile on ensemble forecasts of the 4 May 2007 Greensburg, Kansas, tornadic storm and associated mesocyclones.

,*Mon. Wea. Rev.***140**, 696–716, doi:10.1175/MWR-D-11-00008.1.Dawson, D. T., II, E. R. Mansell, Y. Jung, L. J. Wicker, M. R. Kumjian, and M. Xue, 2014: Low-level ZDR signatures in supercell forward flanks: The role of size sorting and melting of hail.

,*J. Atmos. Sci.***71**, 276–299, doi:10.1175/JAS-D-13-0118.1.Dawson, D. T., II, M. Xue, J.A. Milbrandt, and A. Shapiro, 2015: Sensitivity of real-data simulations of the 3 May 1999 Oklahoma City tornadic supercell and associated tornadoes to multimoment microphysics. Part I: Storm- and tornado-scale numerical forecasts.

,*Mon. Wea. Rev.***143**, 2241–2265, doi:10.1175/MWR-D-14-00279.1.Doviak, R., and D. Zrnic, 1993:

*Doppler Radar and Weather Observations*. 2nd ed. Academic Press, 562 pp.Dowell, D. C., and L. J. Wicker, 2009: Additive noise for storm-scale ensemble data assimilation.

,*J. Atmos. Oceanic Technol.***26**, 911–927, doi:10.1175/2008JTECHA1156.1.Dowell, D. C., F. Zhang, L. J. Wicker, C. Snyder, and N. A. Crook, 2004: Wind and temperature retrievals in the 17 May 1981 Arcadia, Oklahoma, supercell: Ensemble Kalman filter experiments.

,*Mon. Wea. Rev.***132**, 1982–2005, doi:10.1175/1520-0493(2004)132<1982:WATRIT>2.0.CO;2.Dowell, D. C., L. J. Wicker, and C. Snyder, 2011: Ensemble Kalman filter assimilation of radar observations of the 8 May 2003 Oklahoma City supercell: Influence of reflectivity observations on storm-scale analysis.

,*Mon. Wea. Rev.***139**, 272–294, doi:10.1175/2010MWR3438.1.Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation.

,*Mon. Wea. Rev.***129**, 2461–2480, doi:10.1175/1520-0493(2001)129<2461:AOAPMS>2.0.CO;2.Ebert, E. E., 2008: Fuzzy verification of high-resolution gridded forecasts: A review and proposed framework.

,*Meteor. Appl.***15**, 51–64, doi:10.1002/met.25.Ebert, E. E., and J. L. McBride, 2000: Verification of precipitation in weather systems: Determination of systematic errors.

,*J. Hydrol.***239**, 179–202, doi:10.1016/S0022-1694(00)00343-7.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99**, 10 143–10 162, doi:10.1029/94JC00572.Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation.

,*Ocean Dyn.***53**, 343–367, doi:10.1007/s10236-003-0036-9.Fritsch, J. M., and G. S. Forbes, 2001: Mesoscale convective systems.

*Severe Convective Storms, Meteor. Monogr.*, No. 50, Amer. Meteor. Soc., 323–358.Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–757, doi:10.1002/qj.49712555417.Hong, S.-Y., and J.-O. J. Lim, 2006: The WRF single-moment 6-class microphysics scheme (WSM6).

,*J. Korean Meteor. Soc.***42**, 129–151.Johnson, A., X. Wang, F. Kong, and M. Xue, 2013: Object-based evaluation of the impact of horizontal grid spacing on convection-allowing forecasts.

,*Mon. Wea. Rev.***141**, 3413–3425, doi:10.1175/MWR-D-13-00027.1.Johnson, M., Y. Jung, D. Dawson, and M. Xue, 2016: Comparison of simulated polarimetric signatures in idealized supercell storms using two-moment bulk microphysics schemes in WRF.

,*Mon. Wea. Rev.***144**, 971–996, doi:10.1175/MWR-D-15-0233.1.Jung, Y., G. Zhang, and M. Xue, 2008: Assimilation of simulated polarimetric radar data for a convective storm using ensemble Kalman filter. Part I: Observation operators for reflectivity and polarimetric variables.

,*Mon. Wea. Rev.***136**, 2228–2245, doi:10.1175/2007MWR2083.1.Jung, Y., M. Xue, and G. Zhang, 2010: Simulations of polarimetric radar signatures of supercell storm using a two-moment bulk microphysics scheme.

,*J. Appl. Meteor. Climatol.***49**, 146–163, doi:10.1175/2009JAMC2178.1.Jung, Y., M. Xue, and M. Tong, 2012: Ensemble Kalman filter analyses of the 29–30 May 2004 Oklahoma tornadic thunderstorm using one- and two-moment bulk microphysics schemes, with verification against polarimetric data.

,*Mon. Wea. Rev.***140**, 1457–1475, doi:10.1175/MWR-D-11-00032.1.Kalnay, E., 2002:

*Atmospheric Modeling, Data Assimilation, and Predictability*. Cambridge University Press, 341 pp.Kalnay, E., B. Hunt, E. Ott, and I. Szunyogh, 2006: Ensemble forecasting and data assimilation: Two problems with the same solution?

*Predictability of Weather and Climate*, T. Palmer and R. Hagedorn, Eds., Cambridge University Press, 734 pp.Khain, A., A. Pokrovsky, M. Pinsky, A. Seifert, and V. Phillips, 2004: Simulation of effects of atmospheric aerosols on deep turbulent convective clouds using a spectral microphysics mixed-phase cumulus cloud model. Part I: Model description and possible applications.

,*J. Atmos. Sci.***61**, 2963–2982, doi:10.1175/JAS-3350.1.Kong, F., 2013: 2013 CAPS spring forecast experiment program plan, NOAA, 24 pp. [Available online at https://hwt.nssl.noaa.gov/Spring_2013/SpringProgram2013_Plan-v5.pdf.]

Kong, F., 2016: 2016 CAPS spring forecast experiment program plan, NOAA, 29 pp. [Available online at http://forecast.caps.ou.edu/SpringProgram2016_Plan-CAPS.pdf.]

Kumjian, M. R., and A. V. Ryzhkov, 2008: Polarimetric signatures in supercell thunderstorms.

,*J. Appl. Meteor. Climatol.***47**, 1940–1961, doi:10.1175/2007JAMC1874.1.Kumjian, M. R., and A. V. Ryzhkov, 2012: the impact of size sorting on the polarimetric radar variables.

,*J. Atmos. Sci.***69**, 2042–2060, doi:10.1175/JAS-D-11-0125.1.Lei, T., M. Xue, and T. Yu, 2009: Multi-scale analysis and prediction of the 8 May 2003 Oklahoma City tornadic supercell storm assimilating radar and surface network data using EnKF.

*13th Conf. on Integrated Observing and Assimilation Systems for Atmosphere, Oceans, and Land Surface (IOAS-AOLS)*, Phoenix, AZ, Amer. Meteor. Soc., 6.4. [Available online at https://ams.confex.com/ams/89annual/techprogram/paper_150404.htm.]Li, X., and J. R. Mecikalski, 2012: Impact of the dual-polarization Doppler radar data on two convective storms with a warm-rain radar forward operator.

,*Mon. Wea. Rev.***140**, 2147–2167, doi:10.1175/MWR-D-11-00090.1.Lim, K.-S. S., and S.-Y. Hong, 2010: Development of an effective double-moment cloud microphysics scheme with prognostic cloud condensation nuclei (CCN) for weather and climate models.

,*Mon. Wea. Rev.***138**, 1587–1612, doi:10.1175/2009MWR2968.1.Lin, Y.-L., R. D. Farley, and H. D. Orville, 1983: Bulk parameterization of the snow field in a cloud model.

,*J. Climate Appl. Meteor.***22**, 1065–1092, doi:10.1175/1520-0450(1983)022<1065:BPOTSF>2.0.CO;2.Lorenz, E. N., 1969: Atmospheric predictability as revealed by naturally occurring analogues.

,*J. Atmos. Sci.***26**, 636–646, doi:10.1175/1520-0469(1969)26<636:APARBN>2.0.CO;2.Mason, I. B., 1982: A model for the assessment of weather forecasts.

,*Aust. Meteor. Mag.***30**, 291–303.Mason, S. J., and N. E. Graham, 1999: Conditional probabilities, relative operating characteristics, and relative operating levels.

,*Wea. Forecasting***14**, 713–725, doi:10.1175/1520-0434(1999)014<0713:CPROCA>2.0.CO;2.McLaughlin, D., and Coauthors, 2009: Short-wavelength technology and the potential for distributed networks of small radar systems.

,*Bull. Amer. Meteor. Soc.***90**, 1797–1817, doi:10.1175/2009BAMS2507.1.Milbrandt, J. A., and M. K. Yau, 2005a: A multimoment bulk microphysics parameterization. Part I: Analysis of the role of the spectral shape parameter.

,*J. Atmos. Sci.***62**, 3051–3064, doi:10.1175/JAS3534.1.Milbrandt, J. A., and M. K. Yau, 2005b: A multimoment bulk microphysics parameterization. Part II: A proposed three-moment closure and scheme description.

,*J. Atmos. Sci.***62**, 3065–3081, doi:10.1175/JAS3535.1.Milbrandt, J. A., and R. McTaggart-Cowan, 2008: An efficient semi-double-moment bulk microphysics scheme.

*15th Int. Conf. on Clouds and Precipitation*, Cancun, Mexico, International Commission on Clouds and Precipitation (ICCP), P01. [Available online at http://cabernet.atmosfcu.unam.mx/ICCP-2008/abstracts/Program_on_line/Poster_01/Milbrandt_extended.pdf.]Mittermaier, M., N. Roberts, and S. A. Thompson, 2013: A long-term assessment of precipitation forecast skill using the Fractions Skill Score.

,*Meteor. Appl.***20**, 176–186, doi:10.1002/met.296.Morrison, H., and A. Gettelman, 2008: A new two-moment bulk stratiform cloud microphysics scheme in the Community Atmosphere Model, version 3 (CAM3). Part I: Description and numerical tests.

,*J. Climate***21**, 3642–3659, doi:10.1175/2008JCLI2105.1.Morrison, H., J. A. Curry, M. D. Shupe, and P. Zuidema, 2005: A new double-moment microphysics parameterization for application in cloud and climate models. Part II: Single-column modeling of arctic clouds.

,*J. Atmos. Sci.***62**, 1678–1693, doi:10.1175/JAS3447.1.NWS, 2012: Storm data and unusual weather phenomena—May 2007. NOAA, 6 pp. [Available online at http://www.weather.gov/media/lot/stormdata/pdf/may2007.pdf.]

Posselt, D. J., X. Li, S. A. Tushaus, and J. R. Mecikalski, 2015: Assimilation of dual-polarization radar observations in mixed- and ice-phase regions of convective storms: Information content and forward model errors.

,*Mon. Wea. Rev.***143**, 2611–2636, doi:10.1175/MWR-D-14-00347.1.Putnam, B. J., M. Xue, Y. Jung, N. Snook, and G. Zhang, 2014: The analysis and prediction of microphysical states and polarimetric radar variables in a mesoscale convective system using double-moment microphysics, multinetwork radar data, and the ensemble Kalman filter.

,*Mon. Wea. Rev.***142**, 141–162, doi:10.1175/MWR-D-13-00042.1.Putnam, B. J., M. Xue, Y. Jung, G. Zhang, and F. Kong, 2017: Simulation of polarimetric radar variables from 2013 CAPS spring experiment storm-scale ensemble forecasts and evaluation of microphysics schemes.

,*Mon. Wea. Rev.***145**, 49–73, doi:10.1175/MWR-D-15-0415.1.Roberts, N., 2008: Assessing the spatial and temporal variation in the skill of precipitation forecasts from an NWP model.

,*Meteor. Appl.***15**, 163–169, doi:10.1002/met.57.Roberts, N., and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events.

,*Mon. Wea. Rev.***136**, 78–97, doi:10.1175/2007MWR2123.1.Rossa, A., P. Nurmi, and E. Ebert, 2008: Overview of methods for the verification of quantitative precipitation forecasts.

*Precipitation: Advances in Measurement, Estimation and Prediction*, S. Michaelides, Ed., Springer, 419–452.Ryzhkov, A. V., and D. S. Zrnić, 1996: Assessment of rainfall measurement that uses specific differential phase.

,*J. Appl. Meteor.***35**, 2080–2090, doi:10.1175/1520-0450(1996)035<2080:AORMTU>2.0.CO;2.Ryzhkov, A. V., S. E. Giangrande, V. M. Melnikov, and T. J. Schuur, 2005: Calibration issues of dual-polarization radar measurements.

,*J. Atmos. Oceanic Technol.***22**, 1138–1155, doi:10.1175/JTECH1772.1.Schenkman, A., M. Xue, A. Shapiro, K. Brewster, and J. Gao, 2011: The analysis and prediction of the 8–9 May 2007 Oklahoma tornadic mesoscale convective system by assimilating WSR-88D and CASA radar data using 3DVAR.

,*Mon. Wea. Rev.***139**, 224–246, doi:10.1175/2010MWR3336.1.Schultz, P., 1995: An explicit cloud physics parameterization for operational numerical weather prediction.

,*Mon. Wea. Rev.***123**, 3331–3343, doi:10.1175/1520-0493(1995)123<3331:AECPPF>2.0.CO;2.Schwartz, C. S., and Coauthors, 2009: Optimizing probabilistic high resolution ensemble guidance for hydrologic prediction.

*23rd Conf. on Hydrology*, Phoenix, AZ, Amer. Meteor. Soc., 9.4. [Available online at https://ams.confex.com/ams/89annual/techprogram/paper_147171.htm.]Snook, N., and M. Xue, 2008: Effects of microphysical drop size distribution on tornadogenesis in supercell thunderstorms.

,*Geophys. Res. Lett.***35**, L24803, doi:10.1029/2008GL035866.Snook, N., M. Xue, and J. Jung, 2011: Analysis of a tornadic meoscale convective vortex based on ensemble Kalman filter assimilation of CASA X-band and WSR-88D radar data.

,*Mon. Wea. Rev.***139**, 3446–3468, doi:10.1175/MWR-D-10-05053.1.Snook, N., M. Xue, and Y. Jung, 2012: Ensemble probabilistic forecasts of a tornadic mesoscale convective system from ensemble Kalman filter analyses using WSR-88D and CASA radar data.

,*Mon. Wea. Rev.***140**, 2126–2146, doi:10.1175/MWR-D-11-00117.1.Snook, N., M. Xue, and Y. Jung, 2015: Multiscale EnKF assimilation of radar and conventional observations and ensemble forecasting for a tornadic mesoscale convective system.

,*Mon. Wea. Rev.***143**, 1035–1057, doi:10.1175/MWR-D-13-00262.1.Snook, N., M. Xue, and Y. Jung, 2016: Ensemble and probabilistic prediction of the 20 May 2013 Newcastle–Moore EF5 tornado.

*28th Conf. on Severe Local Storms*, Portland, OR, Amer. Meteor. Soc., 16B.6. [Available online at https://ams.confex.com/ams/28SLS/webprogram/Paper301853.html.]Sobash, R. A., J. S. Kain, D. R. Bright, A. R. Dean, M. C. Coniglio, and S. J. Weiss, 2011: Probabilistic forecast guidance for severe thunderstorms based on the identification of extreme phenomena in convection-allowing model forecasts.

,*Wea. Forecasting***26**, 714–728, doi:10.1175/WAF-D-10-05046.1.Stensrud, D. J., and Coauthors, 2009: Convective-scale Warn-on-Forecast System: A vision for 2020.

,*Bull. Amer. Meteor. Soc.***90**, 1487–1499, doi:10.1175/2009BAMS2795.1.Stensrud, D. J., and Coauthors, 2013: Progress and challenges with Warn-on-Forecast.

,*Atmos. Res.***123**, 2–16, doi:10.1016/j.atmosres.2012.04.004.Tanamachi, R. L., L. J. Wicker, D. C. Dowell, H. B. Bluestein, D. T. Dawson II, and M. Xue, 2013: EnKF assimilation of high-resolution, mobile Doppler radar data of the 4 May 2007 Greensburg, Kansas, supercell into a numerical cloud model.

,*Mon. Wea. Rev.***141**, 625–648, doi:10.1175/MWR-D-12-00099.1.Thompson, G., R. M. Rasmussen, and K. Manning, 2004: Explicit forecasts of winter precipitation using an improved bulk microphysics scheme. Part I: Description and sensitivity analysis.

,*Mon. Wea. Rev.***132**, 519–542, doi:10.1175/1520-0493(2004)132<0519:EFOWPU>2.0.CO;2.Thompson, G., P. R. Field, R. M. Rasmussen, and W. D. Hall, 2008: Explicit forecasts of winter precipitation using an improved bulk microphysics scheme. Part II: Implementation of a new snow parameterization.

,*Mon. Wea. Rev.***136**, 5095–5115, doi:10.1175/2008MWR2387.1.Tong, M., and M. Xue, 2005: Ensemble Kalman filter assimilation of Doppler radar data with a compressible nonhydrostatic model: OSS experiments.

,*Mon. Wea. Rev.***133**, 1789–1807, doi:10.1175/MWR2898.1.Tong, M., and M. Xue, 2008: Simultaneous estimation of microphysical parameters and atmospheric state with radar data and ensemble square-root Kalman filter. Part I: Sensitivity analysis and parameter identifiability.

,*Mon. Wea. Rev.***136**, 1630–1648, doi:10.1175/2007MWR2070.1.Ulbrich, C. W., 1983: Natural variations in the analytical form of the raindrop size distributions.

,*J. Climate Appl. Meteor.***22**, 1764–1775, doi:10.1175/1520-0450(1983)022<1764:NVITAF>2.0.CO;2.Vivekanandan, J., W. M. Adams, and V. N. Bringi, 1991: Rigorous approach to polarimetric radar modeling of hydrometeor orientation distributions.

,*J. Appl. Meteor.***30**, 1053–1063, doi:10.1175/1520-0450(1991)030<1053:RATPRM>2.0.CO;2.Wheatley, D. M., N. Yussouf, and D. J. Stensrud, 2014: Ensemble Kalman filter analyses and forecasts of a severe mesoscale convective system using different choices of microphysics schemes.

,*Mon. Wea. Rev.***142**, 3243–3263, doi:10.1175/MWR-D-13-00260.1.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130**, 1913–1924, doi:10.1175/1520-0493(2002)130<1913:EDAWPO>2.0.CO;2.Xue, M., K. K. Droegemeier, and V. Wong, 2000: The Advanced Regional Prediction System (ARPS)—A multiscale nonhydrostatic atmospheric simulation and prediction tool. Part I: Model dynamics and verification.

,*Meteor. Atmos. Phys.***75**, 161–193, doi:10.1007/s007030070003.Xue, M., and Coauthors, 2001: The Advanced Regional Prediction System (ARPS)—A multiscale nonhydrostatic atmospheric simulation and prediction tool. Part II: Model physics and applications.

,*Meteor. Atmos. Phys.***76**, 143–165, doi:10.1007/s007030170027.Xue, M., D.-H. Wang, J.-D. Gao, K. Brewster, and K. K. Droegemeier, 2003: The Advanced Regional Prediction System (ARPS), storm-scale numerical weather prediction and data assimilation.

,*Meteor. Atmos. Phys.***82**, 139–170, doi:10.1007/s00703-001-0595-6.Xue, M., M. Tong, and K. K. Droegemeier, 2006: An OSSE framework based on the ensemble square-root Kalman filter for evaluating impact of data from radar networks on thunderstorm analysis and forecast.

,*J. Atmos. Oceanic Technol.***23**, 46–66, doi:10.1175/JTECH1835.1.Xue, M., Y. Jung, and G. Zhang, 2010: State estimation of convective storms with a two-moment microphysics scheme and an ensemble Kalman filter: Experiments with simulated radar data.

,*Quart. J. Roy. Meteor. Soc.***136**, 685–700, doi:10.1002/qj.593.Yussouf, N., E. R. Mansell, L. J. Wicker, D. M. Wheatley, and D. J. Stensrud, 2013: The ensemble Kalman filter analyses and forecasts of the 8 May 2003 Oklahoma City tornado supercell storm using single- and double-moment microphysics schemes.

,*Mon. Wea. Rev.***141**, 3388–3412, doi:10.1175/MWR-D-12-00237.1.Yussouf, N., D. C. Dowell, L. J. Wicker, K. H. Knopfmeier, and D. M. Wheatley, 2015: Storm-scale data assimilation and ensemble forecasts for the 27 April 2011 severe weather outbreak in Alabama.

,*Mon. Wea. Rev.***143**, 3044–3066, doi:10.1175/MWR-D-14-00268.1.Yussouf, N., J. Kain, and A. Clark, 2016: Short-term probabilistic forecasts of the 31 May 2013 Oklahoma tornado and flash flood event using a continuous-update-cycle storm-scale ensemble system.

,*Wea. Forecasting***31**, 957–983, doi:10.1175/WAF-D-15-0160.1.Zhang, F., C. Snyder, and J. Sun, 2004: Impacts of initial estimate and observations on the convective-scale data assimilation with an ensemble Kalman filter.

,*Mon. Wea. Rev.***132**, 1238–1253, doi:10.1175/1520-0493(2004)132<1238:IOIEAO>2.0.CO;2.Zhu, K. F., Y. Yang, and M. Xue, 2015: Percentile-based neighborhood precipitation verification and its application to a landfalling tropical storm case with radar data assimilation.

,*Adv. Atmos. Sci.***32**, 1449–1459, doi:10.1007/s00376-015-5023-9.