## 1. Introduction

Substantial progress has been made in numerical weather prediction (NWP) over the last few decades that has allowed for a more accurate representation and prediction of deep moist convection by atmospheric models. Convective systems can now be explicitly modeled, and radar reflectivity can now be calculated by the model and used as model output (Koch et al. 2005). This has proven to be a beneficial forecasting tool in the operational meteorology community (Kain et al. 2006; Weisman et al. 2008). Another very important advancement in NWP is the use of ensembles. Ensemble forecasting is a probabilistic technique designed to account for unavoidable errors encountered in NWP (e.g., in the initial conditions, or model physics), aiming to quantify the inherent uncertainties involved with atmospheric prediction (Leutbecher and Palmer 2008). By using an ensemble of forecasts that are all slightly altered by perturbed initial conditions, for example, one can simulate a wide range of possible solutions. In this way, ensemble systems provide useful forecast guidance in the form of spread and probabilities of exceeding certain thresholds for forecast parameters of interest (e.g., probability of exceeding high wind criteria). This is something that cannot be done through deterministic forecasting techniques. Kalnay (2003) also points out that, on average, the ensemble mean should outperform any single deterministic member, and in the case of an ensemble based on a Gaussian error distribution, represents the most likely forecast. In turn, ensemble forecasting has been shown to be an effective forecasting tool in operational environments, even on the mesoscale (e.g., Hou et al. 2001; Grimit and Mass 2002). One particular mesoscale ensemble technique that is growing increasingly popular is the ensemble Kalman filter (EnKF; Evensen 1994), which is both a data assimilation and a forecasting system. Various studies have worked to fine-tune EnKF performance in a variety of model environments (e.g., Houtekamer and Mitchell 1998; Mitchell et al. 2002; Houtekamer et al. 2005; Dirren et al. 2007) and have achieved success in implementing EnKFs in mesoscale NWP models (e.g., Zhang et al. 2006; Fujita et al. 2007; Meng and Zhang 2008a,b; Torn and Hakim 2008).

Although the development of ensemble techniques has been beneficial to the forecasting community, communicating probabilistic forecasts with the public can be problematic (e.g., Morss et al. 2008; Joslyn et al. 2009; Mass et al. 2009), as it has been shown that it is sometimes preferred to be given deterministic forecasts over probabilistic information. Along these lines, it is still useful within an ensemble forecasting system to produce a “best guess” forecast. As mentioned previously, Kalnay (2003) shows that the most mathematically tractable technique (given certain assumptions) to produce such a forecast is to use the ensemble mean as it is the best unbiased linear estimator and statistically outperforms any single ensemble member (Leith 1974; Thompson 1977; Fritsch et al. 2000; Baars and Mass 2005). However, as discussed by Ancell (2013), the ensemble mean may reveal unrealistic behavior as it can diverge from the model attractor, which ultimately results from nonlinearity within the evolving ensemble perturbations. Ancell (2013) quantified the degree of nonlinearity at synoptic scales by examining sea level pressure of midlatitude cyclones impacting the west coast of North America. This was done by examining the divergence of the mean of an ensemble (referred to as the “mean” hereafter) and a deterministic forecast initialized from the mean analysis (referred to as the “control” hereafter). The mean and control are identical at the time of forecast initialization, and would remain so in the absence of nonlinearity (Ancell 2013). Thus, it is through the examination of the divergence of the mean and control that we are able to quantify the degree of nonlinearity.

At synoptic scales, Ancell (2013) found that the mean and control diverge at roughly a constant rate, and the difference between the two becomes significant at forecast times of around 12–24 h, which roughly agrees with previous work done by Gilmour et al. (2001) and Ancell and Mass (2006). Thus, while Ancell (2013) found that using the mean produced the forecast with the smallest errors with regards to cyclone position on average, significant errors existed in specific cases for surface wind speed (up to 10 m s^{−1}) and 6-h precipitation accumulation (up to 25 mm), which are both important meteorological impacts caused by midlatitude cyclones. At convective scales, it is suspected that this rate of divergence is much larger and begins at earlier forecast times since errors and subsequent nonlinearity grow more rapidly (Hohenegger and Schär 2007). In turn, mean forecasts of convection likely lose their usefulness as a realistic possible outcome in perhaps only a few hours. The first goal of this study is to compare nonlinearity (through differences between the mean and control) at convective scales to that at synoptic scales for two different severe storm situations using an object-oriented metric [as opposed to the simple sea level pressure metric used in Ancell (2013)]. Subsequent results may help us to understand how quickly the ensemble mean loses its usefulness as a plausible and accurate forecast for high-impact events at different scales.

The fact that the mean has been shown to become unrealistic at some point in the forecast window motivates developing a technique to produce a deterministic forecast within an ensemble system that is both accurate and realistic. Some techniques such as the probability matching technique (Ebert 2001) or the modified probability matching technique (Fang and Kuo 2013) that take advantages of the probabilistic information in the ensemble have been developed to improve the performance of the mean and the value of the probabilistic guidance. Schwartz et al. (2014) found that the control produced similar forecast errors of convective rainfall rates as a probability matched mean. Ancell (2013) explored a “best member” technique on synoptic scales, evaluating 1) the control, 2) the single member closest to the mean averaged over the entire forecast, and 3) a forecast patched together from different members that were closest to the mean at each 6-h forecast increment. It was found that using the “patched together” forecast produced the most accurate product even though it is not a forecast continuously evolved with the model. In turn, this forecast seemed to show promise as an operational product that can provide both realistic and accurate predictions of high-impact synoptic-scale phenomena. Schwartz et al. (2014) examined convective rainfall rates using two patching techniques and discovered that while these forecasts did often display continuity, some of the evolution of convective features seemed unrealistic. It is still not completely clear though how such best-member techniques manifest themselves with regard to high-impact convective elements, which if understood, could lead to improved operational ensemble products designed to produce best-guess forecasts of severe thunderstorms [much like Ancell (2013) attempted for midlatitude cyclones]. The second purpose of this study is to investigate the nature of the best-member techniques developed in Ancell (2013) using an object-oriented, convective-scale metric, also for two thunderstorm events. Section 2 will present the methodology, section 3 will provide the results and discussion, and section 4 will summarize the findings.

## 2. Methodology

### a. Forecast model

The model used in this study is the Advanced Research WRF (ARW) model, version 3.3 (Skamarock et al. 2008), with three domains (Fig. 1; referred to as domain 1, domain 2, and domain 3 hereafter). The horizontal grid resolutions for domains 1 (outermost domain), 2, and 3 (innermost domain) are 36, 12, and 4 km, respectively. All domains have 38 vertical levels. Domain 1 serves only as a parent domain for the purpose of supplying probabilistic boundary conditions to the nests. Domain 1’s boundary conditions are created through the fixed covariance perturbation method of Torn et al. (2006), and are perturbed about analyses/forecasts from the Global Forecast System (GFS). The lateral boundary conditions for domains 2 and 3 are provided by domains 1 and 2, respectively. The following physics options are used: the Thompson microphysics scheme (Thompson et al. 2008), the Rapid Radiative Transfer Model (RRTM) for longwave radiation (Mlawer et al. 1997), the Dudhia shortwave radiation scheme (Dudhia 1989), the MM5 Monin–Obukhov surface layer physics scheme (Paulson 1970; Dyer and Hicks 1970; Webb 1970; Zhang and Anthes 1982; Beljaars 1995), the unified Noah land surface model (Chen and Dudhia 2001; Tewari et al. 2004), the Yonsei University (YSU) PBL scheme (Hong et al. 2006), and the Kain–Fritsch cumulus parameterization on domains 1 and 2 only (Kain and Fritsch 1993; Kain 2004). The same physics schemes and parameterizations are used for every ensemble member.

### b. Data assimilation system

The data assimilation system used in this work is that of the Data Assimilation Research Testbed (DART; Anderson et al. 2009). A 50-member ensemble adjustment Kalman filter (EAKF; Anderson 2001) is used. The following observations are assimilated every 6 h: *u*- and *υ*-wind components and temperature from aircraft observations; *u*- and *υ*-wind components from satellite data; surface *u*- and *υ*-wind components, surface temperature, and altimeter from marine observations; 10-m *u*- and *υ*-wind components, 2-m surface temperature, and surface pressure from METAR observations; and surface *u*- and *υ*-wind components, surface temperature, and surface altimeter from mesonet observations (on domains 2 and 3 only). Radiosonde observations of temperature and *u*- and *υ*-wind components are also assimilated every 12 h. Moisture observations were evaluated, rather than assimilated. Both localization and inflation are used to account for the small sample size (Anderson and Anderson 1999). The localization radii (Gaspari and Cohn 1999) used here are as follows: for domain 1, 1800 km in the horizontal and 1.5 km in the vertical; and for domains 2 and 3, 600 km in the horizontal and 2.5 km in the vertical. These are tuned values that were in place at the time of the experiments for the Texas Tech University real-time WRF EnKF, which uses the same domains as in this study. A spatially varying adaptive inflation method is used (Anderson 2009), and only the background covariance field is inflated.

### c. Experimental setup

Two convective events are chosen for examination. The dates for these events are 24–25 May 2011 and 15 April 2012. Further details about each event will be presented in section 2d. For each event, the 6-h interval (0000, 0600, 1200, and 1800 UTC) that immediately precedes convective initiation (CI) present in radar observations is designated as the last forecast to be initialized before CI occurs. Convective initiation occurred at approximately 1900 UTC 24 May 2011 and 0300 UTC 15 April 2012, so the “last forecast” for each event is 1800 UTC 24 May 2011 and 0000 UTC 15 April 2012. At 84 h prior to the last forecast, a model spinup period 48 h in length is initialized. This spinup period allows the flow-dependent covariances necessary for the EnKF to establish themselves, and is executed on a 6-h assimilation cycle on domain 1 only. After the spinup period, the 6-h assimilation cycle is continued up to and including the last forecast, and 36-h extended forecasts are run from each analysis. This results in 7 total 36-h forecasts leading up to the convective event, each with subsequently shorter lead time. The first of these seven forecasts does not actually overlap with the convective events of interest since it is initialized more than 36 h prior to CI; therefore, it is not considered for any analyses. The 36-h forecasts are run on all three domains. At the start of the spinup period, initial conditions are created on domain 1 through perturbations about the GFS, using perturbations from the climatological covariances within the National Center for Atmospheric Research (NCAR) WRF-Var system (Barker et al. 2005). For all other initialization times during both the spinup period and each 36-h forecast, initial conditions are generated through the EnKF procedure. Initial conditions for domains 2 and 3 are produced at the end of the spinup period through the WRF nest down process.

### d. Convective cases

The first convective event chosen for this study is the Oklahoma tornado outbreak of 24 May 2011 (hereafter referred to as the May 2011 event). A 500-hPa trough was located over the Four Corners region at 1200 UTC 24 May 2011. At the same time, a developing surface low was located over the Oklahoma Panhandle, with an attendant dryline extending southward into western Texas. By 1800 UTC, the dryline had propagated into far western Oklahoma, and storms began initializing along the dryline in southwestern Oklahoma soon after. By 2100 UTC, a line of supercell thunderstorms was located ahead of the dryline extending from extreme southern Kansas through north-central and southwest Oklahoma and into north-central Texas. The line of supercells continued to grow upscale and propagate eastward across central Oklahoma through the late afternoon and early evening hours, evolving into a mesoscale convective system (MCS), which lasted into western portions of Arkansas and Missouri before dissipating after 0600 UTC. Figure 2 shows the 500 hPa and surface ensemble mean 6-h forecasts valid at 0000 UTC 25 May 2011, along with observed radar reflectivity at that same time to summarize the governing synoptic conditions for this event. The 6-h forecasts are used here in lieu of the ensemble mean analysis because of the physical imbalance associated with data assimilation at initial time that can make the synoptic setup less clear.

The second event is a severe weather producing squall line that moved through western and central Texas on 14–15 April 2012 (hereafter referred to as the April 2012 event). At 0000 UTC 15 April 2012, a 500-hPa trough was located over Arizona. A broad area of surface low pressure was located in southwest Nebraska and eastern Colorado. A retreating dryline stretched from eastern Kansas southward into the eastern Texas Panhandle, and a forward-propagating cold front was approaching the New Mexico–Texas border. Prior to 0300 UTC, a squall line began developing along the cold front–dryline intersection in the Texas Panhandle and west-central Texas. The squall line rapidly intensified and moved eastward during the overnight hours. While northern portions of the squall line began weakening during the overnight hours, it maintained intensity in western Oklahoma and western Texas, and continued to expand southward. The squall line continued to propagate eastward through the morning hours of 15 April at varying levels of intensity. Figure 3 displays the synoptic setup in a similar fashion as Fig. 2, but valid at 0600 UTC 15 April 2012.

### e. Verification using MODE

The model and observation field chosen for verification in this study is composite radar reflectivity. Observed NEXRAD Level III short-range composite reflectivity observations from individual WSR-88D stations located within or in very close proximity to domain 3 were obtained through the National Climatic Data Center website (NOAA/National Climatic Data Center 2013). For each station, observations were collected as close to forecast valid times as possible (at the top of each hour). No radar observations were used with a valid time greater than 5 min prior to or 5 min after a forecast valid time. In cases where two observations were temporally equidistant from the forecast valid time, the latter was used. The observations were interpolated to the same grid projection as domain 3 using the General Meteorology Package (GEMPAK) software (desJardins et al. 1991) and the copygb utility (Developmental Testbed Center 2008).

In this study, we aim to measure the skill of convective forecasts within specific ensemble products through model-simulated reflectivity. While calculating traditional metrics such as mean absolute error is an option, we feel an object-oriented approach is more appropriate to address important forecast characteristics such as timing, spatial coverage, and the shape of reflectivity patterns into our verification procedure. In addition, such an examination goes beyond Ancell (2013) to more appropriately assess nonlinearity and best-member techniques for severe convective forecasts. However, many difficulties arise when attempting to verify meteorological “objects” such as deep moist convection represented by radar reflectivity, particularly when a number of potential objects exist simultaneously that possess many differences in the characteristics mentioned above. While subjective comparisons can be made, such comparisons may change from forecaster to forecaster. This necessitates the use of an objective technique if we are to test our hypotheses regarding the nonlinear evolution of convective elements. The approach chosen here is NCAR’s Model Evaluation Tools (MET), which contains the Method for Object-based Diagnostic Evaluation (MODE; Developmental Testbed Center 2008).

MODE is a tool that is designed to objectively make comparisons of objects (i.e., thunderstorms) between model forecasts and observations that a human analyst would subjectively make (Brown et al. 2007). MODE accomplishes this by identifying objects in both forecast and observation fields, calculating attributes of each object, merging objects in the same field, matching objects in one field with those in the other, and producing a statistical analysis of object attributes. It should be noted that while MODE is to be considered an objective verification method, the user does have some discretion in choosing certain values in the process.

The first step for MODE is object identification, which itself is a four-step process. The first step is to select a field of raw data, such as precipitation accumulation or radar reflectivity. The second step is to smooth the raw data. This process is accomplished by a convolution using a simple circular filter. Once the fields are convolved, the next step is to apply a threshold value to the data, which creates a masked field. After the fields are masked, the raw data are restored inside the masked objects. This concludes the identification of objects process.

Once objects are identified, a variety of attributes are calculated for both single objects and comparisons of two or more objects (Brown et al. 2007; Developmental Testbed Center 2008). After object attributes are evaluated, MODE uses a fuzzy logic engine to merge objects within a field and match objects in one field with those in another. The fuzzy logic engine uses interest maps, confidence maps, and weights to calculate a total interest value. Details of how these maps and weights are chosen and applied can be found in the MET Users’ Guide (Developmental Testbed Center 2008). A threshold value for the total interest function, whose value is between 0 and 1, is selected by the user. This threshold determines which objects are merged within one field, and matched across different fields. Once objects have been defined, merged, and matched, statistical analyses are performed on the object attributes.

For the identification of objects, convolution radii of 16 and 12 km and convolution thresholds of 25 and 20 dB*Z* are used for the May 2011 event and April 2012 event, respectively. The reasoning for the use of different criteria for each event is reflected by the dominant convective mode and its inherent radar characteristics, as well as the desire to produce objects that closely resemble characteristic radar reflectivity outlines of convection. Originally, it was desired to use a uniform convolution radius and threshold for both events. However, the tendency of a squall line’s minor axis to be extremely narrow proved to make it difficult for MODE to identify the squall line at the convolution radius and threshold used for the May 2011 event. It is perceived that with the criteria of the first event, the convolution function was filtering out the squall line, which necessitated the decrease of the two values. Furthermore, the smaller criteria of the April 2012 event were attempted for the May 2011 event. The objects defined by MODE, however, were much too large, and included anvil and broad stratiform rain regions.

Interest maps were prescribed to individual object attributes for merging and matching. In general, the requirements for objects in a single field to be merged are more restrictive than those for objects in different fields to be matched. This is intended to keep isolated areas of convection within the same field apart from each other or larger complexes of convection. The restrictions need to be relaxed to allow areas of convection that may be somewhat displaced between different fields to be matched. This becomes increasingly important with longer forecast lead times. Weights were then chosen for each paired-object attribute. More weight is given to attributes such as centroid distance, boundary distance, convex hull distance, and intersection area ratio. A total interest threshold of 0.7 is assigned to merging, and 0.6 to matching. This follows the previous assessment that requirements for merging objects should be generally less inclusive than for matching. As a reference, the default total interest threshold is 0.7 (Developmental Testbed Center 2008). For a graphical example of how matched objects compare with one another, see Fig. 4.

One feature of MODE that holds important relevance when dealing with convection is its ability to not only match objects in separate fields, but also to create and match “clusters” of objects in each field. For example, if two separate objects in one field are matched with the same object in another, the two separate objects are combined to form a cluster. Most of the data analyses in this project involve matched clusters of objects. On occasion, an object is not matched with another object or cluster of objects in the opposite field, but is still in the vicinity of the ongoing convection and should arguably be considered in the analyses. In such cases, the authors used their discretion in determining which matched objects and matched clusters should be included. It is acknowledged that this problem could be dealt with by changing MODE parameters to produce more desired results. However, as previously noted, a consistency of MODE parameters was preferred. Furthermore, there are several instances in each event where at least one of the fields contains areas of marginally or nonsevere convection that are dislocated from the main area of focus. Objects and clusters of objects in such regions of weak convection were not considered for any of the analyses (see Fig. 5 for an example).

A similar approach to that of Ancell (2013) will be applied here in evaluating the degree of nonlinearity of convective forecasts. Two paired-object attributes in MODE, centroid distance (CD) and intersection area ratio (IAR), are used to evaluate the divergence of the mean and control. Because each paired-object attribute is a comparison between the mean and control, the divergence between the two can be shown by simply examining the growth of CD [similar to Fig. 2a in Ancell (2013)] and the decay of IAR (lower values of IAR correspond to greater model divergence) with increasing forecast lead time or model valid time. This divergence is examined at forecast times after which CI has occurred in both the mean and control such that MODE is able to identify objects in both fields.

Since the mean and control may, at any given forecast hour, contain more than one pair of objects that are of interest, a technique is developed to provide a singular value for CD and one for IAR per forecast hour. The CD at each forecast hour is simply the average of all CDs considered. The IAR at each forecast hour is determined by taking the sum of the intersection area of all pairs of objects considered and dividing it by the sum of the union area of all pairs of objects considered. This prevents smaller objects from having too big of an influence on the singular value, which would likely occur if an average value of IARs was used. This procedure is applied throughout the rest of the analyses.

Both the mean and control are evaluated versus observations in a similar manner as versus each other. Again, CD and IAR will be the paired-object attributes of interest, with the better forecast being the one with smaller CD and larger IAR. Since MODE cannot match objects in one field with those in another if one field does not contain any objects, no comparisons using MODE are made until both forecasts (mean and control) and the observations contain convection.

*m*from the mean at each forecast hour

*h*and CD

_{max}is the maximum CD out of the set of members containing convection (

*M*) for all forecast hours

*H*, rounded up to the nearest grid unit. This results in a normalized CD (CD

_{norm}) for each member at each forecast hour. The possible values for CD

_{norm}range from 0 to 1, with smaller values equating to smaller error. The normalized IAR is defined aswhere IAR is the intersection area ratio of each member

*m*at each forecast hour

*h*, and (1 − IAR)

_{max}is the maximum value of (1 − IAR) for all members

*M*at all forecast hours

*H*, rounded up to the nearest 0.1. This creates a normalized IAR (IAR

_{norm}) with possible values of 0–1. In this case, lower values of the normalized IAR correspond to low error, rather than higher values as before. The NORM is simply obtained by taking the average of CD

_{norm}(

*m*,

*h*) and IAR

_{norm}(

*m*,

*h*).

The member that contains the smallest error for each measurement averaged over the entire forecast is selected as the “single closest member” (referred to as such hereafter). Also, the closest members at each forecast hour are “patched” together to produce a “patched closest member” (referred to as such hereafter). In other words, there will be six total closest member forecasts to consider: a single closest member and patched closest member each determined by CD, IAR, and the NORM.

## 3. Results and discussion

### a. Divergence of the mean and control

The divergence of the mean and the control is examined starting at the first forecast hour when both the mean and control contain convection identifiable by MODE (referred to as model CI hereafter) and ending with the last forecast hour when both the mean and control contain sustained convection detectable by MODE and associated with the event of interest. Again, this divergence reveals the degree of nonlinear perturbation evolution measured through the chosen object-oriented forecast metrics. For the May 2011 case, the model CI and final convective times are 2300 UTC 24 May and 0800 UTC 25 May, respectively. Figure 6 shows the difference between the mean and control for CD (Fig. 6a) and IAR (Fig. 6b) for the five forecasts initialized prior to CI for the May 2011 event. While Ancell (2013) found a somewhat smooth and linear divergence of the mean and control at the synoptic scale, the results here are less smooth. This is probably because these results are associated with a single case and Ancell (2013) averaged his results over many events. A clear pattern can be seen within the last two forecasts (the two closest in time to model CI). Between 3 and 6 h after model CI, a rapid increase in the difference between the mean and control can be seen in both CD (Fig. 6a) and IAR (Fig. 6b). The average CD between the mean and control for the final two forecasts grew from 9.4 grid units (~38 km) at 3 h after model CI to 27 grid units (108 km) at 6 h past model CI, while the average IAR decreased from 0.5 to 0.26 over the same time frame. After around 6 h, this divergence ceases and the difference levels off for CD roughly around values approaching 30 grid units (120 km), but continues to grow for IAR.

The continuation of growing divergence seen through IAR but not in CD can be attributed to the fact that areas of convection began to dissipate during these hours. Storm dissipation implies a continual decrease in aerial coverage of convection and, in turn, the intersection and union areas of the mean and control. Even though both are decreasing, if there are significant differences between the two fields (which has been shown), the intersection area and, therefore, the IAR will both approach and eventually become zero. The area covered by convection may be reduced, but the CD between each convective system remains relatively the same. Therefore, CD is likely a more accurate proxy for error growth after peak divergence.

It is surmised that after forecast times around 6 h after model CI, error saturation, or the growth of model errors being bounded by the model attractor, at convective scales limits the growth of the divergence between the mean and control. Note that once error saturation occurs at any scale (i.e., synoptic, mesoscale, convective, etc.), it precludes error growth at all scales. The most probable reason that the first three forecasts displayed in Fig. 6 do not show the same explosive divergence as the last two is that they are showing the saturated divergence from the synoptic-scale influence. For these forecasts, model CI occurs within or after the time frame identified by Ancell (2013) when nonlinearity becomes significant at synoptic scales (12–24 h). Therefore, the divergence is already saturated prior to model CI for earlier forecasts.

Figure 7 is similar to Fig. 6, but for the April 2012 event. For this event, the forecast hours at which the divergence of the mean and control are evaluated range from 0500 to 1400 UTC 15 April. These results appear even less smooth than the May 2011 case, though some similarities do emerge. As with the May 2011 case, the divergence of the mean and control of the final forecasts leading up to model CI share some resemblance in that they show rapid divergence (and growth of nonlinearity) just after model CI before leveling off. But contrary to the previous event, the rapid divergence occurs earlier in the convective event, around 1–3 h after model CI. The average CD between the mean and control for the final three forecasts increases from 18.8 grid units (~75 km) at 1 h after model CI to 50.9 grid units (~204 km) at 3 h, while the average IAR decreases from 0.29 to 0.18 over the same time frame. Both metrics contain larger overall differences between the mean and control than seen in the previous event. After peak divergence between the mean and control is reached, the behavior of this divergence is less well behaved with multiple periods of weak to moderate divergence and convergence, further suggesting the error has saturated at the convective scale. Looking toward the end of the event, the average IAR of the three forecasts before model CI (Fig. 7b) decreases, which again likely stems from dissipating convection.

Some interesting features presented themselves while examining the effects of forecast lead time on the evolution of the divergence between the mean and control. Figure 8 shows forecast lead time plotted against CD for the May 2011 event. Each individual solid line represents a fixed valid time (e.g., 2300 UTC). The horizontal axis is increasing forecast lead time. So, for example, the 2300 UTC line has a lead time of 5, 11, 17 h, etc., since forecasts are initialized every 6 h. The 2300 UTC line is set in bold to provide emphasis, as it is the first valid time by which model CI had occurred for all forecasts. Since 2300 UTC is early in the convective event of interest, nonlinear effects due to convection should be minimal, meaning any model divergence should solely be due to synoptic-scale influences at this time. When the CD between the mean and control at 2300 UTC is compared with the synoptic-scale divergence identified by Ancell [Fig. 2a in Ancell (2013)], a striking similarity in the pattern of model divergence is revealed. This similarity is very interesting, and suggests the ability to reproduce and isolate the same behavior of mean–control divergence exhibited at synoptic scales using a metric designed for convective objects exists. Even more interesting, Fig. 8 simultaneously shows and isolates the convective-scale divergence of the mean and control using CD. The dashed line represents the CD for the forecast initialized prior to CI (1800 UTC 24 May), and is the same as the black line in Fig. 6a. In turn, Fig. 8 is successfully able to compare mean–control divergence at synoptic scales with that at convective scales, simultaneously showing the slow synoptic-scale divergence with that at the convective scale, which is substantially more rapid.

The growth rate of the difference between the mean and control at the convective scale ranges from around 7 grid units per hour (~28 km h^{−1}) to 9.5 grid units per hour (~38 km h^{−1}), which is 4–9 times as large as that on the synoptic scale (approximately 1–2 grid units per hour, or 4–8 km h^{−1}). Thus, it is evident that the divergence occurs much quicker at convective scales. Also of note is that at synoptic scales, model divergence continues to occur linearly, while at convective scales, it appears to reach a peak, and then the mean–control difference fluctuates afterward. Having only studied two convective events, it appears this peak occurs no later than 6 h after convection initiates within the model.

Figure 9 is the same as Fig. 8, but for the April 2012 event. Here we were not able to produce the synoptic-scale model divergence shown by Ancell (2013). It is presumed that because error saturation at convective scales occurred sooner for this event (1–3 vs 3–6 h after model CI for the May 2011 event), the synoptic-scale divergence is not reproducible. It is intriguing that for every valid time (every solid line plotted), there is a peak divergence, followed by rather steep convergence of the mean and control. This can be at least partially explained by the fact that the forecast initialized two forecast cycles prior to CI produced exceptionally diverse forecasts between the mean and control (not shown). Other peaks and valleys found in Fig. 9 are also likely attributable to the forecast quality of the mean and control.

The results here show that nonlinearity can become an important factor in forecasts of convective processes in the first 6 h after CI. The growth rate of the divergence between the mean and control is explosive for a few hours, and then stabilizes. In contrast, the growth rate of this divergence due to nonlinearity at synoptic scales is roughly linear, and significant divergence does not emerge until at least 12 h into a forecast. The convective-scale divergence between the mean and control was greater for the April 2012 event than the May 2011 event, but the growth of divergence for forecasts initialized closer to CI does present a similar structure. Furthermore, error saturation due to nonlinear effects at convective scales occurs slightly sooner for the April 2012 event (1–3 h after model CI) than the May 2011 event (3–6 h after model CI).

### b. Model evaluation

Once CI occurred in both the mean and control, the simulated composite radar reflectivity calculated by each was compared with observed composite reflectivity (Figs. 10 and 2c). Figure 11 shows the mean and control evaluated versus observations for the five forecast cycles preceding CI for the May 2011 event using CD (Fig. 11a) and IAR (Fig. 11b) averaged across each forecast hour within a given forecast cycle. For every forecast, the control significantly outperforms the mean when averaged across an entire forecast cycle. Also calculated were the CD and IAR at each model valid time averaged across different forecast cycles (Fig. 12). These results agree that the control is superior for this event. Furthermore, for CD, there does not exist a single forecast hour at which the mean is better than the control, while using IAR as the evaluation metric produces only two forecast hours of one forecast cycle where the mean is better (not shown). It is clear that for this particular event, the control outperformed the mean at both short and longer lead times.

The results for the April 2012 event were not as clear-cut as the May 2011 event (Figs. 13 and 3c). Applying the same averaging techniques as with the previous case, it is determined that the control was indeed the better forecast for all forecast cycles except the one immediately preceding CI (Fig. 14). Additionally, for the forecast that initialized 24 h prior to the aforementioned one (0000 UTC 14 April 2012), IAR indicates that the mean is a better forecast, while CD presents the contrary, though both are by slim margins. Looking at how the mean and control perform at each valid time proves to be more telling. The control produces a significantly improved forecast when CD is used. The same occurs when IAR is employed, except for the second earliest valid time at which the mean holds a very slight edge (Fig. 15).

Since the mean of an ensemble is known to have a large area bias, especially at longer forecast lead times, consideration was given to the possible implications this could have on the results. It was confirmed that there is in fact a large area bias for the mean when compared to the control, although this bias is rather small and only increases near the end of the May 2011 event, while remaining fairly steady near the end of the April 2012 event (Fig. 16). The larger bias right at the beginning of the April 2012 event is likely due to the control being slow to initialize convection. Nevertheless, even though a large area bias does exist in the mean, the control still consistently had a larger IAR with the observations than the mean, including at longer lead times (see Figs. 12b and 15b). This further shows that the control was often a superior forecast, and that the ensemble spread shown through the spatial coverage of convection depicted by the mean is appropriate.

The control’s outperformance of the mean for nearly every forecast is somewhat of an unanticipated result. Ancell (2013) showed that the mean was a better forecast on average than the control for synoptic-scale midlatitude cyclones traversing the Pacific Northwest region. However, Schwartz et al. (2014) also found that the control outperformed a simple ensemble mean for convective rainfall rates. It should be noted, though, that a probability matching technique was not employed in this study because only geometric properties of convection were evaluated, as opposed to rainfall rates or radar reflectivity, which a probability matching technique would theoretically improve (Ebert 2001). Given the limited amount of data examined in this study, there is not enough evidence to sufficiently state that the control will, on average, outperform the mean for forecasts of convection.

### c. Best-member techniques

The mean was better than the control for just one forecast (0000 UTC 15 April 2012). In turn, the closest members were determined for that forecast alone. Since the concept of best-member techniques is to produce a realistic forecast that has similar error statistics as the mean, it would be pointless to perform such an operation when a different forecast (the control) that is both statistically superior and realistic is readily available. Nevertheless, it is still of interest how the closest members perform, and if there is a pattern as to which members are closest to the mean [such as in Ancell (2013) where it was found that different members were closest to the mean at different times]. This would be particularly important if the mean was found to be better on average for a large number of cases. In any event, examining how long members that are closest to the mean at one time stay closest to the mean, and how often the closest member to the mean changes throughout a forecast window may reveal crucial insights as to how to develop best-member techniques in the presence of large nonlinearity.

The single and patched closest members at each forecast hour were determined by CD, IAR, and the NORM, and are displayed in Table 1. The single closest member using each metric turned out to be the same member (member 1). The CD produced a different closest member at each forecast hour, suggesting that a patched method should be pursued. However, IAR produced the same closest member for each forecast hour, which was also member 1. Member 1 was also the closest member at 9 out of the 11 forecast hours using the NORM. This suggests that the IAR is producing a stronger signal in the selection process of closest members. It should be noted that even though member 1 was never evaluated as the closest member at any forecast hour using CD, it was a high-ranking member (top 3) near the beginning and end of the forecast period (not shown). Therefore, it appears in this case that it would be most beneficial to use a single member as opposed to a patched together forecast consisting of different members. The patched forecast determined using CD did appear to be realistic, with only a few minor discrepancies in storm motion and evolution from hour to hour. Even though these best-member techniques were only applicable to one forecast in this study, it is important to know which member(s) are closest to the mean, if the closest member changes from forecast hour to forecast hour, and what method of determining closest member(s) works best in case it turns out there are more instances of the mean outperforming the control for forecasts of convection.

List of closest members at each forecast hour (0400–1400 UTC 15 Apr 2012) and the average closest member for the entire forecast (Avg) determined by centroid distance (CD), intersection area ratio (IAR), and a normalized average of the two (NORM).

## 4. Summary and conclusions

A high-resolution WRF Model EnKF was used to examine the nonlinear behavior of an ensemble within the framework of an object-oriented evaluation of severe convection. This work was motivated by Ancell (2013), who found that the ensemble mean, which is usually viewed as the most-likely forecast, can exhibit meteorologically unrealistic behavior once nonlinear ensemble perturbation evolution becomes substantial. Such unrealistic behavior was found on synoptic scales on the order of a day in Ancell (2013), motivating a technique to determine a “closest member” that is both realistic and possesses the statistical benefits of the mean. This paper focuses on examining this behavior on storm scales since convective forecasts are critically important to the protection of life and property, yet have been shown to exhibit more pronounced nonlinear behavior than on synoptic scales (indicating the need to appropriately identify a best-guess forecast within a convective-scale ensemble).

This work has taken a similar approach to Ancell (2013) in that it estimates the degree of nonlinearity within an ensemble for two severe convective events by examining the divergence between the mean of the ensemble (mean) and the deterministic forecast initialized from the mean analysis (control). This study is unique in that unlike Ancell (2013), the evaluation here is based on a software tool titled MODE to objectively verify composite radar reflectivity both simulated in models and observed by radars. The two convective cases examined involved an outbreak of supercells in Oklahoma (24 May 2011) and a squall line in Texas (14–15 April 2012). It was found that significant divergence between the mean and control due to nonlinearity at convective scales occurs no later than 6 h into a given convective event, and may occur as early as 1 h. These results were found through the use of two metrics: centroid distance (CD; the distance between the geometric centers of two areas of convection in different fields) and intersection area ratio (IAR; the ratio of the intersection area compared to the union area of two regions of convection in different fields). This validates the hypothesis that nonlinearity grows substantially faster on convective scales than on synoptic scales, at least within the framework of the metrics used here, for which it may take up to 1 day for nonlinearity to become prominent. It is important to know when such sufficient nonlinearity occurs because it can force the mean off the model attractor, causing unrealistic forecast features such as incorrectly smoothed-out pressure fields to provide poor forecasting guidance for associated meteorological variables. This essentially reveals the mean to be a poor choice for a best-guess forecast since the probability of unrealistic behavior is large.

For each of the two events, the five forecasts preceding CI, which initialized 6 h apart, were further studied with regard to the MODE parameters of CD and IAR. In 9 out of the 10 forecasts examined, the control produced a more accurate forecast than the mean. This somewhat contradicts the findings when a similar procedure was applied to central pressure errors of synoptic-scale midlatitude cyclones traversing the Pacific Northwest region (Ancell 2013). Since the ensemble mean has been shown more generally to be a statistically optimal forecast on average (Kalnay 2003), a possible and perhaps likely explanation of this outcome is that in this study only two cases are examined, yet over 100 cases were composited in Ancell (2013). In any case, the single forecast in which the mean performed better provides a test case to examine how individual ensemble members evolve in relation to the mean toward developing a closest-member technique on convective scales.

Even though the mean was only better for one forecast, an attempt was made to seek a closest member, or group of closest members that might produce similar forecast errors to those of the mean while still being a realistic representation of the atmosphere. The closest single member averaged across the entire forecast (single closest member) and a combination of the members closest to the mean at each forecast hour (patched closest member) were determined using CD, IAR, and a normalized average of the two (NORM), for a total of six best-member forecasts. The single closest member was found to be the same member for all three evaluation methods. Furthermore, using IAR produced the same closest member as the single closest member at every forecast hour, implying that the spatial coverage of convection in the single closest member showed great resemblance to the mean.

There are several limiting factors to this study. Obviously, only studying two convective events limits the ability to ascertain that the results presented in this paper will be applicable to a broad spectrum of severe convective events. Using the objective analysis tool MODE as a surrogate for subjective analysis proved to be promising, though it may produce misleading results if used without caution. MODE can only perform comparisons of “objects” (i.e., thunderstorms) if they exist in both fields being evaluated. Furthermore, if two objects exist in one field and one in the other, one of the objects in the first field may compare well with the one object in the other field. But the second object in the first field may not get matched with any objects in the opposite field and be unaccounted for in the output statistics. This gives the false impression of a successful forecast when, actually, a large area of convection was completely nonexistent in one of the fields. Another potential issue is that adjusting the user-controlled parameters in MODE can reconfigure the shape of objects and, more importantly, how objects are grouped together. The goal was to produce objects that had similar shapes and sizes to the modes of convection being studied. To do so necessitated the use of different parameters for each event. It is also acknowledged that the definition of “similar shapes and sizes” may change from person to person. As long as the same parameters are employed with some consistency (i.e., throughout a particular event), the results should not be jeopardized. Additionally, it must be noted that the results produced in this paper apply only to the two MODE metrics employed in the data analyses, and using different metrics, both inclusive and exclusive to MODE, could potentially produce different results.

In the future, more convective events need to be examined, both of similar and different modes of convection than the ones studied here. Correspondingly, examination of the transition times between convective modes is desired. The two events chosen for this study did not lend themselves well to this type of evaluation. In addition, further investigation into MODE is needed to discover if there are more efficient ways of assessing deep moist convection. Using object attributes other than CD and IAR, perhaps boundary or convex hull distance, ought to be considered. Another area of future exploration is the examination of model divergence using the values of radar reflectivity (dB*Z*), or perhaps precipitation accumulation, as opposed to this paper, which only covered spatial properties of convection. Any research performed should consider the use of a probability matching technique (i.e., Ebert 2001; Fang and Kuo 2013) in order to capture the small-scale subtleties the ensemble averaging process tends to smooth out. Other different methods of verification should also be considered. Since the difference between the mean and control could actually be used to statistically represent ensemble spread, and spread growth should be flow dependent, the noise-to-signal ratio proposed by Fang and Kuo (2015) may be useful for determining the flow signal. Even if a multitude of cases were to show that the control is a better forecast for deep moist convection than the mean, it is still necessary to have ensemble forecasting to address the uncertainty. Though the control may be on a model attractor, it may not be the “real” model attractor (i.e., a perfect atmospheric model).

## Acknowledgments

The authors thank fellow faculty members and graduate students of the Atmospheric Science Group at Texas Tech University for their support, John Halley Gotway and Michael James from the University Corporation for Atmospheric Research, and the High Performance Computing Center staff at Texas Tech University for assisting with software and computing issues.The authors also wish to thank two anonymous reviewers who provided a number of comments and suggestions that improved the article. This work was supported by NOAA CSTAR Grant NWS NA11NWS4680001.

## REFERENCES

Ancell, B. C., 2013: Nonlinear characteristics of ensemble perturbation evolution and their application to forecasting high-impact events.

,*Wea. Forecasting***28**, 1353–1365, doi:10.1175/WAF-D-12-00090.1.Ancell, B. C., , and C. F. Mass, 2006: Structure, growth rates, and tangent linear accuracy of adjoint sensitivities with respect to horizontal and vertical resolutions.

,*Mon. Wea. Rev.***134**, 2971–2988, doi:10.1175/MWR3227.1.Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903, doi:10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2.Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters.

,*Tellus***61A**, 72–83, doi:10.1111/j.1600-0870.2008.00361.x.Anderson, J. L., , and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts.

,*Mon. Wea. Rev.***127**, 2741–2758, doi:10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.Anderson, J. L., , T. Hoar, , K. Raeder, , H. Liu, , N. Collins, , R. Torn, , and A. Avellano, 2009: The Data Assimilation Research Testbed: A community facility.

,*Bull. Amer. Meteor. Soc.***90**, 1283–1296, doi:10.1175/2009BAMS2618.1.Baars, J. A., , and C. F. Mass, 2005: Performance of National Weather Service forecasts compared to operational, consensus, and weighted model output statistics.

,*Wea. Forecasting***20**, 1034–1047, doi:10.1175/WAF896.1.Barker, D. M., , M. S. Lee, , Y.-R. Guo, , W. Huang, , S. Rizvi, , and Q. Xiao, 2005: WRF-Var—A unified 3/4D-Var variational data assimilation system for WRF.

*Sixth WRF/15th MM5 Users’ Workshop*, Boulder, CO, NCAR, 17 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/workshop/WS2005/presentations/session10/1-Barker.pdf.]Beljaars, A. C. M., 1995: The parameterization of surface fluxes in large-scale models under free convection.

,*Quart. J. Roy. Meteor. Soc.***121**, 255–270, doi:10.1002/qj.49712152203.Brown, B. G., , R. Bullock, , J. Halley Gotway, , D. Ahijevych, , C. Davis, , E. Gilleland, , and L. Holland, 2007: Application of the MODE object-based verification tool for the evaluation of model precipitation fields.

*22nd Conf. on Weather Analysis and Forecasting/18th Conf. on Numerical Weather Prediction*, Park City, UT, Amer. Meteor. Soc., 10A.2. [Available online at http://ams.confex.com/ams/pdfpapers/124856.pdf.]Chen, F., , and J. Dudhia, 2001: Coupling an advanced land surface/hydrology model with the Penn State/NCAR MM5 modeling system. Part I: Model description and implementation.

,*Mon. Wea. Rev.***129**, 569–585, doi:10.1175/1520-0493(2001)129<0569:CAALSH>2.0.CO;2.desJardins, M. L., , K. F. Brill, , and S. S. Schotz, 1991: Use of GEMPAK on Unix workstations. Preprints,

*Seventh Int. Conf. on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology*, New Orleans, LA, Amer. Meteor. Soc., 449–453.Developmental Testbed Center, 2008: MET, version 1.1: Model Evaluation Tools users’ guide. Development Testbed Center, 168 pp. [Available online at http://www.dtcenter.org/met/users/docs/overview.php.]

Dirren, S., , R. D. Torn, , and G. J. Hakim, 2007: A data assimilation case study using a limited-area ensemble Kalman filter.

,*Mon. Wea. Rev.***135**, 1455–1473, doi:10.1175/MWR3358.1.Dudhia, J., 1989: Numerical study of convection observed during the Winter Monsoon Experiment using a mesoscale two-dimensional model.

,*J. Atmos. Sci.***46**, 3077–3107, doi:10.1175/1520-0469(1989)046<3077:NSOCOD>2.0.CO;2.Dyer, A. J., , and B. B. Hicks, 1970: Flux-gradient relationships in the constant flux layer.

,*Quart. J. Roy. Meteor. Soc.***96**, 715–721, doi:10.1002/qj.49709641012.Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation.

,*Mon. Wea. Rev.***129**, 2461–2480, doi:10.1175/1520-0493(2001)129<2461:AOAPMS>2.0.CO;2.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99**, 10 143–10 162, doi:10.1029/94JC00572.Fang, X., , and Y.-H. Kuo, 2013: Improving ensemble-based quantitative precipitation forecasts for topography-enhanced typhoon heavy rainfall over Taiwan with a modified probability-matching technique.

,*Mon. Wea. Rev.***141**, 3908–3932, doi:10.1175/MWR-D-13-00012.1.Fang, X., , and Y.-H. Kuo, 2015: A new generic method for quantifying the scale predictability of the fractal atmosphere: Applications to model verification.

,*J. Atmos. Sci.***72**, 1667–1688, doi:10.1175/JAS-D-14-0112.1.Fritsch, J. M., , J. Hilliker, , J. Ross, , and R. L. Vislocky, 2000: Model consensus.

,*Wea. Forecasting***15**, 571–582, doi:10.1175/1520-0434(2000)015<0571:MC>2.0.CO;2.Fujita, T., , D. J. Stensrud, , and D. C. Dowell, 2007: Surface data assimilation using an ensemble Kalman filter approach with initial condition and model physics uncertainties.

,*Mon. Wea. Rev.***135**, 1846–1868, doi:10.1175/MWR3391.1.Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–757, doi:10.1002/qj.49712555417.Gilmour, I., , L. A. Smith, , and R. Buizza, 2001: Linear regime duration: Is 24 hours a long time in synoptic weather forecasting?

,*J. Atmos. Sci.***58**, 3525–3539, doi:10.1175/1520-0469(2001)058<3525:LRDIHA>2.0.CO;2.Grimit, E. P., , and C. F. Mass, 2002: Initial results of a mesoscale short-range ensemble forecasting system over the Pacific Northwest.

,*Wea. Forecasting***17**, 192–205, doi:10.1175/1520-0434(2002)017<0192:IROAMS>2.0.CO;2.Hohenegger, C., , and C. Schär, 2007: Atmospheric predictability at synoptic versus cloud-resolving scales.

,*Bull. Amer. Meteor. Soc.***88**, 1783–1793, doi:10.1175/BAMS-88-11-1783.Hong, S.-Y., , Y. Noh, , and J. Dudhia, 2006: A new vertical diffusion package with an explicit treatment of entrainment processes.

,*Mon. Wea. Rev.***134**, 2318–2341, doi:10.1175/MWR3199.1.Hou, D., , E. Kalnay, , and K. K. Droegemeier, 2001: Objective verification of the SAMEX ’98 ensemble forecasts.

,*Mon. Wea. Rev.***129**, 73–91, doi:10.1175/1520-0493(2001)129<0073:OVOTSE>2.0.CO;2.Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811, doi:10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2.Houtekamer, P. L., , H. L. Mitchell, , G. Pellerin, , M. Buehner, , M. Charron, , L. Spacek, , and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations.

,*Mon. Wea. Rev.***133**, 604–620, doi:10.1175/MWR-2864.1.Joslyn, S., , L. Nadav-Greenberg, , and R. M. Nichols, 2009: Probability of precipitation: Assessment and enhancement of end-user understanding.

,*Bull. Amer. Meteor. Soc.***90**, 185–193, doi:10.1175/2008BAMS2509.1.Kain, J. S., 2004: The Kain–Fritsch convective parameterization: An update.

,*J. Appl. Meteor.***43**, 170–181, doi:10.1175/1520-0450(2004)043<0170:TKCPAU>2.0.CO;2.Kain, J. S., , and J. M. Fritsch, 1993: Convective parameterization for mesoscale models: The Kain–Fritsch scheme.

*The Representation of Cumulus Convection in Numerical Models, Meteor. Monogr.*, No. 24, Amer. Meteor. Soc., 165–170.Kain, J. S., , S. J. Weiss, , J. J. Levit, , M. E. Baldwin, , and D. R. Bright, 2006: Examination of convection-allowing configurations of the WRF model for the prediction of severe convective weather: The SPC/NSSL Spring Program 2004.

,*Wea. Forecasting***21**, 167–181, doi:10.1175/WAF906.1.Kalnay, E., 2003:

*Atmospheric Modeling, Data Assimilation and Predictability.*Cambridge University Press, 363 pp.Koch, S. E., , B. Ferrier, , M. T. Stoelinga, , E. Szoke, , S. J. Weiss, , and J. S. Kain, 2005: The use of simulated radar reflectivity fields in the diagnosis of mesoscale phenomena from high-resolution WRF model forecasts.

*12th Conf. on Mesoscale Processes*, Albuquerque, NM, Amer. Meteor. Soc., J4J.7. [Available online at http://ams.confex.com/ams/pdfpapers/97032.pdf.]Leith, C. E., 1974: Theoretical skill of Monte Carlo forecasts.

,*Mon. Wea. Rev.***102**, 409–418, doi:10.1175/1520-0493(1974)102<0409:TSOMCF>2.0.CO;2.Leutbecher, M., , and T. N. Palmer, 2008: Ensemble forecasting.

,*J. Comput. Phys.***227**, 3515–3539, doi:10.1016/j.jcp.2007.02.014.Mass, C. F., and et al. , 2009: PROBCAST: A Web-based portal to mesoscale probabilistic forecasts.

,*Bull. Amer. Meteor. Soc.***90**, 1009–1014, doi:10.1175/2009BAMS2775.1.Meng, Z., , and F. Zhang, 2008a: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part III: Comparison with 3DVAR in a real-data case study.

,*Mon. Wea. Rev.***136**, 522–540, doi:10.1175/2007MWR2106.1.Meng, Z., , and F. Zhang, 2008b: Tests of an ensemble Kalman filter for mesoscale and region-scale data assimilation. Part IV: Comparison with 3DVAR in a month-long experiment.

,*Mon. Wea. Rev.***136**, 3671–3682, doi:10.1175/2008MWR2270.1.Mitchell, H. L., , P. L. Houtekamer, , and G. Pellerin, 2002: Ensemble size, balance, and model-error representation in an ensemble Kalman filter.

,*Mon. Wea. Rev.***130**, 2791–2808, doi:10.1175/1520-0493(2002)130<2791:ESBAME>2.0.CO;2.Mlawer, E. J., , S. J. Taubman, , P. D. Brown, , M. J. Iacono, , and S. A. Clough, 1997: Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave.

,*J. Geophys. Res.***102**, 16 663–16 682, doi:10.1029/97JD00237.Morss, R. E., , J. L. Demuth, , and J. K. Lazo, 2008: Communicating uncertainty in weather forecasts: A survey of the U.S. public.

,*Wea. Forecasting***23**, 974–991, doi:10.1175/2008WAF2007088.1.NOAA/National Climatic Data Center, 2013: NEXRAD data archive, inventory and access. NOAA/National Climatic Data Center, accessed June 2013. [Available online at http://www.ncdc.noaa.gov/nexradinv/.]

Paulson, C. A., 1970: The mathematical representation of wind speed and temperature profiles in the unstable atmospheric surface layer.

,*J. Appl. Meteor.***9**, 857–861, doi:10.1175/1520-0450(1970)009<0857:TMROWS>2.0.CO;2.Schwartz, C. S., , G. S. Romine, , K. R. Smith, , and M. L. Weisman, 2014: Characterizing and optimizing precipitation forecasts from a convection-permitting ensemble initialized by a mesoscale ensemble Kalman filter.

,*Wea. Forecasting***29**, 1295–1318, doi:10.1175/WAF-D-13-00145.1.Skamarock, W. C., and et al. , 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Note NCAR/TN-475+STR, 113 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/docs/arw_v3_bw.pdf.]

Tewari, M., and et al. , 2004: Implementation and verification of the unified Noah land surface model in the WRF model.

*20th Conf. on Weather Analysis and Forecasting/16th Conf. on Numerical Weather Prediction*, Seattle, WA, Amer. Meteor. Soc., 14.2a. [Available online at https://ams.confex.com/ams/84Annual/techprogram/paper_69061.htm.]Thompson, G., , P. R. Field, , R. M. Rasmussen, , and W. D. Hall, 2008: Explicit forecasts of winter precipitation using an improved bulk microphysics scheme. Part II: Implementation of a new snow parameterization.

,*Mon. Wea. Rev.***136**, 5095–5115, doi:10.1175/2008MWR2387.1.Thompson, P. D., 1977: How to improve accuracy by combining independent forecasts.

,*Mon. Wea. Rev.***105**, 228–229, doi:10.1175/1520-0493(1977)105<0228:HTIABC>2.0.CO;2.Torn, R. D., , and G. J. Hakim, 2008: Performance characteristics of a pseudo-operational ensemble Kalman filter.

,*Mon. Wea. Rev.***136**, 3947–3963, doi:10.1175/2008MWR2443.1.Torn, R. D., , G. J. Hakim, , and C. Snyder, 2006: Boundary conditions for limited-area ensemble Kalman filters.

,*Mon. Wea. Rev.***134**, 2490–2502, doi:10.1175/MWR3187.1.Webb, E. K., 1970: Profile relationships: The log-linear range, and extension to strong stability.

,*Quart. J. Roy. Meteor. Soc.***96**, 67–90, doi:10.1002/qj.49709640708.Weisman, M. L., , C. Davis, , W. Wang, , K. W. Manning, , and J. B. Klemp, 2008: Experiences with 0–36-h explicit convective forecasts with the WRF-ARW model.

,*Wea. Forecasting***23**, 407–437, doi:10.1175/2007WAF2007005.1.Zhang, D.-L., , and R. A. Anthes, 1982: A high-resolution model of the planetary boundary layer—Sensitivity tests and comparisons with SESAME-79 data.

,*J. Appl. Meteor.***21**, 1594–1609, doi:10.1175/1520-0450(1982)021<1594:AHRMOT>2.0.CO;2.Zhang, F., , Z. Meng, , and A. Aksoy, 2006: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part I: Perfect model experiments.

,*Mon. Wea. Rev.***134**, 722–736, doi:10.1175/MWR3101.1.