## 1. Introduction

Successful forecasting of convective storms remains one of the most challenging problems of numerical weather prediction (NWP) today. Several factors contribute to the challenge. First and foremost, limited three-dimensional observational coverage at convective scales (resolutions of ∼1 km or less) imposes difficulties in constructing reliable initial analyses of storm environments and structures. At these scales, the only operationally available source of information is Doppler radar observations, the assimilation of which has proven very challenging as they are nonlinearly related to the model state, which itself is a product of highly nonlinear dynamical–microphysical processes. Further complicating this procedure is also the question of how best to carry out data assimilation for a distinctly multiscale problem. Finally, large systematic model errors, including those associated with precipitation microphysics and resolution (e.g., Gilmore et al. 2004; Bryan et al. 2003) may lead to quick divergence of forecast states from observed ones.

The goal of the present study is to evaluate the quality of convective-storm forecasts of up to 30 min beginning from analyses produced by an ensemble Kalman filter (EnKF) assimilating real Doppler radar observations. This is a companion paper to Aksoy et al. (2009, hereafter Part I), which compared three cases with supercell, multicell, and linear characteristics and focused on how storm-scale analyses depend on the various aspects of the EnKF system and Doppler radar observations. Part I demonstrated the robustness of the EnKF in producing realistic storm-scale analyses of comparable quality for the 3 cases of interest as measured by the fit of the background (typically, a forecast of just 2-min lead time) to observed radar fields of equivalent reflectivity factor (hereafter “reflectivity” for brevity), no-precipitation observations, and radial velocity. Here, we consider the behavior of longer forecasts, as a further and potentially more stringent test of the EnKF analyses.

The EnKF utilizes an ensemble of forecasts to estimate the covariances required for statistical data assimilation (Evensen 1994; Houtekamer and Mitchell 1998). Part I reviews in detail the EnKF and its applications to convective scales. Meanwhile, with increases in computing capabilities, research related to explicit short-range numerical prediction of convective storms has expanded rapidly. Although a range of nowcasting techniques also exists for short-range prediction (e.g., Wilson et al. 2004; Pierce et al. 2004), our interest in this paper is strictly numerical prediction.

Explicit dynamical prediction of convection, initialized using radar observations, has mostly relied on either retrieval techniques or variational data assimilation. In this regard, various techniques have been applied to a range of convective systems including supercells (Hu et al. 2006a,b; Sun 2005; Weygandt et al. 2002), squall lines (Sun and Zhang 2008; Xiao and Sun 2007; Zhao et al. 2006), and mesoscale convective systems (Dawson and Xue 2006). Except Zhao et al. (2006) and Dawson and Xue (2006), all of the mentioned studies utilize both Doppler radial velocity and reflectivity observations and discuss the relative contributions of these two types of observations to forecast quality. The common emerging theme, albeit partially due to the mostly ad hoc ways of assimilating reflectivity in these studies, is that radial velocity observations have the most impact on analysis states and are needed to initialize the three-dimensional wind structure, while reflectivity observations have weaker overall impact and primarily influence microphysical and thermodynamic fields. Nevertheless, all studies report positive results with radar observations and improved forecast quality in a deterministic-forecasts framework.

Ensemble-based data assimilation and forecasting at convective scales is a much less explored area of research in NWP. While, as early as 1992, we see that the distinction between the nature of large-scale and convective-scale atmospheric motions was acknowledged and ensemble, as opposed to deterministic, forecasting was suggested for the convective scales (e.g., Brooks et al. 1992), even a decade ago the focus of short-term numerical forecasting remained almost entirely deterministic (Wilson et al. 1998). Since then, there has been increasing acceptance that mesoscale and especially convective-scale forecasts should be probabilistic even at lead times of an hour or two. Stensrud et al. (1999) and then later Mass et al. (2002) drew attention to the potential of and need for ensemble forecasting at mesoscales. Later, Elmore et al. (2002a,b, 2003) used a cloud-resolving model in an ensemble configuration to investigate probabilistic distributions of convective-storm lifetimes over multiple days with convection, albeit with relatively limited focus on prediction of convection in terms of its bulk properties and not on explicit prediction of individual storm structures and evolution. Nevertheless, their argument that difficulties in estimating storm environments and convective initiation make it very difficult for cloud-scale models to explicitly forecast thunderstorm behavior is still valid today, considering that uncertainties continue to exist in storm environments and potential locations of convective initiation. In addition, our models are far from being perfect, compounding the effects of errors in the initial state. More recently, Kong et al. (2006, 2007) explored the concept of explicit ensemble-based prediction of convection for a tornadic supercell case, using a nested-domain configuration with the highest resolution of 3 km and initializing a 5-member ensemble at this resolution through assimilation of Doppler radial velocity and reflectivity observations. They concluded that ensemble forecasts of explicitly resolved convection have the potential for greater skill and more operational value than a single deterministic forecast. Furthermore, they found that forecasts are improved with the assimilation of observations including Doppler radar data. Zhang et al. (2009) explored similar issues of radar data assimilation and ensemble forecasting in the context of tropical cyclones, employing an EnKF and assimilating Doppler radial velocity observations only, with promising success. Finally, a recent study by Stensrud and Gao (2010) also explored the value of ensemble forecasting at convective scales. They found, for a supercell case, that while their ensembles did not exhibit variability in the *mode* of convection (i.e., nearly all ensemble members produced supercell storms), other features such as storm track, intensity, cold pool characteristics, mesocylones, and updraft speed were noticeably different among ensemble members.

The present study emphasizes the use of a relatively sophisticated statistical data-assimilation scheme that takes advantage of flow-dependent background covariances, at the expense of resolution and domain size, and thus complements the previous work of Kong et al. (2006, 2007). In doing this, we seek to contribute toward bridging the gap between ensemble-based analyses of storm-scale structures and ensemble forecasts thereof, through a systematic investigation of impacts of the EnKF to forecast quality for a variety of cases with different convective characteristics.

Our focus is on the same three convective cases that we investigated in Part I: a supercell in Oklahoma on 11 April 2005, a convective line in Kansas on 15 June 2002, and a multicell storm in Oklahoma on 8 May 2005. We employ the Weather Research and Forecasting (WRF) model (Skamarock et al. 2005), configured as a cloud model, with open boundary conditions, no terrain, and the influence of larger scales represented only through the specification of an environmental sounding at the inflow boundaries. Initial ensemble states are obtained through 60-min assimilation of radial velocity, reflectivity, and no-precipitation observations (i.e., observations of reflectivity less than 5 dB*Z*) by the EnKF. The EnKF is implemented through the Data Assimilation Research Test bed (DART; more information on DART and its freely available source code can be obtained online at www.image.ucar.edu/DAReS/DART). Ensemble forecasts, with 50 ensemble members, are carried out using the identical configuration of WRF that was employed during the assimilation of observations.

The structure of the present article is as follows: section 2 briefly describes relevant aspects of the experimental setup. Several diagnostic quantities used to evaluate the ensemble forecasts are explained in section 3. The results are presented in section 4. Section 5 closes with a summary and discussion of our results.

## 2. A brief summary of experimental setup

This study focuses on three cases with varying convective characteristics. All cases are observed by nearby operational Weather Surveillance Radar-1988 Doppler (WSR-88D) radars. The 11 April 2005 case is a supercell case over Oklahoma, observed by the KTLX radar. Observations between 2220 and 2320 UTC are assimilated, and the ensemble forecasts are initialized at 2320 UTC. The 15 June 2002 convective line case over Kansas is observed by the KGLD radar. Observations between 1758 and 1858 UTC are assimilated, and the ensemble forecasts are initialized at 1858 UTC. The 8 May 2005 multicell case, again over Oklahoma, is also observed by the KTLX radar. Observations between 2058 and 2158 UTC are assimilated, and the ensemble forecasts are initialized at 2158 UTC.

The numerical model is version 2.1 of the Advanced Research WRF (Skamarock et al. 2005), which is nonhydrostatic and employs a mass coordinate. We configure WRF as a simplified cloud model with open lateral boundary conditions and no boundary layer or land surface parameterizations. All experiments employ flat terrain, a 6-hydrometeor ice microphysics parameterization (Lin et al. 1983), 2-km horizontal and (nominal) 500-m vertical resolution, and a model top at 18 km.

All experiments are carried out with 50 ensemble members. Initial ensemble conditions are obtained through the assimilation of Doppler radial velocity, reflectivity, and no-precipitation observations (i.e., observations of reflectivity less than 5 dB*Z*) over the preceding 60-min period as described in Part I. The assimilation scheme uses the parallel EnKF algorithm of DART (more specifically, the ensemble adjustment Kalman filter, or EAKF; Anderson and Collins 2007). All forecasts are initialized from 60-min EnKF analyses.

A summary of the experimental setup and further discussion of the processing of the Doppler radar observations can be found in Table 1. For more detail, please refer to Part I, section 2.

## 3. Verification and diagnostic techniques for ensemble forecasts of convective storms

### a. Mean innovation, RMS innovation, and spread ratio R

*i*indicates quantities related to the

*i*th observation valid at a given time

*t*,

*y*is the observed variable,

**x**is the state vector of the forecast model (i.e., all model variables at all grid points concatenated into a vector), the superscript

*f*indicates a quantity based on a forecast from the time of the most recent analysis to current forecast time

*t*, an overbar denotes the ensemble mean, and

*H*is the observation operator that maps the state vector onto the observations. Similar to Part I, the innovations for radial velocity and reflectivity observations will be considered separately. Innovation-based statistics to be used are the mean innovation, 〈

*d*〉, the root-mean-square (rms) innovation, 〈

*r*〉, and the ratio,

^{f}*R*, of the actual forecast ensemble spread to the “optimal” ensemble spread, each of which is defined in Part I.

For brevity and for consistency in what follows with Part I, we will refer to all observation-minus-forecast differences as “innovations,” regardless of the forecast lead time.

*n*denotes an individual ensemble member. For long-range forecasts, plotting verification statistics across an entire ensemble will enable us to visualize such potential situations as outlier members or multimodal ensemble distributions.

### b. Equitable threat score

*A*,

_{h}*A*

_{fa}, and

*A*represent the numbers of hits (correctly forecast precipitation points), false alarms (forecast precipitation points that did not occur), and misses (precipitation points that were not forecast), respectively, while

_{m}*A*

_{ref}= (

*A*+

_{h}*A*

_{fa})(

*A*+

_{h}*A*)/

_{m}*N*, where

_{F}*N*is the total number of forecast points, or

_{F}*N*=

_{F}*A*+

_{h}*A*

_{fa}+

*A*+

_{m}*A*

_{cr}, with

*A*

_{cr}being the number of correct rejections (correctly forecast no-precipitation points). The ETS is computed for the event that reflectivity exceeds 15 dB

*Z*using reflectivity observations from all available scan surfaces.

### c. Reflectivity correlation coefficient

*r*), is another commonly used measure of forecast skill that operates on pairs of points in the forecast and observed fields: Here, the additional subscript

_{c}*R*symbolizes that only observations and forward operators for reflectivity are considered. As before, quantities in brackets denote averages over all reflectivity observation points

*M*. Notice that this statistic bears close resemblance to the centered anomaly correlation (Wilks 2006).

_{R}### d. Number of reflectivity

Besides metrics that concentrate strictly on comparison with observed quantities, evaluation of forecasts will also be made in terms of how smooth transitions are from analysis states into forecast states. One such metric we will use is *N _{R}*, the total number of points at which reflectivity exceeds 40 dB

*Z*at a particular time. Since reflectivity of 40 dB

*Z*or greater generally appears only in strong, long-lived cells with significant updrafts, we take this metric as a measure of the strength of convection in a forecast. When observed storms are near the observing radar platform (which is mostly the case for all three of our cases), this number can be safely estimated from observed reflectivity and no-precipitation points. Another advantage of computing this metric at observation locations is that one can perform the same computation for observed reflectivity and obtain a direct comparison between forecast and observed convective features.

### e. Normalized volume-integrated updraft speed

*w*, and is computed as follows: Here, the subscript

_{I}*j*indicates quantities related to the

*j*th model grid box valid at a given time

*t*, the summation is carried out over

*M*, which is the set all grid points with

_{W}*w*exceeding a threshold, Δ

*V*is the volume of a given grid box, and

*V*is the total volume over

_{W}*M*. (In this study, we use a 3 m s

_{W}^{−1}threshold to compute

*w*for all cases.) As such,

_{I}*w*is an integrated measure of the convective activity within the model domain.

_{I}### f. Height of maximum reflectivity and maximum height of 40-dBZ reflectivity

To analyze the vertical structure of convection in the forecasts, several height-based diagnostics will be computed: height of maximum reflectivity, or *H*^{refmax}, will represent the height (kilometers above ground level) of maximum forecast reflectivity found within the entire model domain at any given time. Maximum height of 40-dB*Z* reflectivity, or Hmax^{ref40}, will be computed as the maximum height (kilometers above ground level) reached by forecast 40-dB*Z* reflectivity within the entire model domain at any given time.

As was the case for the observation-space statistics, *N _{R}* and

*w*will be calculated for both ensemble means and individual ensemble members to demonstrate the range of possibilities in our verifications. At lead times that are sufficiently long compared to the time scale for error growth, while the ensemble mean continues to be the best (minimum variance) estimator for the state, individual ensemble member states that typically represent discrete storm entities inevitably diverge to the point that the ensemble mean begins to resemble the underlying system’s climatological mean rather than a particular realization at given verification times. To give a practical example, if ensemble members propagated individual cells differently, at sufficiently long lead times, the mean updraft would appear to decay even though each member’s updraft was steady. Furthermore, plotting individual ensemble members’ verification metrics also enables us to diagnose situations such as existence of outlier members and skewed or multimodal distributions in ensembles, which are presumably more likely to occur in forecasts of longer lead times.

_{I}A summary of all observation-space and model-space diagnostic statistics can be found in Table 2.

## 4. Results

We will present our analysis of forecast results in the following order: We will first focus on the visualization of 30-min forecast fields of selected ensemble members to investigate consistency and physical realism in the forecasts. Following this, diagnostics that are computed in observation and model spaces will be discussed.

### a. Comparison of 30-min ensemble forecast fields

We first turn our attention to the horizontal distribution of some key model fields valid at their respective 30-min forecast times. Figures 1 –3 summarize the results for the 11 April 2005 supercell, 15 June 2002 convective line, and 8 May 2005 multicell cases, respectively. Forecast model fields chosen for this purpose are reflectivity (dB*Z*, top rows in each Figs. 1 –3), vertical velocity (m s^{−1}, middle rows), and surface (lowest model level) perturbation temperature (K, bottom rows). In addition, respective distributions of observed reflectivity at their lowest available scan surfaces are shown in the top-left panels of Figs. 1 –3. To facilitate comparison, forecast reflectivity fields are also plotted on the same scan surfaces that their respective observations are plotted on. Meanwhile, model levels for vertical velocity are chosen to roughly coincide with the respective radar scan surface.

For reasons mentioned earlier, we focus exclusively on respective individual member states in this section. As comparisons of all ensemble members for various state variables (not shown) revealed no dramatic differences in terms of the details of convective behavior, for brevity, only two ensemble members are shown in Figs. 1 –3 as an indication of the variability within forecast ensembles. For each respective case, one of such two members is coined the “good” member to represent a member state that most resembled observed reflectivity distributions at its respective forecast time. Conversely, the “bad” member represents a state that deviated most from observed reflectivity. The selection of members in this way is mostly subjective, although quantitative norms such as total energy and total hydrometeor mixing ratio revealed qualitatively similar dispersions within forecast ensembles (not shown).

Comparison of Figs. 1 –3 reveals that, at least in a qualitative sense, evolutions of reflectivity remain consistent with observations in all cases. Observed convective modes are generally well maintained in the forecasts of the convective line and multicell cases through 30 min, while all members in the supercell case exhibit noticeable decay, albeit at different rates. Evolution of forecast reflectivity further suggests that different members diverge on time scales of *O*(10 min). The largest (fastest) divergence among members occurs in the multicell case and the smallest occurs in the convective line case. Qualitatively, these results suggest that the cases may exhibit different predictability limits.

In the 11 April 2005 supercell case, we see that all members exhibit decay as is exemplified by the 30-min forecasts of reflectivity (i.e., less than 40-dB*Z* reflectivity everywhere in the 1.5° scan), downdrafts in storm cores, and surface warm pools in both members shown. There are also considerable propagation differences among members. However, this decay is not believed to be purely an artifact of the forecasts: as was pointed out in Part I (see section 4g and Figs. 13a,b in Part I), the 60-min analysis mean state also exhibited warm temperature perturbations at the surface, despite strong updrafts and relatively high reflectivity above. In coming sections, we will show evidence that, in actuality, this storm exhibits complex time evolution throughout the 60-min analysis period and speculate that the combination of an unfavorable mean sounding and the observed decay during the first phase of the storm’s life cycle may have led to the inconsistencies in the analysis fields and ultimately resulted in the decay we see during its forecast.

For the 15 June 2002 convective line case, we see that all members (not shown) predict a linear structure out to 30 min and the two members shown are remarkably similar in overall reflectivity characteristics and their differences are mostly in the details of propagation speeds and strengths. One important note is that both members exhibit systematically lower maximum reflectivity by about ∼10 dB*Z* in convective cores and underpredict the areal extent of the anvil region (Figs. 2a–c). Similarly, comparable maximum updraft speeds are maintained in the two members (Figs. 2d,e). Nevertheless, member 43 exhibits much less of a consistent linear organization compared to the good member 5 and concentrates convection toward the southwest edge of the storm. This observation is also confirmed by the surface perturbation temperature fields (Figs. 2f,g); while the maximum values and the overall distributions are similar in both members, member 5 has a more-concentrated and faster-propagated surface gust front on its southwest edge. Overall, differences between the two members are small and suggest an underdispersed ensemble. This will be discussed in more detail in the coming sections with further evidence.

Finally, for the 8 May 2005 multicell case, the general multicellular character of convection persists in all the members. This case exhibits the greatest differences among members; variations in the number, location, and intensity of cells are pronounced. Some members are more progressive in the way they represent the gust front along the eastern edge of the complex and produce stronger convection along this line. There also appears to be a reemergence of some spurious convective activity akin to our results in Part I when we did not assimilate the no-precipitation observations (cf. to Fig. 10a of Part I, which exhibits spurious convection in the 60-min analysis mean along a generally north–south line ∼40 km east of the observed convective region). This will be supported with further diagnostics and results will be compared to Part I later.

### b. Forecast performance with respect to observations

We now turn our attention to some of the observation-based quantitative measures to evaluate forecast quality. This discussion spans Figs. 4 –9 and will be mostly based on the forecast parts of the experiments, which correspond to 60–90 min in the respective figures.

Time evolutions for rms innovation, 〈*r ^{f}*〉, mean innovation, 〈

*d*〉, and spread ratio,

*R*, for observed reflectivity are shown in Fig. 4. Figure 5 shows the same quantities for observed radial velocity. From the time evolution of errors, a quick deterioration of forecast quality, as measured by the growth of rms and mean innovations, is evident for all three cases. Error doubling for both reflectivity and radial velocity appears to occur on time scales

*O*(12–18 min) for all cases. Furthermore, in most cases errors during the first 12–18 min of forecast remain comparable in absolute magnitude to their analysis short-term-forecast counterparts.

The evolution of both the rms and mean innovations of radial velocity differs between the convective line case and the other two cases (Figs. 5a–f). First, the rms innovations are noticeably larger at the 60-min analysis time and grow much faster during the free forecast for the convective line case. Furthermore, the mean innovation for the supercell and multicell cases remains consistently low within values of 1–2 m s^{−1}, comparable in magnitude to the observation error of 2 m s^{−1}, as the forecasts progress, suggesting that no significant biases exist for these two cases in their wind predictions. However, we see a more consistent increase in negative radial velocity bias for the convective line case, nearing −3 m s^{−1} by 30 min of forecast. This aspect of the convective line case will be analyzed in more detail later.

Meanwhile, time evolutions of spread ratios suggest a steady increase of ensemble spread, albeit at varying rates, for all cases. Nevertheless, at least during the first ∼20 min of forecasts for all cases, spread ratios remain below unity, indicating that ensemble variance remains insufficient. This is a continuation of the same issue that was much more severe during respective analysis periods, as discussed in Part I.

A return of spurious convection in the forecast of the 8 May 2005 multicell case was suggested in section 4a. To analyze this in more detail, Fig. 6 shows, as a function of time, the number of observation locations where spurious precipitation is forecast (i.e., false-alarm grid points with forecast reflectivity of 10 dB*Z* or higher but collocated no-precipitation observations). For easy comparison, respective statistics from the analysis experiment without the assimilation of no-precipitation observations are also shown (these directly correspond to the “best” values from Fig. 11 of Part I). We see that, as soon as the assimilation of no-precipitation observations stops and the free forecast starts, the number of locations with spurious precipitation begins to grow and, by the end of the 30-min forecast, reaches levels comparable to those valid at the 30th minute of analysis without no-precipitation observations. Thus, while the assimilation of no-precipitation observations successfully suppressed spurious convection during analysis and has positive impact on short forecasts, the rate of increase of spurious cells is very similar in analysis with (Fig. 6) and without (Fig. 11 of Part I) no-precipitation observations. This indicates that the occurrence of spurious cells in the multicell case is likely controlled by factors that are not influenced by assimilation of no-precipitation observations, such as errors in the environmental sounding and the model error.

Vertical profiles of rms innovation, 〈*r ^{f}*〉, mean innovation, 〈

*d*〉, and number of observations assimilated at 60-min analysis and 18- and 30-min forecast times, are shown for observed reflectivity in Fig. 7 (for each given plot time, statistics are computed over the preceding 6-min intervals to roughly include all observations within a full radar scan). Figure 8 shows the same quantities for observed radial velocity. In both figures, observations are binned in 1-km height ranges starting from ground level.

Vertical reflectivity error profiles (Fig. 7) reveal that while mean innovation remains consistently lower than rms innovation at the 60-min analysis times for all cases, it ultimately exceeds rms innovation by the 30-min forecast times throughout most of the depth of the observations with largest deviations at higher levels. Thus, underprediction of reflectivity is systematic and tends to dominate errors as forecasts progress. Also, the larger deviations at higher levels are another indication that, at least for the convective line and multicell cases, systematic underprediction of the anvil regions also plays a significant role. It is also interesting to note that, by the 30-min forecast time, random errors as measured by the rms innovation appear to saturate at a 10–12-dB*Z* level consistently throughout the depth of the observations for all cases.

Meanwhile, vertical error profiles of radial velocity (Fig. 8) exhibit a distinctly different behavior for the convective line case. As mentioned previously, the mean radial-velocity innovation is significant and grows over time to −3 m s^{−1}. Consistent with that finding, the vertical profile also shows a noticeable region of negative bias at mid- to upper levels that grows in extent and magnitude and becomes as high as −10 m s^{−1} at ∼10 km by the 30-min forecast time (Figs. 8d–f). The rms innovation too is very large (cf. observation error) at similar heights and reaches magnitudes as high as 16 m s^{−1}.

Considering the large upper-level reflectivity mean innovation values observed for this case in Figs. 7d–f, we conjecture that part of the bias arises from the fact that the depth of the forecast convection is underpredicted compared to that observed. To illustrate this, Fig. 9 shows selected observed and forecast fields, valid at the 30-min forecast time, plotted comparatively at 5.3° and 2.4° scan surfaces (corresponding to an above ground level height of ∼10–11 and 4–5 km, respectively, near the observed storm). To start with, comparing observed and forecast reflectivity (Figs. 9a–d), we see that, while at lower levels there is a relatively good representation of the storm in the mean forecast, at upper levels convective cores in the mean forecast penetrate only to 10–11 km in height, unlike the observed storm. (This is also confirmed by the very weak updraft and temperature perturbations in the forecast in the same region, not shown.) Furthermore, Fig. 9j shows that observed-forecast radial velocity differences at upper levels are dominated by widespread negative biases coincident spatially with the strong divergent pattern in the observations, suggesting that upper-level outflow is present in observations but missing in the forecast. No such spatially consistent negative biases are discernable at lower levels (Fig. 9i): errors appear more random in nature and are mostly concentrated along the edges of the convective region, suggesting that they arise from the displacement of it. One potential cause of the systematic upper-level differences from observations is that the sounding is not representative of the mature storm environment. While this is difficult to demonstrate directly, simulations initialized with the same bubbles but with no data assimilation (not shown) show that the sounding for the convective line case does not support long-lived convection, indicating that forecasts from this sounding will tend to underestimate the intensity and longevity of convection. This is indeed consistent with the storm being too shallow in our forecasts.

ETS plots (Figs. 10a–c) show that the 0.6–0.7 levels achieved toward the end of the analysis periods are quickly lost during the forecasts in all 3 cases, consistent with the steady growth of rms innovations. Nevertheless, values above 0.4 are retained out to 12–18 min of forecasts. The worst-performing case is 15 June 2002, which also shows the least ensemble variability. As the ETS is only influenced by the pointwise verification of whether precipitation of a given intensity was correctly forecast, it is sensitive to errors in the location and areal extent of convection. Recalling that our ensemble members in all cases typically diverged noticeably in terms of cell propagation (which leads to location errors) and exhibited systematic underprediction of the anvil regions of observed storms (which leads to areal coverage errors), the forecast performance in terms of ETS is not surprising. For the convective line case, these general errors are accompanied by the underprediction of storm depth, and thus the forecast quality as measured by ETS is the worst among the three cases.

The analysis and forecast performances are generally better with regard to the reflectivity correlation coefficient (*r _{c}^{f}*; Figs. 10d–f). Levels of 0.8–0.9 are reached during analysis and we see that many ensemble members retain a score of 0.5 or higher for the first 18 min of forecasts in all cases. Unlike the ETS or innovation statistics, this metric does not compare predicted and observed reflectivity in absolute terms, but rather relative to how spatial deviations deviate from domain-wide averages. This statistic is not influenced by systematic biases (Wilks 2006) and is much less sensitive to location and coverage errors. As most reflectivity errors encountered in our cases fall into one of these categories, the relatively better performance with respect to

*r*is to be expected and indicates good spatial representativeness of observed reflectivity by the forecast model.

_{c}^{f}### c. Forecast performance in the model space

While observation-space diagnostics provide direct measures of forecast quality, such analysis is limited to those aspects of the model state that project on the available observations. Therefore, we now extend our analysis to statistics that are directly based on model fields themselves, with the general goal of diagnosing consistency of the evolution of model-predicted convection from the analysis into the forecast.

First, two statistics that reflect the overall intensity of convection are presented in Fig. 11: the number of reflectivity observations (*N _{R}*) and the volume-integrated updraft speed (

*w*). One additional advantage of diagnosing

_{I}*N*is that a direct comparison to observed

_{R}*N*can be made. Recalling that we use a 40-dB

_{R}*Z*threshold to compute

*N*, the general underestimation of

_{R}*N*in the mean forecasts compared to observations (Figs. 11a–c) during analysis is consistent with our previous finding of a tendency for underprediction of strong precipitation. Nevertheless, transitions from analysis to forecast are smooth. During forecasts, distinct differences arise among cases. For the supercell case, we see that all members predict observed

_{R}*N*well for 12–18 min but then quickly decay. For the convective line case, systematic underprediction is apparent in all members, but the time evolution remains consistent with observations. For the multicell case,

_{R}*N*for the members themselves is in relatively good agreement with observations. However, because of large location differences of predicted cells among members, the ensemble-mean signal is very weak and incorrectly suggests weakening.

_{R}Meanwhile, except for the supercell case, analysis-to-forecast transitions are also relatively smooth with respect to updraft speed, as measured by *w _{I}* (Figs. 11d–f). This should be a more stringent metric of consistency because any analysis-induced imbalances within analysis members should be quicker to project to updraft speed through dynamical processes. Therefore, we find the generally smooth transitions of updraft speed in two cases from analysis to forecast encouraging. During forecasts, differences among members are again apparent and can be interpreted in parallel to our findings for

*N*. For the supercell case, the decay in all members from the onset of forecasts is very distinct. This suggests that dynamically, the analysis storm cannot be maintained in the forecast ensemble and decay ensues as soon as analysis is terminated. The apparent lag of the decay in the microphysical fields, as was indicated by

_{R}*N*, is due to the time it takes for hydrometeors to fall and reach the ground. To illustrate this, Fig. 12 shows height of maximum reflectivity,

_{R}*H*

^{refmax}, and maximum height of 40-dB

*Z*reflectivity as a function of time. We see that for the supercell case (Figs. 12a,d), both metrics indicate an initial lifting of hydrometeors in the forecast for about 6 min and then a steady fall.

For the convective line case, *w _{I}* remains steady at ∼6 m s

^{−1}among all ensemble members (Fig. 11e), suggesting that reasonable convective intensity is maintained in the forecast. However, as can be confirmed from Figs. 2g,h, the updraft region is a narrow line along the gust front and thus is very sensitive to displacement errors. Therefore, as members diverge throughout the forecast, the mean signal becomes weaker and mean

*w*disconnects from the member signal. A similar observation can also be made for the multicell case (Fig. 11f). However, in this situation, the separation of the member and mean signals is much more rapid. Previously, we argued that the assimilation of no-precipitation observations had not suppressed spurious convection, which was observed in the overall increase of its areal coverage over time. Accounting for the growth of

_{I}*N*in the members that is inconsistent with observed steady trend, we believe that the overall growth of

_{R}*w*in the members, too, is likely an artifact of the errors in the initial sounding.

_{I}To analyze bulk properties of how forecast convection behaves in the vertical, height of maximum reflectivity (*H*^{refmax}) and maximum height of 40-dB*Z* reflectivity (Hmax^{ref40}) are plotted as a function of time in Fig. 12 for all cases. First and foremost, a distinct underprediction of *H*^{refmax} is apparent: in all three cases, we see a tendency of the model to concentrate maximum reflectivity at the surface. Similarly, Hmax^{ref40} is also consistently underpredicted in all cases, especially during forecasts. The underprediction is most pronounced for the convective line case, confirming, more substantively, the evidence that was presented in Fig. 9 for lower vertical extent of convection in the model compared to observations. Both of these results seem to be consistent with our previous finding that the most intense parts of convection tend to be underpredicted by the model. Meanwhile, relative levels of Hmax^{ref40} among cases are consistent with their observed counterparts. In other words, the convective line case exhibits generally higher levels of this statistic compared to the other two cases, similar to observations.

Overall, we see that physical realism and consistency of the forecast model varies considerably, depending on what aspect of convection one focuses on. In certain ways, encouraging consistencies are observed both in dynamical aspects (i.e., updraft response) and connections between dynamical and microphysical characteristics. At the same time, the model and/or the simple representation of the mesoscale environment within the model exhibit clear deficiencies, such as the overall underprediction of reflectivity and anvil regions. Among the three cases diagnosed, the most successful forecast appears to be that of the 8 May 2005 multicell case, albeit its overly energetic sounding that tends to produce too much convection. The forecast of the 15 June 2002 convective line, too, is realistic in many aspects. Despite an initial sounding that does not support convection when the model is initialized through idealized means (i.e., a few warm bubbles added at a single time), after a 60-min period of assimilation of radar observations, the forecast exhibits good fidelity and maintains a realistic linear structure in all ensemble members out to 30 min. The case that remains most negatively influenced by the initial sounding even after 60 min of data assimilation is the 11 April 2005 supercell case. For this case, we have seen that data assimilation produces an analysis storm with mixed characteristics (i.e., realistic precipitation and updraft fields but an anomalous warm pool, see Part I), which is not maintained in any of the ensemble members during the forecast and thus, dissipates.

## 5. Summary and discussion

This paper presents results for the assimilation of Doppler radar observations using an EnKF and a storm-scale numerical model. While Part I focused on how storm-scale analyses depend on the various aspects of the EnKF system and Doppler radar observations, here we turn our attention to storm-scale ensemble forecasts initialized from such EnKF analyses. Three independent cases with various convective behavior (a supercell, a convective line, and a multicellular system) are considered. The Weather Research and Forecasting (WRF) model, configured as a simplified cloud model with 2-km resolution, provides forecasts. The ensemble filtering algorithm is implemented through the Data Assimilation Research Testbed (DART). All cases use the same configuration of the modeling and assimilation systems for both analyses and forecasts. All experiments use 50 ensemble members and are carried out to forecast lead times of 30 min.

In the cloud mode, the model is forced by a prescribed environmental profile of temperature, moisture, and horizontal winds at the lateral boundaries. To introduce uncertainty in the mesoscale forcing, respective profiles of horizontal winds are perturbed in the ensembles.

In general, we find that all cases remain under the influence of their environmental soundings to varying degrees: the multicell case produces the most successful forecast. While the environmental sounding appears to produce more widespread convection than observed, it nevertheless results in a realistic mode of convection. The assimilation of no-precipitation observations successfully suppresses spurious convection in the analyses and in forecasts of short lead times, but a return of spurious convection is observed by the 30-min forecast with errors comparable in magnitude to the case with no assimilation of no-precipitation observations at similar lead times. This indicates that the occurrence of spurious cells in the multicell case is more likely controlled by such factors as the uncertainties in the mesoscale environment or model error. The mesoscale environment, however, is not impacted by the assimilation of no-precipitation observations in clear-air regions (as deduced from model soundings in clear-air regions; not shown). This emphasizes the need for a more comprehensive, multiscale data assimilation effort that both assimilates no-precipitation observations at the storm scale, so that spurious convection is suppressed in analyses, and updates the environment at the mesoscale, so that residual small perturbations do not lead to further growth of spurious convection in forecasts.

Meanwhile, the convective line case is influenced by a sounding that does not support convection as deep as the observed structure, especially when simulations are carried out without data assimilation (i.e., free forecasts from “warm bubble” initial conditions). Nevertheless, the cycling of Doppler radar observations for a period of 60 min leads to a very realistic structure in the analyzed storm (as shown in Part I), and even the 30-min forecasts maintain reasonable structure in all ensemble members. The underprediction of the vertical extent of convection is potentially caused by the upper-level errors in the initial sounding. We conclude that radar data assimilation has a positive impact to initialize convection and mitigate the negative influence of the imperfect sounding while idealized methods (i.e., initialization from pure bubble states) failed to sustain convection.

Finally, with the caveat that the same data assimilation configuration is used as the other two cases, we found that the supercell case appears to be the most challenging to forecast. Similar to the convective line case, the environmental sounding for the supercell case does not support convection when initialized from an idealized bubble state. Furthermore, the storm undergoes a complex life cycle during the 60-min analysis period and a coherent storm structure cannot be established even after 60 min of data assimilation (while a reasonable reflectivity structure is obtained, an anomalous warm pool persists underneath the storm). This leads to an eventual decay of the storm in all ensemble members through the 30-min forecast period.

Our results emphasize the importance of incorporating mesoscale information, both in terms of the environment and its uncertainty, for successful convective-scale data assimilation and forecasting. We conclude, based on our results that our simple method of perturbing the mean environmental sounding appeared to have some positive impact on ensemble spread and that radar data assimilation helped modify mean environments to varying degrees, that there is a clear need for more advanced mesoscale data assimilation to obtain more realistic, and preferably three-dimensional, environmental structures and uncertainty characteristics. Likewise, the prospects of incorporating errors not just in kinematic environmental variables but also in thermodynamic ones should be explored. Our findings with regard to the importance of mesoscale environment and its uncertainty appear to also be supported by a recent study by Stensrud and Gao (2010), where the impacts of incorporating three-dimensional environmental initial conditions are explored. For a supercell case, they conclude that ensembles initialized with realistic three-dimensional environments compare better with observations than ensemble forecasts initialized with vertical soundings representative of the environmental conditions near the storm.

Furthermore, it is likely that model deficiencies are important contributors to forecast error in our experiments. Forecast errors that were common in all storms are the systematic underpredictions of strong precipitation and of anvil regions of storms. We speculate that the limited model resolution and deficiencies in the microphysics scheme are the principal causes of these errors.

We should also caution that our choice of a cloud model to perform ensemble analyses and forecasts clearly imposes some limitations to the extent realistic structures can be represented in a numerical model. One major drawback of such an implementation should be expected in the boundary layer, as our model configuration with no PBL parameterization will inevitably limit the realism with which critical elements of storm structure such as low-level inflows and gust fronts can be simulated. Nevertheless, realistic structures of acceptable quality (as assessed subjectively and quantitatively by the mean and rms distances to observations) were attained in final analyses and maintained in forecasts for up to 10–15 min, except in the supercell case. We believe that the mentioned model limitations could have contributed to the dissipation of the supercell in this case. Furthermore, while we attempted to minimize such effects a priori by choosing geographically isolated cases with minimal influence from synoptic and mesoscales, we nevertheless observed that our cases were noticeably influenced by the mean environments that we imposed on the model. Based on our findings demonstrating overall positive impact of radar observations on storm structures and evolution in analyses and short-term forecasts, we expect that our results would likely be improved by using lateral boundary conditions that allow for more realistic three-dimensional, mesoscale variations and that use full physics.

Finally, our results also provide rough estimates of the predictability of convective-scale flows. In all the cases, the ensemble spread and the rms fit to observations increase with a doubling time on the order of 10 min. There are also, however, distinct differences among the cases in how the members diverge from one another. For example, we see a much more prominent diversity in the location and intensity of convection among ensemble members within the multicell case compared to the convective line case, as well as quantitatively more rapid growth of ensemble spread. This suggests that convective predictability itself, presumably mostly controlled by cloud-scale microphysical nonlinear processes, likely exhibits a probabilistic distribution dictated by the mode of convection, which in return is mostly a function of the mesoscale flow. Future studies of convective predictability should take into account such potential feedback from the larger scales for a more complete understanding of how errors behave at the convective scales.

## Acknowledgments

We are indebted to Alain Caya for the original observation operators and other capabilities related to the assimilation of Doppler radar observations with WRF and DART, and to Jeff Anderson for his leadership of DART development. This work was supported by NSF Grant 0205655 and by the U.S. Weather Research Program.

## REFERENCES

Aksoy, A., , D. C. Dowell, , and C. Snyder, 2009: A multicase comparative assessment of the ensemble Kalman filter for assimilation of radar observations. Part I: Storm-scale analyses.

,*Mon. Wea. Rev.***137****,**1805–1824.Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129****,**2884–2903.Anderson, J. L., , and N. Collins, 2007: Scalable implementations of ensemble filter algorithms for data assimilation.

,*J. Atmos. Oceanic Technol.***24****,**1452–1463.Brooks, H. E., , C. A. Doswell III, , and R. A. Maddox, 1992: On the use of mesoscale and cloud-scale models in operational forecasting.

,*Wea. Forecasting***7****,**120–132.Bryan, G. H., , J. C. Wyngaard, , and J. M. Fritsch, 2003: Resolution requirements for the simulation of deep moist convection.

,*Mon. Wea. Rev.***131****,**2394–2416.Dawson, D. T., , and M. Xue, 2006: Numerical forecasts of the 15–16 June 2002 Southern Plains mesoscale convective system: Impact of mesoscale data and cloud analysis.

,*Mon. Wea. Rev.***134****,**1607–1629.Elmore, K. L., , D. J. Stensrud, , and K. C. Crawford, 2002a: Explicit cloud-scale models for operational forecasts: A note of caution.

,*Wea. Forecasting***17****,**873–884.Elmore, K. L., , D. J. Stensrud, , and K. C. Crawford, 2002b: Ensemble cloud model applications to forecasting thunderstorms.

,*Wea. Forecasting***17****,**363–383.Elmore, K. L., , S. J. Weiss, , and P. C. Banacos, 2003: Operational ensemble cloud model forecasts: Some preliminary results.

,*Wea. Forecasting***18****,**953–964.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**(C5). 10143–10162.Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125****,**723–757.Gilmore, M. S., , J. M. Straka, , and E. N. Rasmussen, 2004: Precipitation uncertainty due to variations in precipitation particle parameters within a simple microphysics scheme.

,*Mon. Wea. Rev.***132****,**2610–2627.Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Hu, M., , M. Xue, , and K. Brewster, 2006a: 3DVAR and cloud analysis with WSR-88D level-II data for the prediction of the Fort Worth, Texas, tornadic thunderstorms. Part I: Cloud analysis and its impact.

,*Mon. Wea. Rev.***134****,**675–698.Hu, M., , M. Xue, , J. Gao, , and K. Brewster, 2006b: 3DVAR and cloud analysis with WSR-88D level-II data for the prediction of the Fort Worth, Texas, tornadic thunderstorms. Part II: Impact of radial velocity analysis via 3DVAR.

,*Mon. Wea. Rev.***134****,**699–721.Kong, F., , K. K. Droegemeier, , and N. L. Hickmon, 2006: Multiresolution ensemble forecasts of an observed tornadic thunderstorm system. Part I: Comparsion of coarse- and fine-grid experiments.

,*Mon. Wea. Rev.***134****,**807–833.Kong, F., , K. K. Droegemeier, , and N. L. Hickmon, 2007: Multiresolution ensemble forecasts of an observed tornadic thunderstorm system. Part II: Storm-scale experiments.

,*Mon. Wea. Rev.***135****,**759–782.Lin, Y-L., , R. D. Farley, , and H. D. Orville, 1983: Bulk parameterization of the snow field in a cloud model.

,*J. Climate Appl. Meteor.***22****,**1065–1092.Mass, C. F., , D. Ovens, , K. Westrick, , and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts?

,*Bull. Amer. Meteor. Soc.***83****,**407–430.Miller, L. J., , C. G. Mohr, , and A. J. Weinheimer, 1986: The simple rectification to Cartesian space of folded radial velocities from Doppler radar sampling.

,*J. Atmos. Oceanic Technol.***3****,**162–174.Pierce, C. E., and Coauthors, 2004: The nowcasting of precipitation during Sydney 2000: An appraisal of the QPF algorithms.

,*Wea. Forecasting***19****,**7–21.Skamarock, W. C., , J. B. Klemp, , J. Dudhia, , D. O. Gill, , D. M. Barker, , W. Wang, , and J. G. Powers, 2005: A description of the Advanced Research WRF version 2. NCAR Tech. Note NCAR/TN-468+STR, 88 pp.

Stensrud, D. J., , and J. Gao, 2010: Importance of horizontally inhomogeneous environmental initial conditions to ensemble storm-scale radar data assimilation and very short-range forecasts.

,*Mon. Wea. Rev.***138****,**1250–1272.Stensrud, D. J., , H. E. Brooks, , J. Du, , M. S. Tracton, , and E. Rogers, 1999: Using ensembles for short-range forecasting.

,*Mon. Wea. Rev.***127****,**433–446.Sun, J., 2005: Initialization and numerical forecasting of a supercell storm observed during STEPS.

,*Mon. Wea. Rev.***133****,**793–813.Sun, J., , and Y. Zhang, 2008: Analysis and prediction of a squall line observed during IHOP using multiple WSR-88D observations.

,*Mon. Wea. Rev.***136****,**2364–2388.Weygandt, S. S., , A. Shapiro, , and K. K. Droegemeier, 2002: Retrieval of model initial fields from single-Doppler observations of a supercell thunderstorm. Part II: Thermodynamic retrieval and numerical prediction.

,*Mon. Wea. Rev.***130****,**454–476.Whitaker, J. S., , and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130****,**1913–1924.Wilks, D. S., 2006:

*Statistical Methods in the Atmospheric Sciences*. 2nd ed. Academic Press, 467 pp.Wilson, J. W., , N. A. Crook, , C. K. Mueller, , J. Sun, , and M. Dixon, 1998: Nowcasting thunderstorms: A status report.

,*Bull. Amer. Meteor. Soc.***79****,**2079–2099.Wilson, J. W., , E. E. Ebert, , T. R. Saxen, , R. D. Roberts, , C. K. Mueller, , M. Sleigh, , C. E. Pierce, , and A. Seed, 2004: Sydney 2000 forecast demonstration project: Convective storm nowcasting.

,*Wea. Forecasting***19****,**131–150.Xiao, Q., , and J. Sun, 2007: Multiple-radar data assimilation and short-range quantitative precipitation forecasting of a squall line observed during IHOP_2002.

,*Mon. Wea. Rev.***135****,**3381–3404.Zhang, F., , Y. Weng, , J. A. Sippel, , Z. Meng, , and C. H. Bishop, 2009: Cloud-resolving hurricane initialization and prediction through assimilation of Doppler radar observations with an ensemble Kalman filter.

,*Mon. Wea. Rev.***137****,**2105–2125.Zhao, Q., , J. Cook, , Q. Xu, , and P. R. Harasti, 2006: Using radar wind observations to improve mesoscale numerical weather prediction.

,*Wea. Forecasting***21****,**502–522.

Summary of experimental setup.

Summary of diagnostic statistics used in the study.

^{}

* The National Center for Atmospheric Research is sponsored by the National Science Foundation.