## 1. Introduction

The accuracy of numerical weather prediction (NWP) is subject to two factors: errors in the initial conditions and deficiencies of the NWP model. A considerable amount of research has focused on developing more advanced techniques to minimize the errors in the initial conditions (Le Dimet and Talagrand 1986; Courtier and Talagrand 1987; Evensen 1994, 2003; Evensen and van Leeuwen 1996; Burgers et al. 1998; Houtekamer and Mitchell 1998; Anderson 2001; Bishop et al. 2001; Whitaker and Hamill 2002; Tippett et al. 2003; Gao and Xue 2008; Liu et al. 2007). Among these studies, the ensemble Kalman filter (EnKF) techniques are thought to be attractive because of their ability to make effective use of prediction models and to deal with complex and highly nonlinear processes in the assimilation process. Previous studies using the EnKF method have achieved encouraging levels of success for applications at large scales through the convective scale (e.g., Houtekamer et al. 2005; Whitaker et al. 2004; Snyder and Zhang 2003; Tong and Xue 2005, hereinafter TX05; Xue et al. 2006, hereinafter XTD06).

On the other hand, the deficiencies in the NWP models, which are commonly and broadly referred to as model error, have received less attention until recently, because the characteristics of model error are little known and its statistical properties are poorly understood (Dee 1995; Houtekamer et al. 2005). Model error can arise from many sources such as insufficient resolution in time and/or space, misrepresentation of the physical and subgrid-scale processes, and the use of nonphysical model boundaries and/or external forcing.

Certain EnKF studies have shown that model error can dominate the error growth in data assimilation cycles and must be parameterized to prevent the filter from diverging from its true state (Houtekamer et al. 2005). One way to account for model error within the EnKF system is to add the so-called additive error to the model state by assuming an error model (Lawson and Hansen 2005). Houtekamer et al. (2005) used additive errors by assuming a model error covariance that has the same functional form as the forecast error covariance used in a three-dimensional variational data assimilation (3DVAR) system. Their experiments using a global model showed that the added model errors increased the ensemble spread to the level of ensemble mean error. Hamill and Whitaker (2005) performed several experiments to account for the model error due to unresolved scales using a global spectral model. They compared the two most popular methods for parameterizing model error–covariance inflation and additive error models. Additive error was randomly sampled from the time series of the difference between two runs at different resolutions. Their results performed at the global scale show that the additive error model outperformed the covariance inflation method and produced more accurate analyses. The ability of the additive error approach in increasing the space spanned by the existing ensemble perturbations is an advantage but the added errors are usually flow independent and therefore inconsistent with the actual flow.

Difficulties can arise when we attempt to apply these methods to the convective scale, where model error is very flow and situation dependent. For this reason, the estimation of tunable model parameters, which often have a profound impact on the forecast, using the data assimilation scheme appears to be an attractive alternative or addition to the aforementioned methods for dealing with convective-scale model error. Early work using adjoint-based parameter estimation can be found in fields such as hydrology, which solves the problem of aquifer identification (e.g., Yakowitz and Duckstein 1980). In meteorology, such studies include the estimation of nudging coefficients using the four-dimensional variational data assimilation (4DVAR) method (Zou et al. 1992), statistical model error parameters using a maximum-likelihood method (Dee and da Silva 1999), and the estimation of a wind stress coefficient using the extended Kalman filter method (Hao and Ghil 1995). The relative importance of optimal parameter values versus optimal initial conditions of state is discussed by Zhu and Navon (1999) using a 4DVAR system of a full-physics global spectral model. Their results show that the impacts of optimal parameters on the forecast persist even after the impacts of the optimal initial conditions have been lost. A comprehensive review of parameter estimation studies in meteorology and oceanography up to the mid-1990s can be found in Navon (1997).

Anderson (2001) proposed using EnKF for the simultaneous estimation of parameters and state. Several studies have since shown that EnKF is capable of successfully estimating parameters through the data assimilation process and may therefore help improve the subsequent forecast (Annan et al. 2005a,b; Annan and Hargreaves 2004; Hacker and Snyder 2005; Aksoy et al. 2006a,b). More recently, Tong and Xue (2008a,b, hereinafter TX08a and TX08b, respectively) applied the EnKF method to the estimation of fundamental microphysical parameters in a storm-scale model. In TX08a, parameter identifiability is addressed through an investigation of correlation fields and a detailed sensitivity analysis. TX08b performed simultaneous estimation of up to five microphysical parameters using simulated radar data and found, as in Aksoy et al. (2006b), that a single imperfect parameter can be successfully estimated while the accuracy of estimation declines as the number of error-containing parameters increases. Another common conclusion of both studies is that the parameter estimation is beneficial in reducing errors in both estimated parameters and state. The studies also indicate that the parameter estimates are sensitive to the filter configuration and significant nonlinearities exist between model parameters and state variables, so that an attempt to improve one parameter may influence the estimates of other parameters.

The matter of simultaneous parameter and state estimation is further complicated when the very same parameters to be estimated are involved in the forward observation operators that link the model state to the observations. In past studies, either the parameters to be estimated were not involved in the observation operators, or the observation operators were assumed to be perfect. In the case of radar reflectivity–related observations, the model microphysical parameters also appear in the observation operators. While TX08b estimates the microphysical particle size distribution (PSD) parameters from simulated reflectivity data, the observation operators were assumed to be perfect (i.e., correct parameter values were used in the operators). In that study, difficulties were encountered when estimating multiple PSD parameters and this arose from the fact that the responses to errors in different parameters compensate each other in terms of the observed radar reflectivity, causing solution nonuniqueness. This result suggests that additional constraints provided by polarimetric radar measurements may help improve the well posedness of the problem (Jung et al. 2008a,b, hereinafter JZX08 and JXZS08, respectively). JXZS08 showed the positive impacts of directly assimilating simulated polarimetric variables on state estimation in a perfect-model scenario.

In this paper, extending the earlier studies of TX08a and TX08b that performed simultaneous PSD parameter and state estimation from reflectivity only and assuming perfect observation operators, and the studies of JZX08 and JXZS08 that assimilated simulated polarimetric radar data with a perfect model, we perform simultaneous state and parameter estimation from simulated polarimetric radar data whose observation operators also contain PSD parameter error. We attempt to quantitatively assess how additional polarimetric data can improve the parameter and state estimation using the EnKF approach. The forecast model, the EnKF assimilation system, and the design of the observing system simulation experiments (OSSEs) are first described in section 2, which also includes a discussion of the characteristics of the parameters to be estimated. Section 3 discusses the results of the sensitivity analysis and section 4 examines the impacts of polarimetric radar data on the parameter and state estimation. Throughout this paper, only simulated radar data are used. A summary and conclusions are given in section 5.

## 2. Model and experimental design

### a. Forecast model and filter configuration

Similar to the OSSE studies of TX08a,b, JZX08, and JXZS08, a truth simulation is created using the Advanced Regional Prediction System (ARPS; Xue et al. 2000, 2001, 2003) for a supercell storm. ARPS is a fully compressible and nonhydrostatic atmospheric prediction model; its prognostic variables include three velocity components (*u*, *υ*, and *w*), the potential temperature (*θ*), the pressure (*p*), and the mixing ratios of water vapor, cloud water, rainwater, cloud ice, snow aggregate, and hail (*q _{υ}*,

*q*,

_{c}*q*,

_{r}*q*,

_{i}*q*, and

_{s}*q*, respectively) with the Lin et al. (1983, hereinafter LFO83) ice microphysics scheme. The turbulence kinetic energy is another prognostic variable used by the 1.5-order subgrid-scale turbulence closure scheme. The ARPS model is also used for the sensitivity analysis and in the state and parameter estimation.

_{h}The configurations of the forecast model and assimilation system used here are very similar to those used in Tong and Xue (2005, 2008a,b), except for one major modification: the forward observation operator for reflectivity uses the approach developed in JZX08 instead. The capabilities of assimilating polarimetric data were developed in JZX08 and JXZS08, although the data are used for parameter estimation here. The size of the ensemble is 80 and no covariance inflation is applied. The effects of terminal velocity are assumed to have been removed from the radial velocity data in this study.

The sounding of the 20 May 1977 Del City, Oklahoma, supercell storm (Ray et al. 1981) is used for the truth storm simulation. The CAPE for this sounding is 3300 J kg^{−1}. The grid spacing is set to 2 km horizontally and 0.5 km vertically. The dimension of the model domain is 64 × 64 × 16 km^{3} and a virtual polarimetric Weather Surveillance Radar-1988 Doppler (WSR-88D) is located at the southwest corner of the domain. The storm is initiated by a 4-K ellipsoidal thermal bubble with a 10-km horizontal radius and a 1.5-km vertical radius centered at *x* = 48 km, *y* = 16 km, and *z* = 1.4 km. The time step for the model integration is 6 s with 3 s for the acoustically active model equation terms. These configurations are essentially the same as those used in TX05, TX08a, and JZX08.

The ensemble square root filter (EnSRF) proposed by Whitaker and Hamill (2002) is employed, in which the observations are serially assimilated. With this EnSRF, all observations are assumed to be uncorrelated. This appears to be reasonable as JXZS08 has shown that the error correlation between properly simulated *Z _{H}* and polarimetric data is insignificant. More detailed information on the filter implementation can be found in XTD06 and TX08a.

Following TX08a, and TX08b, spatially smoothed stochastic perturbations with standard deviations of 2 m s^{−1} for the velocity components (*u*, *υ*, and *w*), 2 K for the potential temperature (*θ*), and 0.6 g kg^{−1} for the mixing ratios of the hydrometeors (*q _{υ}*,

*q*,

_{c}*q*,

_{r}*q*,

_{i}*q*, and

_{s}*q*) are added to the initially horizontally homogeneous first guess defined by the Del City sounding to initialize the ensemble members at

_{h}*t*= 20 min of model time. The perturbations are added at the grid points located within 6 km horizontally and 3 km vertically of the observed reflectivity. As in the previous studies of TX08a,b, the pressure is not perturbed. The covariance localization radius is set to 6 km.

An 80-min assimilation window is used with the first analysis at 25 min of model time and the last at 100 min. Radar volume scan data are available and assimilated every 5 min. Reflectivity data from the entire domain, including the nonprecipitating regions, are assimilated and used to update all of the state variables while the radial velocity data, from regions where the reflectivity is greater than 10 dB*Z*, are used to update the wind variables (*u*, *υ*, and *w*) only. It is found in our experiments that updating the thermodynamic and microphysical variables using radial velocity does not further improve the analysis, and this was mainly due to some degradation during the earlier assimilation cycles when reliable covariance between those variables and the radial velocity was not established. While for the same reason, the covariance between the reflectivity and the velocity variables would also be unreliable during the early cycles, the degradation effects using reflectivity to update the velocity variables were found to be smaller. Given the difficulty in determining the optimal delay time for the cross-updating, we chose to use the current simpler settings.

### b. Simulation of observations

Detailed information on the forward observation operators that link model state variables with the polarimetric radar variables can be found in JZX08; these operators are used to generate error-free observations. The error models described in Xue et al. (2007) and JXZS08 are used to generate simulated observation errors with slightly different error statistics. In this study, we assume that a basic quality control process has been applied to the observations prior to the assimilation. The effect is achieved by limiting the modeled reflectivity error samples to within 5 times their standard deviation, which corresponds to 10 dB*Z* (larger error samples are dropped). To accommodate this change while keeping the error standard deviations (SDs) at a level similar to that in JXZS08, the correlated and uncorrelated parts of the error for the reflectivity are increased to 40% and 2.7% of the truth reflectivity, respectively. The resultant error distribution is similar to that in Xue et al. (2007, solid line in their Fig. 1) except for a shorter tail on the negative end (not shown). Therefore, the effective error SDs of the simulated observations are 1 m s^{−1} for *V _{r}*, about 2 dB

*Z*for the reflectivity at the horizontal polarization (

*Z*), close to 0.2 dB for the differential reflectivity (

_{H}*Z*

_{DR}), and 0.5° km

^{−1}for the specific differential phase (

*K*

_{DP}). The same SDs are specified in the filter for the corresponding observations. The reflectivity difference

*Z*

_{DP}is not examined here since it exhibits the highest correlation to

*Z*among the polarimetric variables (JXZS08); hence, is believed to contain the least independent information.

_{H}### c. Parameters to estimate

The LFO83 scheme used in the ARPS model is a single-moment five-class (cloud water, cloud ice, rainwater, snow, and hail) bulk microphysics scheme, in which the PSD is described by an exponential function with a fixed-intercept parameter. The water amount of the hydrometeors in each category is represented by the corresponding mixing ratio, and it changes through interactions with the other categories. Such interactions include condensation or deposition, collection, breakup, freezing, evaporation or sublimation, melting, and precipitation sedimentation. PSD-related parameters including the bulk density and intercept parameter of the PSD of each category explicitly appear in the equations for the microphysical processes and can greatly influence the magnitude and relative importance of those processes. Briefly, the intercept parameter is the product of the total number concentration and the slope parameter of the exponential distribution [see Eqs. (1)–(6) in LFO83]. Significant uncertainties exist because these parameters, which vary significantly both in time and space in nature, are usually predefined as constants in single-moment microphysics schemes. TX08a demonstrated through sensitivity analysis that the errors in the intercept parameters and the bulk densities considerably influence the storm evolution. In this study, the same set of parameters is selected for parameter estimation under the assumption of imperfect observation operators; these parameters are the intercept parameters for rain (*n*_{0R}), snow (*n*_{0S}), and hail (*n*_{0H}), and the bulk densities for snow (*ρ _{S}*) and hail (

*ρ*).

_{H}### d. Parameter estimation procedure

The parameters to be estimated are given (incorrect) first-guess values at the beginning of the assimilation cycles; they are then perturbed for each of the ensemble members to form an ensemble of parameter values. Their values are updated during the EnKF assimilation cycles. The update of these parameters in the early assimilation cycles when the errors in the estimated state are still very large is found to hurt rather than help parameter estimation; the estimated parameter values easily drift away from the truth, because the covariance between the parameters and the observations at this early stage is very unreliable. Since the success of the parameter estimation and the convergence rate depend on the filter performance of the previous assimilation cycles and the error is cumulative, larger error in the early cycles can significantly slow down the parameter estimation process (TX08a). As the error in the estimated state can usually be significantly reduced in the first two to three cycles, we delay the parameter estimation until 40 min of model time has elapsed or the time of the fourth EnKF analysis. During the assimilation period, parameter values estimated in the previous assimilation cycle are used in the forecast model as well as the observation operators of the following cycle. To prevent the collapse of the parameter variance because of the lack of dynamic error growth in the parameters, a covariance inflation procedure following Aksoy et al. (2006b) and TX08b is applied, which restores the parameter spread to a predefined minimum value after each analysis cycle, when the prior parameter spread is smaller than this. For the logarithmically transformed intercept parameters, this predefined minimum spread is set to 1 m^{−4}; for logarithmically transformed snow and hail densities, it is set to 0.5 kg m^{−3}.

### e. Design of parameter estimation experiments

We first perform five sets of single-parameter estimation experiments that examine the capability of the EnKF when only a single parameter contains error. We then perform a set of experiments in which five parameters are unknown. However, our main focus is on the improvement that can be obtained by using additional polarimetric data. Following TX08b, the radial velocity is not used in the parameter estimation due to its small response to the change in parameter values as well as the fact that it is not a direct function of hydrometeors. The radial velocity data are used for state estimation, however.

In the single-parameter estimation experiments, one of the five parameters starts with an incorrect first-guess value while in the five-parameter experiments all five parameters start out incorrectly. In the experiments where the parameter error is involved in the observation operators, the forecast and the analysis trajectory are found to be very sensitive to the initial perturbations of the parameters. To increase the robustness of our estimation, we perform five parallel experiments that only differ in the sampling of the initial parameter perturbations; the same was also done in Aksoy et al. (2006b).

As in TX08b, we sample the random perturbations in the log domain [with 10 log(*x*) transform], which avoids negative values of the intercept parameters and bulk densities. With this procedure, unrealistically small or large parameter values can occur occasionally, causing forecast instability. Such experiments were rerun using reduced large and small time step sizes of 2 and 0.5 s, respectively. Table 1 lists the true and first-guess values of the parameters. Because the Gaussian random perturbations are sampled in the log space, the ensemble mean of the parameters after they have been converted back to their original space is usually not the same as the ensemble mean in the log space. As in TX08b, the parameter estimation is performed in the log space of the parameters while the ensemble prediction uses their values in the original scale.

Within the first few cycles, when the error covariance is still poor, the errors in the estimated parameters often grow to be so large as to prevent successful estimation in later cycles and can cause instability in the model integration. To avoid this problem, we constrain the parameters within their respective lower and upper bounds, which are the same bounds used in the sensitivity experiments (see Table 1).

A data selection procedure developed by TX08b is used here. At each analysis time, 30 observations are chosen based on the correlation between the estimated parameter and the prior estimate (model version) of the *Z _{H}*,

*Z*

_{DR}, and

*K*

_{DP}observations, when only one of the observation types is used for parameter estimation. When more than one of the observation types is used, 15 observations from each dataset are chosen based on their correlation. Therefore, 30 total observations are used when one or two data types are used and 45 observations are employed when all three types are used in the parameter estimation. For polarimetric variables, data thresholding is found to be necessary, as in JXZS08. For

*Z*

_{DR}and

*K*

_{DP}, the thresholds are 0.05 dB and 0.05° km

^{−1}, respectively; data values lower than the thresholds are discarded. These are lower than those used in JXZS08 to allow for the use of more observations. Even though the data subjected to smaller thresholds tend to be noisier, the information they contain on the microphysics, especially in regions where polarimetric signatures are weak, can still be helpful.

## 3. Sensitivity analysis

### a. Response function

Before we performed the parameter estimation, we first carried out a set of sensitivity experiments to examine if the model output, in the form of polarimetric variables, was sensitive to the PSD parameters to be estimated. This issue is ultimately related to the identifiability of each parameter with given observations (Yakowitz and Duckstein 1980; TX08a).

Table 1 lists the uncertainty ranges and initial guesses used in our sensitivity and parameter estimation experiments, respectively; these values were also used in TX08a,b. These choices are based on observed ranges of values although they are not necessarily all-encompassing (Joss and Waldvogel 1969; Houze et al. 1979; Mitchell 1988; Gunn and Marshall 1958; Gilmore et al. 2004; Pruppacher and Klett 1978; Brandes et al. 2007).

*J*, as defined in TX08a:where

*p*denotes the parameter and the superscript

*s*is either

*w*for an incorrect value or

*t*for a true value. With

*p*, the correct parameter value is used in the observation operator. Here,

^{t}*y*denotes the

_{m}^{o}*m*th observation (limited to regions where the reflectivity is greater than 0 dB

*Z*) and

*y*(

_{m}*p*) is a prior estimate based on the model forecast. The observations consist of

*Z*,

_{H}*Z*

_{DR}, and/or

*K*

_{DP};

*σ*is the SD of the observation error.

_{y}The response functions for each type of observation are averaged over the 16 cycles for each incorrect value of a given PSD parameter. Since we are interested in the change in the model response to the error in the parameter, we compute the response function difference (RFD), RFD^{s} = * ^{t}* is the same as Δ

*J*in TX08a, where the true parameter value is used in the response function calculation. Here,

_{y}*J*

_{y,c}(

*p*) is the response function calculated from the forecasts of the control experiment with the truth parameter value.

^{t}The difference between RFD* ^{t}* and RFD

*presents some hints about the amount of error that can be attributed to the error in the observation operator. In the Kalman filter update equation,*

^{w}**x**

*=*

^{a}**x**

*+*

^{b}**K**[

**y**−

*H*(

**x**

*)], the amount of correction made to the analysis background is proportional to observation innovation*

^{b}**y**−

*H*(

**x**

*), which is the quantity in the square brackets in Eq. (1). When the observation operator*

^{b}*H*involves error, the amount of correction can be over- or underestimated, leading to additional errors in the final estimate. Here,

**x**

*is the background state vector (usually forecast from previous cycle),*

^{b}**x**

*is the analyzed state vector, and*

^{a}**K**is the Kalman gain. Therefore, RFD

*represents the total root-mean-square (RMS) difference between the forecast and the observations (relative to the total RMS difference between the forecast and the observations in the control experiment) if the forecast is projected into the observations without error while RFD*

^{t}*represents the total RMS difference the filter would see in the presence of both forecast and observation operator error. When RFD*

^{w}*is larger than RFD*

^{w}*, the observation operator error acts to amplify the total error when measured against the particular observation.*

^{t}Another practical significance of the sensitivity analysis is its ability to rank the relative importance of model parameters so that more important ones can be chosen for estimation. A higher sensitivity implies that the parameter in question has a greater impact on the forecast than that with a smaller sensitivity (Navon 1997).

### b. Results of sensitivity experiments

Before discussing the response function, we first examine the sensitivity of the simulated radar measurements to the parameters. For illustration purpose, we sample the observation points near the convective core at three difference levels (0.4, 2.1, and 10.7 km). The *Z _{H}*,

*Z*

_{DR}, and

*K*

_{DP}values calculated using the default and the upper and lower bound value of the PSD parameters are listed in Table 2. Radar measurements are sensitive to

*n*

_{0R}more than to

*n*

_{0H}and

*n*

_{0S}considering the relatively narrow uncertainty range as compared with those of

*n*

_{0H}and

*n*

_{0S}. In addition,

*Z*changes more than 10 dB

_{H}*Z*when

*n*

_{0R}increases from 3 × 10

^{6}to 8 × 10

^{7}m

^{−4}, while

*Z*

_{DR}and

*K*

_{DP}decrease almost by 2 dB and 2° km

^{−1}, respectively. The differences should be large when the rainwater mixing ratio is larger than that sampled here.

Generally, *Z _{H}* is more sensitive to the PSD parameters of snow and hail than are either

*Z*

_{DR}or

*K*

_{DP}(Table 2). In our sensitivity experiments,

*n*

_{0H}ranges from 4 × 10

^{2}to 4 × 10

^{4}m

^{−4}. The corresponding difference in

*Z*is nearly 30 dB

_{H}*Z*, while

*Z*

_{DR}varies by only about 0.2 dB for the 2.1-km sample. While

*K*

_{DP}is insensitive to

*n*

_{0H}, it varies between 2.06 and 1.45° km

^{−1}for the uncertainty range of

*ρ*given in Table 2. When only dry snow and dry hail coexist at 10.7-km altitude,

_{H}*Z*

_{DR}changes very little with respect to the change in

*n*

_{0S}while the change in

*Z*is larger than 10 dB

_{H}*Z*. However, these values can vary in wider ranges depending on the absolute and relative amounts of hydrometeors at each location.

From a response function point of view, a necessary condition for a parameter to be identifiable is that it has a unique minimum within its bounds and the response function has to be sensitive to the parameter (TX08a). To investigate the parameter identifiability with polarimetric radar data, we plot RFD* ^{t}* and RFD

*against the deviation of the parameter values from their truth in Fig. 1.*

^{w}With respect to reflectivity observations, both the RFD* ^{t}* and RFD

*curves are concave with their minima located at or near the zero deviation points of individual parameters (Figs. 1a and 1b); it is therefore very likely that the truth value can be found by using reflectivity observations when only one of the parameters has error. For*

^{w}*Z*

_{DR}, the RFD w.r.t.

*n*

_{0S}exhibits very small sensitivity for positive deviations, indicating potential difficulty in estimating

*n*

_{0S}in that range. The RFDs of

*n*

_{0R}and

*n*

_{0H}have clear concave shapes with their minima at zero deviation (Figs. 1c and 1d), while the bulk densities,

*ρ*and

_{S}*ρ*, show rather small sensitivities. The RFDs w.r.t.

_{H}*K*

_{DP}are even smaller (Figs. 1d and 1f) for all parameters and no unique minimum is apparent for

*n*

_{0S}and

*ρ*due to the lack of sensitivity w.r.t. to positive deviations.

_{S}The parameter identification problem is more complex in the presence of observation operator errors. When the PSD parameters are involved in the observation operators, incorrect parameter values result in under- or overcorrection to the parameter, which can lead to larger analysis errors. In other words, a large difference between RFD* ^{w}* and RFD

*indicates a large impact of the parameter error through the observation operator. Generally, these differences are moderate for moderate sensitivity and very small when the overall sensitivity is small, but they can be very large when the total sensitivity is large (e.g.,*

^{t}*n*

_{0H}and

*ρ*for

_{S}*Z*; see Fig. 1).

_{H}The problem becomes even more complicated when multiple observation datasets are used due to complex nonlinear interactions within the filter. For example, the RFD for *K*_{DP} might be too small for successful estimation of *n*_{0H} while the estimation of *n*_{0H} using *Z _{H}* might also be challenging due to the large difference between RFD

*and RFD*

^{w}*. However, when*

^{t}*K*

_{DP}and

*Z*are used together, the estimation can be successful, as we will see in section 4. While the sensitivity results are not sufficient to determine if certain parameters can be estimated successfully, they can still provide useful guidance for interpreting the estimation results.

_{H}## 4. Results of parameter estimation

### a. Results of single-parameter estimation

*p*

_{i,k}is the ensemble mean of the

*i*th parameter in linear space for

*k*th experiment out of a total of

*N*.

Figure 2 show the NAEs of estimated parameters from single-parameter experiments. These errors are averaged over five parallel experiments that start from three different initial guesses. The experiment names are made up of the parameter name, and the coefficient and exponent of the initial guess of the intercept parameter or the first two digits of the bulk density shown in Table 1. Observations used in the parameter estimation are indicated after an underscore (_). For example, experiment N0r36_ZhKdp estimates *n*_{0R} from an initial guess of 3 × 10^{6} m^{−4} using both *Z _{H}* and

*K*

_{DP}data. In most cases, the reflectivity data alone can reduce the initial parameter errors (thick solid gray) but the results are not as good as those of TX08b obtained with perfect observation operators. As observed in TX08b, the parameter value can depart far from the truth in the first one or two cycles (e.g., Figs. 2a, 2b, 2d, 2e, 2g, 2h, and 2k) and oscillates (around its truth values in log space). The error in the final estimate is larger than the initial error in such experiments as N0h43, N0h45, and Rhos05. Generally, an increase in the NAE is observed in the later cycles of the intercept parameters (e.g., Figs. 2a–e) while the bulk densities converge to their truth values (except for Rhos05). These results are quite different from those of TX08b, where all parameters eventually converge to their truth values in their single-parameter experiments that use only reflectivity data.

Figure 2 shows that the estimation of the intercept parameters is generally improved when *K*_{DP} is used in addition to *Z _{H}* (solid black curves in Fig. 2). For

*n*

_{0R}, the NAEs stay lower than those of experiments using

*Z*alone (thick solid gray) at most times (Figs. 2a–c). Figure 3 shows the ensemble mean analysis RMSEs of the state variables from experiments N0r_Zh (thick solid gray), N0r_ZhKdp (solid black), and N0r_Zdr (dashed black). They are averaged over 15 experiments that start from three initial guesses (corresponding to Figs. 2a–c), with each initial guesses having five parallel experiments with different initial ensemble parameter perturbations. In this case, the benefits of

_{H}*K*

_{DP}to the estimation of state are rather small because the state obtained with

*Z*alone is already rather good. The overall RMSE levels of the state are lower than those in Figs. 4 and 5, which are for experiments estimating

_{H}*n*

_{0S}and

*n*

_{0H}, respectively. In Figs. 2d–f, the NAE of

*n*

_{0S}experiences a clear reduction in the later cycles when

*K*

_{DP}is used in addition to

*Z*, and the estimated

_{H}*q*(Fig. 4e) and

_{s}*q*(not shown) are improved in response. The positive impacts of

_{i}*K*

_{DP}on the estimation of

*n*

_{0H}may not be apparent from Fig. 2. However, significant improvement is obtained in the state estimation (Fig. 5). It is believed that the smaller variability of the NAEs during the assimilation cycles (Figs. 2g–i), and the significantly smaller NAEs compared to that of N0h45_Zh (Fig. 2h), contribute to the large improvement in the analysis of the state. Additional use of

*K*

_{DP}in the estimation of bulk densities yields slightly smaller errors in the parameter estimation but exhibits little impact on the state estimation (not shown).

The largest benefit of the polarimetric data is obtained in the estimation of *n*_{0H} when *Z*_{DR} is used alone without reflectivity data in the parameter estimation. The NAEs exhibit a steady trend of reduction in general with the exception of the large deviation found in the early assimilation cycles in N0h43_Zdr (black dashed) while the NAEs of N0h_Zh (thick solid gray) show large oscillations with time (Figs. 2g–i). The estimation of all of the state variables, including microphysical variables as well as dynamic and thermodynamic variables is significantly improved as the parameter estimation improves (Fig. 5). However, the use of *Z*_{DR} alone in the parameter estimation has a negative impact on both the state and parameter estimation for the other four parameters (Figs. 2, 3d, and 4).

The reason why *Z*_{DR} outperforms *Z _{H}* in the estimation of

*n*

_{0H}may be explained by the sensitivity analysis. In section 3, it is found that the difference between

*Z*

_{DR}have similar shapes and magnitudes (solid lines with squares in Figs. 1c and 1d). As discussed in section 3, the amount of correction made to the forecast is proportional to the difference between the observations and the forecast projected to the observation space using the observation operator. Therefore, a large (RFD

*− RFD*

^{w}*) implies that the analysis may deteriorate due to the large uncertainty in the observation operators and hence in the observed quantities themselves.*

^{t}Similar to TX08a, we examine the error correlations to help us understand the filter behavior for the parameter estimation of *n*_{0H}. This is because the adjustment to the parameters is accomplished based on error covariance, the dimensional version of the correlation in the filter. Figure 6 shows the time series of the correlation coefficient between parameter *n*_{0H} and the prior estimates of *Z _{H}* for one of the five parallel experiments named N0h46_Zh (dotted), and that between

*n*

_{0H}and

*Z*

_{DR}of the corresponding experiment N0h46_Zdr. The coefficients are averaged over the 30 observations used in the parameter estimation. The correlation coefficient is calculated from the parameter ensemble and the model version (prior estimate) of the observations (

*Z*or

_{H}*Z*

_{DR}) from the forecast ensemble. The correlation coefficient in experiment N0h46_Zdr keeps increasing during its early cycles and stays high during the rest of the cycles. On the other hand, the correlation coefficient in N0h46_Zh drops rapidly in the first two cycles. It bounces back in the next two cycles but oscillates during the remaining cycles and stays lower than that of N0h46_Zdr. Since nonlinear feedback exists between the parameter and state estimations during the assimilation cycles, large error in the parameter estimation due to weak correlation leads to poor state estimation and slow convergence or even parameter estimation failure.

Even though *Z*_{DR} for ice hydrometeors is independent of the intercept parameter for single hydrometeor types under the Rayleigh assumption, it, however, becomes intercept-parameter dependent when more than one species coexist and each contributes to *Z*_{DR} in different ways [see Eqs. (12)–(16) in JZX08]. We show later (in Fig. 8) that the *Z*_{DR} data selected are mostly from areas where dry snow and dry hail coexist. In the single-parameter estimation experiment for *n*_{0H}, *Z*_{DR} is determined by three free parameters (*n*_{0H}, *q _{h}*, and

*q*), and the covariance between

_{s}*n*

_{0H}and

*Z*

_{DR}can be captured by the filter. When only dry hail and dry snow coexist, the

*Z*

_{DR}value varies between two values bounded by those of dry hail and dry snow at the end depending on their relative amounts, which would make

*Z*

_{DR}a better performer than reflectivity in estimating hail-related quantities. On the other hand, increases in

*q*and

_{s}*q*change the reflectivity in the same direction, and the estimation using reflectivity would be harder. Therefore,

_{h}*Z*

_{DR}can be more effective in estimating

*n*

_{0H}.

From Figs. 7 and 8, we can see that the *Z _{H}* observations used in N0h46_Zh are clustered around a few locations (Figs. 7a,b and 8a,b) while the

*Z*

_{DR}observations used in N0h46_Zdr are scattered over a wider area (Figs. 7c,d and 8c,d). Observations from the same spatial regions of a storm are likely to carry similar information on the storm. Repeated application of the observations with similar information content tends to accelerate the reduction of the parameter spread. The covariance inflation procedure used to prevent the collapse of the spread can lead to oscillations and overadjustments (TX08b). In N0r46_Zh, the parameter spread falls to the predefined minimum SD after two cycles while it takes seven cycles in N0r46_Zdr (Fig. 9). We also notice that many of the

*Z*observations are taken from the region where at least three phases (rain, hail, and melting hail) contribute to

_{H}*Z*. At 45 min, many of the

_{H}*Z*data chosen are below 4 km, which is about the 0°C level (Fig. 7b). At 90 min they are mostly near the extended hail core region, possibly near a strong updraft (Fig. 8b). On the contrary, many of the

_{H}*Z*

_{DR}observations are taken from the region where dry hail dominates over snow (Figs. 7d and 8d). From these results, the spatial distribution of the observations used for parameter estimation appears to also affect the estimation, and this may depend on the data selection method used. The most effective data selection method deserves further study.

The mean estimated parameters in logarithmic form from the single-parameter estimation experiments are presented in Table 3, together with the true values given in parentheses. The mean values are computed from the 15 experiments with three different initial guesses for each parameter (see Table 1) and are averaged over the last five cycles. All five parameter estimates are more accurate when both *Z _{H}* and

*K*

_{DP}are used in the parameter estimation than when only

*Z*is used. In the case of

_{H}*n*

_{0H}, the best estimate is obtained using

*Z*

_{DR}data alone. The mean parameter values in a logarithmic form, averaged over five runs are 51.2, 46.7, and 48.0 for N0h43_Zdr, N0h45_Zdr, and N0h46_Zdr, respectively; they are 57.5, 55.8, and 53.8 for N0h43_Zh, N0h45_Zh, and N0h46_Zh, respectively; while the truth is 46. The

*n*

_{0H}averaged over runs with different initial guesses is 56.0 for N0h_Zh and 49.1 for N0h_Zdr (Table 3). After being converted to the linear domain, these values correspond to a factor of 6 difference; 56.0 is about 5 times larger than 49.1 in terms of their linear values. We point out that N0h_Zdr produces a more stable estimate of

*n*

_{0H}than N0h_ZhKdp because in the former the estimated parameter shows smaller spread among the experiments with different realizations (not shown) and has almost no oscillation during the assimilation cycles (see Fig. 2) even though the averaged values in Table 3 appear to be similar. As a result, the state estimation of N0h_Zdr exhibits significant improvement over that of N0h_ZhKdp.

*is the RMSE of the corresponding reference experiment without polarimetric data and*

_{c}*N*is the number of experiments averaged over. This improvement is further averaged over the last five assimilation cycles.

The percent improvements of single-parameter estimation experiments N0r_ZhKdp, N0s_ZhKdp, N0h_Zdr, Rhos_ZhKdp, and Rhoh_ZhKdp over their respective reference experiments N0r_Zh, N0s_Zh, N0h_Zh, Rhos_Zh, and Rhoh_Zh are summarized in Table 4. The improvements are rather small in the estimations of *n*_{0R}, *n*_{0S}, *ρ _{S}*, and

*ρ*because the estimations performed using

_{H}*Z*alone are already very good. As a result, a 21% improvement found in

_{H}*q*of N0r_ZhKdp may be insignificant as shown in Fig. 3e, where the RMSEs of all of the experiments are relatively low. The improvements found in

_{s}*q*(not shown) and

_{i}*q*of N0s_ZhKdp and all state variables of N0h_Zdr seem to be more significant. It is interesting that larger improvements are found in physically related variables. For example, relatively large improvements are found in

_{s}*q*and

_{i}*q*of N0h_ZhKdp while

_{s}*q*is connected to

_{i}*q*by autoconversion, accretion, and growth of Bergeron processes. A similar explanation can be applied to the improvement found in the state variables of N0h_Zdr. All microphysical variables (

_{s}*q*,

_{c}*q*,

_{r}*q*,

_{i}*q*, and

_{s}*q*) attain significant improvement by improving the

_{h}*n*

_{0H}estimate. These variables are closely linked through complex microphysical processes where large hail values are found. Therefore, improving one variable can lead to the better estimation of physically closely linked variables.

The best results for certain parameters or state variables are obtained with somewhat different combinations of polarimetric measurements. Based on our results, the combined use of *Z _{H}* and

*K*

_{DP}appears to be a good choice when estimating one of

*n*

_{0R},

*n*

_{0S},

*ρ*, or

_{S}*ρ*, while the use of

_{H}*Z*

_{DR}alone is recommended for the estimation of

*n*

_{0H}.

In our “control experiments,” the same incorrect parameter values are used in all ensemble members but are not corrected. Our recent studies have shown that introducing microphysical parameter perturbations helps when microphysics errors exist. Physics diversity has also been found to be beneficial for mesoscale EnKF analyses (Meng and Zhang 2007, 2008; Zhang and Snyder 2007). To see if introducing parameter perturbations improves the state estimation, even without parameter estimation, we repeated the control experiments with the three wrong values of *n*_{0H} (=4 × 10^{3}, 4 × 10^{5}, and 4 × 10^{6} m^{−4}; see Table 1) with the perturbations to *n*_{0H} added to these values among the ensemble members.

Contrary to our expectations, the state estimation actually deteriorates in these experiments with the parameter perturbations. We suspect that this is because the parameter errors are rather large, and adding additional perturbations further increases the errors in some members, making the overall model error too large for the filter to give a good state estimation. Nonlinearity associated with such large errors might be another reason that causes suboptimality of the filter.

To test the above hypothesis, we looked at the case of smaller parameter error, where *n*_{0H} = 2 × 10^{4} m^{−4}. In this case, perturbing *n*_{0H} actually improves the state estimation significantly (thick dashed black in Figs. 10a–d) compared with those of the no-perturbation experiment (thick solid gray). The ensemble spread of *q _{s}* and

*q*keeps growing quickly during the forecast (thin dashed black) in the perturbed-parameter experiment while the growth rate of the spread in the no-perturbation experiment decreases with time (thin dashed gray). The larger spread in the former better reflects the presence of the parameter error in the system, leading to about a 50% average improvement in the state estimation. For the small error case, parameter estimation using

_{h}*Z*

_{DR}alone (thick solid black) is found to give an additional 7.4% of improvement averaged over all variables over the perturbed-parameter experiment, with the largest improvement being 27.8% in

*q*. This additional improvement may seem rather small. However, the biggest benefit is expected during forecasts when the estimated

_{h}*n*

_{0H}is used.

### b. Results of five-parameter estimation

In this section, we examine the filter performance when five parameters are estimated simultaneously. Again, errors and estimated PSD parameters are averaged over 160 experiments, as described in section 2e (Table 1).

Figure 11 shows the NAEs of the ensemble mean estimated parameters from the five-parameter estimation experiments. Five-parameter estimation experiments reveal difficulties in estimating all five parameters simultaneously in the presence of observation operator error. The initial error level is overlaid for easier comparison (dashed gray). When *Z _{H}* is used alone (thick solid gray), the NAEs of

*n*

_{0R},

*n*

_{0S}, and

*ρ*experience rapid error growth in the first one to two cycles (Figs. 11a, 11b, and 11d, respectively). These NAEs decrease significantly in the next several cycles but increase again in later cycles. The errors of

_{S}*n*

_{0S}and

*ρ*remain above the initial error level during all assimilation cycles except for a temporary drop at 85 min for

_{S}*ρ*. This result is quite different from that of TX08b, which used perfect observation operators. In their study,

_{S}*Z*alone was able to reduce the errors in all five parameters below their initial errors most of the time.

_{H}A positive impact of the polarimetric data is observed in the estimation of *n*_{0R}, *n*_{0H}, and *ρ _{S}* during the later assimilation cycles, no matter which additional polarimetric parameter is used (Figs. 11a, 11c, and 11d). When either

*Z*

_{DR}(dotted black) or

*K*

_{DP}(dashed black) is used or when both

*Z*

_{DR}and

*K*

_{DP}are used (solid black) in addition to

*Z*in the estimation of

_{H}*n*

_{0H}, and

*ρ*, the error grows much slower after 80 min; the error, however, grows rapidly when

_{S}*Z*is used alone. The most significant positive impact of the polarimetric data is found with the estimation of

_{H}*n*

_{0H}, whose error level is significantly lower in all cases that use polarimetric data (Fig. 11c).

As in the single-parameter estimation experiments, *K*_{DP} is slightly more beneficial than *Z*_{DR} in general but *Z*_{DR} produces a better estimation of *n*_{0H} than does *K*_{DP}. Smaller errors in the estimated parameters during the assimilation cycles help improve the state estimation while smaller errors at the end should improve the subsequent forecast.

For the estimated state, the best results are obtained when both *Z*_{DR} and *K*_{DP} are used for parameter estimation (solid lines in Fig. 12). The RMSEs of experiments para5_ZhZdr and para5_ZhKdp (not shown) are slightly larger than those of para5_ZhZdrKdp but smaller than those of para5_Zh (dashed lines), with the exception of *q _{s}* because of the poor performance of para5_ZhZdrKdp in the estimation of

*n*

_{0S}. A tendency of the error to increase is found in most state variables in para5_Zh during the later assimilation cycles in response to the error increases in

*n*

_{0R},

*n*

_{0S},

*n*

_{0H}, and

*ρ*; this error increase is much weaker and the errors stay lower in para5_ZhZdrKdp in all state variables except for

_{H}*q*.

_{s}Even though the observation operator error adds an extra layer of complication to the parameter estimation, the positive impacts of parameter estimation on state estimation are clear, even with the failures in estimating *n*_{0S} and *ρ _{S}*. This is seen by comparing the state variable errors with those of the no-estimation experiments (thick solid gray in Fig. 12) where the initial “incorrect” parameter values are kept throughout the assimilation cycles. In the latter experiments, the state variable errors increase significantly after 65 min of model time, presumably because the parameter errors now dominate.

The improvement amounts of para5_ZhZdr, para5_ZhKdp, and para5_ZhZdrKdp over para5_Zh computed from Eq. (3) are summarized in Table 5. We can see in Table 5 that the improvements are larger in *w*, *q _{r}*,

*q*, and

_{υ}*q*and smaller (actually negative) in

_{h}*q*. This is in general consistent with the finding of JXZS08. The improvement due to polarimetric data is greatest (between 28% and 35%) in

_{s}*q*here, while it was greatest in

_{h}*q*in JXZS08. No negative impacts were found in any of the state variables in JXZS08. The poor performance in estimating

_{r}*q*is understandable, since polarimetric signatures related to the low-density dry snow are generally very weak.

_{s}The spatial distribution of the observations used in one of the five-parameter estimation experiments is shown in Fig. 13 as an example. As in the single-parameter estimation experiments, the *Z _{H}* observations used to estimate

*n*

_{0H}in para5_Zh are concentrated into two general areas in the precipitation region (black dots in Figs. 13a, 13c, and 13e) while the

*Z*(black dots),

_{H}*Z*

_{DR}(triangles), and

*K*

_{DP}(squares) data in para5_ZhZdrKdp (Figs. 13b, 13d, and 13f) are selected from a broader region. Interestingly, the

*Z*data are mostly selected from the lower levels,

_{H}*Z*

_{DR}mostly from the upper levels, and

*K*

_{DP}mostly from the middle levels. For example, the correlation coefficients between

*n*

_{0H}and

*Z*averaged over 30 observations used in para5_Zh range between 0.6 and 0.7, while those for

_{H}*Z*,

_{H}*Z*

_{DR}, and

*K*

_{DP}averaged over 15 observations are between 0.61 and 0.68, 0.75 and 0.81, and 0.63 and 0.72, respectively, for para5_ZhZdrKdp at 90 min. The ensemble spreads of the observations used in the parameter estimation are generally smaller than the assumed observation error. Still, parameter estimation seems to work, partly because only observations showing high correlations are used.

### c. Results of three-parameter estimations

Figure 11 shows that the errors in the estimated *n*_{0S} and *ρ _{S}* are almost always larger than their initial errors. This suggests that it may be better not to estimate

*n*

_{0S}and

*ρ*, but to keep their initial values. To test this hypothesis, we perform 10 additional experiments starting from incorrect values in all five parameters but estimating only three of them or

_{S}*n*

_{0R},

*n*

_{0H}, and

*ρ*. Two sets of initial guesses are used; they are (

_{H}*n*

_{0R},

*n*

_{0S},

*n*

_{0H},

*ρ*, and

_{S}*ρ*) = (3 × 10

_{H}^{6}m

^{−4}, 7 × 10

^{5}m

^{−4}, 4 × 10

^{5}m

^{−4}, 50 kg m

^{−3}, and 400 kg m

^{−3}) and (3 × 10

^{6}m

^{−4}, 3 × 10

^{7}m

^{−4}, 4 × 10

^{5}m

^{−4}, 300 kg m

^{−3}, and 400 kg m

^{−3}). The estimated mean parameter values and spreads computed from 10 such experiments are shown in Fig. 14. In experiments para3_ZhZdr (dotted black), para3_ZhKdp (solid black), and para3_ZhZdrKdp (dashed black), with the help of polarimetric variables, the mean

*n*

_{0H}and

*ρ*converge nicely to their truth values and exhibit a clear tendency toward rapidly decreasing in spread during the middle to later cycles. Meanwhile, the parameters in para3_Zh (thick solid gray) show large oscillations and stay away from the truth, and the spreads remain high. The

_{H}*n*

_{0R}estimation is most successful with additional

*K*

_{DP}data. The mean estimated parameter values averaged over the 10 experiments and over the last five cycles are more accurate than those of para5_Zh, when polarimetric variables are used, except for

*n*

_{0R}in para3_ZhZdr and para3_ZhZdrKdp (Table 6). Compared to experiment para5_Zh, the largest improvement by not estimating

*n*

_{0S}and

*ρ*is achieved in

_{S}*n*

_{0H}. The positive impacts of the polarimetric data are also greatest in the

*n*

_{0H}estimation. For example, the estimated

*n*

_{0H}in para5_Zh contains about 2200% error in linear space while the estimate in para3_ZhKdp contains only about 17% error; for reference, the average initial error is about 5000% of the assumed truth in linear space.

The state estimation is also improved when the parameter estimation is improved by not estimating the snow-related parameters (Fig. 15). The RMSEs of para3_Zh (black dashed) are generally smaller than those of para5_Zh (thick solid gray), except for *q _{i}*, and the RMSE differences increase with time. The percentage improvement over para5_Zh in para3_Zh averaged over 11 model state variables is 23.4%, with a largest improvement of 42% found in

*q*,

_{h}*w*,

*q*, and

_{r}*q*each experience about a 30% improvement.

_{s}The RMSEs are further reduced significantly by polarimetric data in the parameter estimation (Fig. 15). The *q _{s}* estimation is no longer hampered by the additional

*K*

_{DP}data (solid black) but rather experiences a large RMSE reduction compared to Fig. 12e. When

*Z*is used alone (black dashed), after a large reduction during the first 20 min of the assimilation cycles (not shown in the plots), the RMSEs start increasing between 40 and 70 min mostly because of the poor estimation of

_{H}*n*

_{0R}during the early cycles and the poor estimation of

*n*

_{0H}between 45 and 60 min (Figs. 14a and 14c). Because the accuracy of the estimated state as well as the estimated parameters depends on the history of the estimation, large errors in the early assimilation cycles, regardless of their source, impact the state and parameter estimation process. On the contrary, continuous error reductions throughout the assimilation cycles are seen in all state variables in para3_ZhKdp, except for

*q*(Fig. 15).

_{i}In the early cycles between 40 and 45 min, experiment para3_Zh produces a comparable estimate of *n*_{0R} but a better estimate of *n*_{0H} than does para3_ZhKdp (Figs. 14a and 14c). However, the state estimation of para3_Zh is generally poorer than that of para3_ZhKdp. This seemingly contradictory result can be explained by the compensating model responses described in TX08b. The increase in *n*_{0R} compensates for the decrease in *n*_{0H} in terms of reflectivity. When the problem is insufficiently constrained by the data, multiple solutions can exist. The use of microphysical information contained in additional polarimetric data on hydrometeor types and PSDs appears to help alleviate the solution nonuniqueness problem.

The gross improvement produced by the polarimetric data in the three-parameter estimation experiment with five incorrect parameter values can be assessed more easily by reviewing Table 7. Statistically, the overall errors in the analysis are approximately cut in half. All state variables exhibit fairly large improvements ranging from 29.9% to 66.4%. The best analysis is obtained by using *K*_{DP} data in addition to *Z _{H}*, which is consistent with the parameter estimation results shown in Fig. 14. This appears reasonable because the

*K*

_{DP}data seem to provide different information content than

*Z*since they are selected mostly from discrete regions of the storm while much of the

_{H}*Z*

_{DR}data seem to overlap

*Z*in location (Fig. 13). The combinations of hydrometeor types and dominant species vary with location within the storm. Observations selected from a specific part of a storm can be more effective in correcting the errors associated with the dominant hydrometeor types at that location. Observations taken at the same location tends to be less effective in reducing the ambiguity in the parameters although different types of observations with small correlations (hence different information content) can still be very helpful. Another interesting point is that when not estimating snow-related parameters,

_{H}*q*experiences the second largest improvement in para3_ZhZdr and para3_ZhZdrKdp and the third largest improvement in para3_ZhKdp. The exact reason for this behavior is difficult to ascertain. It could be due to still-present nonuniqueness problems and/or strong nonlinearity of the system. Because the estimation of the snow-related parameter further increases its error, not estimating it at all yields better overall results.

_{s}Finally, in the three- and five-parameter estimation experiments, when the polarimetric data are used alone, individually or together, without *Z _{H}*, the estimated states are generally not as good as those using

*Z*alone. These results are not presented here.

_{H}## 5. Summary and conclusions

We have investigated the impacts of additional polarimetric data on correcting errors in PSD-related fundamental parameters in a model microphysics scheme through observing system simulation experiments. Such errors also affect the observation operators of all radar observations except radial velocity (in our case at least where reflectivity weighting for radial velocity is ignored). These parameters, namely, the intercept parameters of rain (*n*_{0R}), snow (*n*_{0S}), and hail (*n*_{0H}), and the bulk densities of snow (*ρ _{S}*) and hail (

*ρ*), are estimated, individually or all together, simultaneously with the model state using a sequential ensemble square root Kalman filter. The polarimetric data considered include the differential reflectivity

_{H}*Z*

_{DR}and specific differential phase

*K*

_{DP}. To obtain more robust results, single-, five-, and three-parameter estimations are repeated with different initial guesses and different initial ensemble perturbations for each parameter, and the mean and standard deviation statistics are computed and compared. Compared to the earlier parameter estimation work of TX08b, this study includes the effects of observation operator error and examines the impacts of additional polarimetric data. In JXZS08 the impacts of simulated polarimetric data are examined in the absence of any parameter error. Based on the authors’ knowledge, no previous parameter estimation study has addressed the issue of parameter error within the observation operators.

Generally, the reflectivity, *Z _{H}*, observations alone can effectively reduce the error in

*n*

_{0R},

*n*

_{0S},

*ρ*, and

_{S}*ρ*when only one parameter contains error, even in the presence of observation operator error; they, however, perform poorly when estimating

_{H}*n*

_{0H}. The

*K*

_{DP}, in addition to

*Z*, is found to help further reduce the errors in the intercept parameters and improve the state estimation through improved parameter estimation. Adding

_{H}*K*

_{DP}has almost no impact on the estimation of snow and hail densities and their related state variables, because the estimation with reflectivity alone is already very successful. The best estimation of

*n*

_{0H}is obtained when

*Z*

_{DR}is used alone (for parameter estimation) while its estimation using

*K*

_{DP}and

*Z*is also better than that using

_{H}*Z*alone.

_{H}Our results reveal some difficulties in simultaneously estimating all five parameters that contain error. Unlike TX08b, who assumes perfect observation operators, our five-parameter estimation experiments show that the errors in *n*_{0S} and *ρ _{S}* are increased during the assimilation cycles by the parameter estimation to above their initial levels with or without using polarimetric data (for parameter estimation). However, the positive impacts of polarimetric data on the state estimation are clear when

*Z*

_{DR}or

*K*

_{DP}, or both

*Z*

_{DR}and

*K*

_{DP}, are used along with

*Z*in the parameter estimation. When all five parameters contain initial errors, both the parameter and state estimations are improved when

_{H}*n*

_{0R},

*n*

_{0H}, and

*ρ*are estimated without

_{H}*n*

_{0S}and

*ρ*. Moreover, the positive impacts of polarimetric data are further increased compared to the case when all five parameters are estimated. This behavior can be understood from the fact that the polarimetric signature of snow is very weak and the sensitivity of the polarimetric measurement to the corresponding parameters is also small.

_{S}Since it is suggested by previous studies (Aksoy et al. 2006b, TX08b) that a larger ensemble size leads to better parameter estimation, we performed additional five-parameter estimation experiments with a doubled ensemble size of 160. When compared to their 80-member counterparts, the estimated states are improved in general except for experiment para5_ZhKdp, which shows comparable results in a statistical sense. Some of the parameter estimations, however, experience deterioration in some experiments, while larger improvements in other parameters seem to more than compensate for the negative effects of these parameters on the state estimation.

We point out that the accuracies of the state and PSD parameters estimated through the EnKF system may differ when different polarimetric measurements are used. Certain combinations of polarimetric measurements may yield a better-estimated state but with less accurate parameter values than other combinations. This variability also exists among the state variables and estimated parameters. A better understanding of the combined impacts can help optimize the assimilation (or estimation) system although in practice nonlinear interactions in the model, which are abundant in the complex microphysical processes, can make it difficult to delineate the effects of one source of input data or parameter value on another. While the sensitivity studies performed here and in TX08a,b are helpful, more effective approaches may be needed to further improve our understanding.

In this study, simulated polarimetric variables are used in parameter estimation but not in state estimation. It is shown in JXZS08 that the impacts of polarimetric data are rather small when the state estimation is already very good with conventional radar data. In such a case, updating the state using polarimetric data only increases the computational cost. The use of additional polarimetric data for parameter estimation seems to be most beneficial and it also provides an indirect positive impact on the state estimation.

It is suggested that limitations of the current data selection method may partially be responsible for the poorer performance of the reflectivity-alone experiments. Reflectivity observations tend to be selected from a few clustered locations while polarimetric variables are scattered in a wider area so that the microphysical information can be provided by different combinations and/or compositions of hydrometeors. Imposing a minimum distance between observations or a data-thinning process may help alleviate the data-clustering problem.

In this work, the covariance inflation for the state is not applied following TX08a, who reported that the difference in the analysis RMS errors induced by covariance inflation is smaller than that caused by different realizations of the initial ensemble. We performed extra experiments where the state covariance inflation was included and saw only a small impact. Our use of larger, 80-member ensembles also helps reduce the need for covariance inflation.

Although more realistic OSSE scenarios that include both forecast model and observation operator errors are tested in the study, the performance of the estimation system in real-data scenarios remains a question requiring further research. While the polarimetric data are believed to contain much useful information about the microphysics, the use of a single-moment microphysics scheme based on an assumed exponential PSD may limit the ability of polarimetric data in helping estimate the intercept parameters. If a two-moment microphysics scheme is used where both the mixing ratios and the total number concentration are predicted, the intercept parameter no longer has to be specified. In this case, our goal would be changed to the estimation of both the mixing ratios and the total number concentrations that are now state variables. The increased number of state variables needing estimation may demand more observational information and the polarimetric observations may become a more valuable addition to the radial velocity and reflectivity observations of nonpolarimetric Doppler radars. The impacts of polarimetric data on full microphysical state estimation when a two-moment microphysics scheme is used will be examined in the future.

The authors thank Dr. Mingjing Tong for much help on the initial use of the ARPS ensemble Kalman filter code and Daniel Dawson for proofreading the original manuscript. This work was primarily supported by NSF Grants EEC-0313747 and ATM-0608168. The second author was also supported by NSF Grants ATM-0802888, ATM-0530814, and ATM-0331594. The computations were performed at the OU Supercomputing Center for Education and Research.

## REFERENCES

Aksoy, A., , F. Zhang, , and J. W. Nielsen-Gammon, 2006a: Ensemble-based simultaneous state and parameter estimation with MM5.

,*Geophys. Res. Lett.***33****,**L12801. doi:10.1029/2006GL026186.Aksoy, A., , F. Zhang, , and J. W. Nielsen-Gammon, 2006b: Ensemble-based simutaneous state and parameter estimation in a two-dimensional sea breeze model.

,*Mon. Wea. Rev.***134****,**2951–2970.Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129****,**2884–2903.Annan, J. D., , and J. C. Hargreaves, 2004: Efficient parameter estimation for a highly chaotic system.

,*Tellus***56A****,**520–526.Annan, J. D., , J. C. Hargreaves, , N. R. Edwards, , and R. Marsh, 2005a: Parameter estimation in an intermediate complexity earth system model using an ensemble Kalman filter.

,*Ocean Modell.***8****,**135–154.Annan, J. D., , D. J. Lunt, , J. C. Hargreaves, , and P. J. Valdes, 2005b: Parameter estimation in an atmospheric GCM using the ensemble Kalman filter.

,*Nonlinear Processes Geophys.***12****,**363–371.Bishop, C. H., , B. J. Etherton, , and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129****,**420–436.Brandes, E. A., , K. Ikwda, , G. Zhang, , M. Schonhuber, , and R. M. Rasmussen, 2007: A statistical and physical description of hydrometeor distributions in a Colorado snowstorm using a video disdrometer.

,*J. Appl. Meteor. Climatol.***46****,**634–650.Burgers, G., , P. J. Van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126****,**1719–1724.Courtier, P., , and O. Talagrand, 1987: Variational assimilation of meteorological observations with the adjoint equation. Part II: Numerical results.

,*Quart. J. Roy. Meteor. Soc.***113****,**1329–1347.Dee, D. P., 1995: On-line estimation of error covariance parameters for atmospheric data assimilation.

,*Mon. Wea. Rev.***123****,**1128–1145.Dee, D. P., , and A. M. da Silva, 1999: Maximum-likelihood estimation of forecast and observation error covariance parameters. Part I: Methodology.

,*Mon. Wea. Rev.***127****,**1822–1834.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**(C5). 10143–10162.Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation.

,*Ocean Dyn.***53****,**343–367.Evensen, G., , and P. J. van Leeuwen, 1996: Assimilation of Geosat altimeter data for the Agulhas Current using the ensemble Kalman filter with a quasigeostrophic model.

,*Mon. Wea. Rev.***124****,**85–96.Gao, J., , and M. Xue, 2008: An efficient dual-resolution approach for ensemble data assimilation and tests with simulated Doppler radar data.

,*Mon. Wea. Rev.***136****,**945–963.Gilmore, M. S., , J. M. Straka, , and E. N. Rasmussen, 2004: Precipitation uncertainty due to variations in precipitation particle parameters within a simple microphysics scheme.

,*Mon. Wea. Rev.***132****,**2610–2627.Gunn, K. L. S., , and J. S. Marshall, 1958: The distribution with size of aggregate snowflakes.

,*J. Meteor.***15****,**452–461.Hacker, J. P., , and C. Snyder, 2005: Ensemble Kalman filter assimilation of fixed screen-height observations in a parameterized PBL.

,*Mon. Wea. Rev.***133****,**3260–3275.Hamill, T. M., , and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches.

,*Mon. Wea. Rev.***133****,**3132–3147.Hao, Z., , and M. Ghil, 1995: Sequential parameter estimation for a coupled ocean–atmosphere model.

*Proc. Second Int. Symp. on Assimilation of Observations in Meteorology and Oceanography,*Tokyo, Japan, WMO, 181–186.Houtekamer, P. L., , and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Houtekamer, P. L., , H. L. Mitchell, , G. Pellerin, , M. Buehner, , M. Charron, , L. Spacek, , and B. Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations.

,*Mon. Wea. Rev.***133****,**604–620.Houze Jr., R. A., , P. V. Hobbs, , P. H. Herzegh, , and D. B. Parsons, 1979: Size distributions of precipitation particles in frontal clouds.

,*J. Atmos. Sci.***36****,**156–162.Joss, J., , and A. Waldvogel, 1969: Raindrop size distribution and sampling size errors.

,*J. Atmos. Sci.***26****,**566–569.Jung, Y., , G. Zhang, , and M. Xue, 2008a: Assimilation of simulated polarimetric radar data for a convective storm using the ensemble Kalman filter. Part I: Observation operators for reflectivity and polarimetric variables.

,*Mon. Wea. Rev.***136****,**2228–2245.Jung, Y., , M. Xue, , G. Zhang, , and J. Straka, 2008b: Assimilation of simulated polarimetric radar data for a convective storm using the ensemble Kalman filter. Part II: Impact of polarimetric data on storm analysis.

,*Mon. Wea. Rev.***136****,**2246–2260.Lawson, W. G., , and J. A. Hansen, 2005: Alignment error models and ensemble-based data assimilation.

,*Mon. Wea. Rev.***133****,**1687–1709.Le Dimet, F. X., , and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects.

,*Tellus***38A****,**97–110.Lin, Y-L., , R. D. Farley, , and H. D. Orville, 1983: Bulk parameterization of the snow field in a cloud model.

,*J. Climate Appl. Meteor.***22****,**1065–1092.Liu, H., , M. Xue, , R. J. Purser, , and D. F. Parrish, 2007: Retrieval of moisture from simulated GPS slant-path water vapor observations using 3DVAR with anisotropic recursive filters.

,*Mon. Wea. Rev.***135****,**1506–1521.Meng, Z., , and F. Zhang, 2007: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part I: Imperfect model experiments.

,*Mon. Wea. Rev.***135****,**1403–1423.Meng, Z., , and F. Zhang, 2008: Tests of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part II: Comparison with 3DVAR in a real-data case study.

,*Mon. Wea. Rev.***136****,**522–540.Mitchell, D. L., 1988: Evolution of snow-size spectra in cyclonic storms. Part I: Snow growth by vapor deposition and aggregation.

,*J. Atmos. Sci.***45****,**3431–3451.Pruppacher, H. R., , and J. D. Klett, 1978:

*Microphysics of Clouds and Precipitation*. D. Reidel, 714 pp.Ray, P. S., , B. Johnson, , K. W. Johnson, , J. S. Bradberry, , J. J. Stephens, , K. K. Wagner, , R. B. Wilhelmson, , and J. B. Klemp, 1981: The morphology of severe tornadic storms on 20 May 1977.

,*J. Atmos. Sci.***38****,**1643–1663.Snyder, C., , and F. Zhang, 2003: Assimilation of simulated Doppler radar observations with an ensemble Kalman filter.

,*Mon. Wea. Rev.***131****,**1663–1677.Tippett, M. K., , J. L. Anderson, , C. H. Bishop, , T. M. Hamill, , and J. S. Whitaker, 2003: Ensemble square root filters.

,*Mon. Wea. Rev.***131****,**1485–1490.Tong, M., , and M. Xue, 2005: Ensemble Kalman filter assimilation of Doppler radar data with a compressible nonhydrostatic model: OSS experiments.

,*Mon. Wea. Rev.***133****,**1789–1807.Tong, M., , and M. Xue, 2008a: Simultaneous estimation of microphysical parameters and atmospheric state with simulated radar data and ensemble square root Kalman filter. Part I: Sensitivity analysis and parameter identifiability.

,*Mon. Wea. Rev.***136****,**1630–1648.Tong, M., , and M. Xue, 2008b: Simultaneous estimation of microphysical parameters and atmospheric state with simulated radar data and ensemble square root Kalman filter. Part II: Parameter estimation experiments.

,*Mon. Wea. Rev.***136****,**1649–1668.Whitaker, J. S., , and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130****,**1913–1924.Whitaker, J. S., , G. P. Compo, , X. Wei, , and T. M. Hamill, 2004: Reanalysis without radiosondes using ensemble data assimilation.

,*Mon. Wea. Rev.***132****,**1190–1200.Xue, M., , K. K. Droegemeier, , and V. Wong, 2000: The Advanced Regional Prediction System (ARPS)—A multiscale nonhydrostatic atmospheric simulation and prediction tool. Part I: Model dynamics and verification.

,*Meteor. Atmos. Phys.***75****,**161–193.Xue, M., and Coauthors, 2001: The Advanced Regional Prediction System (ARPS)—A multiscale nonhydrostatic atmospheric simulation and prediction tool. Part II: Model physics and applications.

,*Meteor. Atmos. Phys.***76****,**143–165.Xue, M., , D-H. Wang, , J-D. Gao, , K. Brewster, , and K. K. Droegemeier, 2003: The Advanced Regional Prediction System (ARPS), storm-scale numerical weather prediction, and data assimilation.

,*Meteor. Atmos. Phys.***82****,**139–170.Xue, M., , M. Tong, , and K. K. Droegemeier, 2006: An OSSE framework based on the ensemble square root Kalman filter for evaluating impact of data from radar networks on thunderstorm analysis and forecast.

,*J. Atmos. Oceanic Technol.***23****,**46–66.Xue, M., , Y. Jung, , and G. Zhang, 2007: Error modeling of simulated reflectivity observations for ensemble Kalman filter data assimilation of convective storms.

,*Geophys. Res. Lett.***34****,**L10802. doi:10.1029/2007GL029945.Yakowitz, S., , and L. Duckstein, 1980: Instability in aquifer identification: Theory and case studies.

,*Water Resour. Res.***16****,**1045–1064.Zhang, F., , and C. Snyder, 2007: Ensemble-based data assimilation.

,*Bull. Amer. Meteor. Soc.***88****,**565–568.Zhu, Y., , and I. M. Navon, 1999: Impact of parameter estimation on the performance of the FSU global spectral model using its full-physics adjoint.

,*Mon. Wea. Rev.***127****,**1497–1517.Zou, X., , I. M. Navon, , and F. X. Le Dimet, 1992: An optimal nudging data assimilation scheme using parameter estimation.

,*Quart. J. Roy. Meteor. Soc.***118****,**1163–1186.

Microphysical parameters and their uncertainty ranges used in the sensitivity experiments, and their initial guesses as used in the parameter estimation experiments.

The simulated polarimetric radar measurements using the control lower, and upper bounds of PSD parameter values for samples taken at three levels near the convective core.

The mean-estimated parameter values in logarithmical form for single-parameter estimation experiments, averaged over 15 experiments with three different initial guesses presented in Table 1 and five different perturbation realizations for each initial guess, over the last five cycles (80–100 min of model time). Their truth values in logarithmical form are given inside the parentheses.

The percentage improvement of state estimation for single-parameter estimation experiments (N0r_ZhKdp, N0s_ZhKdp, N0h_Zdr, Rhos_ZhKdp, and Rhoh_ZhKdp) over their respective reference experiments (N0r_Zh, N0s_Zh, N0h_Zh, Rhos_Zh, and Rhoh_Zh) without polarimetric data. The percentage improvements are ensemble means of those computed from 15 experiments as in Table 3 and over the last five cycles (80–100 min of model time).

The percentage improvement in state estimation for experiments para5_ZhZdr, para5_ZhKdp, and para5_ZhZdrKdp over experiment para5_Zh, averaged over 160 experiments with 32 different initial guesses with five parallel runs for each initial guess, and over the last five cycles (80–100 min of model time). The prefix “para5_” is omitted from the experiment names.

As in Table 3 but for five-parameter experiment para5_Zh and three-parameter estimation experiments para3_Zh, para3_ZhZdr, para3_ZhKdp, and para3_ZhZdrKdp, in which *n*_{0S} and *ρ _{S}* were kept at their incorrect initial values throughout the assimilation cycles while other three parameters were estimated. The experiments start from two sets of parameter values, namely, (

*n*

_{0R},

*n*

_{0S},

*n*

_{0H},

*ρ*, and

_{S}*ρ*) = (3 × 10

_{H}^{6}m

^{−4}, 7 × 10

^{5}m

^{−4}, 4 × 10

^{5}m

^{−4}, 50 kg m

^{−3}, and 400 kg m

^{−3}) and (3 × 10

^{6}m

^{−4}, 3 × 10

^{7}m

^{−4}, 4 × 10

^{5}m

^{−4}, 300 kg m

^{−3}, and 400 kg m

^{−3}). Their truth values in logarithmical form are given inside the parentheses.

The percentage improvement of the state estimation for three-parameter estimation experiments para3_ZhZdr, para3_ZhKdp, and para3_ZhZdrKdp over experiment para3_Zh. The prefix “para3_” is omitted from the experiment names.