## 1. Introduction

Weather radars have been important for rainfall-rate estimation because of their ability to cover large areas with reasonable spatial and temporal resolutions. A cornerstone of radar-based rainfall estimation is the use of efficient conventional power-law relations (PLR) between the radar reflectivity factor *Z* and rainfall rate *R* [hereafter *R*(*Z _{h}*)], where

*Z*is the radar reflectivity factor in horizontal polarization

_{h}*Z*(mm

_{h}^{6}m

^{−3}) and

*Z*(dB

_{H}*Z*). A common source of error for these methods is that PLR coefficients are often customized to a particular longer-term climatology or seasonal/regional precipitation regimes, and therefore are not universally applicable (e.g., Bringi et al. 2004; Cifelli et al. 2011; Fulton et al. 1998; Ryzhkov et al. 2005a; Wang and Chandrasekar 2010). Although space–time variability in the drop size distribution (DSD) contributes to such apparent diversity in power-law coefficients (Lee and Zawadzki 2005), assessing the performance of PLR estimators may be complicated further by factors that include radar attenuation in rain, beam geometrical considerations, and contamination from hail or melting layer media (e.g., Anagnostou et al. 2006; Giangrande and Ryzhkov 2008; Lee 2006).

Weather radars with dual-polarization capability provide additional insight into the precipitation medium and can help resolve some uncertainties from DSD variability and additional sources (Seliga and Bringi 1976). Dual-polarization radar moments, including the differential reflectivity (*Z*_{dr}) and specific differential phase (*K*_{dp}), allow unique insights into rain parameters, including the size, shape, and orientation of raindrops (Gorgucci et al. 2002). Algorithms that utilize polarimetric radar measurements show significant improvement over traditional *R*(*Z _{h}*) relations and lessened sensitivity to DSD variability and partial attenuation in rain (e.g., Bringi et al. 2004; Hogan 2007; Ryzhkov et al. 2005a; Vulpiani et al. 2009). In addition to PLR methods that estimate rainfall rate

*R*directly from polarimetric radar measurements, recent studies have approached the issue of DSD variability by computing the corresponding

*R*from DSD parameters retrieved from polarimetric radar measurements (e.g., Vulpiani et al. 2006; Cao et al. 2010) using neural network or Bayesian approaches.

Most of the polarimetric rainfall-rate estimators are still deterministic PLRs, where *Z _{h}*,

*Z*

_{dr}and

*K*

_{dp}are used in different combinations or the most appropriate estimator is selected for a given set of polarimetric radar measurements (Ryzhkov et al. 2005a). Deterministic estimators usually fail to account for the fact that microphysics varies in space and time, even within the same precipitation, leading to estimates that are less than optimal. To address this problem, Bringi et al. (2004) derived a new

*R*(

*Z*) that varies continuously in space and time. Hogan (2007) presented a spatially variational method where coefficient

_{h}*a*in

*R*(

*Z*) is iteratively refined. Vulpiani et al. (2009) developed a nonlinear estimator based on neural network, and Cao et al. (2010) proposed a Bayesian approach where only

_{h}*Z*and

_{h}*Z*

_{dr}are used due to the assumption of a single Gaussian distribution of the joint distribution of

*Z*and

_{h}*Z*

_{dr}given DSD parameters.

Recently, Li and Zhang (2011) introduced the Gaussian mixture parameter estimator (GMPE), a linear Bayesian estimator. Because the Gaussian mixture model (GMM) ensures convergence to the prior distribution of dual-polarization variables, the GMPE (a minimum variance unbiased estimator) was shown to outperform PLRs in both rainfall-rate estimation and attenuation correction using simulated polarimetric radar measurements. However, the performance of the GMPE has not been tested for real-world rainfall applications.

Rainfall-rate estimators can be developed either through measurements or simulations. Rainfall-rate estimators developed from measurements are usually optimized for specific radar and regions but are less suitable for others, because precipitation is different for different regions, and radars may have different calibration errors and noise levels. On the other hand, rainfall-rate estimators developed from simulations are more general and less sensitive to measurement error, but they depend on different assumptions. Because of the natural variability of raindrop size, shape, and terminal velocity, any specific model–assumption might lead to errors in simulated polarimetric radar measurements. The single-cell Monte Carlo simulation proposed in Li et al. (2011) and adopted in this study addresses such variability by allowing variables, including raindrop shape, canting angle, and DSD, to have uncertainties, making them more suitable for GMPE development and enabling them to better embody the simulated dual-polarization variable distribution.

In this study, the GMPE approach is applied to polarimetric radar-based rainfall-rate estimation. To distinguish it from general GMPE, it is renamed the Gaussian mixture rainfall-rate estimator (GMRE). The flowchart of the GMRE approach is shown in Fig. 1. The GMRE approach was validated by using data collected during the Joint Polarization Experiment (JPOLE) from the well-gauged central Oklahoma region and S-band radar data from the KOUN radar (Doviak et al. 2002), the polarimetric prototype of the Weather Surveillance Radar-1988 Doppler (WSR-88D), over a multiyear period (Ryzhkov et al. 2005b). Performance of GMRE will be compared to other rainfall-rate estimators that were developed and tested on the JPOLE dataset.

This paper is organized as follows: Sections 2 and 3 introduce the general background and details of the Monte Carlo simulation as well as the GMRE algorithm. Section 4 gives a description of the JPOLE dataset and processing methods, followed by the results and comparison of GMRE and other rainfall-rate estimators in section 5. Section 6 presents the discussion and conclusions.

## 2. Monte Carlo simulation

### a. Introduction

Polarimetric variables and the associated rain parameters for a given radar resolution volume are influenced by several factors, including DSD, drop shape behavior, drop canting angles, and terminal velocity of a raindrop. Unlike simulations that employ fixed relations, Monte Carlo simulation allows variables to have uncertainties (or randomness) to avoid assumptions or loss of generality. In this way, Monte Carlo simulation outputs can capture the desirable variable distributions beyond an ensemble average value and include important statistical information.

*N*

_{0}(m

^{−3}mm

^{−1}) and slope parameter Λ (mm

^{−1}). Slope is uniquely determined if

*N*

_{0}and water content

*W*are known when given by

*ρ*is the density of water. Because

_{w}*N*

_{0}and

*W*have physical meaning, the dynamic range of both is well studied, and only a weak correlation is found between those parameters (Zhang et al. 2008). Even though the exponential distribution may not represent very small or large raindrops as well as the gamma distribution (three free parameters), selecting this distribution helps reduce the number of unrealistic parameter cases. Once a DSD is obtained, rainfall rate

*R*(mm h

^{−1}) can be computed from

*υ*(

*D*) is the terminal velocity relationship from Brandes et al. (2002).

*D*

_{min}and

*D*

_{max}are set at 0.5 and 8 mm, respectively. Raindrops are modeled as oblate spheroids with the polynomial relation between axis ratio

*r*(minor to major axis) and equivolume diameter

_{a}*D*, as given in Brandes et al. (2002). To further generalize for different kinds of raindrop behavior, randomness between [−0.2(1 −

*r*), 0.2(1 −

_{a}*r*)] is added to

_{a}*r*(Li et al. 2011). Scattering amplitudes of a raindrop are calculated using the T-matrix method (Mishchenko 2000). If the copolar backscattering and forward-scattering amplitudes are denoted as

_{a}*Z*) and vertical polarization (

_{h}*Z*

_{υ}), differential reflectivity (

*Z*

_{dr}), and specific differential phase (

*K*

_{dp}) of the volume are defined as follows:

*V*is the size of the volume and ∑ indicates summation over all raindrops in the volume. To ensure that enough raindrops are in the volume and balance computational load,

*V*is set at 1000 m

^{3}.

### b. Simulation procedures

Shown by many observation studies, intercept parameter and water content vary for different rain regimes (e.g., Waldvogel 1974; Zhang et al. 2008). Reciprocally, different types of rain may be emulated from randomly generating *N*_{0} and *W*. The empirical range of *N*_{0} is from 10^{1.5} to 10^{6} m^{−3} mm^{−1}, while *W* can reach 10 g m^{−3} (Zhang et al. 2008). While the ranges of *N*_{0} and *W* are well studied, the distribution of *N*_{0} and *W* remains less certain. In some studies (e.g., Li et al. 2011; Vulpiani et al. 2006), a uniform distribution of DSD parameters is assumed, which leads to equal probability for different rain types. This assumption may not hold in general because smaller rainfall (*R* < 30−40 mm h^{−1}) is more frequent than heavier rainfall intervals/cases.

In this paper, prior distributions of *N*_{0} and *W* are designed to favor rainfall lower than 40 mm h^{−1} and marginalize the probability of extreme rain cases by setting *W* from a one-sided Gaussian distribution and *N*_{0} from a uniform distribution with a smaller upper bound. Table 1 gives details of the simulation. Outputs of the simulation include rain microphysics parameters *N*_{0}, Λ, *R*, and the corresponding dual-polarization variables *Z _{h}*,

*Z*

_{dr}, and

*K*

_{dp}. It is worth mentioning that 8000 cases have been generated to help provide statistical significance. As illustrated in Fig. 2, in the majority of the cases

*R*is lower than 40 mm h

^{−1}and the number of occurrences decreases significantly as

*R*increases. Even though the prior distribution input into the Monte Carlo simulation emphasizes smaller rainfall, a broad range of rainfall is still covered because

*R*reaches as high as 180 mm h

^{−1}. Figures 3a,b show the scatterplots of

*Z*and

_{H}*Z*

_{DR}as well as

*Z*and

_{H}*K*

_{dp}from the MC simulation. According to the Next Generation Weather Radar (NEXRAD)

*R*(

*Z*) relationship [Eq. (18)],

_{h}*R*= 20 mm h

^{−1}corresponds to approximately 43 dB

*Z*for observed reflectivity. Combined with the

*R*distribution in Fig. 2, where 70% of the occurrences are

*R*> 20 mm h

^{−1}, the majority of cases concentrated between 43 and 60 dB

*Z*can be explained. Because of a large amount of big and oblate raindrops caused by a combination of large

*R*and

*N*

_{0}as well as randomness added to the axis ratio relation, there are some extreme cases where

*Z*> 55 dB

_{H}*Z*,

*Z*

_{DR}> 5 dB, and

*K*

_{dp}> 2.5° km

^{−1}in the simulation dataset. An advantage of the Monte Carlo simulation is that it can provide the relative possibility of occurrence for extreme cases. The incorporation of extreme cases are necessary for training the GMM and it will not influence the performance of GMRE, because the GMM always converges to the true distribution as the number of mixtures increases [GMM and number of mixtures is defined in Eq. (7)]. Figures 4a,b present the approximate distribution of

*Z*and

_{H}*Z*

_{DR}from one trained GMM with 5 mixtures and another one with 20 mixtures. The approximate distribution from the GMM with 20 mixtures clearly shows more detail and is much closer to the original distribution in the simulation dataset (Fig. 3a). In contrast, the approximate distribution from the GMM with five mixtures ignores some details while preserving the key portions of the original distribution.

Key parameters of the single-cell Monte Carlo simulation.

## 3. Gaussian mixture rainfall-rate estimator

If polarimetric radar measurements are denoted as vector **z**, rainfall-rate retrieval is based on the connections between **z** and *R*. There are mainly three kinds of rainfall-rate retrieval approaches. Many conventional and polarimetric approaches, including NEXRAD, assume PLRs between **z** and *R* and use linear regression models (e.g., Bringi et al. 2004; Cifelli et al. 2011; Ryzhkov et al. 2005a). Neural network approaches consider a black box that has **z** as input and *R* as output (e.g., Vulpiani et al. 2006, 2009). Bayesian probability approaches try to estimate *R* from maximizing the posterior probability *p*(*R* | **z**) (e.g., Cao et al. 2010; Chiu and Petty 2006; Di Michele et al. 2005; Evans et al. 1995). This section presents the theoretical fundamentals of GMRE as well as training and testing GMRE using simulation dataset, following Li and Zhang (2011).

### a. Theoretical fundamentals of the GMRE approach

*R*,

*W*,

*N*

_{0}, and Λ and the corresponding radar variables such as

*Z*,

_{h}*Z*

_{dr}, and

*K*

_{dp}of a radar resolution volume (single cell) can be combined and considered as an unknown and random vector (called state vector)

**x**, such as

**x**= (

*R*,

*Z*,

_{h}*Z*

_{dr})

^{T}. The prior distribution of

**x**,

*p*(

**x**), can be learned and represented by the GMM and is expressed as

**,**

*μ***Σ**) is the Gaussian distribution with mean

**and covariance matrix**

*μ***Σ**;

*M*is the number of Gaussian mixtures used; and

*α*,

_{i}

*μ*_{i}, and

**Σ**

_{i}are the weighting, mean, and covariance matrix for the

*i*th Gaussian mixture, respectively. With a given number of mixtures, the GMM can be trained using the expectation maximization (EM) algorithm (Russell and Norvig 2009) from training datasets.

**z**and measurement noise vector

**v**, the estimation problem can be formulated based on a linear relationship,

**x**= (

*R*,

*Z*,

_{h}*Z*

_{dr},

*K*

_{dp})

^{T}, then GMRE can be used with different

*K*

_{dp}is available [

**z**= (

*Z*,

_{h}*Z*

_{dr},

*K*

_{dp})

^{T}or not

**z**= (

*Z*,

_{h}*Z*

_{dr})

^{T}], and remains the best estimator for both scenarios.

*p*(

**x**|

**z**), also known as the posterior distribution, yields

**v**is modeled as white Gaussian noise from

*p*(

**z**|

**x**) is

*p*(

**z**), according to linear transformation and the addition property of Gaussian distribution, is also a Gaussian mixture with the same number of mixtures as

*p*(

**x**) and can be written as

**Σ**

_{i}

_{ }+

*i*th Gaussian mixture in

*p*(

**z**). Plugging

*p*(

**z | x**),

*p*(

**x**) and

*p*(

**z**) into Eq. (9) yields

*β*is set to be

_{i}*p*(

**x | z**) is also a Gaussian mixture with the same number of mixtures as

*p*(

**x**) and

*β*is the weighing of the

_{i}*i*th Gaussian mixture in

*p*(

**x | z**). The Bayes least squares estimate of

**x**is given as the conditional mean (Lewis et al. 2006)

*E*[·] is the expectation operator. Here,

**x**in terms of minimum variance and unbiased performance. Estimates of

*R*,

*N*

_{0}, and Λ are obtained at the same time using the same GMRE model.

### b. Training of GMREs

Because rainfall rate *R* can be estimated directly from radar observations or recovered from DSD parameters *N*_{0} and Λ, the state vector is set as **x** = (*R*, *N*_{0}, Λ, *Z _{H}*,

*Z*

_{DR},

*K*

_{dp})

^{T}to compare the performance of both approaches. Even though other dual-polarization variables, such as the linear depolarization ratio (LDR) and correlation coefficients (

*ρ*

_{hv}), are not included in the state vector in this study, GMRE can discover and use hidden relationships among different variables, and additional variables would generally lead to a better performance.

Because a GMM would converge to any particular distribution if sufficient mixtures are used, it is safe to assume that prior distribution *p*(**x**) has a Gaussian mixture form. Training GMRE is a learning process during which knowledge of *p*(**x**) is acquired from the training dataset, and it is stored in the weightings, means, and covariance matrixes of a GMM. Therefore, the training problem becomes how to estimate some unknown parameters (*α _{i}*,

*μ*_{i}, and

**Σ**

_{i}) from a given dataset. The EM algorithm (Russell and Norvig 2009), an iterative optimization method, has been widely used in mixture models. It proceeds as follows: in the

*E*step, a posterior probability is assigned to each individual sample in the dataset. In the

*M*step, a new estimate of the unknown parameters is obtained by increasing the global likelihood, that is, the combination–product of all posterior probabilities. The algorithm will continue until the global likelihood can no longer be increased and it converges to the nearest local maximum, which depends on the initial clustering values of the dataset. Because of the limited number of mixtures and the convergence to the local maximum of the EM algorithm in training GMM, while GMRE is the optimized estimator in theory, training GMRE [or, in other words, characterizing

*p*(

**x**)] may lead to a suboptimum performance of GMRE. However, near-optimal performance can be achieved by choosing the proper number of mixtures and better initial clustering values. In this study, the

*k*-means clustering algorithm (Russell and Norvig 2009) is used in the initial clustering of the training datasets for better initial clustering values and faster convergence in GMM training.

### c. Results of GMREs on simulation datasets

*X*is the parameter (such as

*R*,

*N*

_{0}, and Λ) being estimated, then

*σ*is the standard deviation of the estimation and

_{X}*Z*is available, such as with the legacy WSR-88D, GMRE can be used with

_{H}*Z*will be denoted as

_{H}*R*(

_{G}*Z*). For dual-polarized radar without (or with low quality) differential phase measurements, input to GMRE becomes

_{H}**z**= (

*Z*,

_{H}*Z*

_{DR})

^{T}and

**z**= (

*Z*,

_{H}*Z*

_{DR})

^{T}will be denoted as

*R*(

_{G}*Z*,

_{H}*Z*

_{DR}). For radars with full dual-polarization capabilities (

**z**= (

*Z*,

_{H}*Z*

_{DR},

*K*

_{dp})

^{T}), the same GMRE also can be applied. With

**z**= (

*Z*,

_{H}*Z*

_{DR},

*K*

_{dp})

^{T},

*R*can be directly estimated from GMRE (denoted as

*R*) or calculated from retrieved DSD parameters (

_{G}*N*

_{0}and Λ) using Eq. (3) (denoted as

*R*

_{DSD}).

Figure 5 illustrates the RMSEs of GMRE with different inputs and number of mixtures. In general, more observation variables input into GMRE would lead to better performance. As the number of mixtures increases, the RMSEs of *R _{G}*(

*Z*),

_{H}*R*(

_{G}*Z*,

_{H}*Z*

_{DR}), and

*R*improve slowly while the RMSE of

_{G}*R*

_{DSD}significantly lowers from more than 4 mm h

^{−1}to less than 2 mm h

^{−1}. Tables 2 and 3 compare the performance of GMRE with 5 and 20 mixtures. The GMRE with 20 mixtures is better than the GMRE with 5 mixtures in basically every category. As mentioned in last section, GMRE is a minimum variance, unbiased estimator as long as GMM converged to prior distribution

*p*(

**x**). More mixtures in GMM leads to a closer approximate distribution to

*p*(

**x**) (as can be seen in Figs. 4a,b) and better estimation performance, which would eventually reach minimum variance and unbiased estimations. Therefore, the question becomes how many mixtures are appropriate for GMRE, and the answer varies for different applications. For

*R*(

_{G}*Z*) and

_{H}*R*(

_{G}*Z*,

_{H}*Z*

_{DR}), GMRE with 5 mixtures would be sufficient to perform near its optimal point (with minimum variance and unbiased estimation), while

*R*needs 15 mixtures and

_{G}*R*

_{DSD}may need more than 20 to reach to their optimal performance on the simulation dataset. Figure 6 illustrates the plots of the rainfall-rate estimation from

*R*and

_{G}*R*

_{DSD}with 5 and 20 mixtures versus the simulated truth data. Given the same weather radar observations

**z**= (

*Z*,

_{H}*Z*

_{DR},

*K*

_{dp})

^{T},

*R*performs significantly better than

_{G}*R*

_{DSD}when GMRE has five mixtures because

*R*

_{DSD}is calculated from the retrieved

*N*

_{0}and Λ, where the estimation error of

*N*

_{0}and Λ accumulates and magnifies, therefore leading to larger RMSE for

*R*

_{DSD}. For GMRE with 20 mixtures, the better performance of

*R*

_{DSD}is obtained due to more accurate estimates of

*N*

_{0}and Λ. As the number of mixtures increases, the performance of

*R*improves, but it will not surpass

_{DSD}*R*. Therefore,

_{G}*R*

_{DSD}will not be considered in later discussion.

Rain parameters retrieved by GMREs with 5 mixtures for the simulation dataset: *N*_{0} (mm^{−1} m^{−3}), Λ (mm^{−1}), and all rainfall rate *R* (mm h^{−1}).

Rain parameters retrieved by GMREs with 20 mixtures for the simulation dataset: *N*_{0} (mm^{−1} m^{−3}), Λ (mm^{−1}), and all rainfall rate *R* (mm h^{−1}).

The above simulation results indicate that if GMRE is trained from a dataset whose distribution matches that of the testing dataset, then the GMRE with more mixtures has better performance because GMM converges closer to the true distribution as the number of mixtures increases. Also, these simulation results assume a noise-free environment, which means the noise covariance matrix

## 4. JPOLE dataset description

The JPOLE dataset is a polarimetric radar dataset collected between 2002 and 2005 in central Oklahoma using the KOUN WSR-88D quality radar. A total of 43 events of various precipitation types, including warm-season convective storms containing hail, mesoscale convective systems (MCS) with intense squall lines and trailing stratiform precipitation, widespread cold-season stratiform rain, and select tropical storm remnants, are observed and selected for analysis (Giangrande and Ryzhkov 2008). Concurrent gauge observations from the densely spaced Agricultural Research Service (ARS) and Oklahoma Mesonet (MES) network stations located 50–150 km (e.g., Fiebrich et al. 2006; McPherson et al. 2007; Shafer et al. 2000) from the KOUN radar are also included with this dataset.

Dual-polarized measurements (*Z _{H}* and

*Z*

_{DR}) from KOUN have been compared and calibrated using cross comparison with a disdrometer, the nearby KTLX radar (Oklahoma City, Oklahoma, WSR-88D), and polarimetric signatures of dry aggregated snow above the melting level. Attenuation correction in rain has been performed on

*Z*and

_{H}*Z*

_{DR}using differential phase Φ

_{dp}. Nonmeteorological echoes are filtered by a

*ρ*

_{hv}> 0.85 threshold. To mitigate hail contamination, the

*Z*< 53 dB

_{H}*Z*and 0 <

*Z*

_{DR}< 5 dB thresholds were applied. Gauges further than 150 km from the radar have been removed to avoid–reduce partial beam filling and melting layer effects. Figures 7a,b show scatterplots of the

*Z*and

_{H}*Z*

_{DR}measured at ARS and MES gauges. Compared to the scatterplots from the simulation, clear differences in distributions can be observed. There are extensive observations between 10 and 40 dB

*Z*in the JPOLE dataset and the majority of the KOUN pairings have

*Z*

_{DR}< 3 dB.

If hourly radar accumulations are defined as an hourly rainfall estimate centered on a gauge, then validation of GMRE can be performed by comparing hourly gauge and radar rainfall accumulations over gauge locations. Because usually only eight–nine radar scans are available over the same gauge location within 1 h, the nearest neighbor interpolation method is used to calculate hourly radar accumulations.

## 5. Results and comparisons

*Z*as the only input, is the inversion of the standard NEXRAD rainfall formula for continental (nontropical) application (Fulton et al. 1998),

_{h}*Z*and

_{h}*Z*

_{dr}as inputs, had optimized performance for rain in central Oklahoma during the JPOLE field campaign (Ryzhkov et al. 2005a),

*Z*,

_{h}*Z*

_{dr}, and

*K*

_{dp}based on rainfall rate estimated from Eq. (18),

Two GMREs, one with 5 mixtures (G5) and the other with 20 mixtures (G20), are tested using this JPOLE dataset. Since noise properties of different dual-polarization variables in the JPOLE dataset are unknown, **R** is set to be zero in the current implementation. Because FSE statistics are heavily weighted toward small hourly precipitation accumulations, they are not examined during this test. Tables 4 and 5 summarize the results and comparisons of all retrieval algorithms over the ARS and MES gauges.

Performance comparison of rainfall retrieval algorithms for the ARS dataset. Unit: mm h^{−1}.

Performance comparison of rainfall retrieval algorithms for the MES dataset. Unit: mm h^{−1}.

With reflectivity *Z _{H}* as input,

*R*

_{G5}(

*Z*) outperforms conventional NEXRAD

_{H}*R*(

*Z*) in terms of RMSE for both datasets. With

_{h}*Z*and

_{H}*Z*

_{DR}as inputs,

*R*

_{G}_{5}(

*Z*,

_{H}*Z*

_{DR}) performs slightly worse than the JPOLE

*R*(

*Z*,

_{h}*Z*

_{dr}) relation for the ARS dataset, but better for the MES dataset in terms of RMSE. With full polarimetric inputs

*Z*,

_{H}*Z*

_{DR}, and

*K*

_{dp},

*R*

_{G5}has the best performance in every category for both datasets;

*R*

_{G20}is comparable to

*R*

_{SYN}for the closer ARS dataset, but slightly worse than

*R*

_{SYN}for the MES dataset. All estimates but one from the GMREs show a negative bias, probably because of the fact that they are trained from a dataset that favors smaller rainfall. From the previous section, the GMRE with 20 mixtures converges closer to the distribution of the simulation dataset [denoted as

*p*(

_{s}**x**)], while the GMRE with 5 mixtures is only able to represent a general outline of

*p*(

_{s}**x**) without many details. However, because

*p*(

_{s}**x**) does not precisely match the distribution of the KOUN-based measurement dataset [denoted as

*p*(

_{m}**x**)], the GMRE with 20 mixtures is apparently overfitted to

*p*(

_{s}**x**) and prohibits optimal performance in

*p*(

_{m}**x**). However, the GMRE with five mixtures can outperform the JPOLE-tuned synthetic

*R*

_{SYN}relation in terms of bias and RMSE, even though it represents a less detailed

*p*(

_{s}**x**), as highlighted in Figs. 8 and 9.

It is interesting to compare the performance of the simulation dataset–trained GMRE in this study with the neural network approach introduced in Vulpiani et al. (2009) for the same ARS dataset. For these particular events [Table 2 in Vulpiani et al. (2009) and Table 4 herein], both the neural network and GMRE approach outperformed the synthetic relation, with the five-mixture GMRE showing a slightly better performance overall in terms of bias, STD, and RMSE.

These results confirmed that if GMRE is trained from a dataset whose distribution does not precisely match (but approximates) the distribution of the testing dataset, GMRE is still able to perform very well. When *p _{s}*(

**x**) ≠

*p*(

_{m}**x**), the GMRE with less mixtures may perform even better than the GMRE with more mixtures, which could be overfitted to

*p*(

_{s}**x**). However, depending on how

*p*(

_{s}**x**) approximates

*p*(

_{m}**x**) and how much

*p*(

_{s}**x**) and

*p*(

_{m}**x**) resemble one another, the optimal number of mixtures may vary. For example, comparing the dark blue area and light blue area where

*Z*is between 15 and 50 dB

_{H}*Z*in both Figs. 4a,b with the same areas in Figs. 7a,b, the distribution of G5 (Fig. 4a) at this area is clearly much closer to the same area of Figs. 7a,b than G20 (Fig. 4b). This explains why

*R*

_{G5}outperforms

*R*

_{G20}in both ARS and MES datasets. It also explains why extreme cases in the training dataset will not affect the performance of GMRE because only the relative probabilities of cases at areas of interest matter. As a consequence, GMRE should be trained from a large

*p*(

_{s}**x**) that covers a broader range of occurrences than

*p*(

_{m}**x**) (such as the extreme cases covered in the simulation of this study), to ensure that it is capable of handling not only a particular dataset, but also the radar observations from different seasons–regions. In addition to covering a broad range of scenarios, the training dataset also needs to have appropriate proportions of different kinds of precipitation. The more realistic the training dataset, the better the performance of GMRE can achieve in real environments. The performance of GMRE may be improved if it is trained from a dataset generated from measured DSDs over a wide range of time and area.

## 6. Discussion and conclusions

This study develops a Gaussian mixture rainfall-rate estimator for polarimetric radar–based rainfall-rate estimation. Theoretically, GMRE is the optimal estimator in terms of minimum variance and unbiased performance. It is also a general and flexible approach that can be adapted easily to different observation variables and rain types without compromising its performance. Training the GMRE is essential. During the training process, the prior distribution of microphysics parameters and observation variables is characterized and approximated by GMM, which converges to any specific distribution as the number of mixtures increases. In this study, the training dataset is constructed from a single-cell Monte Carlo simulation where the parameters of exponential DSD, *N*_{0} and *W*, are randomly generated first from designed distributions that favor light and moderate rain. Then, uniformly distributed raindrops of different sizes are put into the single cell according to the DSD. Summing up waves scattered by each raindrop within the single cell, the Monte Carlo simulation produces realistic dual-polarization radar signatures.

GMREs with a different number of mixtures are trained and tested using the general simulation dataset. With more mixtures, the GMM converges more readily to the simulation distribution, leading to better estimation. For the same radar observations, the rainfall rate directly estimated from the radar moments (*R _{G}*) is more precise than the rainfall rate retrieved from taking an indirect path through the estimated DSD parameters (

*R*

_{DSD}) wherein estimation error accumulates and magnifies.

Two GMREs, one with 5 mixtures and the other with 20 mixtures, in company with three PLR algorithms, are tested using the JPOLE dataset. As expected, better results are achieved when more radar observation variables are available for both the GMRE and PLR algorithms. While *R*_{G5}(*Z _{H}*,

*Z*

_{DR}) has a performance comparable to

*R*(

*Z*,

_{h}*Z*

_{dr}),

*R*

_{G5}(

*Z*) performs better than the single-parameter

_{H}*R*(

*Z*) and

_{h}*R*

_{G5}outperforms the synthetic

*R*

_{SYN}JPOLE relation, which is the standard benchmark for this JPOLE dataset. However,

*R*

_{G20}does not perform as well as

*R*

_{G5}, which can be attributed to overfitting the GMRE to the specific simulation distribution that is dissimilar to the KOUN radar measurement distribution. Estimates from GMREs generally have a negative bias, which may reflect that these methods were trained from datasets that favor smaller rainfall over heavier rainfall, and also the fact that KOUN polarimetric radar inputs such as specific differential phase are smoothed somewhat in space–time.

In conclusion, GMRE shows great promise over conventional PLR techniques and provides a statistically optimized solution for rainfall-rate estimation. The convergence capability of GMM provides a general framework to accommodate extra information not only from dual-polarization diversities, but also from other diversities, such as multiple frequencies. A subject of ongoing research is to combine ground-based radar measurement with Ku–Ka band satellite radar measurements into the GMRE for better quantitative precipitation estimation (QPE). Because GMRE is a best estimator in terms of variance and bias performance, as long as the prior distribution is accurate, the focuses of rainfall-rate retrievals may be shifted from developing new algorithms–coefficients to constructing a better training dataset for GMRE. For example, better performance of GMRE may be achieved by tuning the distribution of *N*_{0} and *W* in Monte Carlo simulations. If GMRE is trained from a dataset, either from simulation or measurement, without any climatologically driven optimization, then a global GMRE is possible for all rain types and regions. It is worth mentioning that applications of GMRE are not limited to the S band. Similarly, a GMRE can also be built for C- or X-band radars. Like other rainfall-rate estimation techniques, inputs to GMPE have to be corrected from attenuation before they could be used, especially in C and X band. As demonstrated in Li and Zhang (2011), attenuation correlations also can be incorporated into the GMRE framework, which will be studied in detail in future research.

## Acknowledgments

The authors greatly appreciate the support from NOAA/NSSL to make this work possible, and the comments/suggestions of Dr. Alexander Ryzhkov from CIMMS/ARRC.

## APPENDIX

### Proof of Eq. (13)

## REFERENCES

Anagnostou, M. N., Anagnostou E. N. , and Vivekanandan J. , 2006: Correction for rain path specific and differential attenuation of X-band dual-polarization observations.

,*IEEE Trans. Geosci. Remote Sens.***44**, 2470–2480.Brandes, E. A., Zhang G. , and Vivekanandan J. , 2002: Experiments in rainfall estimation with a polarimetric radar in a subtropical environment.

,*J. Appl. Meteor.***41**, 674–685.Bringi, V. N., Tang T. , and Chandrasekar V. , 2004: Evaluation of a new polarimetrically based Z–R relation.

,*J. Atmos. Oceanic Technol.***21**, 612–623.Cao, Q., Zhang G. , Brandes E. A. , and Schuur T. J. , 2010: Polarimetric radar rain estimation through retrieval of drop size distribution using a Bayesian approach.

,*J. Appl. Meteor. Climatol.***49**, 973–990.Chiu, J. C., and Petty G. W. , 2006: Bayesian retrieval of complete posterior PDFs of oceanic rain rate from microwave observations.

,*J. Appl. Meteor. Climatol.***45**, 1073–1095.Cifelli, R., Chandrasekar V. , Lim S. , Kennedy P. C. , Wang Y. , and Rutledge S. A. , 2011: A new dual-polarization radar rainfall algorithm: Application in Colorado precipitation events.

,*J. Atmos. Oceanic Technol.***28**, 352–364.Di Michele, S., Tassa A. , Mugnai A. , Marzano F. S. , Bauer P. , and Baptista J. P. V. P. , 2005: Bayesian algorithm for microwave-based precipitation retrieval: Description and application to TMI measurements over ocean.

,*IEEE Trans. Geosci. Remote Sens.***43**, 778–791.Doviak, R. J., Carter J. K. , Melnikov V. M. , and Zrnic D. S. , 2002: Modifications to the research WSR-88D to obtain polarimetric data. National Severe Storms Laboratory Rep., 49 pp.

Evans, K. F., Turk J. , Wong T. , and Stephens G. L. , 1995: A Bayesian approach to microwave precipitation profile retrieval.

,*J. Appl. Meteor.***34**, 260–279.Fiebrich, C. A., Grimsley D. L. , McPherson R. A. , Kesler K. A. , and Essenberg G. R. , 2006: The value of routine site visits in managing and maintaining quality data from the Oklahoma Mesonet.

,*J. Atmos. Oceanic Technol.***23**, 406–416.Fulton, R. A., Breidenbach J. P. , Seo D.-J. , Miller D. A. , and Bannon T. O. , 1998: The WSR-88D rainfall algorithm.

,*Wea. Forecasting***13**, 377–395.Giangrande, S. E., and Ryzhkov A. V. , 2008: Estimation of rainfall based on the results of polarimetric echo classification.

,*J. Appl. Meteor. Climatol.***47**, 2445–2462.Gorgucci, E., Chandrasekar V. , Bringi V. N. , and Scarchilli G. , 2002: Estimation of raindrop size distribution parameters from polarimetric radar measurements.

,*J. Atmos. Sci.***59**, 2373–2384.Hogan, R. J., 2007: A variational scheme for retrieving rainfall rate and hail reflectivity fraction from polarization radar.

,*J. Appl. Meteor. Climatol.***46**, 1544–1564.Lee, G. W., 2006: Sources of errors in rainfall measurements by polarimetric radar: Variability of drop size distributions, observational noise, and variation of relationships between R and polarimetric parameters.

,*J. Atmos. Oceanic Technol.***23**, 1005–1028.Lee, G. W., and Zawadzki I. , 2005: Variability of drop size distributions: Time-scale dependence of the variability and its effects on rain estimation.

,*J. Appl. Meteor.***44**, 241–255.Lewis, J. M., Lakshmivarahan S. , and Dhall S. , 2006:

*Dynamic Data Assimilation: A Least Squares Approach.*Vol. 104,*Encyclopedia of Mathematics and Its Applications,*Cambridge University Press, 680 pp.Li, Z., and Zhang Y. , 2011: Application of Gaussian mixture model (GMM) and estimator to radar-based weather parameter estimations.

,*IEEE Geosci. Remote Sens. Lett.***8**, 1041–1045.Li, Z., Zhang Y. , Zhang G. , and Brewster K. A. , 2011: A microphysics-based simulator for advanced airborne weather radar development.

,*IEEE Trans. Geosci. Remote Sens.***49**, 1356–1373.McPherson, R. A., and Coauthors, 2007: Statewide monitoring of the mesoscale environment: A technical update on the Oklahoma Mesonet.

,*J. Atmos. Oceanic Technol.***24**, 301–321.Mishchenko, M. I., 2000: Calculation of the amplitude matrix for a nonspherical particle in a fixed orientation.

,*Appl. Opt.***39**, 1026–1031.Russell, S., and Norvig P. , 2009:

*Artificial Intelligence: A Modern Approach.*3rd ed. Prentice Hall, 1152 pp.Ryzhkov, A. V., Giangrande S. E. , and Schuur T. J. , 2005a: Rainfall estimation with a polarimetric prototype of WSR-88D.

,*J. Appl. Meteor.***44**, 502–515.Ryzhkov, A. V., Schuur T. J. , Burgess D. W. , Heinselman P. L. , Giangrande S. E. , and Zrnic D. S. , 2005b: The joint polarization experiment: Polarimetric rainfall measurements and hydrometeor classification.

,*Bull. Amer. Meteor. Soc.***86**, 809–824.Seliga, T. A., and Bringi V. N. , 1976: Potential use of radar differential reflectivity measurements at orthogonal polarizations for measuring precipitation.

,*J. Appl. Meteor.***15**, 69–76.Shafer, M. A., Fiebrich C. A. , Arndt D. S. , Fredrickson S. E. , and Hughes T. W. , 2000: Quality assurance procedures in the Oklahoma mesonetwork.

,*J. Atmos. Oceanic Technol.***17**, 474–494.Vulpiani, G., Marzano F. S. , Chandrasekar V. , Berne A. , and Uijlenhoet R. , 2006: Polarimetric weather radar retrieval of raindrop size distribution by means of a regularized artificial neural network.

,*IEEE Trans. Geosci. Remote Sens.***44**, 3262–3275.Vulpiani, G., Giangrande S. , and Marzano F. S. , 2009: Rainfall estimation from polarimetric S-band radar measurements: Validation of a neural network approach.

,*J. Appl. Meteor. Climatol.***48**, 2022–2036.Waldvogel, A., 1974: The N0 jump of raindrop spectra.

,*J. Atmos. Sci.***31**, 1067–1078.Wang, Y., and Chandrasekar V. , 2010: Quantitative precipitation estimation in the CASA X-band dual-polarization radar network.

,*J. Atmos. Oceanic Technol.***27**, 1665–1676.Zhang, G., Xue M. , Cao Q. , and Dawson D. , 2008: Diagnosing the intercept parameter for exponential raindrop size distribution based on video disdrometer observations: Model development.

,*J. Appl. Meteor. Climatol.***47**, 2983–2992.