## Abstract

This paper presents a new statistical algorithm to estimate rainfall over the Amazon Basin region using the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI). The algorithm relies on empirical relationships derived for different raining-type systems between coincident measurements of surface rainfall rate and 85-GHz polarization-corrected brightness temperature as observed by the precipitation radar (PR) and TMI on board the TRMM satellite. The scheme includes rain/no-rain area delineation (screening) and system-type classification routines for rain retrieval. The algorithm is validated against independent measurements of the TRMM–PR and S-band dual-polarization Doppler radar (S-Pol) surface rainfall data for two different periods. Moreover, the performance of this rainfall estimation technique is evaluated against well-known methods, namely, the TRMM-2A12 [the Goddard profiling algorithm (GPROF)], the Goddard scattering algorithm (GSCAT), and the National Environmental Satellite, Data, and Information Service (NESDIS) algorithms. The proposed algorithm shows a normalized bias of approximately 23% for both PR and S-Pol ground truth datasets and a mean error of 0.244 mm h^{−1} (PR) and −0.157 mm h^{−1} (S-Pol). For rain volume estimates using PR as reference, a correlation coefficient of 0.939 and a normalized bias of 0.039 were found. With respect to rainfall distributions and rain area comparisons, the results showed that the formulation proposed is efficient and compatible with the physics and dynamics of the observed systems over the area of interest. The performance of the other algorithms showed that GSCAT presented low normalized bias for rain areas and rain volume [0.346 (PR) and 0.361 (S-Pol)], and GPROF showed rainfall distribution similar to that of the PR and S-Pol but with a bimodal distribution. Last, the five algorithms were evaluated during the TRMM–Large-Scale Biosphere–Atmosphere Experiment in Amazonia (LBA) 1999 field campaign to verify the precipitation characteristics observed during the easterly and westerly Amazon wind flow regimes. The proposed algorithm presented a cumulative rainfall distribution similar to the observations during the easterly regime, but it underestimated for the westerly period for rainfall rates above 5 mm h^{−1}. NESDIS_{1} overestimated for both wind regimes but presented the best westerly representation. NESDIS_{2}, GSCAT, and GPROF underestimated in both regimes, but GPROF was closer to the observations during the easterly flow.

## 1. Introduction

Knowledge of spatial and temporal variability of precipitating systems (cumulonimbus, multicellular convective systems, squall lines, etc.) is fundamental for many research areas, from hydrology to global climatic change studies, and information about storms and rainfall can improve the quality of numerical weather forecasts and help decision-makers as they discuss actions to be taken in areas affected by the rain.

The energy released on phase transitions during the precipitation cycle is responsible for three-fourths of the heat energy of the atmosphere (Kummerow et al. 1998). Additionally, two-thirds of all precipitation falls in the tropics, an area covered mainly by oceans and undeveloped countries. Over such areas, satellite remote sensing estimates can provide the only available information about precipitation.

There are two main spectral regions used to estimate precipitation from satellites—infrared and microwave—and both have their advantages and drawbacks. Infrared (IR)-based methods have high temporal resolution (a geostationary satellite can produce a full-disk image every 30 min or less), but there is little physical basis to the retrievals: only cloud-top temperatures are available, and these have to be related to precipitation. Microwave (MW)-based methods, on the other hand, have poor temporal resolution because microwave radiometers are flying only on low-orbit satellites that pass about twice a day over the same location. Initiatives such as the Global Precipitation Measurement (GPM) mission (Adams et al. 2002) will improve the temporal resolution of microwave-based observations by launching a constellation of combined equatorial and polar orbit satellites to produce an observation over the same spot every 3 h (information about the GPM mission is available online at http://gpm.gsfc.nasa.gov).

Unlike IR radiation, MW radiation penetrates clouds and can interact with hydrometeors, depending on the wavelength and particle sizes. Low-frequency microwaves (below 50 GHz) interact with hydrometeors through absorption and emission processes, and high-frequency microwaves have wavelengths near particle size, allowing interactions through scattering. The amount of energy received by the radiometer is then directly related to the hydrometeor profile within the cloud, which means that MW remote sensing of precipitation has a stronger physical basis than IR methods; however, mathematical reconstruction methods and assumptions are needed because the signal detected by the radiometer is an integrated effect through the whole hydrometeor column.

Oceans present low emissivities on the microwave spectrum (around 0.4), and dry land presents high emissivities (0.8–1, depending on soil moisture). Hydrometeors have emissivities near unity. Over oceans, clouds appear as “hot” spots in contrast to a “cold” background, so low-frequency channels (which are sensitive to emission) can be used to infer cloud properties. Over land, because of the hot background, only scattering-sensitive channels can be used to detect clouds, which will appear as cold spots on a microwave image. In addition, polarization plays an important role over oceans because water surfaces have different emission coefficients depending on the angle of view.

Microwave algorithms are generally classified as statistical or physical. Statistical algorithms (e.g., Grody 1991; Adler et al. 1994; Ferraro et al. 1998) use observed data to derive an empirical relationship between brightness temperatures and precipitation. Physical algorithms (e.g., Mugnai and Smith 1988; Evans et al. 1995; Kummerow et al. 2001; Viltard et al. 2006) use a database of radiative transfer calculations based on atmospheric profiles (observed or modeled), which are compared with an observed set of brightness temperatures. The higher the number of channels used, the greater the chance of finding an accurate hydrometeor profile in the database. This need for multichannel observations makes the physical approach best suited for oceans, because over land the high emissivities mean that only higher frequencies can be used [e.g., the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) has only two high-frequency channels, vertically and horizontally polarized 85 GHz]. Statistical algorithms are faster and simpler than physical algorithms, and over land the advantages of statistical algorithms tend to outweigh the disadvantages (Kidd et al. 1998).

Although statistical algorithms present reasonable results on monthly estimates, simple relationships are not able to capture the different rain system characteristics in instantaneous observations once these systems begin to present different cloud and rain development processes that are associated with different microphysical characteristics (i.e., rain droplets and ice particles) Thus, it is expected that they will present different hydrometeor distributions and, consequently, distinct rainfall distributions. For example, Fig. 1 shows the cumulative distribution function (CDF) and probability density function (PDF) for the rainfall rate (RR) and the 85-GHz polarization-corrected brightness temperature (PCT; Spencer et al. 1989) for two different rain systems according to their size classification (mean radius <31 km and mean radius >69 km). The PCT, which reduces the influence of water surfaces on the rainfall retrieval, is computed for each pixel (Spencer et al. 1989) as shown:

where V and H stand for vertical and horizontal polarization, respectively.

This example (Fig. 1) uses 545 TRMM orbits from 1 January to 30 April 1999 over the continental Amazon region. It is evident that significant differences in rainfall occur between these two size classifications. For example, the larger systems are colder and more intense (50%–80% are between 260 and 270 K and 1.2 and 4.1 mm h^{−1}, respectively), and the smaller systems are warmer and less intense (50%–80% are between 275 and 282 K and 0.5 and 2.2 mm h^{−1}, respectively). Moreover, this example also shows that smaller systems have higher rainfall intensities for warmer thresholds, while larger systems require deep convection. Thus, algorithms that only use a single relationship between brightness temperature and rainfall rate are not able to capture the distinct aspects of those systems. By adjusting different PCT versus RR relationships as a function of cloud morphology, we could capture these observed physical precipitation differences and then adequately represent the precipitation density function as observed in these systems.

Based on the concept that different precipitating systems will present different rainfall signatures, this study describes a rainfall estimation algorithm (the University of São Paulo probability algorithm, hereinafter called USProb) that relies on a probabilistic statistical method to correlate the 85-GHz PCT and rainfall rate for different precipitating systems over the Amazon basin. The developed algorithm was tested against independent rainfall measurements of the precipitation radar (PR) on board the TRMM satellite over the Amazon basin and against rain yields estimated from the S-band dual-polarization Doppler radar (S-Pol) weather radar during the TRMM–Large-Scale Biosphere–Atmosphere Experiment in Amazonia (LBA) field campaign of 1999 (Rutledge 1999). Moreover, the algorithm performance was also compared with several different rainfall estimation schemes: the Goddard scattering algorithm (GSCAT; Adler et al. 1994), two versions of the National Environmental Satellite, Data, and Information Service (NESDIS) algorithm (Ferraro and Marks 1995), and the Goddard profiling algorithm (GPROF), version 6 (Kummerow et al. 2001). Finally, the USProb and the other four algorithms were tested during the TRMM–LBA field campaign to verify if they can depict the rainfall distributions observed in the different wind flow regimes studied during the experiment. In the subsequent sections we describe the dataset and the algorithms used in the intercomparison, as well as the development of our algorithm. In section 4, the validation of the algorithm is tested against independent data and different rainfall schemes. Section 5 shows the main physical characteristics observed in the rain systems according to our classification, and section 6 presents the performance of the algorithms based on westerly and easterly wind regimes observed during the TRMM–LBA field campaign. Our conclusions are offered in section 7.

## 2. Data description

To develop and calibrate USProb we used 545 TRMM orbits during the period of 1 January–30 April 1999 over the region defined by latitude 5°N–16°S and longitude 76°–48°W. We used the surface rain and rain type from PR (TRMM product 2A25, version 6) and the 19_{V}-, 22_{V}-, and 85_{V,H}-GHz TMI brightness temperatures (TRMM product 1B11, version 6; the subscripts V and H denote vertical and horizontal polarization, respectively). The data were also interpolated to a grid size of 0.1° × 0.1° (approximately 11 km × 11 km) to account for the different sensor resolutions, which can vary from 7 km × 5 km (85-GHz TMI channels) to 30 km × 18 km (19-GHz TMI channels) at preboost conditions. Although only the 85-GHz channels are used to actually derive rainfall rates, the 19- and 22-GHz channels are used on the screening routine. A complete and detailed reference on the TRMM instruments is given by Kummerow et al. (1998). This dataset is also used to study the system characteristics presented in section 5.

USProb relies on a probabilistic statistical method that correlates PCT and RR for different precipitating systems. The PCT is helpful in removing cold background signatures from wet surfaces such as rivers and lakes that can be confused with clouds. Although microwave emissivities of surface water bodies are a strong function of polarization, radiation that is emitted or scattered by clouds is only slightly polarized. Therefore, for the same frequency, clouds present nearly the same brightness temperature on both horizontal and vertical channels, but wet surfaces, on the other hand, will present significant differences between vertically and horizontally polarized brightness temperatures.

USProb was validated using 2 different ground-truth datasets. The first comparison used the PR-2A25, version 6, surface rainfall rates as a reference, using 109 TRMM orbits during the whole month of October 2005. For the second comparison, the National Center for Atmospheric Research (NCAR) S-Pol radar (Keeler et al. 2000) rainfall estimates (Carey et al. 2000) available from the TRMM–LBA 1999 field campaign were used. The experiment was conducted in the state of Rondonia, Western Amazon, in Brazil during January and February of 1999. The radar covered a 200-km-radius area, and it was centered at 62°W and 11.2°S. For the comparison, we used 45 S-Pol coincident measurements available from the TRMM orbits during 13 January–21 February 1999. This TRMM–LBA dataset was also used in section 6 to verify the performance of the algorithm based on the Amazon wind regime characteristics (westerly versus easterly). Finally, USProb was compared with GSCAT, GPROF, and the original and modified versions of the NESDIS rain-rate algorithm.

### a. GSCAT and NESDIS datasets creation

For validating the GSCAT (Adler et al. 1994) and NESDIS (Ferraro and Marks 1995) Special Sensor Microwave Imager (SSM/I) rain-rate algorithms, rain rates (RR) are retrieved using the following relations. The GSCAT rain rate (mm h^{−1}) for each pixel, over continental areas, is defined as

and the NESDIS rain rate (mm h^{−1}) for each pixel is defined as

where the scattering index (SI; Grody 1991) is defined as

More details on the SI physical basis will be given in section 3a. For NESDIS_{1}, the *a _{n}* coefficients as well as the

*b*and

*c*values followed the original formulations [i.e., Grody (1991) for the SI coefficients, and Ferraro and Marks (1995) for

*b*and

*c*values]. For NESDIS

_{2}, the SI coefficients and the

*b*and

*c*coefficients were adjusted using the calibration dataset with PR surface rainfall as the ground truth over the region of interest. The SI coefficients and the

*b*and

*c*values, for both formulations, are presented in Table 1. For both formulations the rainfall threshold is SI = 10 K, which is the value indicated in the original work of Ferraro and Marks. No additional screening processes were performed for the GSCAT and NESDIS algorithms. These algorithms were originally developed for SSM/I, but currently are also used with TMI data (and could be used with any other sensor with channels around the same frequencies).

### b. GPROF rain rates

The Goddard profiling algorithm (Kummerow et al. 2001) retrieves rainfall’s vertical structure by using a Bayesian approach to match the observed brightness temperatures to simulated brightness temperatures from an Eddington-based radiative transfer model (Kummerow 1993), which uses hydrometeor profiles derived from cloud-resolving models (CRMs) as input.

The GPROF hydrometeor profiles (TRMM product 2A12, version 6) are publicly available on the National Aeronautics and Space Administration (NASA) Goddard Earth Sciences Data and Information Services Center (GES DISC) Web site (http://disc.sci.gsfc.nasa.gov/). The surface rain rate was the only 2A12 data used in this work.

## 3. Algorithm development

The development of USProb can be divided into three steps that are described below: (i) rain screening, (ii) clustering and system classification, and (iii) evaluation of the PCT–RR relationship for each system class.

### a. Rain screening

To identify the pixels or areas that are raining (or not raining), a screening procedure based on a combination of four methods was applied. The indexes chosen are easy to implement and have physical bases, although the screening methodology is a statistical procedure. These methods are listed here.

The polarization-corrected brightness temperature, defined in Eq. (1) and developed by Spencer et al. (1989), can separate cold backgrounds (such as oceans or rivers) from cold clouds at the frequency of 85 GHz because of differences in the vertical and horizontal emissivities of wet surfaces.

The scattering index was developed by Grody (1991) to identify the cloud regions that have ice particles aloft (i.e., where there is ice scattering). The SI formulation is presented in Eq. (4), where the first four terms on the right-hand side represent the emission of water and cloud particles inside the cloud, and it is expected to have the same value as the last term in absence of scattering. If scattering occurs, the decrease observed in the

*T*_{V85}values will cause an increase in the SI value. Because the SI was developed for SSM/I instruments, the SI coefficients were adjusted using TMI data, leading to different values from those proposed by Grody in his original work. The coefficients, original and adjusted, are presented in Table 1.The difference between

*T*_{V19}and*T*_{V85}indicates if there are ice-scattering signatures in the 85-GHz vertically polarized channel. This method was elaborated and tested during the USProb development.The standard deviation of

*T*_{V85}in a 5 × 5-pixel window,*σ*(*T*_{V85}), is the last method. According to Anagnostou and Kummerow (1997),*T*_{V85}is more variable in raining than in nonraining areas, which leads to higher values of*σ*(*T*_{V85}) in case of rain.

Figure 2 shows the distributions of the four indexes for raining and nonraining pixels based on the PR classification. It can be noted that raining and nonraining pixels display different characteristics in all index distributions. The raining pixel distributions have a lognormal shape and the nonraining ones exhibit a skewed Gaussian form [except for the *σ*(*T*_{V85}) distribution, which shows a lognormal characteristic for both raining and nonraining pixels].

Even with these mentioned differences, choosing one specific threshold to determine whether a pixel is raining or not is not a simple task; rather, it is necessary to employ a combination. Therefore, we use three statistical parameters (Negri et al. 1995) to identify which conditions can be applied to screen the rain areas. These parameters are the probability of detection (POD), the false-alarm rate (FAR), and the critical success index (CSI). These parameters are used to optimize the identification of raining and nonraining pixels among the four indexes described above, and they are expressed in the following equations:

where CR is the number of pixels classified as raining by both methods (screening and PR reference), ER is the number of pixels classified as raining by the screening routine but as nonraining by the PR, and ND is the number of pixels classified as nonraining by the screening routine but as raining by the PR. It is important to note that these indexes are strongly correlated, and an attempt to achieve better results on a particular index can cause loss of quality on the results of other indexes. For example, one can adjust the screening routine to maximize the POD value, but this gain of quality in detection of rain will lead to higher levels of FAR and, therefore, to a low value of CSI. The ideal values of POD, FAR, and CSI are 1, 0, and 1, respectively.

In this study, the optimal combination of POD, FAR, and CSI values found was 0.641, 0.280, and 0.511, respectively. These values were obtained using the screening procedure described in the flowchart in Fig. 3, which is actually the screening routine used in USProb.

### b. Clustering and system classification

According to the above screening procedure (Fig. 3), a raining system is defined as a group of interconnected pixels (a cluster) with PCT values lower than 277 K (this is the PCT value at the boundary between the raining and nonraining distributions). Once the cluster is delineated, the screening routine is applied to exclude the nonraining pixels. After the screening, the clusters are classified according to their sizes and PCT distributions. The size classification is stratified in three categories: systems smaller than 3000 km^{2}, systems between 3000 and 15 500 km^{2}, and systems larger than 15 500 km^{2}. Cluster sizes are computed by multiplying the number of pixels in each cluster by the pixel area (0.1°, according to section 2). The other criterion is based on what we define as mean lower temperature (MLT). The MLT is defined by computing the PCT distribution and then calculating the mean of the lowest 10% of values. It is expected that systems with strong convective cores will have lower MLT values than stratiform systems. Using histogram analysis and statistical tests, we chose a threshold value of 220 K because differences between the rainfall-rate distributions of systems above and below this MLT threshold can be found.

After combining these categorizations (the system size and the MLT), five different classes were created, as presented in Table 2. Classes 1 and 3 represent colder systems (class 1 includes larger systems and class 3 has medium-sized systems), and classes 2 and 4 represent warmer systems (class 2 has larger systems and class 4 has medium-sized systems). Class 5 does not use the PCT classification because it represents small systems, and it therefore may not be representative of the calculation of the MLT values. Judging by visual inspections of TMI and Visible and Infrared Scanner (VIRS) images over the Amazon region, classes 1 and 2 bore some resemblance to mesoscale convective systems (MCSs), mesoscale convective complexes (MCCs), and squall lines; classes 3 and 4 somewhat resembled supercells and multicellular systems; and class 5 resemled cumulonimbus clouds. An example of these categorizations is presented in Fig. 4.

### c. PCT–rainfall-rate relationships

Finally, we applied the probability matching method (PMM) developed by Calheiros and Zawadzki (1987) to derive the PCT–RR relationships. The main idea behind the PMM is to relate two independent variables through their probability frequencies. In the case of this study, these two independent variables are the PCT and RR for different system categories. To build the relationships, we computed the PCT and RR CDFs for each one of the five classes, as shown in Fig. 5. By relating each pair of CDFs, we were able to develop five different PCT–RR lookup tables (LUTs), which are graphically represented in Fig. 6.

These relationships show that colder systems (classes 1 and 3) will produce less rain for the same PCT value when compared with classes 2 and 4 (warmer systems). This can be explained by the differences between the system hydrometeor contents. Colder systems, which present strong convective cores, can produce high large-sized ice particle quantities. At 85 GHz, these particles have a very high scattering efficiency, resulting in low PCT values. Because this PCT drop can be wrongly associated with high values of rainfall, when comparing PCT and PR surface rainfall from stratiform and convective systems the nonprecipitating ice particles scattering produced by convective clouds must be taken into account. Therefore, convective systems will present lower PCT values for the same amount of rainfall.

Figure 7 presents the PR reflectivity occurrence with height for MLT-dependent classes, where several differences between colder (classes 1 and 3) and warmer (classes 2 and 4) systems can be noted (such as no occurrence of reflectivity values greater than 20 dB*Z* between 2 and 3 km for warmer systems, although classes 1 and 3 present values over 30 dB*Z*). Warm systems have well-defined occurrence peaks between 2 and 4 km, with values near 15 dB*Z* (light rain). Classes 1 and 3 present values over 30 dB*Z* from 3 km up to 15 km; however, high reflectivity values only occur near 15 km for classes 2 and 4. [It is important to note here that the term “warmer systems” does not mean systems without the ice phase; rather, it means systems with higher MLTs (i.e., less ice) than classes 1 and 3.]

Shin et al. (2000), using 1 yr of TRMM data, showed that the freezing level over the Amazon region varies from 3 to 5.5 km, with higher values occurring during austral summer. The high reflectivity values presented in Fig. 7 at levels higher than the climatological freezing level reinforce the hypothesis that the aforementioned PCT drop is caused by nonprecipitating hydrometeors.

As presented above, PCT–rainfall relationships are clearly system-type dependent, and an algorithm that attempts to use a single relationship between brightness temperatures and rainfall rate may lead to unrealistic results that are amplified on instantaneous retrievals. However, for monthly and weekly averages a single-relationship algorithm can achieve good results, as demonstrated by Adler et al. (1994) and Ferraro and Marks (1995). In the next section, a comparison between USProb and four other algorithms (GPROF, GSCAT, NESDIS_{1}, and NESDIS_{2}) is presented.

## 4. Validation

In this section, the USProb algorithm is evaluated in terms of both screening procedure efficiency and rainfall retrieval. This evaluation is based on a comparison with the rainfall products of 109 TRMM-PR orbits during October 2005 and the 45 coincident TRMM–LBA S-Pol observations during January–February 1999. Furthermore, the new scheme is compared with the well-known rainfall estimation algorithms GSCAT, NESDIS, and GPROF. The rainfall retrieval is divided into three main categories: rain volume estimates, rainfall distribution, and error analysis. The S-Pol reference data were used only in the study of the rainfall distribution and the error analysis.

### a. Screening efficiency

The screening routine efficiency is evaluated by computing the POD, FAR, and CSI indexes, as well the raining area detected by the algorithms, against the ground truth.

Table 3 shows the POD, FAR, and CSI values for rain versus no-rain classification, using 109 TRMM orbits with the PR as the ground truth, and Fig. 8 shows the performance of the indexes as a function of the PR-observed rainfall rate. The five algorithms show an improvement in the index values as the rain rate increases. Although the NESDIS_{1} algorithm has the best POD scores, its FAR values are too high, which leads to poor CSI results until the observed rainfall reaches 10 mm h^{−1}. GSCAT shows low values of POD and high values of FAR. As a consequence, the GSCAT CSI scores are the lowest observed. The low NESDIS_{2} FAR scores show the improvement obtained by using adjusted coefficients for the area of interest. The GPROF screening routine presents the highest CSI values, and the USProb screening scheme, although not scoring the best CSI values instantaneously, agrees with GPROF after 2 mm h^{−1} because of its low values of FAR (lower than GPROF).

To determine the rain area extent associated with a cloud, the 10.8-*μ*m IR channel of TRMM-VIRS is employed to delineate the precipitating clouds. We defined a rainy cloud as a region with pixels that have a brightness temperature lower than 273 K in the IR channel, and within the IR cloud the number of raining pixels (observed by PR and estimated by each algorithm) is computed. This value of 273 K is necessary to avoid the inclusion of warm clouds in the analysis; over the continent the TMI warm cloud signatures are close to the continental background brightness temperatures and the surface noise (variability of the surface emissivity).

Figure 9 shows the scatterplots of the estimated and observed rain areas and Table 4 presents the correlation coefficient (CC), normalized bias (NBIAS), and fractional standard error (FSE) calculated for the five algorithms. These coefficients are defined as follows:

where *N* is the total number of coincident observed (*O*) and estimated (*E*) variables (in this case, clouds). CC varies from 0 to 1, where 1 represents the highest correlation between the two variables. The NBIAS represents the fraction of overestimation (>0) or underestimation (<0) with respect to the reference value. FSE is the error variance normalized by the variance of the predicted variable: the smaller the values are, the lower is the variability of the error with respect to the natural variability of rainfall (Morales and Anagnostou 2003).

The results in Fig. 9 and Table 4 show that USProb presented the lowest NBIAS (0.023) and FSE (0.310) values and NESDIS_{1} presented the highest values (NBIAS = 1.231 and FSE = 1.264). All algorithms showed CC higher than 0.96, meaning that they are all well correlated with the PR reference. GPROF presented high NBIAS (0.773) and FSE (0.747), which indicates an overestimating of the rain area as well as the error variability. NESDIS_{2} was the only algorithm to underestimate the rain area (NBIAS = −0.245) and presented an FSE of 0.487. GSCAT presented the second-best results, with an NBIAS of 0.192 and an FSE of 0.460.

### b. Retrieval efficiency

In this section, the USProb rainfall retrieval efficiency is tested by comparing the estimated rain volumes computed for each cloud and the total estimated rainfall distributions, using the PR and S-Pol surface rain rates as the ground truth (S-Pol data were used only for comparing rainfall distributions). The error distributions were also compared for both references.

#### 1) Rain volume

Using the same approach used to compute rain areas, the rain volume is computed by multiplying the rain rate at each pixel within a cloud by the pixel area. Figure 10 shows the rain volume estimated by each algorithm against the PR-observed rain volumes, and Table 5 shows the CC, NBIAS, and FSE calculated for the five algorithms.

All algorithms produced a CC higher than 0.93, except NESDIS_{2}, with a CC equal to 0.826. The rain-volume estimate depends on the rain-area estimate, so the error statistics tend to have the same behavior, that is, algorithms that have good performance on rain-area estimation tend to have a good rain-volume estimate if their rain estimates are reasonable. Hence, USProb presented the NBIAS (0.039) closest to zero and the lowest FSE (0.352), and NESDIS_{1} presented the highest NBIAS (1.176) and FSE (0.979). NESDIS_{2} underestimated the rain volume by about 25% and showed a FSE of 0.608. Again, GSCAT presented the second-best results (NBIAS = 0.199 and FSE = 0.365), and GPROF overestimated the rain volume by over 100%, with a high error variability (FSE = 0.765).

#### 2) Rainfall distributions

High-quality precipitation retrievals also require the equal quality observed rainfall distributions because some applications such as hydrology studies and flash flood predictions require an accurate estimate of the rainfall probability density function. Accordingly, histograms were created using a bin size of 1 mm h^{−1} for PR data (Fig. 11) and 2 mm h^{−1} for S-Pol data (Fig. 12), and the RR frequencies of occurrence were weighted by the total rain volume of each dataset. S-Pol histograms use a larger bin size to compensate for the small amount of observed data.

Using the PR as a reference, the USProb and GPROF algorithms achieved the best results, but with two main differences: GPROF slightly overestimates rainfall up to 5 mm h^{−1} and shows a second peak from 28 to 39 mm h^{−1}, which leads to a bimodal distribution; while NESDIS_{1} distribution shows a quasi-linear behavior, underestimating rainfall under 15 mm h^{−1} and overestimating above this value. NESDIS_{2}, because of its adjusted coefficients, shows better coincidence than NESDIS_{1}, which reinforces the utility of generating a unique set of coefficients for each location instead of using a global coefficient set. GSCAT overestimates rainfall from 4 to 12 mm h^{−1}, but over 12 mm h^{−1} shows similar results.

The S-Pol comparisons (Fig. 12) show discontinuous patterns because we have a small dataset of only 45 TRMM orbits, some with no rain at all. Even with this small dataset, however, some characteristics can be observed. GSCAT presents a strong peak at 10 mm h^{−1} and an absence of rain after 22 mm h^{−1}, GPROF distribution is similar to the S-Pol reference until 18 mm h^{−1} but is shifted by 2 mm h^{−1}, the NESDIS_{1} histogram shows a linear decay without strong peaks, and NESDIS_{2} presents a series of peaks throughout the distribution. The USProb histogram shows reasonable results with the S-Pol data from 30 to 40 mm h^{−1} and is able to estimate the peaks around 50 and 65 mm h^{−1}. On the other hand, the histogram shows overestimations between 3 and 14 mm h^{−1} and underestimations from 18 to 30 mm h^{−1}.

#### 3) Error distributions

As noted in the previous item, there are some differences among the rainfall rate distributions estimated by each algorithm. Therefore, to quantify these different error distributions, the NBIAS, FSE, and the mean error [MERR, shown in Eq. (11)] are computed and presented in Fig. 13 and Table 6 (with PR as the reference) and again in Fig. 14 and Table 7 (with S-Pol as the reference). MERR (mm h^{−1}) is defined as follows:

where *E* is the estimated rain rate by each algorithm and *O* is the PR or S-Pol observed rain rate.

GPROF shows a bimodal error distribution for both ground-truth references and high NBIAS values (0.685 for PR and 1.261 for S-Pol); this can be an artifact of its convective and stratiform classifications. Also, it has the second-highest MERR (1.430 mm h^{−1} for PR and 0.637 mm h^{−1} for S-Pol). NESDIS_{2} shows negative NBIASs for both references (−0.023 for PR and −0.193 for S-Pol), which indicates underestimation. Moreover, its MERRs are also negative for both references, −0.548 mm h^{−1} for PR and −0.476 mm h^{−1} for S-Pol. In contrast with NESDIS_{2}, NESDIS_{1} has the highest MERRs (5.034 mm h^{−1} for PR and 1.334 mm h^{−1} for S-Pol:) and NBIASs (2.194 for PR and 2.271 for S-Pol), showing overestimations greater than 200% in both cases. GSCAT shows the lowest MERR for the S-Pol comparison (−0.010 mm h^{−1}) and an overestimation about 35% for both references. USProb has an overestimation of about 23% for both comparisons and the lowest MERR (0.244 mm h^{−1}) for the PR comparison. The MERR for the S-Pol comparison is also near zero (−0.157 mm h^{−1}). All algorithms presented a value of FSE near 1, indicating that the error variability is very high.

## 5. Raining systems characteristics

Because USProb is based on the classification of precipitating systems (Table 2), it is important to understand the main physical characteristics observed in those systems, which can help future precipitation estimation algorithms and numerical weather prediction parameterizations.

To evaluate these characteristics, the 2A25-PR measurements from 545 TRMM orbits during the period of 1 January to 30 April 1999 are employed. Table 8 presents the number of systems detected, the volume and area fractions relative to the total rain volume and area estimated, and the volume/area ratio. The large systems (classes 1 and 2) represent over 75% of the rainfall volume and over 70% of the rain area, and medium-sized systems (classes 3 and 4) share over 15% of the volume and 13% of the area estimated. The smaller systems (class 5), represented by 3290 systems, are responsible for 8.38% of the volume detected and 12.8% of the area.

Classes 1 and 3 have MLT values lower than 220 K, representing colder systems, which can be associated with strong convection within the system. Therefore, it is expected that these systems will present larger rain volumes. The volume/area ratio for cold classes is 1.76; the other classes show values lower than 0.9. This suggests that colder systems are more efficient in producing rain than the warm ones. In fact, although colder systems represent only about 30% of the rainy area, they contribute almost 52% of the rain volume.

Because the 2A25-PR product discriminates among rain types [stratiform, convective, and other, where “other” usually refers to regions of precipitation aloft with no precipitation near the surface (Schumacher and Houze 2003), with a negligible (<0.2%) contribution to total rain], we can evaluate the percentage of these types in the above systems. Because the data are interpolated into a regular grid, two different types of rain can occur within the same grid point. Therefore, the “mixed” rain type was created, which combines convective and stratiform rain types in the same grid point. Table 9 shows the statistics of these classifications weighted by the PR rain volume according to the USProb raining classification.

Cold systems (classes 1 and 3) have the largest convective area fractions, over 23% each. The class 1 convective rain fraction is lower than the stratiform rain fraction, probably because of the occurrence of stratiform precipitation associated with convective cells in MCSs. Class 3 presents 54% of the convective rain fraction, but classes 2 and 4, on the other hand, have about 2 times more stratiform rain than convective rain. Finally, class 5 has almost 54% convective rain and about 30% stratiform rain, with 21% of convective area, which can be credited to isolated cumulonimbus.

We can look at these results as a simplified convective–stratiform classification, where the classification procedure used by USProb may determine the predominance of stratiform and convective areas and rainfall. Schumacher and Houze (2003) stated that over the Amazon region the stratiform rain volume fraction is about 35% and the stratiform rain area fraction is about 75%, leading to a volume/area ratio of 0.47 for stratiform precipitation and 2.6 for convective precipitation. Our results agree with their study, even though a further look at seasonal variabilities may be necessary.

## 6. Rainfall dependency: Westerly and easterly wind regimes during the TRMM–LBA 1999 field campaign

In addition to all statistical tests performed in the prior sections, it is important to evaluate the performance of the algorithms in raining systems that depend on climate conditions to determine whether they are suitable to represent these dependencies. For example, during the wet season (January and February) in the southwestern Amazon, the large-scale circulation changes the wind flow (westerly and easterly) as a result of the South Atlantic Convergence Zone (SACZ) and squall lines propagate eastward from the east coast of Brazil (Kodama 1992; Liebmann et al. 1999; Herdies et al. 2002). As a consequence, rain systems have different characteristics in each wind regime (Rickenbach et al. 2002; Williams et al. 2002). During westerly wind regimes, the presence of the SACZ can be observed, with large stratiform rain, shallow convection, and low lightning activity. During easterly wind regimes, strong or deep convection with high lightning activity is more typical, with a predominance of convective rain.

To evaluate the performance of the rainfall estimation algorithms on such different climate regimes, we used the wind regime periods computed by Rickenbach et al. (2002) (Table 10) during the TRMM–LBA 1999 field campaign. We computed the rainfall CDF for these time periods over the area of the S-Pol radar site (60°–64°W, 9°–13°S) for each of the rainfall estimation algorithms and also for the estimated S-Pol rainfall rate available during the period 13 January–21 February of 1999 (Fig. 15), thus excluding the easterly 1 and westerly 3 periods.

As pointed out by Rickenbach et al. (2002) and Anagnostou and Morales (2002), the easterly wind regimes have higher rainfall rates as observed by the S-Pol rainfall CDF, with 15% of the rainfall rates are above 40 mm h^{−1}; in the westerly wind region, the rainfall rate does not exceed 38 mm h^{−1}.

By comparing the S-Pol distribution with the other algorithms, it is observed that USProb results are in agreement with the observed S-Pol rainfall distribution for the easterly periods, and the cumulative frequencies for 5, 10, 20, 30, and 40 mm h^{−1} present a maximum absolute error of 0.05 when compared with the S-Pol distribution. During the westerly regimes, USProb underestimates for rainfall rates above 5 mm h^{−1}, with the maximum value at 28 mm h^{−1}. NESDIS_{1} overestimates for both wind regimes, but it presents the best estimate for the westerly flow when compared with the other algorithms. For NESDIS_{2}, GSCAT, and GPROF, the westerly CDFs are underestimated and do not present rain occurrence above 15 mm h^{−1}; however, in the easterly regime NESDIS_{2} and GSCAT reach their maximum at 34 and 25 mm h^{−1} respectively. Hence, underestimating. GSCAT presents higher rainfall occurrence in the easterly regime than in the westerly flow for rainfall rate lower than 8 mm h^{−1}, which does not agree with the observed results. For the GPROF, the estimates during the easterly regime are very similar to USProb, with only a slight underestimation.

## 7. Conclusions

In this paper we presented USProb, a new algorithm for rainfall retrieval over the Amazon basin region. Tests and comparisons with GSCAT, NESDIS, and GPROF showed that USProb achieved good results for instantaneous rainfall retrieval when comparing all algorithms with two different ground-truth datasets, PR and S-Pol.

When comparing the rain areas and rainfall volumes, USProb showed a high correlation (CC > 0.9) and low bias (NBIAS < 0.04) and presented the lowest FSE for both comparisons (0.310 for PR and 0.352 for S-Pol). GSCAT has the second-best NBIAS and FSE values. NESDIS_{1} presented NBIAS and FSE near 1 for both comparisons, indicating an overestimation of more than 100% on rain areas and volumes. NESDIS_{2}, on the other hand, presented better results for both comparisons because of its adjusted coefficient sets. Kidd et al. (1998) concluded that any rainfall estimation algorithm that is applied in different conditions requires some adjustment, which supports the better results obtained by NESDIS_{2} when compared with NESDIS_{1}.

USProb and GPROF rainfall histograms showed similar agreement with the PR reference, but GPROF showed a bimodal-type distribution. The error analysis showed that NESDIS_{1} had the highest MERR (5.034 mm h^{−1} for PR and 1.334 mm h^{−1} for S-Pol), NBIAS, and FSE (>2 for both comparisons); and NESDIS_{2} presented the lowest NBIAS (−0.023 for PR and −0.193 for S-Pol) and had an MERR near −0.5 mm h^{−1}. This result reinforces the need for a recalibration of the algorithms depending on the region where they are applied.

GSCAT overestimated rain by about 35% for both references, and produced an MERR of 0.358 mm h^{−1} for the PR comparison and −0.010 mm h^{−1} for the S-Pol comparison; this is the best MERR value found in this study. GPROF was highly biased, overestimating rain in 126% for S-Pol and 68% for PR. All algorithms presented FSE values near 1 for both references, except NESDIS_{1}, with an FSE of about 2. These high error variabilities demonstrate how difficult is to estimate rain in an instantaneous manner. USProb overestimated rain by about 23% for both references and had an MERR of <0.25 mm h^{−1}, with its error distributions presenting a Gaussian shape centered around zero for both PR and S-Pol references.

According to the USProb system classification analysis, it was found that colder systems have a higher rain volume/area ratio (1.76) than the warmer systems (<0.7). These low volume/area ratio values presented by classes 2 and 4 show that warmer systems are less efficient rain producers, a characteristic mainly attributed to the larger area of stratiform rain of the warmer systems when compared with the colder ones. Smaller systems (class 5) that do not use the temperature classification presented more than 50% of the rain classified as convective, probably due to isolated cumulonimbus. Additionally, the class 5 volume/area ratio (0.85) is higher than that of the warm systems.

The main westerly and easterly wind regime rainfall characteristics (e.g., lower rain rates for westerly periods) were correctly captured by all algorithms except for GSCAT. NESDIS_{1} presented the best westerly estimate, and USProb showed the best easterly estimate. NESDIS_{2} and GSCAT did not present good results, especially in the westerly regime. GPROF presented lower rain rates than S-Pol for both regimes, but not as low as NESDIS_{2} and GSCAT.

Summarizing the results, we conclude that USProb is more compatible and efficient in retrieving surface rainfall over the region of interest on an instantaneous basis than the other algorithms tested, which is supported by the rain classification analysis and the correct repreasentation of Amazon rainfall east–west variability. Further studies are needed to compare USProb with the NESDIS, GSCAT, and GPROF algorithms at monthly scales, given that they achieved good monthly precipitation estimate results, as demonstrated by Adler et al. (2001).

## Acknowledgments

This research was supported by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo), Grant 03/10310-5 and was developed at the Departamento de Ciências Atmosférica, Instituto de Astronomia, Geofísica e Ciências Atmosféricas, Universidade de São Paulo.

## REFERENCES

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**.**

## Footnotes

*Corresponding author address:* Thiago S. Biscaro, DSA/CPTEC/INPE, Rod. Presidente Dutra km 40, Cachoeira Paulista, São Paulo 12630-000, Brazil. Email: biscaro@cptec.inpe.br