Abstract

This study compares two nonparametric rainfall data merging methods—the mean bias correction and double-kernel smoothing—with two geostatistical methods—kriging with external drift and Bayesian combination—for optimizing the hydrometeorological performance of a satellite-based precipitation product over a mesoscale tropical Andean watershed in Peru. The analysis is conducted using 11 years of daily time series from the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) research product (also TRMM 3B42) and 173 rain gauges from the national weather station network. The results are assessed using 1) a cross-validation procedure and 2) a catchment water balance analysis and hydrological modeling. It is found that the double-kernel smoothing method delivered the most consistent improvement over the original satellite product in both the cross-validation and hydrological evaluation. The mean bias correction also improved hydrological performance scores, particularly at the subbasin scale where the rain gauge density is higher. Given the spatial heterogeneity of the climate, the size of the modeled catchment, and the sparsity of data, it is concluded that nonparametric merging methods can perform as well as or better than more complex geostatistical methods, whose assumptions may not hold under the studied conditions. Based on these results, a systematic approach to the selection of a satellite–rain gauge data merging technique is proposed that is based on data characteristics. Finally, the underperformance of an ordinary kriging interpolation of the rain gauge data, compared to TMPA and other merged products, supports the use of satellite-based products over gridded rain gauge products that utilize sparse data for hydrological modeling at large scales.

1. Introduction

Hydrological studies rely on the quality of rainfall estimates to produce meaningful modeling output. Rain gauges can deliver accurate point measurements, but their poor ability to describe the spatial structure of rainfall can be a major limitation when precipitation fields are required, for example, in distributed hydrological modeling applications. This problem is more severe in tropical regions because of high rainfall variability and scarce data conditions.

Satellite-based estimates such as the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA; also TRMM 3B42) product are becoming increasingly attractive as an alternative source of forcing data in data-sparse regions, although their application can be limited by quantitative inaccuracies (Zulkafli et al. 2014a; Anagnostou et al. 2010; Tian et al. 2007). Information from a large number of rain gauges is already assimilated as part of global/regional satellite algorithms; nevertheless, the rain gauge databases sourced by these procedures can exclude more extensive national networks where data accessibility is restrictive, as often is the case in developing countries. In these regions, the global precipitation product may be found to be unsatisfactory and requiring local adjustment (e.g., Lavado-Casimiro et al. 2009).

Satellite algorithms are known to internally perform gauge correction, for example, using a mean-field bias correction (MBC) and/or inverse-error-weighted averaging methods (Huffman et al. 1997; Grimes et al. 1999). Postanalysis merging methods have also been used on the products of these algorithms, for example, the spatial adjustment of TMPA using interpolations by inverse distance weighting (Lavado-Casimiro et al. 2009), double-kernel smoothing (DS; Li and Shao 2010), and the nearest neighbor method (Vila et al. 2009); correction through regression analysis between the satellite- and rain gauge–based precipitation at various temporal scales, for example, climatologies in Almazroui (2011) and monthly in Yin et al. (2008); and correction using probability distributions (Anagnostou et al. 1999). Geostatistical methods have also been used such as the kriging with external drift (KED) to combine gauge and 10-day (dekad) IR-based precipitation data from Meteosat (Grimes et al. 1999) and the cokriging approach to interpolate daily rain gauge data with the GPCP multisatellite precipitation estimates as covariates (Kottek and Rubel 2008). More recently, Heidinger et al. (2012) performed a wavelet analysis on the signals from daily rain gauge and TMPA time series and reconstructed a combined product by overlaying short-term noise from the gauge signal on the long-term trends in the TMPA signal.

The methods applied to satellite data have origins in the radar research community, for example, MBC [references in Gjertsen et al. (2004)], spatial adjustment (Moore et al. 1989; Wood et al. 2000), KED (Haberlandt 2007; Erdin 2009; Schiemann et al. 2011; Goudenhoofdt and Delobbe 2009), and cokriging (Krajewski 1987). A particularly promising technique that has not been largely applied to satellite-based rainfall estimation is the Bayesian combination (BC) method proposed by Todini (2001). This method quantifies the estimation uncertainties of precipitation data measured by different sensors (e.g., satellite, radar, and ground rain gauges), then combines these data such that the overall estimation uncertainty formulated in terms of estimation error covariances is minimized. Promising results from this method have been obtained from both numerical examples (Mazzetti and Todini 2004) and case studies in a small-scale densely gauged urban catchment (Wang et al. 2013).

Despite the wealth of merging algorithms at our disposal, there is not a clear guideline as to which method would be optimal given the data characteristics (density, spatial bias, temporal resolution, etc.). In this study, we compare two nonparametric rainfall data merging methods—MBC and DS—with two geostatistical methods—KED and BC (untested for satellite applications)—over a mesoscale tropical Andean catchment in Peru. The analysis is conducted using 11 years of available data from the TMPA, version 7, research product and 173 rain gauges in the national weather station network of Peru. The results are assessed through a cross-validation procedure and supplemented by a hydrological evaluation using the catchment water balance and a hydrological model. The aim is to highlight the strengths and weaknesses of each of the methods and provide some guidelines for a preliminary selection of methods in the context of data-sparse hydrological applications.

2. Data and methods

a. Study area

This study focuses on a mesoscale river basin that spans the Peruvian Andes and the Amazon floodplains (Fig. 1). The Peruvian Amazon (locally Marañón) River basin extends from latitude 1° to 11°S and from longitude 80° to 74°W, covering about 360 000 km2. The altitude of the basin ranges from well above 4000 m MSL in the western boundary in the Andes to the Amazonian floodplains to the east.

Fig. 1.

The Marañón basin and three subbasins. The map includes the regional topography, the main river network, the discharge stations, and the rain gauges.

Fig. 1.

The Marañón basin and three subbasins. The map includes the regional topography, the main river network, the discharge stations, and the rain gauges.

The downstream limit of the hydrological basin is defined by the gauging station at San Regis (4.4°S, 74°W). Upstream, the river is gauged at three other stations: Borja, Santiago, and Chazuta. Farther downstream, the Marañón River merges with the Ucayali River to form the Amazon River just before reaching the city of Iquitos. The basin covers the Pacaya–Samiria National Reserve, which is the largest floodable forest reserve in Peru and is an area of high ecological significance.

A complex climatic pattern is present in the region because of the interaction between various synoptic meteorological processes and the orographic effect of the Andes. In general, rainfall is particularly high in regions close to the equator (more than 2000 mm yr−1) because of the passage of the intertropical convergence zone (ITCZ) and decreases toward the tropics (Espinoza Villar et al. 2009). Additionally, humid Atlantic trade winds flow east–west across the vast rain forest and form an intense orographic belt along the eastern Andes as they encounter the first slopes (Espinoza Villar et al. 2009; Garreaud et al. 2009; Bookhagen and Strecker 2008). They also deviate southward, transporting moist air toward the subtropics. In the highlands, the Andean landscape causes shielding and moisture intensification resulting in various microclimates (Buytaert et al. 2006a; Rollenbeck and Bendix 2011).

b. Precipitation data

The satellite-based product is obtained from the TMPA product, in its latest available version 7, released in November 2012. The TMPA product combines observations from multiple satellites, including TRMM and other geostationary satellites (Huffman and Bolvin 2013). The performance of this dataset in this study area has been previously assessed by Zulkafli et al. (2014a). Precipitation estimates from 1998 to 2008 were obtained with a 3-h temporal resolution and 0.25° × 0.25° (approximately 27.8 km × 27.8 km) spatial resolution. Values were subsequently aggregated to the daily scale in order to match the temporal resolution of the rain gauge records.

The rain gauge dataset is the historical rainfall time series obtained from the Peruvian National Meteorological and Hydrological Service [Servicio Nacional de Meteorología e Hidrología del Perú (SENAMHI)]. This includes daily rainfall amounts in millimeters for 11 years between January 1998 and December 2008 from 173 recording stations located within the satellite domain. Their locations are shown in Fig. 1. The area covered by the rain gauges is about 1 100 000 km2, which translates to an average network density of less than one rain gauge per 5000 km2. The gauges are particularly clustered around towns and along rivers, providing higher densities in regions of easier access. The highlands are also better represented compared to the lowlands, as the average rain gauge altitude of 1560 m MSL indicates (Table 1).

Table 1.

Key characteristics of the rain gauge dataset. Values are indicative of the available period (1998–2008).

Key characteristics of the rain gauge dataset. Values are indicative of the available period (1998–2008).
Key characteristics of the rain gauge dataset. Values are indicative of the available period (1998–2008).

c. Merging methods

The methods compared are 1) MBC, 2) DS, 3) ordinary kriging (OK; rain gauge interpolation only), 4) KED, and 5) BC.

1) Mean bias correction

This method corrects the satellite-based product by the total multiplicative bias between the rain gauge estimates and the collocated satellite values, thus assuming a uniform bias over the spatial domain. For each daily event (i.e., each time step), a correction factor B is thus calculated using the following equation:

 
formula

where N is the number of available gauges inside the satellite domain, and and are the gauge and satellite daily rainfall values corresponding to gauged location j.

2) Double-kernel smoothing

The nonparametric double-kernel smoothing is a technique developed by Li and Shao (2010) specifically for data-sparse applications. The main idea of this method is the interpolation of the point residuals using the kernel density function and the adjustment of the satellite field by the estimated residual field. The point residual at the given gauged location j = 1, … , N is defined to be

 
formula

A first-level interpolation is performed to generate a full set of residuals (pseudoresiduals) on the same grid as the satellite data. At the given gridpoint location i = 1, … , M, the pseudoresidual is defined to be:

 
formula

where is the Euclidean norm and is the Kernel function defined as a Gaussian kernel following Li and Shao (2010):

 
formula

The variable H is the position of the points, and the bandwidth b is determined using Silverman’s rule of thumb (Silverman 1998):

 
formula

where n is the number of samples and is the standard deviation of samples. A second-level interpolation is applied on both the residuals and pseudoresiduals to estimate the final error field :

 
formula

and the merged product at point k is calculated by subtracting to the satellite estimate the corresponding error :

 
formula

The kernel smoothing (interpolation) of the residuals does not rely on the stationary assumption, as is the case for geostatistical methods. The formulation is such that the product of the merging will converge toward the rain gauge estimates with decreased distance toward the ground observations.

3) Ordinary kriging

Among kriging estimation methods, the ordinary kriging provides the best unbiased linear estimates of point values or block averages (where “best” means minimum variance). The basic assumption of the OK is that data values can be characterized using a stationary random variable with an unknown mean (Müller 2007). Here, the OK is used to produce the rainfall estimates at each satellite grid location through the interpolation of discrete (point or grid averaged) rain gauge measurements. These spatially interpolated rain gauge estimates, along with the original satellite product, then serve as the baseline for comparison with the merged products.

Ordinary kriging uses the (semi) variogram to characterize the spatial association of data values at different locations. The definition of the (isotropic) semivariogram is given as follows:

 
formula

where is the semivariance of the random variable Z between two points with distance h. Note that does not depend on the position x, but it is only a function of distance h, thus assuming stationarity of the variance of the differences separated by the same distance. In this case, we also assume the isotropy of the model, since is not function of the direction. The variogram can be empirically estimated from available observations by computing the semivariance for several classes of distances between gauges. This is known as experimental or empirical variogram.

Once the experimental variogram is estimated, it is possible to fit a theoretical variogram model to it. In the literature there are a variety of different models: see de Marsily (1986) for a complete list. In this study, the exponential model was found to produce the best fit:

 
formula

where a denotes the range, the distance after which the variogram reaches its asymptotic value, the sill , so that the correlation between farther apart gauges is approximately equal to zero. A nugget effect can be added to Eq. (9) to represent any residual variance not explained by distance at an infinitesimally small separation, due to measurement and epistemic errors.

The unknown precipitation value at location is estimated as a linear combination of the precipitation values at known points , weighted by the semivariogram function :

 
formula

The only additional constraint is that the weights have to add up to unity:

 
formula

This condition is necessary to guarantee an unbiased estimator (de Marsily 1986). Equation (10), subjected to Eq. (11), is solved by minimizing the variance of the estimation errors , which are assumed to have a Gaussian distribution. The resulting OK equation is thus the following:

 
formula

where is the Lagrange multiplier used to fulfill the unbiased condition [Eq. (11)]. Equation (12) can be rewritten in the matrix form , so that the unknown x represents the set of weights in Eq. (10), is the semivariogram matrix with a term for each pairs of gauges, and b is the semivariogram vector between all gauges and the prediction point :

 
formula

where denotes , the semivariogram value between two known points i and j.

The way the equation is set up is such that the solution may include negative weights, which can distort the final estimated values. Hence, an a posteriori correction to the weights is performed according to the algorithm proposed by Deutsch (1996).

4) Kriging with external drift

KED is an extension of the OK whereby nonstationarity is assumed and represented by the drift term, which in our case is the satellite-based estimates.

The method implemented is as described in Hewitt (2012). KED requires the satellite value at the prediction location to be equal to the weighted average of the satellite values at gauged locations :

 
formula

This additional condition is included in the error minimization equation [Eq. (12)] to produce the following:

 
formula

In matrix format, this equation is

 
formula

where denotes , the satellite-based estimation at location i. For the interpolation, and are used solely for their covariance for weighting values of ; the actual values of do not feed into the final estimates.

5) Bayesian combination

The theory underpinning the Bayesian combination method is described in Todini (2001) and PROGEA Srl (2009) and summarized here. This method first uses the block kriging method to generate interpolated rain gauge rainfall estimates at each satellite grid location and to estimate the associated estimation error covariance (i.e., estimation uncertainty).

After kriging interpolation, the Kalman filter (Kalman 1960) is then employed to merge these estimates with the coincidental satellite rainfall product by comparing the uncertainty of these two data sources. Here the merged product becomes the a posteriori estimate , which is defined as

 
formula

where is the mean of the satellite–rain gauge bias [i.e., E(ZGZS)] and is the Kalman gain. The Kalman gain represents the relative measure of uncertainty and is defined as follows:

 
formula

The uncertainty associated with the satellite estimates is represented by the covariance matrix of the satellite error field (i.e., ), while the uncertainty associated with the rain gauge rainfall estimates is the error variance produced in the kriging interpolation. The variable increases (decreases) as is larger (smaller) compared to , and consequently, the filter will weigh more in favor of the rain gauge (satellite) estimates in the combination.

The Bayesian combination method was initially implemented using the commercial software RAINMUSIC (PROGEA Srl 2009). As the software is not open source, the default settings of the method cannot be changed, for example, the theoretical variogram model for the kriging interpolation has to be Gaussian. A MATLAB script was thus implemented to execute the Bayesian combination with several changes:

  1. Instead of the Gaussian variogram model, the exponential variogram model was found to provide a better fit to the experimental variogram for the kriging interpolation.

  2. Instead of using the block kriging method, the OK method was used to interpolate grid-averaged rain gauge data. The main difference between the ordinary and block kriging methods lies in the way variograms are calculated from data. The former calculates the variance between two points, while the latter calculates the variance between point and block (which is the average of the point-to-point variances).

To distinguish the two different implementations, the RAINMUSIC Bayesian combination will be subscripted as BCR, whereas the replicate (MATLAB) model is annotated as BC.

d. Meteorological evaluation

Since we do not know the true rainfall value at a given grid point, we employed collocated rain gauge observations as a first approximation in order to assess the accuracy of the various merged precipitation products. The reader should nonetheless be aware of the scale mismatch that exists between rain gauge point measurements and 0.25° satellite cells (approximately 773 km2 in TMPA). However, in our context of a comparative evaluation, the pixel-to-point evaluation is implemented in the same fashion across all four merging methods and therefore is not expected to affect the relative performance.

The collocation of a rain gauge to a satellite grid cell was performed using nearest neighbor approximation, such that rain gauges are associated to the centers of the nearest grid cells. If multiple gauges are present over the same cell, their estimates are averaged to produce a presumably more representative value.

The satellite–gauge data merging was tested in a 10-fold cross-validation scheme. The cross validation was performed at the daily time step for 141 days of rain, during which more than 75% of the rain gauges recorded a rain event.

The goodness-of-fit between the merged products and the corresponding pixel-average rain gauge values was evaluated in terms of the mean absolute error MAE, mean error ME, root-mean-square error RMSE, Nash–Sutcliffe efficiency NSE, and percent bias (relative bias in percent).

An analysis of error components such as false alarms and missed events is not included. This would require a detailed discussion of each of the merging techniques on those performance indices, which is beyond the scope of this study. From a hydrological modeling perspective, this should not represent an issue given the large scale of the studied basins. Both false alarms and missed events are typically dominated by low-intensity or very localized events (covering part of the pixel but missed by the rain gauge or vice versa) which have only a limited contribution to streamflow. If they indeed have a hydrological impact (e.g., on the water balance and flow magnitude), then this should be captured in the performance metrics included in the analysis. However, for smaller-scale applications, such error components might have an impact. The reader is referred to Maggioni et al. (2013) for a related analysis.

e. Hydrological evaluation

A first-level hydrological assessment of the data quality was conducted using the catchment water balance with the runoff ratio RR defined as the ratio of the precipitation that contributes to runoff:

 
formula

The RR values calculated using the different outputs from the merging were compared to known values from the literature (Campling et al. 2002; Buytaert et al. 2006b; Rollenbeck and Anhuf 2007).

The merged precipitation products were subsequently used to drive a hydrological model. The Joint UK Land Environment Simulator (JULES; Best et al. 2011), which is a physics-based community model derived from the Met Office land surface scheme, was parameterized over 2040 grid boxes of 0.125° × 0.125° (approximately 14 km × 14 km) in the catchment using best-available land cover and soil data to generate runoff. The runoff was then routed offline along the river network using a delay function to simulate streamflows at various points in the river basin. The river network was developed using hydrographic data (90-m resolution) from the Hydrological Data and Maps Based on Shuttle Elevation Derivatives at Multiple Scales (HydroSHEDS; Lehner et al. 2008). Model optimization was only used to obtain the routing parameters (flood wave celerities); any inconsistent model compensation of the precipitation errors across the merging products is not expected. For further description of the model, the reader is referred to Zulkafli et al. (2013).

The impact of merged precipitation forcing data on streamflow simulations is assessed by comparison with observed daily discharges. The evaluation is done at four streamflow gauging stations, identified in Fig. 1, over an 11-yr period (1998–2008). Streamflow data were obtained through Geodynamical, Hydrological, and Biogeochemical Control of Erosion/Alteration and Material Transport in the Amazon Basin (HYBAM) from SENAMHI and the Instituto Nacional de Meteorología e Hidrología, Ecuador (INAMHI), monitoring networks. The goodness-of-fit was evaluated in terms of the percent bias (relative bias in percent), correlation, RMSE, and NSE at the daily scale.

f. Software

The merging algorithms in this study were originally written in MATLAB. Subsequently, these have been rewritten in R and made publicly available on GitHub (Zulkafli et al. 2014b).

3. Results

a. Spatial variability of precipitation

A spatial analysis of the mean annual precipitation across the merged products is presented in Fig. 2. The figure shows MBC and DS retaining a high degree of information from TMPA, whereas KED and BCR resemble more closely the OK field. These patterns from MBC, DS, and KED are reasonable, given that the MBC and DS methods consider the entire satellite rainfall field information in the adjustment process, while KED only uses the satellite information at gauging locations as an external drift term. However, the pattern produced by BCR is inconsistent with expectations and thus calls for further analysis.

Fig. 2.

Annual mean precipitation of the TMPA, OK, and merged products (1998–2008). The black line delimits the San Regis basin and the silver lines are country borders. The black dots indicate the locations of gauges used in the data merging. The Andean range follows the western boundary of the basin.

Fig. 2.

Annual mean precipitation of the TMPA, OK, and merged products (1998–2008). The black line delimits the San Regis basin and the silver lines are country borders. The black dots indicate the locations of gauges used in the data merging. The Andean range follows the western boundary of the basin.

The BC method, in theory, is expected to provide an estimate in between the TMPA and gauge estimates depending on the degree of reliability in either data. However, the resulting BCR field suggests that a significantly high weight was given to the block kriged estimates (i.e., measurements), which could be attributed to the (estimation) error covariance of the block kriged rain gauge estimates being much smaller than that of the TMPA rainfall estimates. Through the comparison of the experimental variograms and the associated theoretical variogram models fitted by the RAINMUSIC software (not shown here), it was found that the theoretical variograms consistently underestimated the sill and/or overestimated the range, and as such, underestimated the error covariance of block kriging estimates. This relation between theoretical variogram fitting and (estimation) error covariance can be demonstrated through a sensitivity analysis (see Fig. 3). It can be observed that the faulty estimation of variogram parameters could result in extensive areas of low error variance (Fig. 3b), and consequently, the preferential use of gauge estimates versus satellite estimates in ungauged areas of the field (Fig. 3c). The reason for this poor fit by RAINMUSIC could be due to the use of point-to-block variogram estimation, which may oversmooth the spatial variability of rain gauge measurements over the satellite’s coarse grids. Caution is therefore required for satellite applications of this method, as the spatial resolution of existing satellite data is still limited. The replicate BC algorithm, which used an OK instead of block kriging for the rain gauge interpolation, yields substantially better results, as are evidenced by a better retention of the a priori spatial fields by the merged product (Fig. 2g). Based on this, the BCR estimates are excluded from further analysis.

Fig. 3.

The sensitivity analysis of BC’s variogram parameters performed on 10 Jan 1998. The figure shows (a) three different variogram parameterizations: (left) good model fit, (middle) overestimated range, and (right) underestimated sill; (b) the corresponding error variance field; and (c) merged products in comparison to the original (d) TMPA and BK fields.

Fig. 3.

The sensitivity analysis of BC’s variogram parameters performed on 10 Jan 1998. The figure shows (a) three different variogram parameterizations: (left) good model fit, (middle) overestimated range, and (right) underestimated sill; (b) the corresponding error variance field; and (c) merged products in comparison to the original (d) TMPA and BK fields.

Meanwhile, MBC and DS produced slightly varying results, and this can be better explained using a map of the residuals between TMPA and the merged products (Fig. 4). The OK residual field (Fig. 4a) highlights that TMPA estimates are closer to the OK values in the mountains than they are in the lowlands. MBC (Fig. 4b) averages the bias over the entire spatial domain, effectively reducing the intensity of the gauge correction to the north and the east, as is shown by the distinctly smooth residual field. On the other hand, with DS, the kernel effect can be observed (Fig. 4c). Strong positive residuals are observed in the northwest and likewise negative residuals in the northeast that are an extension of the residuals calculated from the nearest group of gauges.

Fig. 4.

Annual mean residual between merged products (including OK) and TMPA (1998–2008). The black line delimits the San Regis basin and the silver lines are country borders. The black dots indicate the locations of gauges used in the data merging. The Andean range follows the western boundary of the basin.

Fig. 4.

Annual mean residual between merged products (including OK) and TMPA (1998–2008). The black line delimits the San Regis basin and the silver lines are country borders. The black dots indicate the locations of gauges used in the data merging. The Andean range follows the western boundary of the basin.

b. Cross validation against reference rain gauge estimates

The summary of performance scores (Table 2) indicated improvements in the merged product when compared to TMPA and OK data. Values of MAE, RMSE, ME, and NSE were found to improve in all of the methods, and the KED provided the best scores, followed by DS and BC.

Table 2.

Performance scores from the cross validation of the merging analysis calculated over 141 wet days between 1998 and 2008. The percent indicates a score relative to the TMPA score. The average precipitation intensity at the rain gauges is 10.3 mm day−1. The best scores are in boldface.

Performance scores from the cross validation of the merging analysis calculated over 141 wet days between 1998 and 2008. The percent indicates a score relative to the TMPA score. The average precipitation intensity at the rain gauges is 10.3 mm day−1. The best scores are in boldface.
Performance scores from the cross validation of the merging analysis calculated over 141 wet days between 1998 and 2008. The percent indicates a score relative to the TMPA score. The average precipitation intensity at the rain gauges is 10.3 mm day−1. The best scores are in boldface.

The spatial distribution of these scores across the study area reveals some insights into the relative performances between the merging methods. Figure 5 shows the percent bias, which is the ME normalized by the station average daily rainfall. MBC retains much of TMPA’s negative bias whereas DS improves along the Andes. This suggests that the latter has a better capability for local correction, which is highly important for such a complex environment as the Andes. To a lesser extent, both methods also achieve correction in the lowlands, where the original TMPA is expected to already perform well. Given OK’s tendency to converge to rain gauge measurements and that the meteorological evaluation takes rain gauge values as reference, OK performs well in general but in particular in the lowlands, and KED follows the same spatial pattern. This is also consistent with the expectation that spatial correlation is higher in the lowlands than it is in the mountains. Moreover, OK and KED perform better in highly sampled regions, which is discernible in Fig. 6 in terms of higher modeling efficiency values at the centers of rain gauge clusters. Finally, this figure further suggests that the best merging products overall are achieved using OK, KED, and DS. This is in spite of OK and KED’s poor representation of the spatial patterns of the annual mean precipitation over the entire merging field. This highlights another limitation of the meteorological cross-validation framework when applied to large areas with an uneven rain gauge density distribution.

Fig. 5.

The percent bias (%) between merging product and rain gauge time series at all cross-validation points.

Fig. 5.

The percent bias (%) between merging product and rain gauge time series at all cross-validation points.

Fig. 6.

The NSE between merging product and rain gauge time series at all cross-validation points.

Fig. 6.

The NSE between merging product and rain gauge time series at all cross-validation points.

c. Hydrological evaluation: Impact on the water balance and hydrological modeling at basin and subbasin scales

Table 3 presents a hydrological evaluation of the merging algorithms and shows several merging algorithms having a positive impact on the water balance. The most prominent improvements are seen with the DS, MBC, and (to a lesser degree) the KED methods in the upper Andean basins Borja and Santiago, where the runoff ratios are closer to the reference runoff ratio value of 0.7. In San Regis, the DS and MBC are similarly successful in reducing the water balance errors whereas KED increases them. In Chazuta, improvements are gained with the BC and DS methods, although the highest improvement in this subbasin is achieved with MBCS, which is an additional analysis that was run by excluding the rain gauges located outside the subbasin. In contrast, OK exacerbates the errors in all basins compared to TMPA.

Table 3.

Spatial trends in the average runoff ratios calculated using observed streamflows and precipitation from gauge-adjusted TMPA. The expected value for humid tropical regions is between 0.6 and 0.7.

Spatial trends in the average runoff ratios calculated using observed streamflows and precipitation from gauge-adjusted TMPA. The expected value for humid tropical regions is between 0.6 and 0.7.
Spatial trends in the average runoff ratios calculated using observed streamflows and precipitation from gauge-adjusted TMPA. The expected value for humid tropical regions is between 0.6 and 0.7.

The hydrological modeling results are summarized in terms of the performance scores across the basins. Figure 7 shows RMSE and relative bias reductions as well as NSE increases (compared to TMPA) by MBC and DS, and additionally by BC in Chazuta subbasin. In contrast, OK, KED, and BC (except in Chazuta) show generally poorer scores for all four indices—percent bias, RMSE, correlation, and NSE—compared to TMPA. In Borja and Santiago subbasins, the DS method also resulted in above zero model efficiency. Over the entire modeled time series, the NSE improvements range from 0.1 to 0.5, with the higher increases achieved at Santiago and Borja.

Fig. 7.

Performance scores of daily streamflow using TMPA vs TMPA-gauge-adjusted products.

Fig. 7.

Performance scores of daily streamflow using TMPA vs TMPA-gauge-adjusted products.

4. Discussion and conclusions

The results of this case study suggest that the relative performances of the rainfall merging methods are strongly influenced by the rain gauge density over the domain of analysis. Overall, the DS method followed by the MBC method delivered the most consistent improvement over the satellite product in both cross-validation and hydrological verification performance scores. Although the geostatistical methods, that is, KED and BC, did not result in a good hydrological performance, they nonetheless performed well in the cross validation and showed the potential to produce valuable merged rainfall estimates if employed in sufficiently gauged areas. Finally, the underperformance of methods such as OK and KED in the hydrological evaluation, which either fully or heavily rely on rain gauge precipitation, supports the use of satellite-based products over gridded rain gauge products that utilize sparse data for hydrological modeling at large scales.

Guided by these observations, we propose in Fig. 8 a guideline for the selection of a merging method given data characteristics and constraint and discuss this in the following.

Fig. 8.

Decision flowchart for merging methods selection based on data characteristics.

Fig. 8.

Decision flowchart for merging methods selection based on data characteristics.

The rain gauge network density in the studied basin should be considered when choosing the optimal merging method. In relatively well gauged basins (less than 250 km2 per gauge), the interpolation of gauge data only is believed to be sufficiently reliable for water resources applications (World Meteorological Organization 1994; Chappell et al. 2013), and simpler kriging methods such OK are regarded as accurate interpolation techniques for average daily rainfall (Collischonn et al. 2008; Buytaert et al. 2006a). However, large-scale applications might violate the assumption of stationarity (a constant mean) in OK because of the nature of the spatial variability of precipitation. KED provides a solution to this by introducing the external drift as observed by the satellite measurements.

In moderately gauged basins (between 250 and 1500 km2 per gauge), merging methods based on geostatistics such as BC or KED provide valuable rainfall estimates, as demonstrated in the Chazuta subbasin. However, geostatistical methods may still be affected by a number of practical limitations and fundamental assumptions. For example, the Gaussian distribution assumption made during the variance minimization in the weights estimation does not particularly hold for applications at fine temporal scale, as daily precipitation data are typically positively skewed, unlike monthly accumulations, which are more normally distributed. Ways around this problem are widely discussed in the literature, for example, through data transformation methods (e.g., Müller 2007).

In poorly gauged basins (more than 1500 km2 per gauge), geostatistical merging methods have delivered unsatisfactory results because of a strong reliance on rain gauge observations over valuable satellite information in ungauged areas. A possible cause could be the assumption of a known semivariogram common to all kriging estimators. Since the true semivariogram is unknown in reality, the experimental semivariogram is used instead in order to estimate the semivariogram parameters from the data. As far as rainfall fields are concerned, nonzero precipitation events tend to have a spatial footprint that negatively correlates to the intensity of the event (e.g., small-scale convective vs large-scale stratiform precipitation). This should be captured in the experimental semivariogram, but in poorly gauged basins, distances between rain gauges are often too large and spatial correlation can be overestimated.

Instead, nonparametric merging methods DS and MBC have proved to be more robust to data scarcity. The success of the DS method reflects the spatial heterogeneity of the climate and the size of the model field and supports the idea that distant gauges should have very little bearing on and should be given minimal weight in the estimation of unknown points simply based on their distance alone, particularly in very complex landscapes such as the Andean Amazon. A local correction is best achieved by DS that substantially weights surrounding observed residuals by distance. MBC is attractive for its simplicity, but the assumption of a uniform multiplicative bias is a very narrow assumption given the complexity of the climate and the size of the domain. This is reflected by the improvements observed by applying the method to a smaller scale (with MBCS in Chazuta subbasin), which in essence is a move toward a more local bias correction. Extended methods such as range-dependent mean bias correction have also been implemented in radar applications (e.g., Amitai et al. 2002) that can be adopted. The range threshold may further be made adaptable to the dominant precipitation processes (large-scale versus convective) in the area of the rain gauges.

This decision flowchart is intended to be a preliminary, nondefinitive guide to the selection of an optimal method given the data characteristics. As it was developed based on experiences from a region characterized with highly heterogeneous topography and precipitation processes, we expect this flowchart to hold for many other regions, at least for humid tropical and (sub)tropical mountain regions. Alternatively, an ensemble modeling approach can be considered that utilizes all of the merging inputs, whereby the model parameters can be weighted using Bayesian model averaging based on each model’s reproduction of observed streamflow (Jiang et al. 2012; Strauch et al. 2012).

Acknowledgments

Z.Z. and W.B. were funded by UK NERC Grant NE/I0040117/1 and NE-K010239-1. The authors thank Dr. Cinzia Mazzetti and Prof. Ezio Todini (University of Bologna) for making freely available to us the RAINMUSIC software package for meteorological data processing.

REFERENCES

REFERENCES
Almazroui
,
M.
,
2011
:
Calibration of TRMM rainfall climatology over Saudi Arabia during 1998–2009
.
Atmos. Res.
,
99
,
400
414
, doi:.
Amitai
,
E.
,
D. B.
Wolff
,
D. A.
Marks
, and
D. S.
Silberstein
,
2002
: Radar rainfall estimation: Lessons learned from the NASA/TRMM validation program. Proc. Second European Conf. on Radar Meteorology (ERAD), Delft, Netherlands, Delft University of Technology,
255
260
.
Anagnostou
,
E. N.
,
A. J.
Negri
, and
R. F.
Adler
,
1999
:
Statistical adjustment of satellite microwave monthly rainfall estimates over Amazonia
.
J. Appl. Meteor.
,
38
,
1590
1598
, doi:.
Anagnostou
,
E. N.
,
V.
Maggioni
,
E. I.
Nikolopoulos
,
T.
Meskele
,
F.
Hossain
, and
A.
Papadopoulos
,
2010
:
Benchmarking high-resolution global satellite rainfall products to radar and rain-gauge rainfall estimates
. IEEE Trans. Geosci. Remote Sens.,
48
,
1667
1683
, doi:.
Best
,
M. J.
, and Coauthors
,
2011
:
The Joint UK Land Environment Simulator (JULES), model description—Part 1: Energy and water fluxes
.
Geosci. Model Dev.
,
4
,
677
699
, doi:.
Bookhagen
,
B.
, and
M. R.
Strecker
,
2008
: Orographic barriers, high-resolution TRMM rainfall, and relief variations along the eastern Andes. Geophys. Res. Lett., 35, L06403, doi:.
Buytaert
,
W.
,
R.
Célleri
,
P.
Willems
,
B.
De Bièvre
, and
G.
Wyseure
,
2006a
: Spatial and temporal rainfall variability in mountainous areas: A case study from the south Ecuadorian Andes. J. Hydro., 329, 413–421, doi:.
Buytaert
,
W.
,
V.
Iniguezz
,
R.
Célleri
,
B.
De Bièvre
,
G.
Wyseure
, and
J.
Deckers
,
2006b
: Analysis of the water balance of small Páramo catchments in south Ecuador. Environmental Role of Wetlands in Headwaters, NATO Science Series: IV, Vol. 63, Kluwer, 271–281, doi:.
Campling
,
P.
,
A.
Gobin
,
K.
Beven
, and
J.
Feyen
,
2002
: Rainfall–runoff modelling of a humid tropical catchment: The TOPMODEL approach. Hydrol. Processes, 16, 231–253, doi:.
Chappell
,
A.
,
L. J.
Renzullo
,
T. H.
Raupach
, and
M.
Haylock
,
2013
:
Evaluating geostatistical methods of blending satellite and gauge data to estimate near real-time daily rainfall for Australia
.
J. Hydrol.
,
493
,
105
114
, doi:.
Collischonn
,
B.
,
W.
Collischonn
, and
C. E. M.
Tucci
,
2008
:
Daily hydrological modeling in the Amazon basin using TRMM rainfall estimates
.
J. Hydrol.
,
360
,
207
216
, doi:.
de Marsily
,
G.
,
1986
: Quantitative Hydrogeology: Groundwater Hydrology for Engineers. Academic Press, 440 pp.
Deutsch
,
C. V.
,
1996
:
Correcting for negative weights in ordinary kriging
.
Comput. Geosci.
,
22
,
765
773
, doi:.
Erdin
,
R.
,
2009
: Combining rain gauge and radar measurements of a heavy precipitation event over Switzerland: Comparison of geostatistical methods and investigation of important influencing factors. Ph.D. thesis, ETH Zurich, 108 pp.
Espinoza Villar
,
J. C.
, and Coauthors
,
2009
:
Spatio-temporal rainfall variability in the Amazon basin countries (Brazil, Peru, Bolivia, Colombia, and Ecuador)
.
Int. J. Climatol.
,
29
,
1574
1594
, doi:.
Garreaud
,
R. D.
,
M.
Vuille
,
R.
Compagnucci
, and
J.
Marengo
,
2009
:
Present-day South American climate
.
Palaeogeogr. Palaeoclimatol. Palaeoecol.
,
281
,
180
195
, doi:.
Gjertsen
,
U.
,
M.
Šálek
, and
D.
Michelson
,
2004
: Gauge adjustment of radar-based precipitation estimates in Europe. Proc. Third European Conf. on Radar Meteorology and Hydrology, Visby, Sweden, SMHI, ERAD, 7–11. [Available online at www.copernicus.org/erad/2004/online/ERAD04_P_7.pdf.]
Goudenhoofdt
,
E.
, and
L.
Delobbe
,
2009
:
Evaluation of radar-gauge merging methods for quantitative precipitation estimates
.
Hydrol. Earth Syst. Sci.
,
13
,
195
203
, doi:.
Grimes
,
D.
,
E.
Pardo-Iguzquiza
, and
R.
Bonifacio
,
1999
:
Optimal areal rainfall estimation using raingauges and satellite data
.
J. Hydrol.
,
222
,
93
108
, doi:.
Haberlandt
,
U.
,
2007
:
Geostatistical interpolation of hourly precipitation from rain gauges and radar for a large-scale extreme rainfall event
.
J. Hydrol.
,
332
,
144
157
, doi:.
Heidinger
,
H.
,
C.
Yarlequé
,
A.
Posadas
, and
R.
Quiroz
,
2012
:
TRMM rainfall correction over the Andean Plateau using wavelet multi-resolution analysis
.
Int. J. Remote Sens.
,
33
,
4583
4602
, doi:.
Hewitt
,
A. J.
,
2012
: Comparative assessment of gauge–radar merging techniques. Tech. Rep., Met Office, 9–19.
Huffman
,
G. J.
, and
D. T.
Bolvin
,
2013
: TRMM and other data precipitation data set documentation. Accessed 27 November 2013. [Available online at ftp://precip.gsfc.nasa.gov/pub/trmmdocs/3B42_3B43_doc.pdf.]
Huffman
,
G. J.
, and Coauthors
,
1997
:
The Global Precipitation Climatology Project (GPCP) combined precipitation dataset
.
Bull. Amer. Meteor. Soc.
,
78
,
5
20
, doi:.
Jiang
,
S.
,
L.
Ren
,
Y.
Hong
,
B.
Yong
,
X.
Yang
,
F.
Yuan
, and
M.
Ma
,
2012
:
Comprehensive evaluation of multi-satellite precipitation products with a dense rain gauge network and optimally merging their simulated hydrological flows using the Bayesian model averaging method
.
J. Hydrol.
,
452–453
,
213
225
, doi:.
Kalman
,
R. E.
,
1960
:
A new approach to linear filtering and prediction problems
.
J. Basic Eng.
,
82
(
1
),
35
45
.
Kottek
,
M.
, and
F.
Rubel
,
2008
: Co-kriging global daily rain gauge and satellite data. Geophysical Research Abstracts, Vol. 10, Abstract EGU2008-A-03133. [Available online at http://meetings.copernicus.org/www.cosis.net/abstracts/EGU2008/03133/EGU2008-A-03133.pdf.]
Krajewski
,
W. F.
,
1987
: Cokriging radar-rainfall and rain gage data. J. Geophys. Res., 92, 9571–9580, doi:.
Lavado-Casimiro
,
W.
,
D.
Labat
,
J. L.
Guyot
,
J.
Ronchail
, and
J. J.
Ordonez
,
2009
: TRMM rainfall data estimation over the Peruvian Amazon–Andes basin and its assimilation into a monthly water balance model. IAHS Publ., 333, 245–252.
Lehner
,
B.
,
K.
Verdin
, and
A.
Jarvis
,
2008
:
New global hydrography derived from spaceborne elevation data
.
Eos, Trans. Amer. Geophys. Union
,
89
,
93
94
, doi:.
Li
,
M.
, and
Q.
Shao
,
2010
:
An improved statistical approach to merge satellite rainfall estimates and raingauge data
.
J. Hydrol.
,
385
,
51
64
, doi:.
Maggioni
,
V.
,
H. J.
Vergara
,
E. N.
Anagnostou
,
J. J.
Gourley
,
Y.
Hong
, and
D.
Stampoulis
,
2013
:
Investigating the applicability of error correction ensembles of satellite rainfall products in river flow simulations
.
J. Hydrometeor.
,
14
,
1194
1211
, doi:.
Mazzetti
,
C.
, and
E.
Todini
,
2004
: Combining raingages and radar precipitation measurements using a Bayesian approach. geoENV IVGeostatistics for Environmental Applications, X. Sanchez-Vila, J. Carrera, and J. J. Gómez-Hernández, Eds., Quantitative Geology and Geostatistics, Vol. 13, Springer, 401–412, doi:.
Moore
,
R.
,
B.
Watson
,
D.
Jones
, and
K.
Black
,
1989
: London weather radar local calibration study: Final report. Rep. for the National Rivers Authority, Thames Region, NERC/Institute of Hydrology, Wallingford, United Kingdom, 85 pp.
Müller
,
H.
,
2007
: Bayesian transgaussian kriging. Proc. 15th European Young Statisticians Meeting, Castro Urdiales, Spain, Bernoulli Society, 5 pp.
PROGEA Srl
,
2009
: RAINMUSIC: Multi-sensor Bayesian combinations. PROGEA Srl, Bologna, Italy, 119 pp. [Available online at ftp://ftp2.progea.net/Web/RAINMUSIC_TechDoc_ENG_2012.pdf.]
Rollenbeck
,
R.
, and
D.
Anhuf
,
2007
:
Characteristics of the water and energy balance in an Amazonian lowland rainforest in Venezuela and the impact of the ENSO-cycle
.
J. Hydrol.
,
337
,
377
390
, doi:.
Rollenbeck
,
R.
, and
J.
Bendix
,
2011
:
Rainfall distribution in the Andes of southern Ecuador derived from blending weather radar data and meteorological field observations
.
Atmos. Res.
,
99
,
277
289
, doi:.
Schiemann
,
R.
,
R.
Erdin
,
M.
Willi
,
C.
Frei
,
M.
Berenguer
, and
D.
Sempere-Torres
,
2011
:
Geostatistical radar–raingauge combination with nonparametric correlograms: Methodological considerations and application in Switzerland
.
Hydrol. Earth Syst. Sci.
,
15
,
1515
1536
, doi:.
Silverman
,
B.
,
1998
: Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability, Vol 26, Chapman & Hall, 175 pp.
Strauch
,
M.
,
C.
Bernhofer
,
S.
Koide
,
M.
Volk
,
C.
Lorz
, and
F.
Makeschin
,
2012
:
Using precipitation data ensemble for uncertainty analysis in SWAT streamflow simulation
.
J. Hydrol.
,
414–415
,
413
424
, doi:.
Tian
,
Y.
,
C. D.
Peters-Lidard
,
B. J.
Choudhury
, and
M.
Garcia
,
2007
:
Multitemporal analysis of TRMM-based satellite precipitation products for land data assimilation applications
.
J. Hydrometeor.
,
8
,
1165
1183
, doi:.
Todini
,
E.
,
2001
:
A Bayesian technique for conditioning radar precipitation estimates to rain-gauge measurements
.
Hydrol. Earth Syst. Sci.
,
5
,
187
199
, doi:.
Vila
,
D. A.
,
L. G. G.
de Goncalves
,
D. L.
Toll
, and
J. R.
Rozante
,
2009
:
Statistical evaluation of combined daily gauge observations and rainfall satellite estimates over continental South America
.
J. Hydrometeor.
,
10
,
533
543
, doi:.
Wang
,
L.-P.
,
S.
Ochoa-Rodríguez
,
N. E.
Simões
,
C.
Onof
, and
C.
Maksimović
,
2013
:
Radar–raingauge data combination techniques: A revision and analysis of their suitability for urban hydrology
.
Water Sci. Technol.
,
68
,
737
747
, doi:.
Wood
,
S.
,
D.
Jones
, and
R.
Moore
,
2000
:
Static and dynamic calibration of radar data for hydrological use
.
Hydrol. Earth Syst. Sci.
,
4
,
545
554
, doi:.
World Meteorological Organization
,
1994
: Guide to hydrological practices: Data acquisition and processing, analysis, forecasting and other applications. WMO 168, 5th ed., 735 pp. [Available online at www-cluster.bom.gov.au/hydro/wr/wmo/guide_to_hydrological_practices/WMOENG.pdf.]
Yin
,
Z.-Y.
,
X.
Zhang
,
X.
Liu
,
M.
Colella
, and
X.
Chen
,
2008
:
An assessment of the biases of satellite rainfall estimates over the Tibetan Plateau and correction methods based on topographic analysis
.
J. Hydrometeor.
,
9
,
301
326
, doi:.
Zulkafli
,
Z.
,
W.
Buytaert
,
C.
Onof
,
W.
Lavado
, and
J. L.
Guyot
,
2013
:
A critical assessment of the JULES land surface model hydrology for humid tropical environments
.
Hydrol. Earth Syst. Sci.
,
17
,
1113
1132
, doi:.
Zulkafli
,
Z.
,
W.
Buytaert
,
C.
Onof
,
B.
Manz
,
E.
Tarnavsky
,
W.
Lavado
, and
J.-L.
Guyot
,
2014a
:
A comparative performance analysis of TRMM 3B42 (TMPA) versions 6 and 7 for hydrological applications over Andean–Amazon River basins
.
J. Hydrometeor.
,
15
,
581
592
, doi:.
Zulkafli
,
Z.
,
D.
Nerini
, and
B.
Manz
,
2014b
: Rainmerging: Pixel-point merging functions for precipitation time series. R package. [Available online at https://github.com/zedzulkafli/Rainmerging.]

Footnotes

##

Current affiliation: Cat Perils, Swiss Reinsurance Company, Ltd., Zurich, Switzerland.

This article is licensed under a Creative Commons Attribution 4.0 license.