1. Introduction
Precipitation is an essential component of the water cycle and reliable and accurate information about its spatiotemporal distribution is decisive for a multitude of scientific studies and operational applications. Rain gauge observations are the most used and—on a local scale—direct and accurate precipitation data sources. In addition, precipitation data can be derived from other sources such as rainfall radar stations, satellites, reanalysis products, or based on merging procedures (Sun et al. 2018). In many developing countries like Peru and Ecuador, rain gauges are unevenly and sparsely distributed (Scheel et al. 2011; Ochoa et al. 2014; Manz et al. 2016; Hunziker et al. 2017; Aybar et al. 2020). These features limit the precise estimation of spatial and temporal variability of precipitation using only gauge-based measurements in the tropical Andes.
Precipitation information derived from satellite data, e.g., CMORPH (Joyce et al. 2004), TMPA (Huffman et al. 2007), CHIRP (Funk et al. 2015a), and IMERG (Huffman et al. 2019), with a high spatiotemporal resolution, near-global coverage, and near-real-time availability have been produced in the last decades (see appendix A for abbreviations). These satellite-based precipitation products are promising alternative sources for regions with sparse observations. However, previous studies for the Andes domain (Scheel et al. 2011; Kneis et al. 2014; Mantas et al. 2014; Ochoa et al. 2014; Zulkafli et al. 2014; Satgé et al. 2016; Chavez and Takahashi 2017; Manz et al. 2017; Baez-Villanueva et al. 2018; Erazo et al. 2018) have reported that precipitation estimates from satellites can be erroneous or biased, and that ground-based data are often needed to reduce their bias. Furthermore, the current short length of satellite records in this region constitutes an important restriction for the use of most of these products for long-term applications.
Reanalysis precipitation data, such as CFSR (Saha et al. 2010), JRA‐55 (Kobayashi et al. 2015), and MERRA (Reichle et al. 2017), rely on uncertain parameterizations, and their spatial resolution is too coarse to represent orographic precipitation (Beck et al. 2020b). Recently, the state-of-the-art climate reanalysis ERA5 (Hersbach et al. 2020) was released, which has been shown to outperform previous reanalyses for precipitation estimation (Beck et al. 2019a; Tall et al. 2019; Xu et al. 2019a; Fallah et al. 2020; Gleixner et al. 2020), and has shown acceptable performance for hydrological modeling over North America (Tarek et al. 2020), the Amazon River basin (Towner et al. 2019), and at the global scale (Alfieri et al. 2020).
In recent years, global merged precipitation products that incorporate satellite and reanalysis information with gauge-based datasets such as CHIRPS (Funk et al. 2015a) and MSWEP (Beck et al. 2017, 2019b) have been published and are available. Many studies worldwide have shown that these products have higher accuracy than precipitation estimates based on one source only (e.g., either satellite- or reanalysis-based precipitation products) and have significant potential for hydrometeorological studies (Bai and Liu 2018; Beck et al. 2019a; Wu et al. 2019; Xu et al. 2019b). CHIRPS has been used successfully to understand the precipitation variability over the Andes (Segura et al. 2019) and Amazonia (Paccini et al. 2018; Espinoza et al. 2019; da Motta Paca et al. 2020). In South America, the accuracy of merged precipitation products has been tested only in a few studies using ground-based precipitation (Zambrano-Bigiarini et al. 2017; Baez-Villanueva et al. 2018; Satgé et al. 2019) and hydrological modeling (Wongchuig Correa et al. 2017; Satgé et al. 2019). CHIRPS and MSWEP showed good performance for streamflow simulation in the Amazon River basin (Wongchuig Correa et al. 2017) and in catchments draining into Titicaca Lake (Satgé et al. 2019, 2020). To the best of our knowledge, there are no case studies in the literature on hydrological evaluation of CHIRPS and MSWEP in Peruvian and Ecuadorian watersheds, which is addressed in this study.
At the regional scale, recently a high-resolution (0.1°) daily gridded precipitation dataset for Peru as part of PISCO datasets was developed (PISCO hereafter) (Aybar et al. 2020). PISCO is based on the merging of satellite estimates (CHIRP) and ground-based observations. It is used by SENAMHI for operational purposes in Peru for droughts and floods monitoring at the national scale, and was applied for hydrological modeling of the Andean Vilcanota River catchment (Fernandez-Palomino et al. 2020), catchments draining into the Pacific Ocean (Asurza-Véliz and Lavado-Casimiro 2020), and Peruvian catchments (Llauca et al. 2021). As the method used to generate PISCO mainly corrects the biases of CHIRP using in situ precipitation data, the higher accuracy of precipitation estimates is constrained to gauged regions such as the Pacific coast and the eastern and western slopes of the Andes of Peru (Aybar et al. 2020; Llauca et al. 2021). Hence, the application of PISCO for Peruvian Amazon and transboundary river catchments is limited. This motivated us to generate a new rainfall dataset for hydrometeorological applications at the national scale of Peru and Ecuador, exploiting the lessons learned from precipitation estimates derived not only from gauges and satellites but also from the state-of-the-art reanalysis ERA5. Indeed, ERA5 and CHIRP, which has long-term daily precipitation data available (1981–present) and hence appropriate for long-term hydrological applications, were used for the precipitation merging procedure in this study. Moreover, terrain elevation which was reported to be a key physical variable with a strong influence on precipitation patterns in mountainous regions (Chavez and Takahashi 2017; Bhuiyan et al. 2019; Beck et al. 2020b) was considered as an additional predictor variable.
Besides sparseness and uncertainty of rainfall observations in complex tropical mountain ranges, in some of those regions depositing fog and clouds may contribute significantly to precipitation, but cannot be recorded with conventional measurements. In páramos (grassland ecosystems extending from northern Peru to Venezuela and occurring between the tree line and glaciers) and tropical montane cloud forest (TMCF) such precipitation plays a key role in the water cycle as the cloud/fog interception by the páramos/forest constitutes an important water source to the system (Gomez-Peralta et al. 2008; Bruijnzeel et al. 2011; Clark et al. 2014; Cárdenas et al. 2017; Strauch et al. 2017). Modeled contributions of cloud water varying from less than 5% of total precipitation in wet areas to more than 75% in low‐rainfall areas in TMCF were reported by Bruijnzeel et al. (2011). Fog water contribution of up to 30% of bulk precipitation (rainfall plus fog water) was estimated in tropical montane forests in the eastern Andes of Central Peru using fog gauges (Gomez-Peralta et al. 2008). Cloud water contribution of up to 15% of streamflow was reported for the montane Kosñypata catchment in the eastern Peruvian Andes using an isotopic mixing model (Clark et al. 2014). Fog water contribution of up to 28% of the total precipitation to páramos in the Colombian Andes was measured using fog gauges (Cárdenas et al. 2017). To correct the underestimation of precipitation by gridded precipitation products, adjustment of precipitation data for regions covered by cloud forests has been proposed (Strauch et al. 2017) with reported increases of up to 50% of the precipitation values in the WFDEI dataset (Weedon et al. 2014) required to improve streamflow simulation in the tropical montane watersheds.
However, the cloud/fog water component is not represented in the aforementioned precipitation data sources. This lacuna, together with the dearth of precipitation gauges could explain some of the poor hydrologic model performances and problems with water budget closure reported in previous studies in páramo and/or montane catchments draining into the Amazon River (Zulkafli et al. 2014; Zubieta et al. 2015; Manz et al. 2016; Zubieta et al. 2017; Strauch et al. 2017; Aybar et al. 2020). Thus, for reliable and accurate estimation of precipitation in regions such as the TMCF and páramos, it is important to consider the contribution of cloud/fog water to the terrestrial hydrological system.
Correcting potential errors in gridded precipitation datasets for these areas requires application of other types of observations and estimates. Corrected estimates of precipitation using satellite soil moisture products have been derived in recent years (Brocca et al. 2013; Román-Cascón et al. 2017; Brocca et al. 2019). However, the utility of these products could be limited due to their low accuracy in regions with dense forests (Brocca et al. 2020), such as TMCF and rain forest areas. Streamflow observations, which are spatially integrative and could be another source of data supplementing information from sparse rain gauges, offer an additional method to infer precipitation patterns and evaluate precipitation datasets (Le Moine et al. 2015; Henn et al. 2018). In this study, we applied regional streamflow observations inversely to infer or correct the precipitation input for the corresponding regional hydrological simulations. This approach has been termed “hydrology backwards” or “reverse hydrology” by Kirchner (2009) and has so far been applied in mountainous catchments like Rietholzbach in Switzerland (Teuling et al. 2010), Alzette in Luxembourg (Krier et al. 2012), Schliefau and Krems in Austria (Herrnegger et al. 2015), and the Sierra Nevada mountain range of California (Henn et al. 2015, 2018). These studies used a simple lumped hydrological model to do reverse hydrology. In our case, we applied a process-based hydrological model to correct precipitation biases using streamflow data. We hypothesize that correction of precipitation using streamflow data can improve closing the observed water budget gap over complex tropical mountainous catchments such as páramo and montane watersheds.
This study is the first attempt to generate a precipitation dataset for Peru and Ecuador by merging different sources of precipitation and correcting precipitation estimates through reverse hydrology. Furthermore, we evaluate the applicability of the precipitation dataset generated in this study, uncorrected precipitation datasets used for merging procedure (CHIRP and ERA5), and current state-of-the-art local (PISCO) and global (CHIRPS and MSWEP) merged precipitation products for hydrological modeling of Peruvian and Ecuadorian river catchments. This will demonstrate the effectiveness of the new methods combined here, and will help illustrate the appropriateness of multiple precipitation datasets for the countrywide hydrometeorological applications both in Peru and Ecuador. The objectives of this study are 1) to generate a high-spatial-resolution and hydrologically adjusted precipitation dataset for Peru and Ecuador, and 2) to assess and compare the applicability of this precipitation data and the current state-of-the-art uncorrected and merged precipitation products for hydrological modeling.
2. Study area and data
a. Study area
The study area covers Peru and Ecuador with elevation ranging from 0 to 6518 m MSL (Fig. 1). The new precipitation dataset [Rain for Peru and Ecuador (RAIN4PE)] is generated for the terrestrial land surface between 19°S–2°N and 82°–67°W. The study area has complex hydroclimatic conditions related to its variable climate zones and the Andes Cordillera, which acts as a topographic barrier between the cold and dry eastern Pacific and the warm and moist Amazon region. The Andes divides the study area into three natural drainage basins (Fig. 1): (i) the Pacific basin (watersheds located on the western side of the Andes that convey water to the Pacific Ocean); (ii) Amazon basin (watersheds located on the eastern side of the Andes that drain to Amazon River); and (iii) Titicaca Lake basin (catchments draining into Titicaca Lake).

(left) Study area and spatial distribution of precipitation gauges with record length greater than 10 years for the 1981–2015 period used for the merging procedure. (right) Drainage systems, river networks, and streamflow stations used for hydrological model calibration based on the cascading calibration approach. Red polygons show the gauged catchments with water budget imbalance where gridded precipitation datasets are corrected using streamflow data through reverse hydrology. Nueva Loja station gauges the catchment “A”, San Sebastian (B), Francisco De Orellana (C), Santiago (D), Borja (E), Shanao (F), Chazuta (G), Puerto Inca (H), and Lagarto (I). Boundaries of the páramo and tropical montane cloud forest (TMCF) ecosystems were obtained from Helmer et al. (2019).
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

(left) Study area and spatial distribution of precipitation gauges with record length greater than 10 years for the 1981–2015 period used for the merging procedure. (right) Drainage systems, river networks, and streamflow stations used for hydrological model calibration based on the cascading calibration approach. Red polygons show the gauged catchments with water budget imbalance where gridded precipitation datasets are corrected using streamflow data through reverse hydrology. Nueva Loja station gauges the catchment “A”, San Sebastian (B), Francisco De Orellana (C), Santiago (D), Borja (E), Shanao (F), Chazuta (G), Puerto Inca (H), and Lagarto (I). Boundaries of the páramo and tropical montane cloud forest (TMCF) ecosystems were obtained from Helmer et al. (2019).
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
(left) Study area and spatial distribution of precipitation gauges with record length greater than 10 years for the 1981–2015 period used for the merging procedure. (right) Drainage systems, river networks, and streamflow stations used for hydrological model calibration based on the cascading calibration approach. Red polygons show the gauged catchments with water budget imbalance where gridded precipitation datasets are corrected using streamflow data through reverse hydrology. Nueva Loja station gauges the catchment “A”, San Sebastian (B), Francisco De Orellana (C), Santiago (D), Borja (E), Shanao (F), Chazuta (G), Puerto Inca (H), and Lagarto (I). Boundaries of the páramo and tropical montane cloud forest (TMCF) ecosystems were obtained from Helmer et al. (2019).
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
In the region, the great spatial variability of precipitation patterns is modulated by the interplay among large-scale (e.g., latitudinal migration of Atlantic intertropical convergence zone, South American monsoon systems, Hadley, Walker cell, marine currents, Bolivian high) and local circulation patterns (e.g., upslope and downslope moisture transport) and the complex Andean orography (Laraque et al. 2007; Tobar and Wyseure 2018; Segura et al. 2019; Espinoza et al. 2020). Furthermore, El Niño–Southern Oscillation (ENSO) is a major modulator of hydroclimatology at interannual time scales along the Andes (Poveda et al. 2020). The study area hosts a diversity of ecosystems such as deserts, punas (high mountain grasslands), páramos, glaciers, mountain forests, TMCFs, and rain forests. From these, páramo and TMCF (Fig. 1) are ecosystems where an important cloud/fog water input to the system was reported (Gomez-Peralta et al. 2008; Bruijnzeel et al. 2011; Clark et al. 2014; Cárdenas et al. 2017). This is an important precipitation source to consider in hydrological modeling of páramo and montane watersheds, as it was carried out herein.
b. Data
1) Ground-based precipitation data
The precipitation data of a total of 804 precipitation gauges with record length greater than ten years for the 1981–2015 period were used for this study (Fig. 1), out of which 587 (217) gauges have daily (only monthly) precipitation data. The data were collected from different sources such as national hydrometeorological institutions and previous studies in the region. The data for Peru were obtained from the Peruvian ANA (Autoridad Nacional del Agua) and Aybar et al. (2020); for Ecuador from Morán-Tejeda et al. (2016), Tamayo (2017), Tobar and Wyseure (2018); for Brazil from Xavier et al. (2016, 2017); and for Colombia from IDEAM (Instituto de Hidrología, Meteorología y Estudios Ambientales). We used 587 (804) precipitation gauges with daily (monthly) data for the merging of precipitation datasets at the daily (monthly) time step.
2) Discharge data
Discharge data of 72 streamflow stations (Fig. 1) with record lengths ranging from one to 33 years for 1983–2015 were obtained from different sources, such as the Peruvian ANA and SENAMHI for catchments draining into the Pacific Ocean and located in the Andes. For the Amazon lowland, data were obtained from the Critical Zone Observatory HYBAM (Hydrogéochimie du Bassin Amazonien, www.so-hybam.org). This hydrological network has been operated by an international team from IRD (Institut de Recherche pour le Développement; France), SENAMHI (Peru), INAMHI (Instituto Nacional de Meteorología e Hidrología; Ecuador), and the Brazilian ANA (Agência Nacional de Águas; Brazil) since 2003 (Armijos et al. 2013; Santini et al. 2019).
3) Gridded precipitation data
Table 1 presents the five precipitation datasets used in this study. We used the non-gauge-corrected datasets (CHIRP and ERA5) for the merging procedure to generate RAIN4PE dataset. The satellite-based CHIRP precipitation dataset (Funk et al. 2015a) is obtained by considering infrared-based precipitation estimates and corresponding monthly precipitation climatology generated for Funk et al. (2015b). We selected CHIRP since it has high spatial resolution and long-term (from 1981 onward) daily precipitation data, which is appropriate for long-term hydrometeorological applications. ERA5 (Hersbach et al. 2020) is the latest climate reanalysis dataset produced by the European Centre for Medium Weather Forecasts (ECMWF). Compared with its predecessor ERA-Interim (Dee et al. 2011) that became operational in 2006, ERA5 is based on the ECMWF’s Integrated Forecasting System Cycle 41r2 which was operational in 2016. ERA5 thus benefits from a decade worth of numerical weather prediction developments in model physics, core dynamics, and data assimilation relative to ERA-Interim. Moreover, ERA5 has a much higher temporal and spatial resolution than previous global reanalyses. The hourly ERA5 precipitation data were downloaded and aggregated to obtain daily time step records matching the local gauge observations (from 0700 to 0700 local time).
List of gridded precipitation datasets used in this study. In uncorrected datasets, their temporal dynamics depend entirely on satellite (S) or reanalysis (R) data, while in gauge-corrected datasets, their temporal dynamics depend at least partly on gauge (G) data. In the spatial coverage column, “Global” means fully global coverage including oceans, while “Land” indicates that the coverage is limited to the terrestrial land surface.


To compare RAIN4PE against other gauge-corrected precipitation datasets besides the uncorrected ones (CHIRP and ERA5), we selected three merged products (CHIRPS, MSWEP, and PISCO) widely used in data evaluation and hydrometeorological applications in the region (Wongchuig Correa et al. 2017; Paccini et al. 2018; Bhuiyan et al. 2019; Espinoza et al. 2019; Satgé et al. 2019, 2020; Asurza-Véliz and Lavado-Casimiro 2020; Baez-Villanueva et al. 2020; da Motta Paca et al. 2020; Fernandez-Palomino et al. 2020; Llauca et al. 2021). CHIRPS (Funk et al. 2015a) and PISCO (Aybar et al. 2020) are obtained by merging CHIRP and gauge estimates through deterministic and geostatistical interpolation methods. Finally, MSWEP is derived by optimally merging a range of gauge, satellite, and reanalysis precipitation estimates, where satellite and reanalysis datasets are merged using weights for each one based on the coefficient of determination between 3-day mean gauge- and grid-based precipitation time series (Beck et al. 2017, 2019b). The daily MSWEP precipitation data were provided for this study.
4) Additional data
In addition to various precipitation products, Table 2 presents other datasets that were used for the hydrological modeling process. The surface elevation data were used both for the merging procedure and setting up the hydrological model.
Data used for hydrological modeling.


3. Methods
The framework of this study involves three main steps (Fig. 2): (i) merging procedure through a machine learning technique at the daily and monthly scales; (ii) calibration of model parameters and hydrological adjustment through the reverse hydrology concept; and (iii) evaluation of all precipitation products through hydrological modeling.

Flowchart for (i) the generation of gridded precipitation dataset, (ii) hydrological model calibration and adjustment of precipitation datasets, and (iii) hydrological evaluation. Here d (m) indicates the daily (monthly) time step, BD(1),…,(n) are buffer distances (distance from any point to all precipitation gauges), BCF is the bias correction factor, OFs are the objective functions for hydrological model calibration, and GOFs are the goodness of fit measures. BCF is optimized only over catchments with water budget imbalance. Note that for hydrological evaluation (step iii), the model was rerun using the respective corrected precipitation data and optimum model parameters values with BCF set to 1.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

Flowchart for (i) the generation of gridded precipitation dataset, (ii) hydrological model calibration and adjustment of precipitation datasets, and (iii) hydrological evaluation. Here d (m) indicates the daily (monthly) time step, BD(1),…,(n) are buffer distances (distance from any point to all precipitation gauges), BCF is the bias correction factor, OFs are the objective functions for hydrological model calibration, and GOFs are the goodness of fit measures. BCF is optimized only over catchments with water budget imbalance. Note that for hydrological evaluation (step iii), the model was rerun using the respective corrected precipitation data and optimum model parameters values with BCF set to 1.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
Flowchart for (i) the generation of gridded precipitation dataset, (ii) hydrological model calibration and adjustment of precipitation datasets, and (iii) hydrological evaluation. Here d (m) indicates the daily (monthly) time step, BD(1),…,(n) are buffer distances (distance from any point to all precipitation gauges), BCF is the bias correction factor, OFs are the objective functions for hydrological model calibration, and GOFs are the goodness of fit measures. BCF is optimized only over catchments with water budget imbalance. Note that for hydrological evaluation (step iii), the model was rerun using the respective corrected precipitation data and optimum model parameters values with BCF set to 1.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
a. Merging procedure
In this section, the merging procedure to obtain RAIN4PE at 0.1° spatial resolution for the 1981–2015 period is described; see Fig. 2 for a scheme.
1) Covariates
For the merging procedure at the daily (monthly) scale, we used daily (monthly) precipitation estimates of CHIRP and reanalysis ERA5, surface elevation (Yamazaki et al. 2017), and buffer distances from observation points as covariates. The latter is to account for geographical proximity effects in the prediction process using the random forest (RF) method as suggested by Hengl et al. (2018). The elevation is taken into account because it is a key physical variable with a strong influence on precipitation patterns (Chavez and Takahashi 2017; Beck et al. 2020b). We selected these covariates: satellite precipitation, reanalysis precipitation, and elevation, all of them based on recent studies (Bhuiyan et al. 2019; Baez-Villanueva et al. 2020; Beck et al. 2020b; Hong et al. 2021). To match the 0.1° spatial resolution of the final precipitation product, the covariates with grid cell size < 0.1° (>0.1°) were regridded to 0.1° spatial resolution applying the bilinear interpolation (nearest neighbor) method.
2) Random forest modeling to combine different data sources
In this study, the RF method (Breiman 2001) was applied to produce a gridded precipitation dataset by merging multiple precipitation sources (gauge, satellite, and reanalysis). RF has been used and proved recently to have similar or superior performance in the interpolation of environmental variables such as precipitation, temperature, and evapotranspiration compared to traditional spatial interpolation techniques, e.g., regression kriging and inverse distance weighting (Hengl et al. 2018; da Silva Júnior et al. 2019; Sekulić et al. 2020). Last, RF-based methodologies (Bhuiyan et al. 2019; Baez-Villanueva et al. 2020) to merge precipitation products with ground-based measurements were developed and applied successfully in data-scarce and complex terrain regions such as the Peruvian and Colombian Andes (Bhuiyan et al. 2019) and Chilean territory (Baez-Villanueva et al. 2020).
In RF, each tree is constructed from the random selection of covariates which ensures that trees are decorrelated with each other and a bootstrap sample of the observations (Breiman 2001). The unsampled data, called out-of-bag, can be used to test the prediction accuracy and the importance of input variables, and thus no extra independent validation dataset is needed (Breiman 2001).
We implemented RF using the R package randomForest (Liaw and Wiener 2002) and the following RF parameters: 1) the number of trees (set at 1000); 2) the number of predictor variables randomly selected at each node (set at one-third of the number of variables, default value); 3) the minimum number of observation in a tree’s terminal node (set at 5, default value); and 4) the out-of-bag portion to test the accuracy of the predictions (set at one-third of the total number of observations). These parameter values were successfully used in other studies (Baez-Villanueva et al. 2020; Fox et al. 2020; Sekulić et al. 2020).
In the merging procedure (see Fig. 2), an RF model was trained using ground-based observations as the dependent variable and the selected covariates as predictor variables for each day and month in the 1981–2015 period. The trained RF models were then applied to covariates, yielding preliminary daily precipitation data (Pd0) and monthly precipitation data (Pm1). Finally, Pd0 was corrected to match Pm1. For that, the ratio of Pm1 over the monthly precipitation derived from Pd0 was computed on each grid cell for each month, and this ratio was then applied to multiply the Pd0 values on the grid for the month to generate the RF-based RAIN4PE dataset. This correction was because the interpolation of precipitation patterns at a monthly scale is more reliable and accurate than the daily interpolation (Aybar et al. 2020; He et al. 2020).
b. Hydrological modeling and adjustment of precipitation datasets
This section describes the approaches applied for hydrological model calibration and validation and hydrological adjustment of precipitation datasets using streamflow data through the reverse hydrology concept. The hydrological correction is applied only for nine catchments (Fig. 1) having a water budget imbalance due to underestimated streamflow in the simulations with uncorrected precipitation inputs, as reported in previous studies (Zulkafli et al. 2014; Zubieta et al. 2015, 2017; Strauch et al. 2017). We applied the reverse hydrology using the Soil and Water Assessment Tool (SWAT; Arnold et al. 1998) model in which both the bias correction factor (BCF) for precipitation fields and model parameters were calibrated jointly (Fig. 2).
1) SWAT model
In SWAT, flow routing in river channels can be computed using the Muskingum or the variable storage method, considering the flow velocity to be the same across the channel and floodplain section (Neitsch et al. 2011). This approach has been shown to be inefficient for flow routing in Amazon rivers (Santini 2020), where flows are largely affected by floodplains that act as reservoirs, causing significant flood peak delay and attenuation (Paiva et al. 2011; Yamazaki et al. 2011; Santini et al. 2015; Santini 2020). To exclude this limitation, Santini (2020) has implemented a new flow routing method for SWAT to consider the river–floodplain dynamics, where the associated floodplain of each river reach was treated as a simple storage model, as in other hydrological models (Paiva et al. 2011; Yamazaki et al. 2011). This approach was used in our study.
2) SWAT model setup, calibration, and validation
The SWAT model was set up for Peruvian and Ecuadorian catchments (total of 1 638 793 km2) based on the input data listed in Table 2. The model includes 2675 subcatchments and 6843 HRUs. Channel cross-section parameters such as the bankfull width (B) and channel depth (CHD) were estimated using geomorphologic equations based on upstream drainage areas derived for Amazon rivers (Paiva et al. 2011). Floodplain width is estimated by multiplying the bankfull width by a factor (set at 5, default value). We assigned Manning’s n values of 0.03 (0.10) for channels (floodplains). The modified Soil Conservation Service curve number, the Priestley–Taylor equation, and the variable storage methods were used to simulate surface runoff and infiltration, potential evapotranspiration, and river flow routing, respectively.
The simulation period was from 1981 to 2015. The first two years were considered for the model spinup. For the model calibration, all flow data were used for stations with a record lower than 10 years, and for those with longer, two-third of the data were used. In the latter case, the remaining flow data were used for model validation (53 out of 72 streamflow stations). The model calibration for each precipitation product was performed applying the multisite cascading calibration approach (Xue et al. 2016) in nine sequences (Fig. 1), where the calibrated discharge from the upstream catchments was used as input for the downstream. The model parameters and BCFs for each (sub)catchment were calibrated using the respective set of parameters defined in Table 3 for Andean, montane, and lower Amazon catchments. Moreover, plant parameters were adopted from our previous study (Fernandez-Palomino et al. 2020).
Parameters and their ranges for model calibration for evapotranspiration (ET), streamflow (Q), and precipitation (P). In the “Change type” column, R (V) refers to a relative (absolute) change of parameter values during the calibration. Parameter set 1 was applied for Andean catchments draining into the Pacific Ocean and Titicaca Lake and for Andean catchments upstream the montane watersheds. Montane watersheds having a water budget imbalance were calibrated using parameter set 2. Catchments downstream the montane watersheds were calibrated using parameter set 3. Note that BCF is applied only for catchments with water budget closure problems to infer precipitation from streamflow data. See Neitsch et al. (2011) for detailed parameter definitions.


The optimum values of model parameters and BCFs were obtained through multiobjective calibration. For that, the model was calibrated against observed discharge using the Nash–Sutcliffe efficiency log (lNSE) and aggregated flow duration curve signature (FDCsign) as objective functions (see Table 4). We selected lNSE and FDCsign since these have been shown sufficient to test the model for simulating all hydrograph aspects in the calibration (Fernandez-Palomino et al. 2020). Moreover, the application of FDC-based signatures provides more information about the hydrological behavior of the modeled basin (Yilmaz et al. 2008; Hrachowitz et al. 2014) and leads to better parameter identifiability, more accurate discharge simulation, and reduction of predictive uncertainty (Yilmaz et al. 2008; Pokhrel and Yilmaz 2012; Hrachowitz et al. 2014; Pfannerstill et al. 2014, 2017; Chilkoti et al. 2018; Fernandez-Palomino et al. 2020; Sahraei et al. 2020). Following Chilkoti et al. (2018) and Fernandez-Palomino et al. (2020), we estimated percent bias for four segments of the FDC [peak flow (0%–2%), high flow (2%–20%), midsegment (20%–70%), and low flow (70%–100%)], and then the absolute values of the bias percentages were averaged to obtain the FDCsign to take into consideration the hydrological signatures for model calibration. The respective FDC segmentation represents peak flow events occurring rarely, quick runoff (due to snowmelt and/or rainfall), the flashiness of a basin’s response, and the streamflow’s baseflow components. The Borg multiobjective evolutionary algorithm (Borg MOEA; Hadka and Reed 2013) was used to optimize the objective functions (maximization of lNSE and minimization of FDCsign) with 1000 iterations as maximum. The Borg MOEA parameterization was the same as in Fernandez-Palomino et al. (2020). The parameters for ungauged catchments (at HRU level) were obtained applying the spatial proximity approach (Guo et al. 2021) using the inverse distance weighting (Shepard 1968). For regionalization of parameters, donor catchments (gauged) within a radius of 150 km were used to avoid the influence of Amazonian catchments in the estimation of parameters for Andean basins draining into the Pacific Ocean and Titicaca Lake.
Mathematical formulation of the goodness of fit metrics and hydrological signatures. Here, O and S are observed and simulated flow (m3 s−1), respectively; EP is exceedance probability; P, H, and L are the indices of the minimum flow of the peak flow, high flow, and low flow segments, respectively. In the optimization process for hydrological model calibration, lNSE was maximized, whereas FDCsign was minimized.


3) Hydrological adjustment of precipitation datasets
The optimum BCFs obtained for each catchment with water budget imbalance (Fig. 1) in the calibration procedure were applied to the respective daily gridded precipitation data to obtain the hydrologically corrected daily precipitation dataset (Fig. 2). For that, a continuous BCF map at 0.1° spatial resolution was produced where grid cells within the respective catchment retained the respective BCF, and for cells on the boundary, the area-weighted BCFs were estimated. It is noteworthy that applying the resulting BCF map to gridded precipitation data can result in spatial discontinuities of precipitation patterns at the border of the catchments. To reduce such discontinuities, we further applied a 5 × 5 mean filter to the BCF map. Finally, the corrected precipitation data were used as input to SWAT to run the model with the respective optimum parameters for the simulation period to compute the model performance measures for the hydrological evaluation of precipitation datasets.
c. Evaluation methods
1) Evaluation using out-of-bag sample
The prediction accuracy of preliminary daily precipitation data (Pd0) and monthly precipitation data (Pm1) produced by the RF method (see Fig. 2) was assessed using the mean absolute error (MAE) and determination coefficient (R2) based on the out-of-bag sample.
2) Hydrological evaluation
We evaluated the accuracy of precipitation estimates through hydrological modeling for the three drainage systems in the study area. It is an adequate approach evaluating gauge-corrected precipitation datasets since streamflow observations are independent from ground precipitation observations that are used in these datasets (Beck et al. 2020a; Brocca et al. 2020; Satgé et al. 2020).
For hydrological evaluation, a multicriteria evaluation of SWAT-simulated streamflow using all precipitation products was carried out. For that, both hydrograph goodness of fit metrics and hydrological signatures (Table 4) were considered for both calibration and validation periods. The modified Kling–Gupta efficiency (KGE) and percent bias (PBIAS) were used for assessing model skills in representing general discharge dynamics and over or underestimation tendencies, respectively; lNSE and percent bias in FDC low segment volume (Slow) for low flows; Nash–Sutcliffe efficiency (NSE) and percent bias in FDC high segment volume (Shigh) for high flows; and percent bias in FDC peak segment volume (Speak) for extremely peak flow conditions. This multicriteria evaluation aims to assess model skills representing all aspects of the observed FDC and hydrographs, which is important for assessing the reliability of precipitation products for hydrometeorological applications such as the analysis of water budget and hydroclimatic extremes (floods and droughts). The hydrological model performance was ranked based on the rating performance criteria of Moriasi et al. (2007). Thus, for simplicity, the absolute values of PBIAS, Slow, Shigh, and Speak < 10 were considered as very good, (10–15) good, (15–25) satisfactory, and (>25) unsatisfactory, and KGE, NSE, and lNSE > 0.75 were considered very good, (0.65–0.75) good, (0.50–0.65) satisfactory, and (<0.50) unsatisfactory.
Furthermore, in this study, we analyzed the distribution of model parameters and compared the evapotranspiration (ET) simulated by SWAT with remotely sensed ET from Global Land Evaporation Amsterdam Model (GLEAM) and Moderate Resolution Imaging Spectroradiometer Global Evaporation (MOD16). The ET estimates from MOD16 and GLEAM are based on the Penman–Monteith and Priestly–Taylor equations, respectively. This comparison is to verify the plausibility of ET estimates which is one of the largest components of the water budget besides precipitation and difficult to estimate over complex terrain. Results of the analysis of parameter distribution and ET estimates are described in appendices B and C.
4. Results
a. Performance of the merging algorithm
The skill of the RF method for predicting daily and monthly precipitation patterns was evaluated using performance measures (R2 and MAE) based on the out-of-bag sample. Figure 3 shows that based on the temporal distribution of R2, the RF performance does not have a seasonal pattern for the daily precipitation prediction, whereas it exhibits better performance in the period from April to December for monthly prediction. Furthermore, R2 shows that prediction is better for the monthly (mean R2 = 0.72) than the daily (mean R2 = 0.25) precipitation. This result supports the correction of daily-predicted precipitation values to match the monthly predictions performed in our study as described in the methods. MAE is much lower in the period June–September for both daily and monthly precipitation prediction, indicating that precipitation is more easily predictable when most of the study area experiences lower precipitation during the dry season. It is important to mention that satellite precipitation (CHIRP) was often the most important covariate in the merging procedure both at daily and monthly scale, followed by reanalysis precipitation (ERA5) and terrain elevation, while buffer distances were negligible (Fig. S1 in the online supplemental material).

Performance of the random forest algorithm for spatial interpolation of (left) daily and (right) monthly precipitations. Here, R2 is the coefficient of determination, and MAE is the mean absolute error. The middle and bottom graphs show the performance measures averaged for each day or month in the 1981–2015 period.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

Performance of the random forest algorithm for spatial interpolation of (left) daily and (right) monthly precipitations. Here, R2 is the coefficient of determination, and MAE is the mean absolute error. The middle and bottom graphs show the performance measures averaged for each day or month in the 1981–2015 period.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
Performance of the random forest algorithm for spatial interpolation of (left) daily and (right) monthly precipitations. Here, R2 is the coefficient of determination, and MAE is the mean absolute error. The middle and bottom graphs show the performance measures averaged for each day or month in the 1981–2015 period.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
b. Hydrological correction of the gridded precipitation datasets
The spatial variation of the obtained bias correction factors (BCFs) for six precipitation datasets is shown in Fig. 4. This differs from the method of Strauch et al. (2017), who applied a unique correction factor to WFDEI (Weedon et al. 2014) dataset for all montane regions. The lower values of BCFs for ERA5 are related to significant precipitation overestimation along the Andes by ERA5 (Figs. 5 and 6). The results for the other datasets (Fig. 4) show that higher BCFs were the result for MSWEP (mean BCF = 1.66) and lower for RAIN4PE (mean BCF = 1.38). For a BCF of 1.38, on average, 28% of total precipitation is the precipitation underpredicted in páramo and montane watersheds in the study area which falls in the range (0%–30%) of cloud/fog water contribution to total precipitation reported in previous studies of the region (Gomez-Peralta et al. 2008; Cárdenas et al. 2017). Figure 4 also shows that significant benefits of precipitation correction made for RAIN4PE are obvious in a good representation of streamflow seasonality for all nine catchments. The correction of CHIRPS also works relatively well in most of the catchments in terms of seasonal streamflow prediction, although it fails over the southern Ecuadorian Amazon (at Santiago station). The hydrological correction of the other datasets (CHIRP, ERA5, MSWEP, and PISCO) performs well for southern catchments (from Borja to Lagarto station) but not in Ecuadorian catchments (from Nueva Loja to Santiago station) since the streamflow seasonality change is underestimated, indicating a serious drawback of these datasets.

(top) Bias correction factors (BCFs) for six precipitation datasets and (bottom) long-term mean seasonal streamflow (Q) dynamics in the period 1983–2015 after SWAT model calibration over nine catchments with underestimation of precipitation amounts in comparison with the observed mean seasonal discharge. The mean BCF was computed using the catchment areas as weights. Note that both observed and seasonal streamflow were computed only for the months with available streamflow data.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

(top) Bias correction factors (BCFs) for six precipitation datasets and (bottom) long-term mean seasonal streamflow (Q) dynamics in the period 1983–2015 after SWAT model calibration over nine catchments with underestimation of precipitation amounts in comparison with the observed mean seasonal discharge. The mean BCF was computed using the catchment areas as weights. Note that both observed and seasonal streamflow were computed only for the months with available streamflow data.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
(top) Bias correction factors (BCFs) for six precipitation datasets and (bottom) long-term mean seasonal streamflow (Q) dynamics in the period 1983–2015 after SWAT model calibration over nine catchments with underestimation of precipitation amounts in comparison with the observed mean seasonal discharge. The mean BCF was computed using the catchment areas as weights. Note that both observed and seasonal streamflow were computed only for the months with available streamflow data.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

The spatial patterns of average annual precipitation for the period 1981–2015 based on (top) raw and (middle) hydrologically adjusted precipitation data of ERA5, CHIRP, CHIRPS, MSWEP, PISCO, and RAIN4PE. (bottom) The underestimated precipitation fields for each precipitation dataset. The numbers in brackets represent the precipitation ranges. In the case of ERA5, precipitation values exceeding 8000 mm are in purple (distributed over the Ecuadorian Andes mainly).
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

The spatial patterns of average annual precipitation for the period 1981–2015 based on (top) raw and (middle) hydrologically adjusted precipitation data of ERA5, CHIRP, CHIRPS, MSWEP, PISCO, and RAIN4PE. (bottom) The underestimated precipitation fields for each precipitation dataset. The numbers in brackets represent the precipitation ranges. In the case of ERA5, precipitation values exceeding 8000 mm are in purple (distributed over the Ecuadorian Andes mainly).
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
The spatial patterns of average annual precipitation for the period 1981–2015 based on (top) raw and (middle) hydrologically adjusted precipitation data of ERA5, CHIRP, CHIRPS, MSWEP, PISCO, and RAIN4PE. (bottom) The underestimated precipitation fields for each precipitation dataset. The numbers in brackets represent the precipitation ranges. In the case of ERA5, precipitation values exceeding 8000 mm are in purple (distributed over the Ecuadorian Andes mainly).
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

Performance of the unadjusted precipitation datasets in comparison with gauge observations: MAP is the mean annual precipitation, ME is the mean error, r is the Pearson’s correlation coefficient, and R2 is the coefficient of determination. The comparison measures (ME, r, and R2) were computed using monthly precipitation time series for 1981–2015.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

Performance of the unadjusted precipitation datasets in comparison with gauge observations: MAP is the mean annual precipitation, ME is the mean error, r is the Pearson’s correlation coefficient, and R2 is the coefficient of determination. The comparison measures (ME, r, and R2) were computed using monthly precipitation time series for 1981–2015.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
Performance of the unadjusted precipitation datasets in comparison with gauge observations: MAP is the mean annual precipitation, ME is the mean error, r is the Pearson’s correlation coefficient, and R2 is the coefficient of determination. The comparison measures (ME, r, and R2) were computed using monthly precipitation time series for 1981–2015.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
c. Spatial patterns of precipitation
In general, the spatial variability of the long-term average annual precipitation (1981–2015) portrayed by all precipitation datasets looks quite similar (Fig. 5), although PISCO shows distinct precipitation patterns and magnitudes in the rain forest regions. Figure 5 also shows the spatial patterns of the estimated precipitation underestimates for each precipitation dataset. As can be seen, these patterns look quite similar over the Peruvian Amazon for five datasets (CHIRP, CHIRPS, MSWEP, PISCO, and RAIN4PE) but vary over the northern Amazon basin in Ecuador. The substantial precipitation underestimation (ranging from 0 to 3369 mm, Fig. 5) found here suggests that precipitation correction was necessary to achieve the closure of the water budget and appropriate hydrological modeling of the páramo and montane watersheds.
In addition, a comparison of the unadjusted precipitation data with gauge observation was done (Fig. 6) to assess precipitation datasets’ reliability or critical shortcomings. It shows that ERA5 overestimates precipitation significantly over the Andes. CHIRP, CHIRPS, MSWEP, and PISCO (CHIRP and CHIRPS) underestimate (overestimate) precipitation over the northern (arid southern) Pacific coastal areas. Furthermore, ERA5, CHIRP, CHIRPS, MSWEP, and PISCO have inconsistent temporal distribution of precipitation over the northern Amazon, which is confirmed by low values of correlation and determination coefficients that result from comparing these products with gauge observations at a monthly scale (Fig. 6) and SWAT-simulated seasonal streamflow using these datasets (Fig. 4). Therefore, these datasets are less suitable for characterizing spatiotemporal variability of precipitation over the Ecuadorian Amazon than RAIN4PE. However, it should be kept in mind that the comparison measures in Fig. 6 could be biased toward datasets (CHIRPS, MSWEP, PISCO, and RAIN4PE) that used data from the assimilated precipitation gauges in their production (see Table 1).
d. Hydrological evaluation
In this section, we evaluate the performance of the SWAT model driven by the hydrologically adjusted CHIRP (CHIRP-SWAT), ERA5 (ERA5-SWAT), CHIRPS (CHIRPS-SWAT), MSWEP (MSWEP-SWAT), PISCO (PISCO-SWAT), and RAIN4PE (RAIN4PE-SWAT) for calibration and validation periods. For that, we used multiple performance measures to assess the model skills in representing discharge dynamics including all flow conditions (low, high, and peak flows). It is important to mention that temporal mismatches in the daily precipitation accumulation may influence the model performance at the daily scale since CHIRP, CHIRPS, and MSWEP were delivered using different daily time window aggregation than the local one (from 0700 to 0700 local time). Furthermore, our analyses are based on the results of the only one hydrological model, SWAT, and the application of other hydrological models could be done in future to verify and refine the obtained results.
1) Performance evaluation for daily streamflow and extremes
We investigated the spatial variability of hydrological model performance for streamflow simulation forced by six precipitation products in calibration (Fig. 7, Table S1) and validation (Fig. S2, Table S2) periods. These figures present the Kling–Gupta efficiency spatial distribution and show results in terms of seven criteria for all streamflow stations and catchments draining into the Titicaca Lake, the Pacific Ocean, and the Amazon River as boxplots. Table 5 shows each criterion’s median values for each drainage system and precipitation product for the simulation period (1981–2015). The results described in this section are based on the outputs for calibration period (Fig. 7) but they are also valid for the validation period (Fig. S2), as results for both periods are similar.

Hydrological model performance metrics for daily streamflow simulations by SWAT driven by six precipitation datasets in the calibration period: (top) spatial patterns of KGE and (bottom) boxplots showing seven criteria for all streamflow (Q) stations and stations located in catchments draining into the Amazon River, Pacific Ocean, and Titicaca Lake. The datasets are sorted in ascending order of the median KGE for all Q stations. Values exceeding 0.5 (between ±25%) for KGE, lNSE, and NSE (PBIAS, Slow, Shigh, and Speak) are considered skillful (marked by light gray background in boxplots). Black points in the upper part represent negative values of KGE. Note that the x axis starts at 0 for KGE, NSE, and lNSE to improve visualization, whereas PBIAS, Slow, Shigh, and Speak were constrained between ±50%.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

Hydrological model performance metrics for daily streamflow simulations by SWAT driven by six precipitation datasets in the calibration period: (top) spatial patterns of KGE and (bottom) boxplots showing seven criteria for all streamflow (Q) stations and stations located in catchments draining into the Amazon River, Pacific Ocean, and Titicaca Lake. The datasets are sorted in ascending order of the median KGE for all Q stations. Values exceeding 0.5 (between ±25%) for KGE, lNSE, and NSE (PBIAS, Slow, Shigh, and Speak) are considered skillful (marked by light gray background in boxplots). Black points in the upper part represent negative values of KGE. Note that the x axis starts at 0 for KGE, NSE, and lNSE to improve visualization, whereas PBIAS, Slow, Shigh, and Speak were constrained between ±50%.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
Hydrological model performance metrics for daily streamflow simulations by SWAT driven by six precipitation datasets in the calibration period: (top) spatial patterns of KGE and (bottom) boxplots showing seven criteria for all streamflow (Q) stations and stations located in catchments draining into the Amazon River, Pacific Ocean, and Titicaca Lake. The datasets are sorted in ascending order of the median KGE for all Q stations. Values exceeding 0.5 (between ±25%) for KGE, lNSE, and NSE (PBIAS, Slow, Shigh, and Speak) are considered skillful (marked by light gray background in boxplots). Black points in the upper part represent negative values of KGE. Note that the x axis starts at 0 for KGE, NSE, and lNSE to improve visualization, whereas PBIAS, Slow, Shigh, and Speak were constrained between ±50%.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
Median values of each performance measure for daily streamflow simulation for the period 1983–2015 (without the spinup period). Values in bold denote the best performing product in each drainage system and the study area according to the specific score on the left.


Results for catchments draining into Titicaca Lake show that SWAT driven by gauge-corrected precipitation datasets performs satisfactorily to very good for daily streamflow simulation (median KGE ≥ 0.79), including all flow conditions. The good performance of MSWEP and CHIRPS for hydrological modeling in the Titicaca Lake basin shown here coheres with the performance demonstrated in Satgé et al. (2019, 2020). However, RAIN4PE (median KGE = 0.86) was shown in our simulation to be the best choice for this drainage system. Regarding two non-gauge-corrected datasets, CHIRP-SWAT has unsatisfactory performances for high-flow dynamics, and ERA5-SWAT significantly overestimates streamflow (Fig. S4).
In the Pacific basin, CHIRPS-SWAT, MSWEP-SWAT, and even PISCO-SWAT have low KGE (≤0.5), high biases, and poor performance for high and peak flows for some stations. The outcome for MSWEP and PISCO aligns with the findings of previous studies (Bhuiyan et al. 2019; Derin et al. 2019; Asurza-Véliz and Lavado-Casimiro 2020). CHIRP-SWAT has more skill than ERA5-SWAT, which shows a significant overestimation of streamflow; however, they both are outperformed by the gauge-corrected precipitation datasets. The overall good performance of RAIN4PE-SWAT (median KGE = 0.78) allowed us to conclude that RAIN4PE is the most suitable precipitation product for daily streamflow simulation (including all flow conditions and water budget closure) in the catchments draining into the Pacific Ocean.
In the Amazon basin, among the six precipitation products driving SWAT, RAIN4PE (median KGE = 0.80) provided the best performance measures for daily streamflow simulation (including all flow conditions). PISCO (median KGE = 0.49) provided the worse measures, particularly over the lower Amazon catchments which is consistent with previous studies (Aybar et al. 2020; Llauca et al. 2021). Despite the fact that median KGE (>0.5) is satisfactory for CHIRP, CHIRPS, ERA5, and MSWEP, the other measures such as the lNSE and NSE show that they tend to perform unsatisfactorily for the simulation of low- and high-flow dynamics. However, KGE patterns (Fig. 7) show unsatisfactory scores over the Ecuadorian Amazon catchments, showing the limitations of all products (including RAIN4PE) in portraying the actual daily precipitation variability there.
In general, SWAT performance for all streamflow stations (Fig. 7 and Fig. S2 and Table 5, Tables S1 and S2) suggests that RAIN4PE (e.g., median KGE = 0.80) is the most appropriate product for daily streamflow simulation, including all flow conditions in the study area.
2) Performance evaluation for monthly streamflow
Fig. 8 and Fig. S3 display the spatial distribution of KGE, NSE, lNSE, and PBIAS to assess the SWAT model skill for the monthly streamflow simulation in the calibration and validation periods. These figures show that results in both periods are quite similar, although the overall performance of PISCO-SWAT and MSWEP-SWAT is a bit lower in the validation period. Based on results of model performance in the validation period (Fig. 8), among the six precipitation products driving SWAT, overall RAIN4PE (median KGE = 0.86, NSE = 0.82, lNSE = 0.82, and |PBIAS| = 5.4%) provided the best performance measures for monthly streamflow simulation in all evaluated catchments. Despite the median KGE, NSE, and lNSE were satisfactory (>0.5, Fig. 8) for CHIRP, CHIRPS, MSWEP, and PISCO, the spatial patterns of these measures show the limitation (e.g., NSE < 0.5) of these products for hydrological modeling over the Ecuadorian Amazon, lower Amazon, and some catchments draining into the Pacific Ocean, which is in agreement with the results for the daily outputs. Otherwise, ERA5-SWAT was found to perform unsatisfactorily for Andean basins, although its performance improved for larger catchments in the Amazon basin. The overall very good performance in accordance with criteria by Moriasi et al. (2007) obtained by RAIN4PE-SWAT highlights the increased utility of RAIN4PE for countrywide hydrometeorological applications in Peru and Ecuador.

Hydrological model performance metrics KGE, NSE, lNSE, and PBIAS for monthly streamflow simulations by SWAT driven by six precipitation datasets in the validation period. Black points represent negative values of KGE, NSE, and lNSE.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

Hydrological model performance metrics KGE, NSE, lNSE, and PBIAS for monthly streamflow simulations by SWAT driven by six precipitation datasets in the validation period. Black points represent negative values of KGE, NSE, and lNSE.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
Hydrological model performance metrics KGE, NSE, lNSE, and PBIAS for monthly streamflow simulations by SWAT driven by six precipitation datasets in the validation period. Black points represent negative values of KGE, NSE, and lNSE.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
5. Discussion
a. Advantages of the merging methodology
This study demonstrates a successful method for merging multiple precipitation sources (based on gauge, satellite, and reanalysis data) with surface elevation using the RF method to generate a spatially gridded precipitation dataset RAIN4PE. This is supported by the significant improvement of RAIN4PE for hydrological simulations compared to the non-gauge-corrected datasets (CHIRP and ERA5) used for the merging procedure. Furthermore, the superiority of RAIN4PE regarding the gauge-corrected datasets (CHIRPS, MSWEP, and PISCO) for hydrological simulations suggests that the methodology applied herein to generate RAIN4PE is much more robust than that of the other merged precipitation products. This means that the RF method is more effective in merging multiple precipitation data sources than deterministic and geostatistical interpolation methods (Funk et al. 2015a; Aybar et al. 2020) and merging approaches that use weights for each source (Beck et al. 2017, 2019b). Compared to the aforementioned merging approaches, RF has the flexibility to include multiple precipitation sources and environmental variables (e.g., surface elevation) that explain precipitation patterns. Besides this advantage, RF can capture nonlinear dependencies and interactions of variables, such as the nonlinear interactions among the precipitation and terrain elevation due to the complex Andes morphology (Figs. 1 and 5; Chavez and Takahashi 2017), which could be challenging to model using geostatistical techniques. However, it is important to keep in mind the RF limitation for predicting value beyond the range in the training data (Hengl et al. 2018). Overall, the results of our study provide a reference for merging multisource precipitation data and environmental variables using RF in complex data-scarce regions.
b. Hydrological correction of the gridded precipitation datasets
The high BCF values (Fig. 4) obtained to correct gridded precipitation biases make evident that most of the datasets evaluated (CHIRP, CHIRPS, MSWEP, PISCO, and RAIN4PE) often have precipitation underestimation over the páramo and montane watersheds in the Amazon (Fig. 5). This underestimation, especially by gauge-corrected datasets, could be caused by the low number of precipitation gauges available (Fig. 1), which is further amplified by the fact that the gauges do not account for the important cloud/fog water input into the system (Gomez-Peralta et al. 2008; Clark et al. 2014; Cárdenas et al. 2017).
A substantial precipitation underestimation over the páramo and montane watersheds is critical since it might even lead to physically unrealistic runoff ratios above 1 in water budget estimates as reported in previous studies (Zulkafli et al. 2014; Zubieta et al. 2015; Manz et al. 2016; Strauch et al. 2017; Builes-Jaramillo and Poveda 2018; Aybar et al. 2020). Furthermore, precipitation errors in the upstream catchments can negatively affect simulation results for downstream river catchments. For instance, the assignment of unrealistic model parameter values to counterbalance precipitation uncertainty can lead to misrepresentation of the basinwide water budget (see more details in appendix B). To overcome these deficiencies, we used streamflow data to adjust precipitation biases.
Our results show that the hydrological correction of precipitation datasets was more efficient over the regions with the strongest rainfall seasonality such as the Peruvian catchments (Espinoza Villar et al. 2009; Segura et al. 2019). This suggests that actual spatiotemporal precipitation fields over these regions are well depicted by the assessed datasets, whereas the correction efficiency over the Ecuadorian Amazon catchments, which experience precipitation throughout the year with high spatial regime variability (Laraque et al. 2007; Tobar and Wyseure 2018), is more variable and depends more strongly on the precipitation product. For instance, the correction was not feasible for CHIRP, ERA5, MSWEP, PISCO, and even CHIRPS (at southern Ecuadorian Amazon) which led to the underrepresentation of the seasonal streamflow patterns and hence the true seasonal precipitation patterns as well (Figs. 4 and 6). This is a critical drawback of these products, and our findings here could be helpful for their revision and improvement. Even though the hydrological correction of CHIRPS resulted in the improvement of the seasonal streamflow simulations for the northern Ecuadorian Amazon, CHIRPS-SWAT still performed unsatisfactorily for the daily and monthly discharge dynamics (Figs. 7, 8 and Figs. S2 and S3), indicating that CHIRPS does not represent well the actual precipitation patterns over the Ecuadorian Amazon catchments. In these catchments, other datasets such as gauge-based ORE HYBAM (Guimberteau et al. 2012), gauge-corrected WFDEI, reanalysis ERA-Interim, and satellite-based PERSIANN (Hsu et al. 1997), TMPA, CMORPH, and IMERG have also been reported to perform unsatisfactorily for streamflow simulation (Zulkafli et al. 2014; Zubieta et al. 2015; Strauch et al. 2017; Zubieta et al. 2017; Towner et al. 2019). Overall, when comparing RAIN4PE to CHIRP, ERA5, MSWEP, PISCO, CHIRPS, and other datasets mentioned above, we can see that it shows satisfactory performance for monthly (Fig. 8) and seasonal (Fig. 4) streamflow simulations with SWAT over the Ecuadorian Amazon. However, its performance for daily simulation is still unsatisfactory (Fig. 7), which highlights that estimation of precipitation at a daily resolution over data-scarce regions such as the equatorial Amazon region is very challenging. The exposed shortcomings of precipitation datasets suggest the urgent implementation and densification of precipitation and cloud/fog gauge networks over the Ecuadorian Amazon and Peruvian montane watersheds. These could help to improve the depiction of rainfall amounts and their spatiotemporal distribution and hence could be useful for improving streamflow simulations. It is important to keep in mind that the correction of the proposed precipitation product through the reverse hydrology concept was performed using the SWAT hydrological model, and therefore the performance of the RAIN4PE dataset may change if another hydrological model is used. Though, as SWAT is a widely used comprehensively verified model, we expect only minor deviation.
c. Implications for hydrological modeling
The results of the hydrological evaluation clearly show the advantages and shortcomings of each evaluated precipitation dataset for streamflow simulation, including low, high, and peak flows. Moreover, we presented the comparison of SWAT-simulated seasonal streamflow using all evaluated datasets against observed seasonal streamflow for the three drainage systems (Titicaca Lake basin, Pacific basin, and Amazon basin) in Figs. S4–S6. These figures can assist practitioners in selecting the appropriate precipitation product for hydrological applications. In general, the hydrological evaluation highlighted RAIN4PE as the best precipitation dataset for hydrological modeling of the Peruvian and Ecuadorian watersheds. RAIN4PE is the only gridded precipitation product for Peru and Ecuador, which benefits from maximum available in situ observations, multiple precipitation sources, environmental variable (elevation data), and is supplemented by streamflow data to correct the precipitation underestimation over páramos and montane catchments. The exploitation of all these variables using state-of-the-practice methods to generate RAIN4PE proved that RAIN4PE-SWAT was capable of closing the (hitherto) observed water budget imbalance over Peruvian and Ecuadorian catchments which, eventually, makes the RAIN4PE a good candidate for hydrological applications in the region. Despite this, we consider that RAIN4PE is still subject to uncertainties, especially in regions where precipitation was inferred from the observed streamflow data. For these regions, precipitation estimates should be viewed with some care due to uncertainties in streamflow data, inferred evapotranspiration, gridded precipitation data, and hydrological model structure.
In this study, besides evaluating precipitation datasets for streamflow simulation, we show that uncertainties associated with precipitation estimates have implications in estimating hydrological model parameters (see appendix B) and water budget components (e.g., evapotranspiration, see appendix C). This is critical for the regionalization of parameters and reliable estimation of the water budget for water resources management. Furthermore, an aftermath verification of RAIN4PE-SWAT-simulated evapotranspiration with GLEAM and MOD16 estimates (appendix C) shows that GLEAM and MOD16 return higher estimated values of evapotranspiration which would not allow the water budget closure and bring inconsistencies in the temporal evapotranspiration distribution over northern Amazon in Ecuador. This suggests that evapotranspiration estimation is still a challenge for remotely sensed based evapotranspiration products in the region.
It is important to highlight that this study is the first applying SWAT updated for improved representation of tropical vegetation dynamics (Alemayehu et al. 2017) and river–floodplain dynamics. These improvements are crucial to model the hydrological processes of Andean and Amazonian river catchments appropriately. The benefits of appropriate representation of tropical vegetation dynamics were demonstrated in previous studies (Strauch and Volk 2013; Alemayehu et al. 2017; Fernandez-Palomino et al. 2020), while the benefit of flow water routing that considers river–floodplain dynamics can be observed in the good representation of discharge dynamics of the Amazonian rivers in this study. For instance, in the Ucayali River (a tributary of the Amazon River), the significant observed flood peak delay (on a scale of months) from Lagarto to Requena station is well reproduced by SWAT (see Fig. S6), which is consistent with the findings of Santini (2020).
It is also important to highlight that this study is the first applying SWAT at the country-level of Peru and performing a multiobjective calibration and validation using hydrograph goodness of fit and FDC signatures for large‐domain modeling (1.6 million km2) in a region with complex hydroclimatic conditions. Our results show the robustness of signature-based calibration guiding the model to reproduce not only one common objective function (e.g., high flows given by NSE) but all aspects of the hydrograph and FDC as supported by RAIN4PE-SWAT good performances reproducing all flow conditions. This is crucial for robust hydrometeorological applications including extremes such as droughts and floods as well as for the assessment of precipitation dataset reliability. Furthermore, our results reinforce previous study findings (Shafii and Tolson 2015; Chilkoti et al. 2018; Fernandez-Palomino et al. 2020), which proved the robustness of a signature-based calibration approach in the hydrological modeling of small watersheds. We consider that our approaches can be helpful for future studies related to precipitation estimates as well as to hydrological model calibration, evaluation, and application.
d. Future development and application
Based on the experiences we gained, our future investigations will focus on applying RAIN4PE-SWAT to analyze the water budget at the national scale of Peru, as well as climate change impacts on water resources using RAIN4PE as the basis for bias adjustment, and trends in frequency and intensity of meteorological and hydrological droughts. The current RAIN4PE data availability (1981–2015) is planned to be extended in the future. Moreover, the methodology presented in the paper will also be extended to the entire Amazon basin.
6. Summary and conclusions
We developed a new hydrologically adjusted daily precipitation dataset (1981–2015, 0.1° resolution) called RAIN4PE by merging three existing datasets for a domain covering Peru and Ecuador. This dataset takes advantages of ground-, satellite-, and reanalysis-based precipitation datasets, including CHIRP and ERA5, which are merged with terrain elevation using the random forest (RF) method to provide precipitation estimates. Furthermore, streamflow data was used to correct precipitation estimates over catchments with water budget closure problems (e.g., the páramo and montane watersheds) through the reverse hydrology methods, for which the SWAT model was applied for the first time herein. Moreover, a comprehensive hydrological evaluation of RAIN4PE, CHIRP, ERA5, and the existing state-of-the-art gauge-corrected precipitation datasets—CHIRPS, MSWEP, and PISCO—in the Peruvian and Ecuadorian river catchments using a range of performance metrics was performed. For that, SWAT was calibrated and validated with each precipitation dataset in a number of catchments. We summarize our findings as follows.
The good RAIN4PE-SWAT performance for streamflow simulation suggests the effectiveness of the RF method to merge multisource precipitation estimates with terrain elevation to develop a reliable spatially gridded precipitation dataset. As all datasets (CHIRP, ERA5, and terrain elevation) used to develop RAIN4PE are freely available, this approach can be used in other data-scarce regions.
The utility of streamflow data to improve both precipitation and streamflow simulations over the páramo and montane watersheds with precipitation underestimation was demonstrated herein. This highlights that the reverse hydrology approach offers a new effective way of understanding the hydrological processes of the Andean–Amazon catchments, which have a key role in the hydrological variability of the entire Amazon basin.
The hydrological evaluation results from uncorrected precipitation datasets forcing SWAT for streamflow simulation revealed that CHIRP outperformed ERA5, which significantly overestimate precipitation along the Andes. However, these products were outperformed by the gauge-based precipitation datasets.
Among the gauge-corrected precipitation datasets forcing SWAT for streamflow simulation, all products performed well in the catchments draining into the Titicaca Lake. For catchments draining into the Pacific Ocean and Amazon River, CHIRPS, MSWEP, and PISCO performed unsatisfactorily in several catchments, indicating the limitations of these products for hydrological modeling over these drainage systems. In contrast, RAIN4PE was the only product that provided consistently good performance for the daily and monthly streamflow simulations, including all discharge conditions (low, high, and peak flows) and water budget closure in almost all Peruvian and Ecuadorian river catchments.
We found that CHIRP, CHIRPS, ERA5, MSWEP, and PISCO cannot represent the seasonal distribution of precipitation and hence the seasonal streamflow over the Ecuadorian Amazon. This is a critical drawback that can have implications in hydrometeorological applications in the Amazon basin.
We found that uncertainties in precipitation data in existing datasets affect the estimation of model parameters and water budget components, suggesting the importance of developing high-quality meteorological forcing datasets in mountainous regions. Our contribution is in line with this and marks progress in developing precipitation datasets in the region.
The overall good performance of the RAIN4PE highlights its utility as an important new gridded precipitation dataset, which opens new possibilities for numerous hydrometeorological applications throughout Peru and Ecuador. Examples are streamflow simulations, estimation of the water budget and its evolution, water resources management, understanding spatiotemporal variations of droughts and floods, and exploring spatial variations and regimes of precipitation. We consider that RAIN4PE and our RAIN4PE-SWAT model can be adopted as a benchmark to evaluate precipitation datasets in Peru and Ecuador.
Acknowledgments.
The authors thank the EPICC project that is part of the International Climate Initiative (IKI). The Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (BMU) supports this initiative on the basis of a decision adopted by the German Bundestag. We also thank SENAMHI, Peruvian ANA, observatory HYBAM, Enrique Morán-Tejeda, Guido G. Tamayo, Juan J. Nieto, Vladimiro Tobar, and Kevin J. Perez for providing the hydrometeorological dataset. The authors are thankful to CHIRP, CHIRPS, ERA5, MSWEP, and PISCO data generation teams for providing the precipitation data at free of cost. We are thankful to Dr. David Hadka and Dr. Patrick M. Reed for making their software “BORG: Many-Objective Evolutionary Computing Framework” available for this study. We are grateful to the Editor, Liz Stephens, Oscar M. Baez-Villanueva, and one anonymous reviewer for their constructive comments.
Data availability statement.
The RAIN4PE data record is freely available at https://doi.org/10.5880/pik.2020.010 (Fernandez-Palomino et al. 2021).
APPENDIX A
Glossary
CHIRP |
Climate Hazards Group Infrared Precipitation |
CHIRPS |
CHIRP with Station data |
CMORPH |
Climate Prediction Center morphing technique |
IMERG |
Global Precipitation Measurement (GPM) Integrated Multisatellite Retrievals |
MSWEP |
Multi-Source Weighted-Ensemble Precipitation |
PISCO |
Peruvian Interpolated data of SENAMHI’s Climatological and Hydrological Observations |
SENAMHI |
Servicio Nacional de Meteorología e Hidrología del Perú |
TMPA |
Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis |
WFDEI |
WATCH Forcing Data methodology applied to ERA‐Interim data |
APPENDIX B
Evaluating the Distribution of Model Parameters
In this section, we analyze the distribution of calibrated model parameters to see the regional parameter behavior and to elucidate potential input errors as they were identified to achieve the water budget closure using different precipitation datasets. Thus, unrealistic parameter values could be linked to input error. We advise readers to see Table 3 for the description of parameters and Neitsch et al. (2011) for detailed parameter definitions. Among the calibrated SWAT parameters, only two (SOL_AWC, GW_REVAP) can alter the water budget since they influence evapotranspiration and, subsequently, runoff estimation. The remaining parameters influence the surface runoff (SURLAG), groundwater (GW_DELAY, RCHRG_DP, GWQMN, ALPHA_BF), and flow routing (CH_K2, CHD, FP_W_F) not affecting water loss from the system. We illustrate in Figs. B1–B3 the spatial patterns of the calibrated parameters related to six precipitation datasets.

Calibrated parameter values for the soil available water capacity (SOL_AWC) for topsoil (1) and subsoil (2), the surface runoff delay coefficient (SURLAG), and the groundwater delay time (GW_DELAY). The HWSD map shows SOL_AWC values derived from the Harmonized World Soil Database, which were used for setting up the SWAT model.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1

Calibrated parameter values for the soil available water capacity (SOL_AWC) for topsoil (1) and subsoil (2), the surface runoff delay coefficient (SURLAG), and the groundwater delay time (GW_DELAY). The HWSD map shows SOL_AWC values derived from the Harmonized World Soil Database, which were used for setting up the SWAT model.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1
Calibrated parameter values for the soil available water capacity (SOL_AWC) for topsoil (1) and subsoil (2), the surface runoff delay coefficient (SURLAG), and the groundwater delay time (GW_DELAY). The HWSD map shows SOL_AWC values derived from the Harmonized World Soil Database, which were used for setting up the SWAT model.
Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-20-0285.1