Exploring the Application of Flood Scaling Property in Hydrological Model Calibration

Yanchen Zheng aState Key Laboratory of Hydraulic Engineering Simulation and Safety, Tianjin University, Tianjin, China
bDepartment of Civil Engineering, University of Bristol, Bristol, United Kingdom

Search for other papers by Yanchen Zheng in
Current site
Google Scholar
PubMed
Close
,
Jianzhu Li aState Key Laboratory of Hydraulic Engineering Simulation and Safety, Tianjin University, Tianjin, China

Search for other papers by Jianzhu Li in
Current site
Google Scholar
PubMed
Close
,
Ting Zhang aState Key Laboratory of Hydraulic Engineering Simulation and Safety, Tianjin University, Tianjin, China

Search for other papers by Ting Zhang in
Current site
Google Scholar
PubMed
Close
,
Youtong Rong cSchool of Geographical Sciences, University of Bristol, Bristol, United Kingdom

Search for other papers by Youtong Rong in
Current site
Google Scholar
PubMed
Close
, and
Ping Feng aState Key Laboratory of Hydraulic Engineering Simulation and Safety, Tianjin University, Tianjin, China

Search for other papers by Ping Feng in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Model calibration has always been one major challenge in the hydrological community. Flood scaling properties (FS) are often used to estimate the flood quantiles for data-scarce catchments based on the statistical relationship between flood peak and contributing areas. This paper investigates the potential of applying FS and multivariate flood scaling properties [multiple linear regression (MLR)] as constraints in model calibration. Based on the assumption that the scaling property of flood exists in four study catchments in northern China, eight calibration scenarios are designed with adopting different combinations of traditional indicators and FS or MLR as objective functions. The performance of the proposed method is verified by employing a distributed hydrological model, namely, the Soil and Water Assessment Tool (SWAT) model. The results indicate that reasonable performance could be obtained in FS with fewer requirements of observed streamflow data, exhibiting better simulation of flood peaks than the Nash–Sutcliffe efficiency coefficient calibration scenario. The observed streamflow data or regional flood information are required in the MLR calibration scenario to identify the dominant catchment descriptors, and MLR achieves better performance on catchment interior points, especially for the events with uneven distribution of rainfall. On account of the improved performance on hydrographs and flood frequency curve at the watershed outlet, adopting the statistical indicators and flood scaling property simultaneously as model constraints is suggested. The proposed methodology enhances the physical connection of flood peak among subbasins and considers watershed actual conditions and climatic characteristics for each flood event, facilitating a new calibration approach for both gauged catchments and data-scarce catchments.

Significance Statement

This paper proposes a new hydrological model calibration strategy that explores the potential of applying flood scaling properties as constraints. The proposed method effectively captures flood peaks with fewer requirements of observed streamflow time series data, providing a new alternative method in hydrological model calibration for ungauged watersheds. For gauged watersheds, adopting flood scaling properties as model constraints could make the hydrological model calibration more physically based and improve the performance at catchment interior points. We encourage this novel method to be adopted in model calibration for both gauged and data-scarce watersheds.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jianzhu Li, lijianzhu@tju.edu.cn

Abstract

Model calibration has always been one major challenge in the hydrological community. Flood scaling properties (FS) are often used to estimate the flood quantiles for data-scarce catchments based on the statistical relationship between flood peak and contributing areas. This paper investigates the potential of applying FS and multivariate flood scaling properties [multiple linear regression (MLR)] as constraints in model calibration. Based on the assumption that the scaling property of flood exists in four study catchments in northern China, eight calibration scenarios are designed with adopting different combinations of traditional indicators and FS or MLR as objective functions. The performance of the proposed method is verified by employing a distributed hydrological model, namely, the Soil and Water Assessment Tool (SWAT) model. The results indicate that reasonable performance could be obtained in FS with fewer requirements of observed streamflow data, exhibiting better simulation of flood peaks than the Nash–Sutcliffe efficiency coefficient calibration scenario. The observed streamflow data or regional flood information are required in the MLR calibration scenario to identify the dominant catchment descriptors, and MLR achieves better performance on catchment interior points, especially for the events with uneven distribution of rainfall. On account of the improved performance on hydrographs and flood frequency curve at the watershed outlet, adopting the statistical indicators and flood scaling property simultaneously as model constraints is suggested. The proposed methodology enhances the physical connection of flood peak among subbasins and considers watershed actual conditions and climatic characteristics for each flood event, facilitating a new calibration approach for both gauged catchments and data-scarce catchments.

Significance Statement

This paper proposes a new hydrological model calibration strategy that explores the potential of applying flood scaling properties as constraints. The proposed method effectively captures flood peaks with fewer requirements of observed streamflow time series data, providing a new alternative method in hydrological model calibration for ungauged watersheds. For gauged watersheds, adopting flood scaling properties as model constraints could make the hydrological model calibration more physically based and improve the performance at catchment interior points. We encourage this novel method to be adopted in model calibration for both gauged and data-scarce watersheds.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jianzhu Li, lijianzhu@tju.edu.cn

1. Introduction

As an essential tool to represent the dynamic and complex hydrological processes in watersheds, hydrological models are widely used to support water resource management, planning, and decision-making (Niehoff et al. 2002; Bormann et al. 2007; Huisman et al. 2009; Daggupati et al. 2015). Generally, to provide reliable simulation results for floods with hydrological models, the model parameters should be calibrated carefully. However, the hydrological processes are a result of complex interactions of catchment physical processes, which are highly changeable in both space and time (Gupta 2004; Ayalew et al. 2014). Finding an optimal set of parameters to provide a reasonable streamflow prediction of catchment outlet or streamflow of any catchment interior locations is quite difficult (Wanders et al. 2014), especially for data-scarce catchments. In addition, even for the gauged watershed, the model parameters that need to be calibrated for different model modules are also increasing with the wide application of distributed hydrological models, which further increases the difficulty of hydrological model calibration.

Beven (2006) believed that only a few parameter sets could meet the multiple calibration criteria simultaneously in model calibration, indicating that the model performance could be improved by adding more constraints. Hence, many studies tried to develop different calibration strategies to constrain the model performance based on all available information of the catchment. Calibrating the hydrological model with multivariable data is one of the common ways to improve the simulation performance among recent research (Chiang et al. 2014; White and Chaubey 2005). Numerous researchers calibrated streamflow and other hydrological variables simultaneously (Koren et al. 2008; Li et al. 2018; Pfannerstill et al. 2017; Rajib et al. 2016; Omani et al. 2017; Tuo et al. 2018; Lu et al. 2014; Wanders et al. 2014; Rajib et al. 2018). Multisite calibration method is also a widely used calibration strategy (Cao et al. 2006; Zhang et al. 2008, 2010; Niraula et al. 2012; Wi et al. 2015; Fenicia et al. 2016; Hughes et al. 2016).

a. Calibration strategies based on limited flow data for data-scarce catchments

However, sufficient observed measurements are required in multisite and multivariable calibration strategies. In this section, we introduce some other calibration strategies for data-scarce catchments.

Numerous studies are devoted to exploring the minimum streamflow time series length for achieving robust model calibration results. The recorded minimum streamflow time series length varied from 3 months to 8 years among different studies (Yapo et al. 1996; Brath et al. 2004; Perrin et al. 2007). Seibert and Beven (2009) indicated that a few runoff measurements can contain as much of the information as the continuous runoff time series. For instance, a single event or 10 observations during high flows could provide the same information as the continuous 3 months of data (Seibert and McDonnell 2013). They also proved that maximum flows series contain more information than minimum or mean flows. Similarly, Melsen et al. (2014) reported that the season (5 months) with the highest precipitation is sufficient to give a robust simulation of high flows over the full observation period. Singh and Bárdossy (2012) demonstrated that calibration on identified events representing 6%–7% of a 10-yr time series could obtain the reasonable results compared to calibration on the whole time series.

Some researchers further investigated the optimal sampling strategy to extract the most informative runoff measurements for constraining the model. Pool et al. (2017) compared 12 runoff measurements sampling strategy with selection ranging from simply monthly maximum runoff measurements to more complex observation times series data. They demonstrated that flow-duration curves were best estimated with strategies including low and mean flows, while strategies with high runoff magnitudes produced the best simulation of hydrographs.

Another type of calibration strategy is employing regional information or a priori information as model constraints to estimate the parameters (Hundecha and Bárdossy 2004; Götzinger and Bárdossy 2007). The goal of a regionalization approach is to transfer the information from gauged catchments to ungauged catchments based on the statistical relations between catchment characteristics and model parameters (Viviroli and Seibert 2015; Samaniego et al. 2017). This is the most common way to estimate the model parameters for ungauged catchments (Perrin et al. 2008; Rojas-Serna et al. 2016; De Lavenne et al. 2019). Additionally, combining the regionalization method and point flow measurement appears to be a promising method to estimate the model parameters for ungauged catchments. Some studies proved that employing a few runoff observations could improve the model calibration performance (Viviroli and Seibert 2015; Seibert 2010; Rojas-Serna et al. 2016; Pool et al. 2018).

b. Scaling property of floods

Mendoza et al. (2015) mentioned that integrating prior knowledge of hydrologic processes could improve the relatively poor performance of hydrological models. We attempted to consider how to add constraints for the model hydrologic processes based on limited data. In general, observations during wet periods, especially the event peak, contain valuable information for model parameter estimation (Yapo et al. 1996; Seibert and Beven 2009; Melsen et al. 2014; Pool et al. 2017). The peak discharge reflects various aspects of the rainfall rate, spatial–temporal variability, and watershed characteristics such as soil moisture, land use, and land cover (Ogden and David 2003). The statistical relationship of flood peaks with the size of contributing upstream area, which captures the interactions of the streamflow-generating process, forms the basis of many empirical approaches for the estimation of floods in data-scarce catchments (Ayalew et al. 2014; Farmer et al. 2015). A quantile-based flood scaling property has been applied in the study of design flood estimation along with flood risk assessment (Ishak et al. 2011; Basu and Srinivas 2015) and has been used to develop regional flood frequency (RFF) equations for many decades (Dawdy et al. 2012; Smith et al. 2015; Furey et al. 2016). However, Furey et al. (2016) pointed out that the quantile-based analysis allowed for event mixing, indicating that different flood events data were employed to determine a scaling relationship. Under these circumstances, the flood results could be underestimated. Another type of flood scaling property, namely, event-based flood scaling property, was taken as a more informative and robust method for establishing the relation between the flood peak and the drainage area in a watershed. A wide range of empirical and theoretical studies revealed that the scaling relationship at the flood event scale can be expressed in a power-law form Q(A) = αAθ, where Q(A) is peak discharge, A is the drainage area, α is the intercept, and θ is the exponent (Ogden and David 2003; Furey and Gupta 2005, 2007; Mandapaka et al. 2009; Ayalew et al. 2014, 2015).

Nevertheless, considering drainage area as the only explanatory variable of event flood peak is not sufficient in some cases. Jothityangkoon and Sivapalan (2001) illustrated that the spatial scaling of flood peaks is closely related to the space–time variability of rainfall–runoff processes through the underlying catchment water balance, including the effects of rainfall, evaporation, antecedent soil moisture storage, and the stream network geomorphology. Al-Rawas and Valeo (2010) explored the effect of various watershed characteristics on peak flood flow based on a large number of observed flood events. Ishak et al. (2011) employed a total of 10 explanatory variables to improve flood predictions. Hence, the so-called multivariate flood scaling property based on a multiple regression framework considering various watershed explanatory variables could help better grasp the rule of flood scaling and avoid estimation bias (Farmer et al. 2015).

c. Scope of this paper

This study aims to develop a novel calibration method that could be used in both gauged and data-scarce catchments for hydrological model calibration. The proposed method employs event-based scaling properties of flood as model constraints, which utilizes the statistical relationship between event flood peak and drainage area. Since estimation of the design flood in ungauged catchments is normally based on the scaling properties of flood, adopting the flood scaling property in model calibration exhibits the potential of applying the proposed method in both gauged catchments and data-scarce catchments. An innovation of this paper is to explore the potential of applying flood scaling properties as model constraints to calibrate a distributed hydrological model for the first time. Moreover, apart from the drainage area, the other catchment descriptors affecting flood peak are also considered as explanatory variables for describing the scaling properties of flood. Thus, this paper aims to explore the following questions: 1) Does the flood scaling property have the potential to behave as a model constraint to calibrate a hydrological model? 2) How much can this new calibration strategy improve the model performance? 3) If more catchment descriptors are utilized as the explanatory variables, will the model calibration results outperform one that only considers drainage area?

2. Study area and data

a. Study area

Four watersheds, namely, Zijingguan watershed, Fuping watershed, Shifu watershed, and Kuancheng watershed, which are all located in the same typical temperate continental and semiarid climate region in north China, were selected as the study catchments. The geographic locations of these four study catchments are shown in Fig. 1, and the main catchment characteristics are listed in Table 1. The drainage areas of these four study areas range from 1031 to 2175 km2 with an average of 1659 km2. Forest and grass are the dominant land cover type of these study areas. In these four catchments, short heavy rainfall is the main driving factor of the flood. Precipitation from July to August accounts for 65%–85% of the annual rainfall. Previous studies have proven that the event-based flood scaling property exists in this region (Li et al. 2013, 2019; Kang et al. 2019; Li et al. 2020).

Fig. 1.
Fig. 1.

Geographic location and detailed subbasin division map of the four study catchments: (a) Zijingguan watershed, (b) Shifu watershed, (c) Fuping watershed, and (d) Kuancheng watershed.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

Table 1.

Main catchment characteristics information on the study areas.

Table 1.

b. Data

The observed daily streamflow data for the four watershed outlets during the flood seasons (June–September) were available. The corresponding periods of observed daily rainfall data were also collected. The number of rainfall gauges within the study catchment and the records period of both rainfall–runoff data were listed in Table 2. Additionally, daily precipitation data from Climate Forecast System Reanalysis (CFSR) datasets (Fuka et al. 2013; Dile and Srinivasan 2014) were adopted to integrate the missing values of the measured rainfall series for the dry season. Since the rainfall is mostly concentrated in the flood season in the four study catchments, flood events seldom occurred in the nonflood season. Furthermore, several researchers have proved that calibration with short, wet time periods of the observations is sufficient to provide the robust simulation compared to the calibration on the whole time series (Singh and Bárdossy 2012; Melsen et al. 2014; Seibert and McDonnell 2013; Seibert and Beven 2009). More importantly, our research goal is to compare the calibration results between different calibration strategies, rather than achieve the best performance. Hence, the available streamflow and rainfall records data are enough to obtain the reasonable calibration results.

Table 2.

Rainfall–runoff data records information for four study catchments.

Table 2.

1) Selection of flood events

Several flood events from the observed flood series were selected as key research objects to verify the flood scaling property in the study areas. Many previous studies indicated that a 2-yr return period flood, namely, natural bank-full discharge, could be considered as a threshold flow symbolizing the occurrence of flood events, which is often employed as a global minimum flood protection standard (Weeink 2010; Scussolini et al. 2016; Zheng et al. 2020). Thus, in this study, all flood events in the calibration periods with the flood peak larger than 2-yr return period flood were selected as significant flood events. The special number of identified flood events for four study catchments can be found in Table 2.

2) SWAT model configuration

The land use and land cover maps for the study areas were collected from the Institute of Geographic Sciences and Resources of the Chinese Academy of Sciences (http://www.resdc.cn/). The Harmonized World Soil Database (HWSD) v1.1 soil dataset (FAO/IIASA/ISRIC/ISS-CAS/JRC 2009) was utilized to extract the soil groups. The weather data spanning 1979–2014, including air temperature, wind speed, relative humidity, and solar data, provided by CFSR datasets (Fuka et al. 2013; Dile and Srinivasan 2014), were downloaded from the website https://globalweather.tamu.edu/.

Based on the homogeneity of land characteristics, the delineation of subbasins was performed by employing the ArcSWAT tool. Reasonable threshold values were adopted for soil, slope, and land cover to delineate the hydrological response units (HRUs) (Srinivasan et al. 2010; Sexton et al. 2010; Han et al. 2012; Her et al. 2015) for four study catchments. A total of 500 parameter sets were sampled from initial range by Latin hypercube sampling (LHS). The simulation results of these 500 parameter sets were utilized to conduct the global sensitivity analysis to screen out the sensitive parameters. The initial parameter range was determined based on the previous research, in which the Soil and Water Assessment Tool (SWAT) model was also adopted to simulate the streamflow in the Zijingguan basin and its adjacent basin (Bu et al. 2018; Wang et al. 2021). The parameter range of some parameters, which are not used in the mentioned studies, are set to some widely used parameter range in SWAT-model-related research (Kouchi et al. 2017; Abbaspour et al. 2015; Chilkoti et al. 2018). Table 3 lists the potential parameters and their sensitivity ranking for the four study catchments. Only those significant sensitive parameters were selected to be calibrated in the comparison of different calibration scenarios, while the other parameters were fixed with the optimal value according to the first 500 runs model simulation results. The first 2 or 3 years were utilized as the warm-up period. Among the rest time periods of data, 70% of the data were allocated for calibration, and 30% for validation, as shown in Table 2. Due to the large proportion of missing data in Zijingguan watershed after 2000, the verification period for this catchment is divided for a relatively longer period to ensure that there are enough observed data for comparison.

Table 3.

Potential parameters and their sensitivity ranking for the four study catchments. An asterisk indicates that this parameter is significantly sensitive in this catchment, and this parameter was selected to be calibrated in further analysis.

Table 3.

3) Collection of catchment descriptors

To better describe the flood peaks with employing multivariate flood scaling property, a total of 28 catchment descriptors were collected in this paper, which can be divided into two categories. The first category is the internal watershed characteristics, depending on the long-term controls such as the properties of the river network and other topographical factors. This type of catchment descriptor normally can be delineated and computed based on DEM and land use data. Since the watershed topography and stream network geomorphology exhibit limited changes over long periods, the internal catchment descriptor basically remains unchanged for the selected flood events.

The other type of catchment descriptor varies in time, depending on event characteristics such as external rainfall characteristics, antecedent watershed soil moisture conditions, and evaporation. Consequently, rainfall characteristics (rainfall intensity, duration, etc.) were calculated based on the observed precipitation data for the selected flood events. Other catchment descriptors reflecting the antecedent watershed conditions, such as soil moisture content and evaporation data, were extracted from GLDAS-Noah datasets (Rodell et al. 2004; Beaudoing and Rodell 2015) due to the lack of observed data in the study areas. GLDAS-Noah v2.0 datasets, which are available at https://giovanni.gsfc.nasa.gov/, currently cover the period from January 1948 to December 2014 and contain 36 available variables, with a 3-h temporal interval and 0.25° spatial resolution. Five soil moisture content indexes and five evaporation indexes were selected from the GLDAS-Noah datasets. In this paper, the instant soil moisture content data at the closest time step before the date of each rainfall–runoff event were derived to reflect the antecedent wetness condition of the watershed. Both the mean values of soil moisture content data and evaporation data during each rainfall–runoff event were also collected. The area-weighted method was employed to calculate the average soil moisture content and evaporation value. As a result, a total of 28 catchment descriptors were adopted in this study, including 9 internal and 19 external catchment descriptors, as listed in Table 4.

Table 4.

List of the potential catchment descriptors influencing peak discharges.

Table 4.

3. Method

a. Identification of catchment descriptors

With drainage area being treated as the principal catchment descriptor in most studies (Meigh et al. 1997; Menabde and Sivapalan 2001; Jothityangkoon and Sivapalan 2001; Lima and Lall 2010; Lee and Huang 2016; Furey and Gupta 2007; Al-Rawas and Valeo 2010), the scaling property of flood generally takes the form expressed in Eq. (1). However, the flood peak is the result of a complex rainfall–runoff process under the influence of rainfall, watershed characteristics, and other factors, so flood scaling property based on multiple regression analysis is also employed in this study as an objective function to avoid the bias caused by only considering the watershed area. The multiple regression equation is expressed in Eq. (2). Both forms of flood scaling property are considered as objective functions in model calibration to conduct further analysis. Employing the power-law equation to describe the relationship between the flood peak and the explanatory variables is the common and typical method of applying the flood scaling property (Ogden and David 2003; Furey and Gupta 2005, 2007). More importantly, previous studies have proven that the event-based flood scaling property in the form of power-law equation exists in the study area (Li et al. 2013, 2019; Kang et al. 2019; Li et al. 2020):
Q(A)=αAθ,
Q=b0×X1b1×X2b2×X3b3××Xqbq,
where Q is the flood peak discharge (m3 s−1), A is the drainage area, α is the coefficient, and θ is the exponent; b0 is a constant, X1Xq are the explanatory variables, namely, catchment descriptors, such as precipitation intensity, soil moisture content, and stream density, and b1bq are the regression coefficients.

The best subset regression method was performed on catchment descriptors with event flood peak to find the optimal input combination of catchment descriptors. The goal of best subset selection is to choose the best subset of variables such that the resultant regression model has the best prediction accuracy (Zhu et al. 2020). Compared to the common variable selection method, best subset regression tests all possible combinations of the potential variables in a regression equation and identifies the best solution (Neter et al. 1989). This method is quite suitable for the feature selection problem with fewer candidate variables in total. In recent years, the best subset selection method has been frequently applied in hydrology community (Howard et al. 2010; Lacombe et al. 2014; Okcu et al. 2016).

For measuring the regression model performance, adjusted R2, Bayesian information criterion (BIC) (Schwarz 1978) and Mallows’ coefficient (Cp) (Mallows 1973) were adopted as the evaluation metrics. As the modification version of R2, adjusted R2 is more reasonable when evaluating model fit with many explanatory variables. The higher the adjusted R2, the better the model. To avoid information redundancy or multicollinearity problems among various catchments descriptors (Al-Rawas and Valeo 2010; Ishak et al. 2011), BIC and Cp criterion were employed to select the best performing model with the least number of input variables. A small BIC value implies a good selection of subset. The model with the lowest Cp value is preferred.

b. Objective functions and calibration criteria

When the flood scaling property is taken as the objective function, the flood peaks from each reach outlet for each selected flood events along with its corresponding contributing upstream area is employed to fit the power-law equation. The objective functions of flood scaling property are expressed in Eqs. (3) and (4). Figure 2 demonstrates the process of adopting flood scaling property in SWAT model calibration:
Rfsj2=1ni=1nRfsi2,
max{Rfsj12,Rfsj22,Rfsj32,,Rfsjm2}subject toRfsjk2>0.6(k=1,2,3,,m),
where Rfsi2 denotes the R2 of each power-law equation for flood scaling property, and Rfsj2 represents the mean value of Rfsi2; n is the number of selected flood events, namely, the number of power-law equations; m is the simulation times of the SWAT model in each iteration.
Fig. 2.
Fig. 2.

Schematic representation of employing flood scaling property in SWAT model calibration.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

For instance, 12 flood events were selected for the Zijingguan watershed. Consequently, a total of 12 power-law equations along with the Rfsj2 of 12 flood events can be acquired for each model simulation. Li et al. (2020) have performed event-based flood scaling analysis in the Zijingguan watershed with the simulated subbasin peak discharge using the Hydrologic Engineering Center Hydrologic Modeling System (HEC-HMS). They reported that R2 varies for 14 different flood events in the range from 0.329 to 0.980 with an average of 0.735. In this study, the threshold of the mean value of 12 power-law equations Rfsj2 was set as 0.6, which is considered as the basic requirement to verify whether this model simulation meets the flood scaling property. The larger the Rfsj2 of 12 power-law equations, the better the model performance.

For multivariate flood scaling property, the flood peaks from each reach outlet of all selected flood events along with its corresponding catchment descriptors were extracted to fit the logarithmic form of the multiple regression equation. The multiple linear regression (MLR) equation and the objective functions are expressed in Eqs. (5) and (6):
logQ=log(b0)+b1log(X1)+b2log(X2)+b3log(X3)++bqlog(Xq),
max{Rmlrj12,Rmlrj22,Rmlrj32,,Rmlrjm2}subject toRmlrjk2>0.6(k=1,2,3,,m)pmlrjk<0.05(k=1,2,3,,m),
where Q is the peak discharge (m3 s−1), X1Xq are the potential catchment descriptors of reach outlets, q is the number of catchment descriptors, and b0bq are the regression coefficients of the MLR equation. The term Rmlrj2 denotes the R2 of jth MLR equation, and pmlrj is the p value of jth MLR equation.

Since the flood peaks from each reach outlet for all selected floods and its corresponding catchment descriptors were extracted to fit the MLR equation, only one MLR equation can be obtained during each SWAT model simulation. For each MLR equation, two basic requirements should be satisfied to achieve the best performance. First, the MLR equation should pass the F test to examine whether the fitting equation is significant. Thus, the p value of the MLR equation should be less than 0.05. Furthermore, a larger F statistic represents better regression fitting. If two MLR equations reach the same Rmlrj2, the F statistic can be utilized to determine the superiority of the two equations. Similarly, the goodness of fit for multivariate flood scaling property is also determined on the basis of Rmlrj2. The larger the Rmlrj2, the better the model performance; Rmlrj2 > 0.6 is also utilized as the threshold criterion.

To compare the calibration performance under different objective functions, traditional statistical objective functions were also adopted. As the most widely adopted statistical index in model calibration (Tuo et al. 2018; Cao et al. 2018; Rajib et al. 2018; Odusanya et al. 2019; Lee et al. 2020), the Nash–Sutcliffe efficiency coefficient (NSE) was employed in this study. In addition, the Kling–Gupta efficiency (KGE) was also applied due to its three components structure allowing a better understanding of the model performance (Gupta et al. 2009; Knoben et al. 2019). The equations for NSE and KGE are presented in Eqs. (7) and (8):
NSE=[1i=1n(EiOi)2i=1n(OiO¯i)2],
KGE=1(r1)2+(α1)2+(β1)2,
where Ei is the ith SWAT-simulated streamflow data; Oi is the corresponding observed streamflow data; O¯i is the average of the observed streamflow data; E¯i is the average of the simulated streamflow data; n is the total number of observed values; r is the Pearson correlation coefficient between observed and simulated streamflow data; α is the standard deviation of the simulated streamflow data over the standard deviation of observed streamflow data; and β is the ratio of the mean simulated streamflow data to observed streamflow data.

c. Calibration scenarios design and evaluation metric

To explore the model performance of adopting flood scaling property (FS) and MLR as objective functions, three groups of calibration scenarios were designed as follows.

  1. As the comparison, traditional NSEs or KGEs were first employed as objective functions separately. These two objective functions are widely used in most of the research. The observed streamflow data of watershed outlet is needed in these two calibration scenarios.

  2. Then, the new FS or MLR objective functions were adopted to explore the potential of applying the scaling property of flood in model calibration. It is important to note that only the simulated event flood peaks during each model run need to be extracted to fit the power-law equations with corresponding catchment contributing areas and behaved as objective function in FS calibration scenarios. Observed streamflow data were employed to perform the flood frequency analysis and adopted as the criterion of selecting the significant flood events. For the calibration scenarios with MLR, the observed streamflow time series data are also not required during the calibration processes. But the event flood peaks of watershed outlets are still needed, as there are various catchment descriptors available to be chosen as explanatory variables. We need to figure out which catchment descriptors have an influential impact on event flood peaks. So, the identification of the most influential catchment descriptors on event flood peak should be performed first.

  3. The combination of FS or MLR and traditional NSE or KGE were designed as another four calibration scenarios, namely, FS&NSE, FS&KGE, MLR&NSE, and MLR&KGE. In these four calibration scenarios, with considering FS or MLR as constraints, the best simulation was selected according to the NSE or KGE value. Under these circumstances, both the effect of flood scaling property and the simulation performance at the watershed outlet are guaranteed. Consequently, the observed streamflow data at the watershed outlet are also required in these calibration scenarios.

Hence, a total of eight calibration scenarios, including NSE, KGE, FS, MLR, FS&NSE, FS&KGE, MLR&NSE, and MLR&KGE, were designed.

For model evaluation criterion, apart from the performance of watershed outlet hydrograph represented by NSE or KGE value, flood frequency curve of annul maximum flood peak was also considered as an evaluation aspect in this study. As one of the widely used hydrological signatures, the flood frequency curve describes the relationship between the frequency and magnitude of the flood peak (Vogel and Fennessey 1994; McMillan 2021), which is also a major concern of the design flood calculation. Flood frequency curve has been adopted to behave as the objective function in model calibration in a growing number of research in recent years. (Westerberg et al. 2011; Pokhrel et al. 2012; Pfannerstill et al. 2014; Pool et al. 2018; Garcia et al. 2017). In the meanwhile, the flood frequency curve is also a critical descriptor of evaluating model performance (Ley et al. 2016; Chilkoti et al. 2018), especially in assessing the comparison results of several calibration strategies (Pool et al. 2017). As employing flood scaling properties as objective functions is a relatively new proposed calibration method, the influence caused by this method on both hydrograph and flood frequency curve would be of interest to us. So, we focus on the comparison results between different calibration scenarios including the traditional objective functions and also the proposed ones. Since the simulated flood frequency curve could indicate the model performance, we prefer to employ flood frequency curve to behave as evaluation metric, helping us to find the optimal calibration scenario.

Since each flood frequency curve segment describes different hydrological processes (Yilmaz et al. 2008; Pokhrel et al. 2012; Pfannerstill et al. 2014), we partition the flood frequency curve into four segments and evaluate the performance on each segment by calculating the mean relative error (MRE) in this work. The four segments are 1) the very high-flow segment within a range of flow exceedance probability less than 5% (Q5), 2) the high-flow segment within a range between Q5 and Q20, 3) the midflow segment within a range between Q20 and Q70, and 4) the low-flow segment within a range of flow exceedance probability larger than 70%. The positive and negative of MRE value represents whether the simulated value overestimates or underestimates the observed value. The closer the absolute value of MRE is to 0, the better the model result is. Equation (9) is used to calculate the MRE for each flood frequency curve segment:
MRE=1Ni=1NEiOiOi,
where Ei is the simulated data; Oi is the corresponding observed data; and N is the total number of observed values within each segment.

d. SWAT model calibration implementation process

The sequential uncertainty conformity (SUFI-2) optimization algorithm is one of the most widely used calibration approaches in the SWAT model. This algorithm is iterative and could achieve good calibration performance with the smallest number of simulations (Yang et al. 2008; Abbaspour et al. 2015, 2017). The specific schematic coupling of SWAT and SUFI-2 is presented in Fig. 3. More details about SUFI-2 algorithm can be found in Yang et al. (2008) and Abbaspour (2015).

Fig. 3.
Fig. 3.

Schematic diagram showing the coupling process of SUFI-2 and SWAT and the implementation of flood scaling property during model calibration.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

Given that the objective of this paper is to apply flood scaling property in SWAT model calibration, the criterion metric is not the only traditional statistical index. Therefore, on the basis of the SUFI-2 algorithm procedure, the following two steps should be noted to employ the flood scaling property.

  1. The first difference from the traditional calibration method is related to the extraction of the simulated value. Instead of only extracting the simulated value of the watershed outlet or a certain subbasin, the simulated flood peak of each reach outlet for the selected flood events must be extracted from the simulated subbasin streamflow. For instance, as presented in Fig. 1a, a total of 11 reach outlets were identified in the Zijingguan watershed. The simulated peak discharge at each reach outlet during the selected eight flood events must be extracted from the simulated subbasin streamflow. Moreover, the simulated value at the watershed outlet is also extracted to guarantee the calculation of traditional statistical indices.

  2. The application of flood scaling property establishes the power-law fitting function between flood peak at each reach outlet and the corresponding contributing upstream area. For multivariate flood scaling property, a multiple linear regression equation is constructed with the flood peak and the identified catchment descriptors. New objective functions have been developed to employ the proposed calibration method, as only traditional statistical objective functions are available in SUFI2_goal_fn.exe in the SUFI-2 algorithm.

4. Results

a. Identification of catchment descriptors

Due to the lack of observed streamflow data of subbasins, the observed peak discharge of each flood event at watershed outlets from four study catchments and the corresponding potential catchment descriptors are employed to perform the best subset regression analysis. Table 5 lists the best performance regression models with adopting different number of catchment descriptors. The value of adjusted R2 shows an upward tendency with an increase in the number of catchment descriptors, reaching 0.69 in model 5. However, BIC and Cp value obtain the minimum value in model 2 with a total number of four catchment descriptors, indicating that these four catchment descriptors could predict the reasonable results with an acceptable adjusted R2. Adding more explanatory variables in regression model may lead to information redundancy or overfitting problems. Interestingly, instead of commonly used drainage area, the shape factor (SF) is identified to be one of the significant catchment descriptors for flood peak. Since SF is computed by dividing the squared basin length by basin area, the results indicate that the shape of catchment has more effect on flood peak prediction than drainage area in the study areas. One possible reason is that the drainage area of these four study areas is close with a small standard deviation. To sum up, a total of four catchment descriptors, namely, SF, MSM10_40, TSM100_200, and EV, are selected as the explanatory variables in multivariate flood scaling calibration scenarios.

Table 5.

Selection of regression models for different combination of input catchment descriptor. Bold indicates the best performance in each criterion.

Table 5.

b. Different calibration scenarios result with the SUFI-2 algorithm

The three groups of calibration scenarios with a total number of eight are performed by SUFI-2 algorithm. For all eight calibration scenarios, five iterations are executed with SWAT model running a total of 2500 times under each calibration scenario. The first 500 sets of parameters for each calibration scenario are obtained by LHS in the initial range of SWAT parameters, so the first 500 sets of iterative parameters adopted in the first group of each calibration scenario are identical. However, due to the different constraints of each calibration scenario, the range of new recommended parameters obtained by each iteration varies from different calibration scenarios, so finally, different streamflow simulation results are presented for the eight calibration scenarios. The simulated streamflow under eight calibration scenarios during the calibration period is shown in Fig. 4. Figure 5 presents the performance of two statistical indicators at the watershed outlet with the increase of model iterations.

Fig. 4.
Fig. 4.

SWAT model simulation by adopting the SUFI-2 algorithm under the eight calibration scenarios in (a) Zijingguan (ZJG), (b) Fuping (FP), (c) Kuancheng (KC), and (d) Shifu (SF) catchments during calibration periods.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

Fig. 5.
Fig. 5.

Performance of two statistical indicators under eight different calibration scenarios with increasing model iterations.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

1) Calibration for NSE or KGE as objective functions

The mean NSE value for the four study watersheds at watershed outlet reaches 0.59 after five iterations with traditional NSE as the objective function. The KGE indicator achieves 0.65 when considering KGE as a single objective function. The results are not overly encouraging, which could be attributed to the complicated hydrological behaviors in subhumid and semiarid regions and limited observed data (Kannan et al. 2007; Jeong et al. 2010; Hapuarachchi et al. 2011; Ragettli et al. 2017). Thus, model calibration in these four study catchments is more difficult. Given that our research goal is to explore the potential of applying flood scaling property in model calibration, the high accurate model simulation result is not our main purpose. Thus, these two calibration scenarios results are utilized as the comparative experiments, representing the commonly adopted model calibration method. Additionally, as demonstrated in Fig. 4, it can be found that the shape of the hydrograph employing NSE as the objective function tends to be smoother than that employing KGE, while the simulated streamflow under KGE calibration scenarios presents better flood peak simulations.

2) Calibration for FS or MLR as objective functions

When FS is considered as the single objective function and without constraining the watershed outlet streamflow, both the NSE and KGE indicators of streamflow simulation at four study watershed outlets are not high. The identical situation occurs in another calibration scenario, in which only MLR is considered as the objective function. However, under these two calibration scenarios, the ability to capture and simulate the flood peak is better than the calibration scenarios with traditional objective functions. The reason may be that by employing FS or MLR as the objective function, the spatial scaling relationship between subbasins is enhanced during the calibration process. Consequently, the simulated flood peak value of each subbasin satisfies the statistical relationship with the basin area, meteorological factors, etc., which improves the simulation accuracy of the flood peak for each subbasin. Figure 6 displays the variation in the goodness of fit, namely, the mean R2 of the power-law equation (FS calibration scenario) and R2 of the multiple linear regression equation (MLR calibration scenario), for the four study catchments from the first model iteration to the fifth iteration.

Fig. 6.
Fig. 6.

The variation in the goodness of fit of the fitting equation from the first iteration to the fifth iteration for the four study catchments under (a) FS and (b) MLR calibration scenarios.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

Figure 6 illustrates that all R2 values for each study catchment under these two calibration scenarios vary greatly in the first iteration. More specifically, as presented in Fig. 6a, except the mean value of R2 in the first iteration is low in KC watershed with 0.35, the mean value of R2 for the rest of the study watersheds could reach around 0.78 in the first iteration of FS calibration scenarios. This difference of R2 among four study watersheds is more obvious in MLR calibration scenarios, and the average R2 of the four watersheds fluctuates between 0.4 and 0.75 in the first model iteration. The results indicate that the simulated streamflow corresponding to many sets of parameters in the first iteration does not satisfy the flood scaling property. However, in the fifth iteration, the R2 values for both the FS and MLR calibration scenario provide a better fit, exhibiting the potential of applying flood scaling property as model constraints.

3) Calibration for the combination of FS or MLR and NSE or KGE as objective functions

As demonstrated in Fig. 5, in the KGE or NSE calibration scenarios could obtain the higher KGE or NSE evaluation value at watershed outlet than only FS or MLR employed as the single objective function. Since the FS and MLR calibration scenarios have constraints on event flood peak, facilitating the simulation of the event flood peaks is more consistent with the flood generation process. These calibration scenarios are more focused on the simulation about the event flood peak rather than the simulation results of hydrograph at watershed outlets. So basically, the results of calibration scenarios with single FS or MLR have poor NSE or KGE values of hydrograph for watershed outlets but high ability of capturing the event flood peaks. This finding illustrates that adding more constraints on watershed outlets with observed streamflow data could well improve the model performance compared to the calibration scenarios that only adopted FS or MLR.

Additionally, from Fig. 4, we can learn that the hydrograph of only FS or MLR adopted as model constraints tends to have more noise than that of the calibration scenarios with two objective functions. Since both the performance of flood peaks and hydrograph are taken into consideration in the calibration scenarios with two objective functions, the simulation results at watershed outlets have been improved under these calibration scenarios. The results further prove that adding the statistical NSE or KGE objective function on watershed outlet could produce better performance of the watershed simulated streamflow.

4) Model performance under validation periods

The model performance of eight calibration scenarios during the validation periods is presented in Fig. 7. The results present that the eight calibration scenarios can also achieve reasonable results in the validation periods, and each calibration scenario could capture the flood peak well. Similarly, the hydrographs in FS or MLR calibration scenarios contain more noises than those of the other calibration scenarios, which is consistent with the results in calibration periods.

Fig. 7.
Fig. 7.

SWAT model simulation performance under the eight calibration scenarios in (a) ZJG, (b) FP, (c) KC, and (d) SF catchments during validation periods.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

As shown in Fig. 8, the assessment of flood frequency curve is also carried out with the observed and simulated flood peaks from both calibration and validation periods. The MRE values of each flood frequency curve segment under all designed calibration scenarios are presented in Table 6. The MRE values vary within each flood frequency curve segment, which indicates the various performance among different calibration scenarios. Usually, highest MRE values could be found in mid- or low-flow segments. The reason is that MRE is quite sensitive to the errors that occur in the lower magnitudes of the data (Dawson et al. 2007). From Fig. 5, we can learn that the NSE value is the highest in NSE calibration scenario, indicating the good calibration results in common sense. Yet, according to the results demonstrated by Fig. 8, the simulated streamflow results under the NSE calibration scenario obviously underestimate the flood peak at high-flow segments. Negative MRE values could be found in Q5 and Q20 segments for all four study catchments. The KGE calibration scenario produces better simulation of flood peak than NSE calibration scenario with smaller absolute MRE values, yet still with the trends of underestimating the flood peak at high return periods in most cases.

Fig. 8.
Fig. 8.

Flood frequency analysis by employing the simulated and observed annual maximum peak discharge of both calibration and validation periods. (TOF denotes the theoretical frequency of the flood frequency curve, for instance, OBS TOF represents the theoretical frequency of the observed data.)

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

Table 6.

The MRE values of each FDC segment under different calibration scenarios.

Table 6.

In contrast, as demonstrated in Fig. 8, it is obvious that the flood frequency curve of FS or MLR calibration scenarios is relatively closer to the observed flood frequency curve at very high or high-flow segments, exhibiting better results than NSE or KGE calibration scenarios. In some cases, the FS or MLR calibration scenario even overestimates the flood peak, which is consistent with the results of the simulated hydrograph. It is possible that applying flood scaling property as model constraints enhances the flood response process. Consequently, the ability of capturing the flood peak is improved, yet the simulation of flood peak is relatively larger than the observed data. As a result, the deviation of simulated flood frequency curve under FS or MLR calibration scenarios is relatively larger than observed data in the majorities of flow segments, especially in low-flow segment.

Furthermore, we can find that with adopting the NSE or KGE as one of the objective functions in FS or MLR calibration scenarios, the overestimation of flood frequency curve has been well improved with better accuracy. Compared to the calibration scenario with single FS or MLR objective function, improved MRE values could be found in these calibration scenarios at all flow segments. The evaluation results of flood frequency curve further illustrate the necessity of adopting the combination of statistical indicators and flood scaling property as objective functions in hydrological model calibration.

5. Discussion

a. Necessity of adopting flood scaling property as constraints

In some cases, only considering the performance at the catchment outlet is not sufficient, as the simulated flood processes at the interior point of the catchment may not coincide with the actual watershed properties. The application of flood scaling property as constraints could improve this situation. As demonstrated in Figs. 9g and 9h, two significant flood events, namely, the 1988 and 1996 flood events, in the Zijingguan watershed with different rainfall spatial distributions are selected to demonstrate the necessity of adopting flood scaling property as constraints. The simulated streamflow is extracted for a total of 10 catchment interior reach outlets and also the catchment outlet for these two flood events under NSE, FS, and MLR calibration scenarios, as shown in Figs. 9a–f. Due to the high rainfall amount at the Chajianlin rain gauge in the 1988 flood event, the flood generated at the south of the Zijingguan watershed tends to be larger. Consequently, the flood peak from the hydrograph of the R1 reach outlet tends to be larger, even though the contributing area of the R1 reach outlet is quite small. As presented in Figs. 9a–c, the increase of flood peak for the R1 reach outlet could be captured under the three calibration scenarios (as delineated by the three boxes numbered 2, 3, and 4). However, under the NSE calibration scenario, the flood peak of the R10 reach outlet is even larger than that at the watershed outlet (as delineated by the box numbered 1), which is not consistent with the actual watershed situation, as the rainfall amount is largest at the watershed outlet. Obviously, this situation is well avoided in the FS and MLR calibration scenario, indicating the significance and validity of employing FS or MLR as an objective function. Thus, adopting the scaling property of flood has the potential to better constrain the performance of catchment interior points.

Fig. 9.
Fig. 9.

Simulated streamflow process of reach outlets for typical flood events and the spatial distribution of rainfall. The reach outlet names in the legend are arranged according to the contributing area size. (a)–(c) The simulated streamflow of each reach outlet for the 1988 flood event under the NSE, FS, and MLR calibration scenarios, respectively. (d)–(f) The results for the 1996 flood event under the NSE, FS, and MLR calibration scenarios. (g),(h) The spatial distribution of rainfall for the 1988 and 1996 flood events, respectively.

Citation: Journal of Hydrometeorology 22, 12; 10.1175/JHM-D-21-0123.1

In addition, as shown in Figs. 9b and 9c, we can find that the hydrographs from different watershed interior points under MLR exhibit more reasonable performance than FS. Specifically, in the 1988 flood events at the Zijingguan watershed, the magnitude of flood peak from watershed interior reach outlets tends to be determined by the size of contributing areas under FS, while MLR produces more reasonable results at R1 reach outlet with larger flood peak. For the 1996 flood event with a relatively uniform rainfall spatial distribution, the flood peak of each reach outlet basically exhibits similar tendency under both FS and MLR calibration scenarios. The results indicate that the uneven spatial distribution of rainfall would affect the formation of flood at catchment interior points. Hence, adopting MLR could produce better performance than FS, especially in the case of the uneven distribution of rainfall or other potential external factors.

b. Advantages of adopting flood scaling property as constraints

Compared to the commonly used NSE calibration scenario, the advantage of adopting flood scaling property as constraints is that the more accurate simulated flood peak can be obtained even with a small amount of the observed streamflow data. It is important to note that observed streamflow data are less required in the FS calibration scenarios. Only the simulated event flood peaks are extracted to fit the power-law equations and the goodness of fit behaves as the objective function. As for MLR calibration scenarios, the identification of catchment descriptors requires the observed streamflow data of this catchment or this region.

Additionally, as shown in Fig. 4, although the simulated streamflow results introduce noises in hydrograph under FS or MLR calibration scenarios, the simulation accuracy of flood peak is clearly improved compared to the NSE calibration scenario. It is obvious that the flood frequency curve provided by simulated annual peak discharge in NSE calibration scenario tends to underestimate the flood with more than 3-yr return period. In contrast, for flood frequency curve of FS or MLR calibration scenarios, the simulated flood frequency curve produces better results and even overestimates the flood peak in some cases. However, the performance of flood frequency curve could be improved with adopting the combination of statistical indicators and flood scaling property as model constraints. To sum up, with only considering the flood scaling property as model constraints, reasonable results could be obtained without employing the observed streamflow time series data directly, which exhibits the potential of applying this method in ungauged watershed. Yet, if the observed streamflow data are available, employing the combination of statistical indicators and flood scaling property as objective functions could better constrain the model performance and improve the simulation of hydrograph at watershed outlet and the accuracy of the flood frequency curve.

c. Applications and future works

For employing flood scaling property as constraints in hydrological model calibration, it is necessary to judge whether the event-based flood scaling property exists in this river basin. In these study watersheds, previous studies have reported the existence of event-based flood scaling property (Li et al. 2013, 2019; Kang et al. 2019; Li et al. 2020). However, since the flood scaling property is an intrinsic law in river basins (Ogden and David 2003; Furey and Gupta 2005, 2007; Gupta et al. 2007; Ayalew et al. 2014, 2015; Furey et al. 2016; Lee and Huang 2016), the method proposed in this paper has good applicability.

In addition, as long as the hydrological model can output the streamflow process of subbasins or reaches, the flood scaling property can be employed as the objective function. In the future, more attempts should be undertaken to consider flood scaling property as constraints in various hydrological models, not just in the SWAT model. Moreover, since the physical conditions of different river basins are distinct, it is possible that the streamflow of the watershed outlet is well simulated by employing only flood scaling property in other river basins. Therefore, the proposed calibration method should be verified in more regions.

6. Conclusions

To propose a new calibration method for hydrological models based on limited observed data, this paper investigates the potential of applying flood scaling property as one of the objective functions in hydrological model calibration. The main conclusions that were drawn are as follows.

Little information of observed streamflow time series data is needed when only FS is adopted as the single objective function in model calibration. Observed streamflow data were adopted to perform the flood frequency analysis and behaved as the criterion of selecting the significant flood events. Yet, the FS calibration scenario produces a better simulation on flood peak than NSE calibration scenarios, especially for high return period floods, exhibiting the potential of applying this method in ungauged watersheds.

In contrast, MLR calibration scenario requires some streamflow data to identify the dominant catchment descriptors. FS and MLR calibration scenarios provide a similar hydrograph at the watershed outlet, while MLR achieves better performance at catchment interior points. If the rainfall spatial distribution of the flood event is not uniform, adopting MLR as one of the model constraints in calibration is suggested.

Furthermore, it is important to note that better performance on hydrograph and flood frequency curve could be obtained with adopting the combination of statistical indicators and flood scaling property as objective functions. Under these calibration scenarios, not only could the accurate simulation of catchment outlet be guaranteed with high NSE or KGE values, but the performance on the catchment interior points is also better constrained.

In brief, for ungauged watersheds, although limited observed streamflow data are available, the simulated flood peak of the subbasin and its contributing drainage area could be extracted to fit the power-law equation and behave as an objective function, providing a new alternative method in hydrological model calibration. For gauged watersheds, adopting flood scaling property as model constraints could make the hydrological model calibration more physically based and improve the performance at catchment interior points. Therefore, the flood scaling property is encouraged to be utilized as an additional constraint in both gauged and ungauged watersheds for hydrological model calibration.

Acknowledgments

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported of this paper. This research is supported by the National Key Research and Development Program of China (2018YFC0407902). The authors thank the editors and reviewers for their constructive suggestions.

Data availability statement.

Datasets for this research are available in the reference list. To be specific, land use and land cover maps and soil data are available from the Institute of Geographic Sciences and Resources of the Chinese Academy of Sciences and FAO/IIASA/ISRIC/ISS-CAS/JRC (2009) (at http://www.resdc.cn/, http://www.fao.org/fileadmin/templates/nr/documents/HWSD/). Weather data, including precipitation, air temperature, wind speed, relative humidity, and solar data, are available from Fuka et al. (2013) and Dile and Srinivasan (2014) at https://globalweather.tamu.edu/. GLDAS-Noah datasets are available from Rodell et al. (2004) and Beaudoing and Rodell (2015) at https://giovanni.gsfc.nasa.gov/.

REFERENCES

  • Abbaspour, K. C., 2015: SWAT-CUP: SWAT calibration and uncertainty programs: A user manual. Eawag, Swiss Federal Institute of Aquatic Science and Technology, 100 pp., http://swat.tamu.edu/media/114860/usermanual_swatcup.pdf.

    • Search Google Scholar
    • Export Citation
  • Abbaspour, K. C., E. Rouholahnejad, S. Vaghefi, R. Srinivasan, H. Yang, and B. Kløve, 2015: A continental-scale hydrology and water quality model for Europe: Calibration and uncertainty of a high-resolution large-scale SWAT model. J. Hydrol., 524, 733752, https://doi.org/10.1016/j.jhydrol.2015.03.027.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Abbaspour, K. C., S. A. Vaghefi, and R. Srinivasan, 2017: A guideline for successful calibration and uncertainty analysis for soil and water assessment: A review of papers from the 2016 international SWAT conference. Water, 10, 6, https://doi.org/10.3390/w10010006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Al-Rawas, G. A., and C. Valeo, 2010: Relationship between wadi drainage characteristics and peak-flood flows in arid northern Oman. Hydrol. Sci. J., 55, 377393, https://doi.org/10.1080/02626661003718318.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayalew, T. B., W. F. Krajewski, R. Mantilla, and S. J. Small, 2014: Exploring the effects of hillslope-channel link dynamics and excess rainfall properties on the scaling structure of peak-discharge. Adv. Water Resour., 64, 920, https://doi.org/10.1016/j.advwatres.2013.11.010.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayalew, T. B., W. F. Krajewski, and R. Mantilla, 2015: Analyzing the effects of excess rainfall properties on the scaling structure of peak discharges: Insights from a mesoscale river basin. Water Resour. Res., 51, 39003921, https://doi.org/10.1002/2014WR016258.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Basu, B., and V. V. Srinivas, 2015: A recursive multi-scaling approach to regional flood frequency analysis. J. Hydrol., 529, 373383, https://doi.org/10.1016/j.jhydrol.2015.07.037.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Beaudoing, H. K., and M. Rodell, 2015: GLDAS Noah land surface model L4 3 hourly 0.25 × 0.25 degree V2.0. Goddard Earth Sciences Data and Information Services Center, accessed 24 April 2019, https://doi.org/10.5067/342OHQM9AK6Q.

    • Search Google Scholar
    • Export Citation
  • Beven, K., 2006: A manifesto for the equifinality thesis. J. Hydrol., 320, 1836, https://doi.org/10.1016/j.jhydrol.2005.07.007.

  • Bormann, H., L. Breuer, T. Graff, and J. A. Huisman, 2007: Analysing the effects of soil properties changes associated with land use changes on the simulated water balance: A comparison of three hydrological catchment models for scenario analysis. Ecol. Modell., 209, 2940, https://doi.org/10.1016/j.ecolmodel.2007.07.004.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brath, A., A. Montanari, and E. Toth, 2004: Analysis of the effects of different scenarios of historical data availability on the calibration of a spatially-distributed hydrological model. J. Hydrol., 291, 232253, https://doi.org/10.1016/j.jhydrol.2003.12.044.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bu, J., C. Lu, J. Niu, and Y. Gao, 2018: Attribution of runoff reduction in the Juma River basin to climate variation, direct human intervention, and land use change. Water, 10, 1775, https://doi.org/10.3390/w10121775.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cao, W., W. B. Bowden, T. Davie, and A. Fenemor, 2006: Multi-variable and multi-site calibration and validation of SWAT in a large mountainous catchment with high spatial variability. Hydrol. Processes, 20, 10571073, https://doi.org/10.1002/hyp.5933.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cao, Y., J. Zhang, M. Yang, X. Lei, B. Guo, L. Yang, Z. Zeng, and J. Qu, 2018: Application of SWAT model with CMADS data to estimate hydrological elements and parameter uncertainty based on SUFI-2 algorithm in the Lijiang River basin, China. Water, 10, 742, https://doi.org/10.3390/w10060742.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chiang, L. C., Y. Yuan, M. Mehaffey, M. Jackson, and I. Chaubey, 2014: Assessing SWAT’s performance in the Kaskaskia River watershed as influenced by the number of calibration stations used. Hydrol. Processes, 28, 676687, https://doi.org/10.1002/hyp.9589.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chilkoti, V., T. Bolisetti, and R. Balachandar, 2018: Multi-objective autocalibration of SWAT model for improved low flow performance for a small snowfed catchment. Hydrol. Sci. J., 63, 14821501, https://doi.org/10.1080/02626667.2018.1505047.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Daggupati, P., and Coauthors, 2015: A recommended calibration and validation strategy for hydrologic and water quality models. Trans. ASABE, 58, 17051719, https://doi.org/10.13031/trans.58.10712.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dawdy, D. R., V. W. Griffis, and V. K. Gupta, 2012: Regional floodfrequency analysis: How we got here and where we are going. J. Hydrol. Eng., 17, 953959, https://doi.org/10.1061/(ASCE)HE.1943-5584.0000584.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dawson, C. W., R. J. Abrahart, and L. M. See, 2007: HydroTest: A web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts. Environ. Modell. Software, 22, 10341052, https://doi.org/10.1016/j.envsoft.2006.06.008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • De Lavenne, A., V. Andréassian, G. Thirel, M. H. Ramos, and C. Perrin, 2019: A regularization approach to improve the sequential calibration of a semidistributed hydrological model. Water Resour. Res., 55, 88218839, https://doi.org/10.1029/2018WR024266.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dile, Y. T., and R. Srinivasan, 2014: Evaluation of CFSR climate data for hydrologic prediction in data-scarce watersheds: An application in the Blue Nile River basin. J. Amer. Water Resour. Assoc., 50, 12261241, https://doi.org/10.1111/jawr.12182.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • FAO/IIASA/ISRIC/ISS-CAS/JRC, 2009: Harmonized World Soil Database (version 1.1). FAO, 43 pp.

  • Farmer, W. H., T. M. Over, and R. M. Vogel, 2015: Multiple regression and inverse moments improve the characterization of the spatial scaling behavior of daily streamflows in the Southeast United States. Water Resour. Res., 51, 17751796, https://doi.org/10.1002/2014WR015924.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fenicia, F., D. Kavetski, H. H. G. Savenije, and L. Pfister, 2016: From spatially variable streamflow to distributed hydrological models: Analysis of key modeling decisions. Water Resour. Res., 52, 954989, https://doi.org/10.1002/2015WR017398.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fuka, D. R., C. A. MacAllister, A. T. Degaetano, and Z. M. Easton, 2014: Using the climate forecast system reanalysis dataset to improve weather input data for watershed models. Hydrol. Processes, 28, 56135623, https://doi.org/10.1002/hyp.10073.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Furey, P. R., and V. K. Gupta, 2005: Effects of excess rainfall on the temporal variability of observed peak-discharge power laws. Adv. Water Resour., 28, 12401253, https://doi.org/10.1016/j.advwatres.2005.03.014.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Furey, P. R., and V. K. Gupta, 2007: Diagnosing peak-discharge power laws observed in rainfall–runoff events in Goodwin Creek experimental watershed. Adv. Water Resour., 30, 23872399, https://doi.org/10.1016/j.advwatres.2007.05.014.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Furey, P. R., B. M. Troutman, V. K. Gupta, and W. F. Krajewski, 2016: Connecting event-based scaling of flood peaks to regional flood frequency relationships. J. Hydrol. Eng., 21, 04016037, https://doi.org/10.1061/(ASCE)HE.1943-5584.0001411.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Garcia, F., N. Folton, and L. Oudin, 2017: Which objective function to calibrate rainfall–runoff models for low-flow index simulations?. Hydrol. Sci. J., 62, 11491166, https://doi.org/10.1080/02626667.2017.1308511.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Götzinger, J., and A. Bárdossy, 2007: Comparison of four regionalisation methods for a distributed hydrological model. J. Hydrol., 333, 374384, https://doi.org/10.1016/j.jhydrol.2006.09.008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gupta, H. V., H. Kling, K. K. Yilmaz, and G. F. Martinez, 2009: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol., 377, 8091, https://doi.org/10.1016/j.jhydrol.2009.08.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gupta, V. K., 2004: Emergence of statistical scaling in floods on channel networks from complex runoff dynamics. Chaos Solitons Fractals, 19, 357365, https://doi.org/10.1016/S0960-0779(03)00048-1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gupta, V. K., Troutman, B. M., Dawdy, D. R., 2007: Towards a nonlinear geophysical theory of floods in river networks: An overview of 20 years of progress. Nonlinear Dynamics in Geosciences, Springer, 121151.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, E., V. Merwade, and G. C. Heathman, 2012: Implementation of surface soil moisture data assimilation with watershed-scale distributed hydrological model. J. Hydrol., 416–417, 98117, https://doi.org/10.1016/j.jhydrol.2011.11.039.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hapuarachchi, H. A., Q. J. Wang, and T. C. Pagano, 2011: A review of advances in flash flood forecasting. Hydrol. Processes, 25, 27712784, https://doi.org/10.1002/hyp.8040.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Her, Y., J. Frankenberger, I. Chaubey, and R. Srinivasan, 2015: Threshold effects in HRU definition of the soil and water assessment tool. Trans. ASABE, 58, 367378, https://doi.org/10.13031/trans.58.10805.

    • Search Google Scholar
    • Export Citation
  • Howard, A. J., M. Bonell, D. Gilmour, and D. Cassells, 2010: Is rainfall intensity significant in the rainfall–runoff process within tropical rainforests of northeast Queensland? The Hewlett regression analyses revisited. Hydrol. Processes, 24, 25202537, https://doi.org/10.1002/hyp.7694.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hughes, J. D., S. S. H. Kim, D. Dutta, and J. Vaze, 2016: Optimization of a multiple gauge, regulated river-system model. A system approach. Hydrol. Processes, 30, 19551967, https://doi.org/10.1002/hyp.10752.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huisman, J. A., and Coauthors, 2009: Assessing the impact of land use change on hydrology be ensemble modelling (LUCHEM) III: Scenario analysis. Adv. Water Resour., 32, 159170, https://doi.org/10.1016/j.advwatres.2008.06.009.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hundecha, Y., and A. Bárdossy, 2004: Modeling of the effect of land use changes on the runoff generation of a river basin through parameter regionalization of a watershed model. J. Hydrol., 292, 281295, https://doi.org/10.1016/j.jhydrol.2004.01.002.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ishak, E., K. Haddad, M. Zaman, and A. Rahman, 2011: Scaling property of regional floods in New South Wales Australia. Nat. Hazards, 58, 11551167, https://doi.org/10.1007/s11069-011-9719-6.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jeong, J., N. Kannan, J. Arnold, R. Glick, L. Gosselink, and R. Srinivasan, 2010: Development and integration of sub-hourly rainfall-runoff modeling capability within a watershed model. Water Resour. Manage., 24, 45054527, https://doi.org/10.1007/s11269-010-9670-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jothityangkoon, C., and M. Sivapalan, 2001: Temporal scales of rainfall–runoff processes and spatial scaling of flood peaks: Space–time connection through catchment water balance. Adv. Water Resour., 24, 10151036, https://doi.org/10.1016/S0309-1708(01)00044-6.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kang, Y. F., J. Z. Li, and Q. S. Ma, 2019: Using HEC-HMS model to simulate flooding in Zijingguan watershed. J Irrig. Drain., 38, 108115, https://doi.org/10.13522/j.cnki.ggps.20180520.

    • Search Google Scholar
    • Export Citation
  • Kannan, N., S. M. White, F. Worrall, and M. J. Whelan, 2007: Sensitivity analysis and identification of the best evapotranspiration and runoff options for hydrological modelling in SWAT-2000. J. Hydrol., 332, 456466, https://doi.org/10.1016/j.jhydrol.2006.08.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knoben, W. J. M., J. E. Freer, and R. A. Woods, 2019: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrol. Earth Syst. Sci., 23, 43234331, https://doi.org/10.5194/hess-23-4323-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Koren, V., F. Moreda, and M. Smith, 2008: Use of soil moisture observations to improve parameter consistency in watershed calibration. Phys. Chem. Earth, 33, 10681080, https://doi.org/10.1016/j.pce.2008.01.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kouchi, D. H., K. Esmaili, A. Faridhosseini, S. H. Sanaeinejad, D. Khalili, and K. C. Abbaspour, 2017: Sensitivity of calibrated parameters and water resource estimates on different objective functions and optimization algorithms. Water, 9, 384, https://doi.org/10.3390/w9060384.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lacombe, G., S. Douangsavanh, R. M. Vogel, M. McCartney, Y. Chemin, L. M. Rebelo, and T. Sotoukee, 2014: Multivariate power-law models for streamflow prediction in the Mekong Basin. J. Hydrol. Reg. Stud., 2, 3548, https://doi.org/10.1016/j.ejrh.2014.08.002.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lee, K. T., and J. K. Huang, 2016: Influence of storm magnitude and watershed size on runoff nonlinearity. J. Earth Syst. Sci., 125, 777794, https://doi.org/10.1007/s12040-016-0700-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lee, S., G. W. McCarty, G. E. Moglen, X. Li, and C. W. Wallace, 2020: Assessing the effectiveness of riparian buffers for reducing organic nitrogen loads in the Coastal Plain of the Chesapeake Bay watershed using a watershed model. J. Hydrol., 585, 124779, https://doi.org/10.1016/j.jhydrol.2020.124779.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ley, R., H. Hellebrand, C. M. Casper, and F. Fenicia, 2016: Comparing classical performance measures with signature indices derived from flow duration curves to assess model structures as tools for catchment classification. Hydrol. Res., 47, 114, https://doi.org/10.2166/nh.2015.221.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, J. Z., P. Feng, and Z. Wei, 2013: Incorporating the data of different watersheds to estimate the effects of land use change on flood peak and volume using multi-linear regression. Mitig. Adapt. Strategies Global Change, 18, 11831196, https://doi.org/10.1007/s11027-012-9416-0.