## 1. Introduction

Rainfall–runoff hydrological models (HMs) treat different parts of the hydrological cycle and primarily aim at maximizing the accuracy of streamflow simulation. Atmospheric land surface models (LSMs) originally represented simple land surface parameterization schemes designed for coupling with atmospheric models. Their primary purpose was to provide atmospheric models with boundary conditions in terms of vertical fluxes of energy and water between the land surface and the atmosphere rather than to predict runoff, which does not interact with the atmosphere (Mengelkamp et al. 2001). The first such scheme was developed by Manabe (1969) for the Geophysical Fluid Dynamics Laboratory (GFDL) atmospheric model. More recently, simple parameterization schemes have evolved into complex physically based land surface models (Polcher 2001) that describe heat and water exchange processes occurring in a soil–vegetation/snowpack–atmosphere system (SVAS) with different degrees of details and complexity. These schemes may be used in a stand-alone mode for simulating the heat and water balance components at the land–atmosphere interface as well as different characteristics of the hydrothermal regime of territories, hydrological objects (catchments, river basins) and ecosystems (e.g., Dickinson et al. 1986; Sellers et al. 1986; Verseghy 1991; Noilham and Mahfouf 1996; Wood et al. 1998; Gusev and Nasonova 2003, 2004a). The number of output variables in LSMs may reach several dozens, including runoff.

From time to time, we are faced with the opinion that complex physically based LSMs cannot be as successful as hydrological models with respect to streamflow simulation because, first, LSMs, being too complicated, suffer from accumulated errors in the larger number of forcing data and model parameters they require relative to the lesser data demands of HMs; and second, despite the more complex structure of LSMs, their treatment of runoff generation processes may be oversimplified compared to complex HMs. Better performance of HMs compared to LSMs was obtained within the framework of the Model Parameter Estimation Experiment (MOPEX; Duan et al. 2006), in which all the participating models could be calibrated to improve streamflow simulation. In this context, the following questions arose:

What cause HMs to simulate streamflow better than LSMs?

What should be undertaken to increase the accuracy of streamflow simulation by LSMs?

Can LSMs be as successful as HMs in simulating runoff?

The present work is an attempt to investigate these issues using the Soil–Water–Atmosphere–Plant (SWAP) land surface model (Gusev and Nasonova 1998, 2003), as well as the data and results from the MOPEX project [taken from Duan et al. (2006) and kindly provided by Prof. J. Schaake].

## 2. The study basins and data

Twelve river basins (with a drainage area of 1020–4421 km^{2}) provided within the framework of the second MOPEX workshop (Duan et al. 2006) were used in this study (Fig. 1). The basins are located within the southeastern part of the United States and are characterized by a great variety of natural conditions, ranging from arid to humid and from cropland to forests. Basic characteristics of the basins are given in Table 1. Snow and frozen ground effects are considered to have a minor influence on the hydrological processes because all the basins are located south of 42°N. A full description of the basins can be found in Duan et al. (2006).

The data for each of the 12 river basins, provided by the second MOPEX workshop organizers and detailed in Duan et al. (2006), include the following datasets: meteorological forcing data covering a 39-yr period (1960–98), daily river runoff at basin outlets for model calibration and validation, and basin characteristics data for a priori estimation of model parameters. The first 20-yr period (1960–79) was chosen for model calibration, whereas the last 19-yr period (1980–98) was for model validation.

The meteorological forcing datasets consist of basin-averaged hourly near-surface meteorology, including downward shortwave (SW) and longwave (LW) radiation, air temperature and humidity, atmospheric precipitation, air pressure, and wind speed. The hourly precipitation datasets are based on hourly and daily rain gauge data from the National Climatic Data Center (NCDC). The rest of the forcing data were processed from the ⅛° meteorological dataset derived by the University of Washington (UW) from NCDC daily precipitation, daily minimum and maximum temperature, and wind speed data obtained from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) global reanalysis data (Kistler et al. 2001). For HMs, daily climatic potential evaporation derived from the National Oceanic and Atmospheric Administration (NOAA) free water surface evaporation maps (Farnsworth et al. 1982) was also provided.

The basin characteristics datasets include basin boundary, elevation, spatial distribution of different U.S. Department of Agriculture (USDA) soil texture classes and the University of Maryland vegetation types (Table 2), soil texture, and monthly greenness fraction. This information is very general for model simulation. This has led to the necessity of a priori estimation of soil and vegetation parameters on the basis of available general information. The technique of a priori estimation of model parameters for the SWAP model was detailed in Gusev and Nasonova (2006).

## 3. Possible reasons for better simulation of streamflow by HMs compared to LSMs

*x*

_{sim}and

*x*

_{obs}are simulated and observed values of a variable

*x*, and Ω is a discrete sample set of variable

*x*. The standard deviation of Eff was used as an indicator for performance consistency.

Some results of the MOPEX intercomparison experiments are presented in Fig. 2 (see Duan et al. 2006 for details). This figure shows the daily efficiency of streamflow simulations, averaged across the 12 basins, for different models using calibrated parameters. As can be seen, the hydrological models were more successful than the LSMs. This may have happened because of 1) shortcomings and uncertainties in meteorological forcing data (because forcing datasets for LSMs and HMs are different); 2) a larger number of model parameters in LSMs compared to HMs; 3) the low quality of the LSMs, including, in particular, model structure, mathematical formalization of physical processes, and model code; and 4) a less effective calibration of the LSMs compared to the HMs. What is the main reason in this particular case?

The first possibility does not seem to be the main one because problems with forcing data concern both LSMs and HMs. Although LSMs need more forcing data than HMs, the main attention, when preparing forcing data, should be paid to precipitation and incoming radiation because runoff formation is more sensitive to these two forcing factors, whereas their estimates suffer from uncertainties to a large extent (Nasonova et al. 2008). Precipitation is used both by LSMs and HMs. As to incoming radiation, which was derived from meteorological data, it is unknown whether its estimation is less accurate than estimates of potential evapotranspiration required for HMs.

As to the model parameters, LSMs, as a rule, have more model parameters than HMs. However, the number of LSM model parameters that strongly influence runoff generation is usually much less than the total number of model parameters. Therefore, it is necessary to find the most important parameters and to concentrate on their estimation. Most LSM parameters represent soil and vegetation characteristics that can be measured directly, derived from the other measured characteristics, estimated using lookup tables, or found in literature on the basis of information about soil and vegetation classes. However, the problem of parameter estimation can be complicated by significant variability of some model parameters. In this case, even if a parameter can be measured directly, there will be a problem of scales due to inconsistency between a point scale of measurement and a scale of model application, resulting in large uncertainties in parameter values. When used to simulate streamflow, it is possible to calibrate the LSM parameters that most strongly influence the simulated streamflow. Rainfall–runoff HMs have on the order of 10 or more parameters. Some of them can be derived from the basin characteristics, whereas others cannot be measured directly or derived from measurements and therefore have to be calibrated. This means that the problem of parameter estimation is a common problem for both LSMs and HMs.

As to the quality of the LSMs under consideration, it should be noted that, when participating in the other international projects, ISBA, SWAP, and Noah have demonstrated their ability to reproduce heat and water exchange processes (including runoff formation) occurring in SVAS fairly well. This can be confirmed, in particular, by the results from the Rhône-Aggregation (Rhône-AGG) land surface scheme intercomparison project (Boone et al. 2004). The Nash–Sutcliffe efficiency of daily streamflow simulation for the Rhône River was 84.7%, 81.4%, and 80.2%, for ISBA, SWAP, and Noah, respectively. These results were the best among the 15 participating LSMs. For the small subbasins and catchments within the Rhône River basin, the efficiency, on average, was lower than for the whole basin; however, ISBA, SWAP, and Noah were again among the best models. SWAP and Noah did not calibrate their parameters; ISBA calibrated only one parameter using daily values of measured streamflow from two small subbasins of the Rhône River basin. High efficiency of streamflow simulations may be explained by the good quality of both the models and available data. This means that these LSMs have a reasonable level of complexity of the model structure components being responsible for runoff generation that allows the models to reproduce runoff with high accuracy, provided that input data (both forcing data and model parameters) of high quality and fine spatial–temporal resolution are available. Absence of qualitative input data inevitably results in a decrease in the accuracy of streamflow simulations. In this case, only calibration will help to improve the situation.

Let us consider the last reason, that is, the calibration of model parameters. First of all, it is necessary to understand how the models that participated in the MOPEX project have calibrated their parameters. Usually HMs calibrate all or most of model parameters. Thus, all five parameters of the SWB model were calibrated, and all four parameters of GR4J were also calibrated. In the SAC-SMA model, 13 of 16 parameters were calibrated (Andreassian et al. 2006); in PRMS, 11–13 parameters were calibrated (depending on the basin hydroclimatic conditions; Prof. G. Leavesley 2008, personal communication). Calibration of HMs is usually carried out automatically by means of effective optimization algorithms. SWB and SAC-SMA used the global shuffle complex evolution optimization algorithm developed by the University of Arizona (SCE-UA), described in Duan et al. (1992). PRMS was calibrated in two steps using the Rosenbrock optimization scheme (Prof. G. Leavesley 2008, personal communication). GR4J applied a local gradient search procedure, which for a small number of calibrated parameters was found to be as effective as the SCE-UA (Andreassian et al. 2006). Among the LSMs that participated in the MOPEX project, ISBA did not calibrate model parameters, SWAP manually tuned only one soil parameter to get a good agreement between simulated and observed annual streamflow (Duan et al. 2006), and Noah calibrated several soil parameters influencing runoff formation by means of the SCE-UA optimization procedure (Andreassian et al. 2006). Model calibration has influenced the efficiency of streamflow simulation. Thus, as can be seen in Fig. 2, ISBA has the lowest efficiency, whereas the efficiency of SWAP and Noah simulations is somewhat higher. At the same time, more complex among hydrological models (SAC-SMA, GR4J, and PRMS in Fig. 2) performed much better in terms of daily efficiency.

From the above, it follows that one of the main reasons for better performance of the HMs compared to the LSMs in the MOPEX project seems to be a better calibration of the HMs. Therefore, an increase in the accuracy of streamflow simulation by LSMs with a good structure can be achieved by improved model parameter calibration. To test this statement, the land surface model SWAP (described in section 4) will be used. Section 5 will describe different strategies to calibrate SWAP using daily streamflow. Section 6 will show the results of streamflow simulations by SWAP with different sets of optimal parameters. The results will be evaluated and then compared with the corresponding results from the models that participated in the MOPEX project, with the main focus on the complex hydrological models.

## 4. Representation of hydrological processes in HMs and LSM SWAP

### a. Hydrological rainfall–runoff models

Hydrological rainfall–runoff models are usually classified as conceptual and physically based or distributed-parameter and lumped-parameter models. Conceptual models use heuristic or empirical equations to represent water balance dynamics, whereas physically based models use equations based on scientifically accepted principles (Kuczera 1997). The study basin is partitioned into hydrologic response units under a distributed-parameter approach or considered as a single calculational unit in a lumped model. Here, we can only briefly consider the HMs that have shown the best results in the MOPEX project. Each of these models consists of several interconnected conceptual water reservoirs (storages). The storage capacities are specified by model parameters. Transfer functions can determine water partitioning between flows, water exchange between reservoirs, and water movement in a river network.

#### 1) SAC-SMA

The SAC-SMA is a deterministic, conceptually based rainfall–runoff model with spatially lumped parameters (Burnash et al. 1973; Gan and Burges 2006; Andreassian et al. 2006). It simulates runoff from a catchment using precipitation and potential evapotranspiration as input data. SAC-SMA consists of six water storages. Two of them are in the upper soil zone, three are in the lower zone, and the remaining water storage represents the water accumulated over the impervious area. The key function is equation of soil water percolation, which determines water movement from the upper zone to the lower zone. The model treats the following hydrological processes: generation of direct surface runoff over impervious area, surface runoff when the upper zone is fully saturated, interflow from the upper zone free water storage, primary and supplemented baseflow from the lower zone free water storages, subsurface outflow, and evapotranspiration from all water storages. The simulated runoff is converted into streamflow through a unit hydrograph. Water loss from the riverbed is taken into account. There are 16 parameters in the model, 13 of which are calibrated. Free (calibrated) parameters include all water storage capacities, depletion coefficients for different reservoirs, percentage of impervious areas, and three parameters related to percolation. The SCE-UA algorithm was used for model calibration. The optimization criteria included a blend of daily and monthly root-mean-square error (RMSE) to get the best nearly unbiased water balance that also represented daily variability reasonably well (Prof. J. Schaake 2008, personal communication).

#### 2) PRMS

PRMS was developed by the U.S. Geological Survey (USGS; Leavesley et al. 1983). PRMS is a physically based watershed model that was designed to work both with lumped and distributed parameters. Forcing data for PRMS represent daily precipitation, daily maximum and minimum temperature, and solar radiation (for snowmelt processes). PRSM is conceptualized as a series of four reservoirs: impervious zone, soil zone, subsurface, and groundwater. The soil reservoir is divided into two zones: the upper recharge zone and the lower zone. In the upper zone, water losses occur as a result of evaporation and transpiration, whereas the lower zone loses water only as a result of transpiration. The impervious and soil reservoirs produce surface runoff. The soil zone water excess goes to subsurface and groundwater reservoirs, which generate subsurface and groundwater flows respectively. These flows, along with surface runoff, produce streamflow. Groundwater reservoir also forms groundwater sink, which does not contribute to streamflow. The model was calibrated in two steps using the Rosenbrock optimization scheme and the objective function representing the sum of the absolute value of the difference between predicted and observed daily streamflow. In the first step, the set of parameters related to fitting the annual water balance was calibrated (three parameters, one of which, related to evapotranspiration, was calibrated for each season). Then in the second step, a set of parameters (eight parameters plus two more for basins 11 and 12) related to hydrograph timing was fitted. Both steps used the same objective function. The details about model calibration were kindly provided by Prof. G. Leavesley (2008, personal communication).

#### 3) GR4J

The GR4J model is a daily lumped conceptual model with four free parameters. Its structure is simpler than that of the earlier mentioned models. Its description can be found in Perrin et al. (2003) and Andreassian et al. (2006). Input forcing data for the model include daily potential evapotranspiration and daily precipitation. The model has two interconnected storages: production storage and routing storage. The former includes an interception storage with zero capacity (potential evapotranspiration directly acts on input rainfall) and a soil moisture accounting storage, which determines effective rainfall and evapotranspiration. The model also treats water exchange with subterranean areas outside of the catchment. Transfer functions include percolation from the soil storage, constant split of effective rainfall into direct and indirect flow components, two unit hydrographs for each flow component, and a nonlinear routing storage that routes the indirect flow component. Four free model parameters represent maximum capacities of the production and routing storages, groundwater exchange coefficient, and time base of unit hydrograph. The “step by step” optimization algorithm based on local gradient search procedure is used for parameter calibration. As it was noted in Andreassian et al. (2006), this optimization algorithm was found to be as successful as global search algorithms (in particular, SCE-UA) if 4–10 model parameters are optimized.

### b. The land surface model SWAP

Land surface models “differ from classical hydrologic watershed models in that they are concerned with both water and energy balance, they are driven by multiple input variables (e.g., precipitation, shortwave and longwave radiation, wind speed, air temperature, and humidity), and they predict the evolution of several observable state variables (e.g., soil skin temperature, surface soil moisture) and output fluxes (e.g., latent heat, sensible heat, runoff)” (Bastidas et al. 1999).

The land surface model SWAP represents a physically based model describing the processes of heat and water exchange within a soil–vegetation/snow cover–atmosphere system. Different versions of SWAP were detailed in a number of publications (e.g., Gusev and Nasonova 1998, 2002, 2003, 2004a; Gusev et al. 2006). The last version of SWAP treats the following processes: interception of liquid and solid precipitation by vegetation; evaporation, melting, and freezing of intercepted precipitation, including refreezing of meltwater; formation of snow cover at the forest floor and at the open site during the cold season; partitioning of nonintercepted precipitation or water yield of snow cover between surface runoff and infiltration into a soil; formation of the water balance of aeration zone, including transpiration, soil evaporation, and water exchange with underneath layers and dynamics of soil water storage; water table dynamics; formation of the heat balance and thermal regime of SVAS; and soil freezing and thawing. Here, we only briefly consider how the model treats runoff generation processes occurring during the warm season because the cold season processes (including snow formation and soil freezing) have a relatively small hydrological effect on the study basins.

The soil column in SWAP stretches down to the first impervious layer and is divided into a soil aeration zone and a groundwater zone (Fig. 3). The soil aeration zone is divided into two layers: the first is a soil root layer and the second is a layer between the lower boundary of the root zone and time-variable water table. The root layer depth *h _{r}* is constant, whereas the depth of the second zone

*h*is time dependent. Treatment of water transfer in the first and second layers is based on the Buckingham–Darcy law but with accounting for ice content in each layer.

_{g}*W*

_{1}is determined by the rates of infiltration

*I*, transpiration

*E*, bare soil evaporation

_{T}*E*, and water fluxes at the lower boundary of the root zone (

_{S}*Q*

_{D,1}and

*Q*

_{g,1}),

*ρ*is the water density and

_{w}*τ*is time. Hereafter, the quantities related to the first and second zones are marked by the indices 1 and 2, respectively. The dynamics of soil water content in the second zone

*W*

_{2}depends upon water fluxes at the upper (

*Q*

_{D,1}and

*Q*

_{g,1}) and lower (

*Q*

_{D,2}and

*Q*

_{g,2}) boundaries of this zone,

*Q*

_{D,1}and

*Q*

_{D,2}are caused by diffusive mechanism of water transfer. The downward water fluxes at the lower boundaries

*Q*

_{g,1}and

*Q*

_{g,2}are caused mainly by the gravitational redistribution of soil water when its amount is greater than the field capacity

*W*

_{fc}. Partitioning of water flow at the lower boundary of each layer into two components,

*Q*

_{D,i}and

*Q*

_{g,i}(

*i*= 1,2), is convenient for constructing the calculational algorithm.

The parameterization of the fluxes in Eqs. (2) and (3) was detailed in Gusev and Nasonova (2002). The infiltration rate is simulated by making use of a modification of the Green–Ampt equation. In so doing spatial variability of the coefficient of hydraulic conductivity at saturation is taken into account (Gusev and Nasonova 1998). The mathematical formulation of transpiration rate is based on the semiempirical theory by Budagovskiy (1989). The rate of bare soil evaporation is simulated by accounting for the formation of the drying layer in the uppermost part of the soil column (Gusev 1998). For the drying layer, a concept of water transfer by water vapor diffusion mechanism is used (see Gusev and Nasonova 1998 for details). The downward flux *Q*_{g,i} at the lower boundary of the *i*th layer occurs when soil water content *W _{i}* exceeds field capacity

*W*

_{fc}. However, the excess water is not removed from a layer immediately; its amount declines exponentially and tends toward the field capacity

*W*

_{fc}(Gusev and Nasonova 1998). The upward fluxes

*Q*

_{D,1}and

*Q*

_{D,2}are modeled using diffusivities of soil moisture derived from parameterizations by Clapp and Hornberger (1978) for hydraulic conductivity and matric potential of soil water but accounting for the effect of volumetric ice content in each layer (Gusev and Nasonova 2002).

The calculation of surface runoff *Q _{s}* is based on the concept of infiltration excess over land flow (Hortonian overland flow); that is,

*Q*occurs when the rate of nonintercepted precipitation

_{s}*P*exceeds the infiltration rate (Gusev and Nasonova 1998).

_{s}*Q*is calculated by accounting for the dynamics of water table depth

_{d}*h*(

_{g}*τ*):

*W*

_{sat}is the soil porosity. The last term in Eq. (4) is caused by mobility of the lower boundary of the second soil layer as a result of variation in water table depth

*h*, which is calculated according to an algorithm described in Gusev and Nasonova (2002),

_{g}*h*

_{w}is the effective depth of flow at the catchment surface averaged over a time interval

*t*needed to reach the equilibrium distribution between water in the streamlet network and groundwater and

_{g}*μ*is the catchment average water yield of groundwater. For the first approximation, the value of

_{g}*t*can be estimated under the assumption that

_{g}*t*≤

_{g}*t*, where

_{L}*t*is the time needed for runoff to reach the catchment outlet, which can be determined as in Gusev and Nasonova (2002),

_{L}*μ*=

_{g}*W*

_{sat}−

*W*

_{fc}.

Unlike the aforementioned HMs, SWAP does not have a surface water reservoir for the water accumulated over the impervious area and does not treat additional groundwater sink, which does not contribute to streamflow, and riverbed losses. These parameters provide additional opportunities to adjust simulated streamflow to the observed streamflow.

The SWAP model can be applied both for point (or grid cell) simulations of vertical fluxes and state variables of SVAS in atmospheric science applications and for simulating streamflow on different scales: from small catchments to continental-scale river basins. Transition from a local to a macroscale version is based on explicit accounting for spatial heterogeneity of an area by means of its dividing into computational grid boxes connected by a river network. Model simulations are carried out separately for each grid box. Then model outputs (with the exception of runoff) are averaged over all grid boxes to obtain areal averages. To obtain streamflow hydrograph at a grid box or river basin/subbasin outlet, the simulated surface and subsurface runoff is routed through a box or river network. In the case of a grid box or a small catchment (up to the order of 10^{3}–10^{4} km^{2}), a kinematic wave equation is used to simulate streamflow at the box/catchment outlet. In the case of a larger river basin, a river routing model is used to simulate streamflow at the basin outlet.

During the last 10 yr, different versions of SWAP were validated against observations, including characteristics related to energy balance or thermal regime of SVAS (sensible and latent heat fluxes, ground heat flux, net radiation, upward longwave and shortwave radiation, surface temperature, and soil freezing and thawing depths) and to hydrological cycle or water regime of SVAS (surface and total runoff from a catchment, river discharge, soil water storage in different layers, evapotranspiration, snow evaporation, intercepted precipitation, water table depth, snow density, snow depth and snow water equivalent, and water yield of snow cover). The model validations were performed for “point” experimental sites and for catchments and river basins of different areas (from 10^{−1} to 10^{5} km^{2}) on a long-term basis and under different natural conditions (e.g., Gusev and Nasonova 1998, 2000, 2002, 2003, 2004a,b; Gusev et al. 2006; Boone et al. 2004). The results have demonstrated that SWAP is able to reproduce (without calibration) heat and water exchange processes occurring in SVAS adequately, provided that high-quality input data are available.

## 5. Optimization of SWAP model parameters

After automatic calibration, the resulting model parameters were fine-tuned manually to obtain final calibrated model parameter values. Manual deterministic calibration was conducted on a regular network in the vicinity of the obtained minimum of Ext (within the range of ±20% of the optimized parameter values). This allowed us, first, to select a final calibrated parameter set by minimizing both Ext and Bias by means of compromise between these two objectives and, second, to estimate the sensitivity of streamflow simulations to model parameters.

### a. Random search technique

RST has several stages (Gusev et al. 2008). At the first stage, sufficiently wide feasible parameter space is specified by fixing the lower and upper parameter bounds defined from the maximum plausible ranges for the parameters based on physical reasoning. A prescribed number of model runs (realizations) are performed using different values of calibrated parameters, which are determined within their fixed bounds using a generator of uniformly distributed random numbers. For each realization, streamflow simulation and estimation of Ext and Bias are carried out. Then the “best” realizations—that is, with the lowest values of Ext and near-zero values of Bias—are selected and corresponding values of calibrated parameters are used to reduce (“manually”) the feasible parameter space. At the next stage, a new search of the optimum of the objective functions is performed for the reduced parameter space that allows one to reduce the number of realizations. This is especially important for a large set of optimized parameters because if the feasible parameters space is fixed during optimization, the number of realizations needed to find the optimum with the specified accuracy grows exponentially with an increase in the number of parameters (Solomatine et al. 1999). If it is necessary, further reduction of parameters space may be done and searching the optimum may be continued until there will be no progress in the minimization of Ext. When the optimization procedure is stopped, the point with the lowest value of Ext and near-zero value of Bias is selected. The values of optimized parameters corresponding to this point are considered to be optimal.

In this study, at the first stage, the search was terminated after 20 000 realizations. The number of realization for the second and third stages was determined on the basis of analysis of the “best” realizations (the convergence of objective functions and the scatter in parameter values were taken into account) and usually did not exceed 5000 realizations.

### b. SCE-UA optimization procedure

The SCE-UA is a single-objective optimization algorithm. To apply the SCE-UA for our two objective functions Ext and Bias, we decided to minimize Ext under the condition that the absolute value of bias is Bias ≤ 5%.

The SCE-UA algorithm has been described in detail in Duan et al. (1992). In the first step, the SCE-UA selects an initial population of optimized parameters by random sampling throughout the feasible parameter space for *n* parameters, based on given parameter ranges. For each point, the objective function values are calculated. Then the population is partitioned into several communities (complexes), each consisting of 2*n* + 1 points, based on the corresponding objective function values. Each community is made to evolve independently for a prescribed number of times based on the downhill simplex method (Nelder and Mead 1965). The communities are periodically consolidated into a single group, and the population is shuffled to share information and partitioned into new communities. As the search progresses, the entire population tends to converge toward the neighborhood of global optimum, provided the initial population size is sufficiently large. The evolution and shuffling steps are repeated until a prescribed convergence criterion is satisfied. In this study, six complexes (reducing to three complexes) were used, with a convergence criterion of 0.1% (change in Ext) in eight loops.

The distributive diskette for the SCE-UA code was taken from online (at http://www.sahra.arizona.edu/software/).

### c. Selection of parameters to be optimized

Because LSMs usually contain a lot of model parameters, the procedure for the selection of parameters to be optimized is very important. The total number of optimized parameters should not be too small to ensure sufficient degrees of freedom for obtaining a good agreement between the simulated and observed daily streamflow. At the same time, the number should not be too large to obtain the steady values of the calibrated parameters under a reasonable number of realizations.

Our experience has shown that for the study basins, the following SWAP model parameters can be calibrated: 1) soil hydrophysical parameters: hydraulic conductivity at saturation *K*_{0}, soil porosity *W*_{sat}, plant wilting point *W*_{wp}, field capacity *W*_{fc}, *B*-exponent parameter and saturated matric potential *ϕ*_{0}, soil column thickness *h*_{g0} (here, the depth from the soil surface to the upper impervious layer); 2) vegetation parameters: the root layer depth *h _{r}*, albedo

*α*, and leaf area index (LAI); and 3) parameter controlling the transformation of runoff within a basin (the Manning roughness coefficient

*n*).

The hydraulic conductivity at saturation *K*_{0} is one of the most important parameters of SWAP because it controls the partitioning of water reaching the soil surface between infiltration and surface runoff. In SWAP, subgrid effects are taken into account through *K*_{0}. Thus, when modeling infiltration and surface runoff, subgrid spatial variability of *K*_{0} is considered by using not only mean value of *K*_{0} for each grid box but also root-mean-square deviation (Gusev and Nasonova 1998). Soil hydrophysical parameters *W*_{sat}, *W*_{wp}, and *W*_{fc} are of great importance for the description of evapotranspiration and evolution of soil water storage. The parameters *B* and *ϕ*_{0} are used to formulize the dependence of soil hydraulic conductivity *K* and soil water matric potential *ϕ* on soil moisture *W* [in parameterizations of functions *K*(*W*) and *ϕ*(*W*) by Clapp and Hornberger (1978)]. SWAP is also sensitive to the soil column thickness *h*_{g0}, which, affecting the total soil water storage, controls to a great extent (along with some other factors) the partitioning of water entering a soil between an increment of soil water storage and drainage from the soil column. The root layer thickness *h _{r}* affects the maximum water storage available for transpiration, which occurs from this layer. Albedo

*α*determines the amount of nonreflected incoming solar radiation, which influences heat and water exchange at the land–atmosphere interface. LAI controls evapotranspiration. Vegetation albedo and LAI are time-varying parameters, for which monthly values should be specified in SWAP. A priori estimated monthly values were taken from the International Satellite Land-Surface Climatology Project, Initiative II (ISLSCIP-II) datasets. Here, we assume that a shape of the prescribed seasonal course of

*α*and LAI is correct, whereas the values of these parameters may contain biases. To correct the possible biases, we implement two adjustment factors (constant throughout a year):

*k*and

_{α}*k*

_{LAI}. The shape of the streamflow hydrograph is also influenced by the Manning roughness coefficient

*n*.

Besides the land surface parameters, meteorological forcing data also suffer from uncertainties and errors. If the forcing data are based on reanalysis products, they contain systematic errors (which reflect the biases and errors in the underlying general circulation models; Zhao and Dirmeyer 2003), resulting in errors in simulated heat and water balance components (Nasonova et al. 2008). The accuracy of mean areal forcing data derived from in situ measurements depends on the accuracy of measurements, as well as on the density and representativity of meteorological stations. To reduce the errors and uncertainties in forcing data on model performance, some authors began to calibrate the most influencing meteorological characteristics along with parameters of hydrological and land surface models (e.g., Gan et al. 2006; Xia 2007). Because precipitation and incoming radiation influence runoff generation to the greatest extent among the forcings, we decided to use the following adjustment factors: *k*_{lp}, *k*_{sp}, *k*_{sw}, and *k*_{lw} for liquid and solid precipitation and shortwave and longwave radiation, respectively.

### d. Calibration experiments

To verify whether the SWAP model is able to reproduce streamflow with the accuracy of the hydrological models, different strategies of SWAP calibration using streamflow from the 12 MOPEX basins were investigated. All calibrated parameters were kept within a reasonable range so as not to violate any physical constraints.

Because the SCE-UA calibration algorithm is widely used by the hydrological modeling community, we also decided to apply this algorithm to calibrate our model SWAP. To reach the best performance of SWAP, we carried out three calibration experiments (hereinafter SWAP_SCE1, SWAP_SCE2, and SWAP_SCE3) for each basin with a different number of model parameters to be calibrated: 10, 12, and 15, respectively (Table 4). In the first case (SWAP_SCE1), adjustment factors for forcing data and LAI were not used. Then we included adjustment factors for the main forcing data *k*_{lp}, *k*_{sw}, and *k*_{lw} in the set of calibrated parameters (see SWAP_SCE2 experiment in Table 4). Involving *k*_{sw} in the optimization procedure allowed us to exclude albedo *α* from the set of calibrated parameters because in the model, these parameters are presented as a product of albedo and shortwave radiation *R*_{sw}: *α* × *k*_{sw} × *R*_{sw}. Therefore, a decrease in *k*_{sw} together with an increase in *α* (and vice versa) may give the same results. In SWAP_SCE3, all 15 parameters were calibrated, including all adjustment factors (*k*_{sp} for the northern basins) to vegetation parameters and forcing data. In this case, automatic calibration was not followed by manual adjustment of the resulting parameters to reduce the obtained Bias, which was restricted by 5% in the automatic procedure.

It should be noted that in the MOPEX intercomparison experiments, the participating models did not use adjustment factors for forcing data. Despite this, we decided to perform the calibration experiment (SWAP_SCE3) with those adjustments to reveal their influence on streamflow simulation.

Because we used only the RST optimization algorithm in our previous investigations (Gusev et al. 2007, 2008; Nasonova and Gusev 2007), we decided to compare RST with SCE-UA. For this purpose, we performed a calibration of 12 model parameters (as in SWAP_SCE2) with the RST technique. This calibration experiment will be referred to as SWAP_RST.

All the described experiments are summarized in Table 4, which also includes SWAP’s calibration (referred to as SWAP_K0) performed during our participation in the MOPEX project. In this case, SWAP was calibrated manually by tuning only one parameter (hydraulic conductivity at saturation *K*_{0}) to minimize mean bias between simulated and measured annual streamflow.

## 6. Results

The results of daily streamflow simulations with different sets of calibrated parameters were compared with observations, with each other, and with the results from the hydrological models that participated in the MOPEX experiments. Following the MOPEX strategy, the period of 1960–79 was used for model calibration and the periods of 1980–98 and 1960–98 were used for model verification. The agreement between simulated and observed streamflow for each river basin was estimated at daily and monthly time scales, for the overall hydrograph and for different flow intervals using several goodness-of-fit statistics: the Nash–Sutcliffe coefficient of efficiency Eff, RMSE, systematic error, and the coefficient of correlation *r*. Hydrographs were also compared visually to reveal how the model reproduces the shape of hydrograph, including timing of peaks, recession slopes, and low flows.

### a. SWAP streamflow simulations with different sets of optimal parameters

#### 1) Streamflow simulations by SWAP

Figure 4 shows the progress in SWAP streamflow simulations in different calibration experiments compared to a priori results in terms of daily (Figs. 4a and 4b) and monthly (Figs. 4c and 4d) efficiency and absolute bias (Figs. 4e and 4f). The left panels contain median values for the 12 basins, whereas the right panels depict their standard deviations, which can serve as an indicator for model performance consistency in the 12 basins. Figure 5 displays Eff and Bias for each basin.

Comparison of results of SWAP using a priori parameters (SWAP_apriori) and calibrated hydraulic conductivity at saturation (SWAP_K0) shows that the calibration of *K*_{0} has resulted in substantial improvement of annual water balance for all the basins: for the calibration period, bias in the simulated streamflow is absent; for the validation and the entire calculational periods, median absolute bias is 4% and 2%, respectively (Fig. 4). This is not surprising because annual bias was minimized during the calibration. Model consistency in all the basins was greatly improved (standard deviation in SWAP_K0 case is much lower than in SWAP_apriori, as can be seen in Fig. 4, right panels). At the same time, such a poor calibration has not resulted in significant progress in the daily efficiency of streamflow simulations—their median values are nearly the same as in a priori run (Fig. 4). For several basins (3, 4, 5, 8, and 9), daily efficiency became even slightly lower, whereas biases significantly improved (Fig. 5). For the other basins, daily efficiency increased somewhat, especially for the basins 1, 11, and 12, which were the worst in a priori run (Eff was negative and Bias was large; Fig. 5).

Figure 6 illustrates hydrographs obtained in SWAP_apriori and SWAP_K0 experiments for the basins 1 and 7 for a couple of water years from the calibration period. In the case of basin 1 (the top panel of Fig. 6), in a priori run, bias was 78% (0.67 mm day^{−1}); that is, streamflow volume was greatly overestimated and daily Eff was negative—that is, runoff variations were not captured at all. This can be explained by poor estimation of a priori values of soil parameters. This can be expected because a large portion (36%) of the basin area is classified as “bedrock” (see Table 2); this category of the land surface is not treated in the model. An attempt to take bedrock into account in a priori estimation of soil parameters resulted in too low a value of *K*_{0} that caused incorrect partitioning of nonintercepted precipitation between surface runoff and infiltration. This, in turn, caused too much surface runoff and too low infiltration that substantially reduced the amount of water available for evapotranspiration from a soil root zone. As a result, we can see a lot of false peaks during the summer months in response to precipitation events in the upper hydrograph in Fig. 6. Calibration of *K*_{0} allowed us to improve the water balance: annual evapotranspiration significantly increased (by nearly 1.6 times), decrease in total runoff was twice as large as a result of a decrease in the fast component of streamflow. This improvement is especially evident during the summer months, when evapotranspiration is high, because some false peaks disappeared. Monthly efficiency changed from negative in a priori run to 0.54 after calibration. However, daily Eff is still low (0.22); that is, further improvement of model parameters is needed.

Let us consider another situation concerning the calibration of *K*_{0}, shown in the bottom panel in Fig. 6, where simulated and observed hydrographs for basin 7 are depicted. In this case, a priori estimation of model parameters was more successful: daily Eff was 0.61 for 1960–79, monthly Eff = 0.56, and Bias = −14.4%; that is, streamflow was underestimated. The calibration of *K*_{0} provided an improvement of annual water balance (Bias ≈ 0), monthly Eff grew to 0.67, whereas daily Eff practically did not change. Now, the timing of flood peaks is rather good, whereas their volume is overestimated and low flow is underestimated.

The improved calibration of SWAP provided much better results compared to SWAP_K0. After the calibration of 15 parameters in the SWAP_SCE3 experiment, overall daily Eff increased by 23.9%, 12.8%, and 15.7% for the calibration and validation periods, and the entire period, respectively; the corresponding increment of monthly efficiency was 28%, 15.2%, and 19.3%, respectively (Table 5). Mean absolute bias for all the cases and periods did not exceed 5%. Besides overall statistics, the simulation of streamflow hydrographs was substantially improved as compared to SWAP_K0 results. Figure 7 illustrates this statement for the same two basins and water years shown in Fig. 6. Comparing Figs. 6 and 7, we can see progress in the hydrograph simulation: flood picks, low flows, and recession slopes are now matching much better as compared to SWAP_a priori and SWAP_K0.

Comparison of different calibration experiments allowed us to reveal the following findings. The application of RST and SCE-UA optimization procedures for the same set of calibrated parameters gives closely consistent values of daily and monthly Eff and Bias (cf. SWAP_RST and SWAP_SCE2 in Figs. 4 and 5). On average, the RST set of optimal parameters results in daily Eff equaled to 64.7%, 63.7%, and 64.7% for the calibration and validation periods, and the entire 1960–98 period, respectively, whereas absolute Bias for the same periods is 1.2%, 6.4%, and 3.3%, respectively. The application of SCE-UA provides Eff equaled to 65.0%, 64.2%, and 64.9%, whereas absolute Bias is 1.5%, 5.3%, and 2.0% for the calibration and validation periods, and the entire period, respectively. The same regularity is also observed at monthly scale. Visual comparison of hydrographs reveals negligible differences. These results mean that the RST calibration technique is as effective as SCE-UA; however, the latter is somewhat more convenient because the application is less time and labor consuming.

Comparison of SWAP_SCE1 and SWAP_SCE2 results shows that involving the adjustment factors for forcing data in the calibration procedure provides, on average, a 3.8%, 2.3%, and 3% increase in daily efficiency for the calibration and validation periods, and the entire period, respectively (cf. SWAP-SCE1 and SWAP-SCE2 in Fig. 4). Such an increase seems to be small; however, for some basins, it reaches 8.8% for the calibration period and 7.3% for the validation period (SWAP-SCE1 and SWAP-SCE2 in Fig. 5). This may be explained by the different accuracy of areal estimates of forcing data for different basins. For monthly efficiency, the differences are larger: up to 19.7% for the calibration period and 16.7% for the validation period.

#### 2) Optimal values of the calibrated parameters

Optimal values of the most calibrated parameters, obtained by four calibration experiments, are shown in Fig. 8 in comparison with a priori estimates. As can be seen, different calibrations result in different values of optimal parameters. This can be explained by the different number of calibrated parameters. In such a situation, uncertainties in noncalibrated parameters, to which runoff is also sensitive, influence the results of calibration. Thus, in SWAP_SCE1, when we did not calibrate the adjustment factors for forcing data controlling runoff generation along with the other factors, errors and uncertainties in the forcings inevitably contributed to optimal values of model parameters, making them distinct from the corresponding optimal values in the other calibration experiments. At the same time, variations in the optimal parameter values are not very large and can be considered as acceptable. All the optimal values are physically reasonable because their calibration was performed within physically based bounds.

It is interesting to consider the results of calibration of the forcing data to reduce their possible biases (Fig. 8, the bottom panels). In the SWAP_SCE3 experiment, calibration of the adjustment factors for the forcing data leads, on average, to a decrease in incoming SW and LW radiation by 5% and 4%, respectively. This is consistent with uncertainties in estimates of incoming radiation fluxes deriving from standard meteorological observations. Thus, the analysis of global estimates of incoming SW and LW, taken from four different datasets [Surface Radiation Budget (SRB), International Satellite Cloud Climatology Project (ISCCP), NCEP/Department of Energy (DOE) reanalysis, and 40-yr European Centre for Medium-Range Weather Forecasts Re-Analysis (ERA-40)], has shown that the differences in the estimates reach 8% and 4% for SW and LW, respectively, whereas they may be somewhat larger on a regional scale (Nasonova et al. 2008). As for precipitation, the adjustment factors, averaged over the basins, are 1.12 and 1.04 for liquid and solid precipitation, respectively; that is, rainfall should be increased by 12% and snowfall by 4%. This is within the accuracy of estimation of areally averaged precipitation, which depends on the density of precipitation gauges and on biases associated with the gauge measurement process.

The adjustment factors *k _{α}* and

*k*

_{LAI}for the vegetation parameters were calibrated only in the SWAP_SCE3 experiment. The optimal values of

*k*are rather high (from 2 to 3). This can be caused by very low a priori estimates of vegetation albedo. For example, a priori estimated albedo varies from 0.08 to 0.1 during the summer months in most basins. Such low values are typical of spruce forests, whereas albedo of the other types of vegetation covering the basins (see Table 2 for the vegetation classes) should be higher. Multiplying such low values by

_{α}*k*ranging from 2 to 3, we come to reasonable values of albedo for most types of vegetation. The adjustment factor for LAI varies from 0.5 to 1.7 among the basins (averaging to 1.06). These values also seem to be feasible because they provide LAI within physically reasonable bounds.

_{α}#### 3) Sensitivity analysis

To check whether the selection of parameters for calibration was correct, we investigated the sensitivity of bias and efficiency of streamflow simulation to changes in model parameter values. The sensitivity analysis was performed using the optimal values of model parameters obtained in the SWAP_SCE3 experiment. We tested the effect of varying each parameter value around its optimum by ±20% independently of the other parameters (under their fixed optimal values) on Eff and Bias. Because *K*_{0} is characterized by a great spatial variability (it may vary by several orders of magnitude), we changed its value by 10^{±0.2}. The relative deviations of Eff (*η*) and Bias (*μ*) due to parameter variation are shown for each basin in Fig. 9. Figure 10 illustrates their values, averaged over the 12 basins and sorted in decreasing order. The larger the values of *η* or *μ*, the more sensitive Eff or Bias is to a parameter ±20% variation.

The analysis of the sensitivity results presented in Figs. 9 and 10 allows us to come to the following conclusions. First, simulation of rain-driven streamflow is largely sensitive to biases in incoming longwave radiation and rainfall. This confirms the importance of calibrating the adjustment factors for the forcing data together with the model parameters to reduce the effect of uncertainties and biases in the forcings on parameter calibration and on model simulations. Second, the sensitivity levels of systematic error and Eff to the same parameters are different. Third, *η* and *μ* vary greatly among the basins (Fig. 9); that is, the sensitivity level of streamflow simulations is site dependent. This can be explained by the influence of climatic conditions on model sensitivity. The largest sensitivity level to all the selected parameters is observed in the driest basins, those of 11 and 12 (especially in the basin 11, characterized by the lowest precipitation and runoff ratio). Lastly, we can confirm that the selection of parameters to be calibrated, based on our experience, was mainly correct. However, some model parameters may be excluded from the list of calibrated parameters because of the low sensitivity level. The comparison of SWAP_SCE2 and SWAP_SCE3 experiments shows that excluding *k*_{LAI}, *k _{α}*, and

*k*

_{sp}does not produce a significant decrease in streamflow simulations (cf. the results for the validation period in Table 5).

### b. Comparison of SWAP streamflow simulations with HMs

Here, we are going to compare SWAP streamflow simulations using optimal values of model parameters from the SWAP_SCE3 calibration experiment with corresponding simulations performed by the models that participated in the MOPEX project. The main focus will be on the more successful complex HMs.

Figure 11 presents daily and monthly efficiencies and absolute biases in terms of mean values (for 12 basins) and standard deviations for each model for the three periods. Figure 12 provides the same statistics for each basin. As can be seen from Fig. 11, after improved calibration, SWAP does much better compared to SWAP_K0. SWAP_SCE3 produces the same mean daily efficiency as GR4J and performs better than GR4J with respect to mean monthly Eff, Bias, and model performance consistency in 12 basins. The new SWAP results are very close to those from PRMS and SAC-SMA. Thus, for the entire period of 1960–98, PRMS provides the values of daily Eff, monthly Eff, and Bias equaled to 70.6%, 83.0%, and 4.2%, respectively, whereas these values are 69.2%, 84.1%, and 3.4% for SAC_SMA as compared to 66.3%, 82.0%, and 3.7% for SWAP_SCE3. The same statistics for SWAP_K0 are 50.6%, 62.7%, and 2.1%.

The top left panel in Fig. 12 clearly shows that all the models may be divided into two groups in accordance with daily Eff of streamflow simulation for the calibration period. SWAP_SCE3 is among the best hydrological models, whereas SWAP_K0 is in the group of models with poorer calibration. The same regularity is observed for the entire period of 1960–98. As for the validation period, the performance of SWAP_SCE3 is slightly worse compared to the calibration period. The middle panels clearly show that SWAP_SCE3 performs very well at monthly scale. The relative values of absolute bias are also acceptable. Thus, we can conclude that advanced model calibration allowed us to substantially improve overall efficiency of streamflow simulations, both on daily and monthly scales.

Comparison of hydrographs, simulated by PRMS, SAC-SMA, and SWAP_SCE3, are given in Fig. 13 for wet and dry basins (basin 7 and 12, respectively). Basin 7 is modeled by SWAP in a good agreement with observations, with the exception of flood peaks near days 35 (November 1977), 120 (the end of January 1978) and 520 (the beginning of March 1979). It is interesting that the flood peaks near days 35 and 520 are captured both by SWAP and by HMs, but modeled values, which are in a very good agreement with each other, are lower than measured. This can be explained by an imperfection in all the models or in input data, or by the low accuracy of flood measurements. At the same time, none of the models captures the flood peak near day 120. In the case of our model, this is connected with low air temperature (<0°C), at which precipitation is treated as snowfall. Snowfall results in the formation of snowpack rather than runoff generation. When the air temperature becomes positive, snowmelt occurs; however, the meltwater does not form runoff but infiltrates into the soil and increases soil water storage. Given partitioning precipitation between snowfall and rainfall, the situation can be improved. After day 130, SWAP performs quite adequately: observed peaks, low flows, and recession slopes are reproduced rather adequately compared to HMs. HMs have even more problems than SWAP during the winter of the first water year.

As to the dry basin 12 (see the bottom panel in Fig. 13), the agreement between SWAP simulation and observation is poorer. SWAP adequately captures low flows, whereas its response to precipitation events is sometimes larger than observed (e.g., near days 30, 225, 385, 570, and 680) and is sometimes smaller (around days 115, 140, 250, and 430). The HMs also have problems (e.g., inaccurate modeling peaks near days 110–150 and 250 and recession slopes near days 390–450), but they generally perform somewhat better. The correlation between observed and modeled flow is 0.79 and 0.86 for PRMS and SAC-SMA, respectively, as compared to 0.76 for SWAP for these two water years.

Generally, all the models perform worse in dry basins because the flow volumes are too small, and it is difficult to model them accurately because of the imperfection of input data (both forcings and model parameters) and model formulations. At the same time, the errors in modeled flow are also small (they may be much smaller than in the wet basins; compare with basin 7 in the top panel), only their relative values are large.

Finally, we would like to compare how the models reproduce streamflow within different flow intervals for the same wet and dry basins (Fig. 14). The flow intervals correspond to the 10% probability intervals at the exceedance probability curve; only for the higher flow with a probability less than 10% was the interval 0%–10% partitioned into four smaller intervals—<1%, 1%–2.5%, 2.5%–5%, and 5%–10%—to provide more details. The first interval in Fig. 14 includes the highest flood peaks, which occur with the probability <1%. The analysis was performed for the entire period of 1960–98; each 10% interval contains 1379 points with daily flow values. For each flow interval, different statistics were calculated for the two versions of SWAP calibration (SWAP_K0 and SWAP_SCE3) to show the progress in calibration, and for the PRMS and SAC-SMA models. Figure 14 includes mean observed and simulated values, RMSE, and the correlation coefficient.

As can be seen from Figs. 14a and 14b, improved calibration of SWAP has led to a decrease in RMSE (due to diminishing random error) and to higher correlation for all the intervals, with the exception of the highest flow, where SWAP_K0 is even better than the HMs. In wet basin 7, SWAP_SCE3 produces results that are comparable to, or in several intervals, slightly better than those obtained by either SAC-SMA or PRMS. In dry basin 12, SWAP_SCE3 less accurately reproduces the highest flow. The discrepancies in statistics for the other intervals are very small.

The results presented in section 6 show that after a proper calibration, SWAP is able to simulate daily streamflow with the accuracy comparable to that of hydrological rainfall–runoff models with a good structure.

## 7. Summary and conclusions

The research presented in this paper was motivated by the fact that in the MOPEX project, the rainfall–runoff hydrological models simulated streamflow, in general, better than the land surface models. The analysis of the MOPEX results allowed us to come to the conclusion that one of the main reasons for this seems to be a better calibration of the HMs compared to the LSMs. To test this statement, different strategies for calibration of the LSM SWAP using daily streamflow measured at the 12 MOPEX basins during 1960–79 were investigated. The results of the streamflow simulations for the 12 basins and for the calibration period (1960–79), the validation period (1980–98), and the entire (1960–98) period performed by SWAP using different sets of calibrated parameters were compared with observations and with each other to reveal the best set of optimal parameters. The best SWAP streamflow simulations were compared with the corresponding results obtained by the rainfall–runoff hydrological models that performed better than SWAP in the MOPEX project, to reveal whether the LSM SWAP can be as successful as the HMs in streamflow modeling. After substantial improvement of SWAP calibration, streamflow simulation was significantly improved (with respect to overall statistics on daily and monthly scales, flow interval statistics, and hydrograph shape, including volume and timing of peaks, recession slopes, and low flows) and the obtained results became close to those obtained by the HMs. The main conclusions from this investigation may be summarized as follows.

The accuracy of streamflow simulations depends more on the skill of calibration (the technique of calibration, the objective functions used, the choice of calibrated parameters) than on a model type (LSM or HM), provided that model quality (including, in particular, model structure, mathematical formalization of physical processes, model code, etc.) is good because calibration helps to compensate to some extent uncertainties and shortcomings in input data and model parameters rather than shortcomings of a model. The lower the quality of input information, the clearer the dependence.

Uncertainties and errors in forcing data can be partly compensated by the application of adjustment factors for those meteorological characteristics, which influence runoff generation to a greater extent. Calibration of such factors together with model parameters allows one to reduce the influence of systematic errors in forcing data on the optimization of model parameters and on model performance.

The land surface model SWAP can simulate river runoff after appropriate calibration with accuracy comparable to that of rainfall–runoff hydrological models (at least for the river basins with an area of ∼10

^{3}km^{2}). It could be expected that if a model adequately treats physical mechanisms of heat and water exchange processes occurring in a soil–vegetation–atmosphere system, it has a potential to simulate runoff with a high accuracy. Poor runoff simulation may result from a low quality of input data and model parameters. Therefore, thorough calibration of the most important parameters using streamflow observations can improve the results.

## Acknowledgments

This work was supported by the Russian Foundation for Basic Research (Grant 08-05-00027). We acknowledge the MOPEX experiment organizers (Profs. J. Schaake and Q. Duan) for providing us with the data to run the model and with the results of streamflow simulations from different participating models. We appreciate Profs. Q. Duan, H. V. Gupta, and S. Sorooshian for the free distribution of SCE-UA code. Critical reviews by the anonymous reviewers and chief editor (Prof. A. Barros) greatly improved the manuscript. We also appreciate chief editorial assistant T. Scott for her editorial recommendations.

## REFERENCES

Andreassian, V., and Coauthors, 2006: Catalogue of the models used in MOPEX 2004/2005.

,*IAHS Publ.***307****,**41–93.Bastidas, L. A., Gupta H. V. , Sorooshian S. , Shuttleworth W. J. , and Yang Z. L. , 1999: Sensitivity analysis of a land surface scheme using multicriteria methods.

,*J. Geophys. Res.***104****,**(D16). 19481–19490.Boone, A., and Coauthors, 2004: The Rhône-Aggregation Land Surface Scheme intercomparison project: An overview.

,*J. Climate***17****,**187–208.Budagovskiy, A. I., 1989: Principles of the method of calculating the duty of water and irrigation regimes.

,*Water Resour.***16****,**27–35.Burnash, R. J. C., Ferral R. L. , and McGuire R. A. , 1973: A generalized streamflow simulation system—Conceptual modeling for digital computers. National Weather Service, NOAA, and the State of California Tech. Rep. Joint Federal and State River Forecast Center, 204 pp.

Clapp, R. B., and Hornberger G. M. , 1978: Empirical equations for some soil hydraulic properties.

,*Water Resour. Res.***14****,**601–604.Dickinson, R. E., Henderson-Sellers A. , Kennedy P. J. , and Wilson M. F. , 1986: Biosphere-Atmosphere Transfer Scheme (BATS) for the NCAR Community Climate Model. NCAR Tech. Note NCAR/TN-275+STR, 82 pp.

Duan, Q., Sorooshian S. , and Gupta V. K. , 1992: Effective and efficient global optimization for conceptual rainfall runoff models.

,*Water Resour. Res.***28****,**1015–1031.Duan, Q., Gupta H. V. , Sorooshian S. , Rousseau A. N. , and Turcotte R. , Eds.,. 2003:

*Calibration of Watershed Models*. Water Science and Application Series, Vol. 6, Amer. Geophys. Union, 345 pp.Duan, Q., and Coauthors, 2006: Model Parameter Estimation Experiment (MOPEX): An overview of science strategy and major results from the second and third workshops.

,*J. Hydrol.***320****,**3–17.Farnsworth, R. K., Thompson E. S. , and Peck E. L. , 1982: Evaporation atlas for the contiguous 48 United States. NOAA Tech. Rep. NWS 33, 26 pp.

Gan, T. Y., and Burges S. J. , 2006: Assessment of soil-based and calibrated parameters of the Sacramento model and parameter transferability.

,*J. Hydrol.***320****,**117–131.Gan, T. Y., Gusev Ye M. , Burges S. J. , and Nasonova O. N. , 2006: Performance comparison of a complex, physics-based land surface model and a conceptual, lumped-parameter, hydrologic model at the basin-scale.

,*IAHS Publ.***307****,**196–207.Gusev, E. M., 1998: Evaporation from soil under drying.

,*Eurasian Soil Sci.***31****,**836–840.Gusev, E. M., and Nasonova O. N. , 2004a: Simulation of heat and water exchange at the land–atmosphere interface on a local scale for permafrost territories.

,*Eurasian Soil Sci.***37****,**1077–1092.Gusev, E. M., and Nasonova O. N. , 2004b: Challenges in studying and modeling heat and moisture exchange in soil–vegetation/snow cover–surface air layer systems.

,*Water Res.***31****,**132–147.Gusev, E. M., Nasonova O. N. , and Dzhogan L. Ya , 2006: The simulation of runoff from small catchments in the permafrost zone by the SWAP model.

,*Water Res.***33****,**115–126.Gusev, E. M., Nasonova O. N. , Dzhogan L. Ya , and Kovalev E. E. , 2008: The application of the land surface model for calculating river runoff in high latitudes.

,*Water Res.***35****,**171–184.Gusev, Ye M., and Nasonova O. N. , 1998: The land surface parameterization scheme SWAP: Description and partial validation.

,*Global Planet. Change***19****,**63–86.Gusev, Ye M., and Nasonova O. N. , 2000: An experience of modelling heat and water exchange at the land surface on a large river basin scale.

,*J. Hydrol.***233****,**1–18.Gusev, Ye M., and Nasonova O. N. , 2002: The simulation of heat and water exchange at the land-atmosphere interface for the boreal grassland by the land-surface model SWAP.

,*Hydrol. Processes***16****,**1893–1919.Gusev, Ye M., and Nasonova O. N. , 2003: The simulation of heat and water exchange in the boreal spruce forest by the land-surface model SWAP.

,*J. Hydrol.***280****,**162–191.Gusev, Ye M., and Nasonova O. N. , 2006: Simulating runoff from MOPEX experimental river basins using the land surface model SWAP and different parameter estimation techniques.

,*IAHS Publ.***307****,**188–195.Gusev, Ye M., Nasonova O. N. , Dzhogan L. Ya , and Kovalev Ye E. , 2007: Hydrological predictability investigation of global data sets for high-latitude river basins.

,*IAHS Publ.***313****,**127–133.Kistler, R., and Coauthors, 2001: The NCEP–NCAR 50-Year Reanalysis: Monthly means CD-ROM and documentation.

,*Bull. Amer. Meteor. Soc.***82****,**247–267.Kuczera, G., 1997: Efficient subspace probabilistic parameter optimization for catchment models.

,*Water Resour. Res.***33****,**177–185.Leavesley, G. H., Lichty R. W. , Troutman B. M. , and Saindon L. G. , 1983: Precipitation-runoff modeling system: User’s manual. USGS Rep. 83-4238, Water-Resources Investigation Report Series, 207 pp.

Manabe, S., 1969: Climate and the ocean circulation. 1: The atmospheric circulation and the hydrology of the earth’s surface.

,*Mon. Wea. Rev.***97****,**739–805.Mengelkamp, H-T., Warrach K. , Ruhe C. , and Raschke E. , 2001: Simulation of runoff and streamflow on local and regional scales.

,*Meteor. Atmos. Phys.***76****,**107–117.Nash, J. E., and Sutcliffe J. V. , 1970: River flow forecasting through conceptual models part I—A discussion of principles.

,*J. Hydrol.***10****,**282–290.Nasonova, O. N., and Gusev Ye M. , 2007: Can a land surface model simulate runoff with the same accuracy as a hydrological model?

,*IAHS Publ.***313****,**258–265.Nasonova, O. N., Gusev Ye M. , and Kovalev Ye E. , 2008: Global estimates of the land heat and water balance components using the land surface model and different data sets (in Russian).

,*Izv. Russ. Acad. Sci. Ser. Geogr.***1****,**8–19.Nelder, J. A., and Mead R. , 1965: A simplex method for function minimization.

,*Comput. J.***7****,**308–313.Noilham, J., and Mahfouf J-F. , 1996: The ISBA land surface parameterization scheme.

,*Global Planet. Change***13****,**145–159.Perrin, C., Michel C. , and Andreassian V. , 2003: Improvement of a parsimonious model for streamflow simulation.

,*J. Hydrol.***279****,**275–289.Polcher, J., 2001: The Global Land-Atmosphere System Study (GLASS).

*BAHC–GEWEX News*joint issue,*BAHC News,*No. 9, and*GEWEX News,*Vol. 11, No. 2, International GEWEX Project Office, Silver Spring, MD, 5–6.Sellers, P. J., Mintz Y. , Sud Y. C. , and Dalcher A. , 1986: A Simple Biosphere Model (SiB) for use within general circulation models.

,*J. Atmos. Sci.***43****,**505–531.Solomatine, D. P., Dibike Y. B. , and Kukuric N. , 1999: Automatic calibration of groundwater models using global optimization techniques.

,*Hydrol. Sci. J.***44****,**879–894.Verseghy, D. L., 1991: CLASS – A Canadian land surface sheme for GCMs. 1. Soil model.

,*Int. J. Climatol.***11****,**111–113.Wood, E. F., and Coauthors, 1998: The Project for Intercomparison of land-surface Parameterization Schemes (PILPS) phase 2(c) Red-Arkansas River basin experiment: 1. Experiment description and summary intercomparisons.

,*Global Planet. Change***19****,**115–135.Xia, Y., 2007: Calibration of LaD model in the northeast United States using observed annual streamflow.

,*J. Hydrometeor.***8****,**1098–1110.Zhao, M., and Dirmeyer P. A. , 2003: Production and analysis of GSWP-2 near-surface meteorology data sets. COLA Tech. Rep. 159, 38 pp.

Average daily Eff of streamflow simulations for the 12 MOPEX basins by different models for 39-yr period (1960–98).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Average daily Eff of streamflow simulations for the 12 MOPEX basins by different models for 39-yr period (1960–98).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Average daily Eff of streamflow simulations for the 12 MOPEX basins by different models for 39-yr period (1960–98).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Schematic illustration of soil column discretization and water fluxes generating surface runoff and drainage in the LSM SWAP. Here, *h _{r}* is the root layer depth,

*h*is the water table,

_{g}*h*

_{g0}is the depth to impermeable layer.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Schematic illustration of soil column discretization and water fluxes generating surface runoff and drainage in the LSM SWAP. Here, *h _{r}* is the root layer depth,

*h*is the water table,

_{g}*h*

_{g0}is the depth to impermeable layer.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Schematic illustration of soil column discretization and water fluxes generating surface runoff and drainage in the LSM SWAP. Here, *h _{r}* is the root layer depth,

*h*is the water table,

_{g}*h*

_{g0}is the depth to impermeable layer.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Median values of (a) daily and (c) monthly Eff, (e) absolute value of Bias, and interbasin STD of (b) daily and (d) monthly efficiency, and (f) Bias for the 12 MOPEX river basins from a priori simulations and calibrated results of SWAP for the calibration period (gray), the validation period (white), and the entire period (black).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Median values of (a) daily and (c) monthly Eff, (e) absolute value of Bias, and interbasin STD of (b) daily and (d) monthly efficiency, and (f) Bias for the 12 MOPEX river basins from a priori simulations and calibrated results of SWAP for the calibration period (gray), the validation period (white), and the entire period (black).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Median values of (a) daily and (c) monthly Eff, (e) absolute value of Bias, and interbasin STD of (b) daily and (d) monthly efficiency, and (f) Bias for the 12 MOPEX river basins from a priori simulations and calibrated results of SWAP for the calibration period (gray), the validation period (white), and the entire period (black).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and Bias for each of the 12 MOPEX river basins from a priori simulations and calibrated results of SWAP for the different periods under study.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and Bias for each of the 12 MOPEX river basins from a priori simulations and calibrated results of SWAP for the different periods under study.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and Bias for each of the 12 MOPEX river basins from a priori simulations and calibrated results of SWAP for the different periods under study.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP in SWAP_apriori and SWAP_K0 experiments for two basins and two water years (the day numbers are given from 1 Oct of the previous year), as compared to observed hydrographs. Precipitation is shown in the top part of each panel.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP in SWAP_apriori and SWAP_K0 experiments for two basins and two water years (the day numbers are given from 1 Oct of the previous year), as compared to observed hydrographs. Precipitation is shown in the top part of each panel.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP in SWAP_apriori and SWAP_K0 experiments for two basins and two water years (the day numbers are given from 1 Oct of the previous year), as compared to observed hydrographs. Precipitation is shown in the top part of each panel.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP in two calibration experiments (SWAP_K0 and SWAP_SCE3), presented for two basins and two water years in comparison with observed hydrographs.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP in two calibration experiments (SWAP_K0 and SWAP_SCE3), presented for two basins and two water years in comparison with observed hydrographs.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP in two calibration experiments (SWAP_K0 and SWAP_SCE3), presented for two basins and two water years in comparison with observed hydrographs.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

The optimal values of calibrated SWAP model parameters and adjustment factors from different calibration experiments (SCE1, SCE2, SCE3, and RST), given in comparison with a priori values.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

The optimal values of calibrated SWAP model parameters and adjustment factors from different calibration experiments (SCE1, SCE2, SCE3, and RST), given in comparison with a priori values.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

The optimal values of calibrated SWAP model parameters and adjustment factors from different calibration experiments (SCE1, SCE2, SCE3, and RST), given in comparison with a priori values.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Sensitivity of daily Eff and systematic error to the calibrated parameters for each of the 12 basins.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Sensitivity of daily Eff and systematic error to the calibrated parameters for each of the 12 basins.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Sensitivity of daily Eff and systematic error to the calibrated parameters for each of the 12 basins.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Sensitivity of daily Eff and systematic error, averaged over the 12 basins and sorted in decreasing order, to the calibrated parameters.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Sensitivity of daily Eff and systematic error, averaged over the 12 basins and sorted in decreasing order, to the calibrated parameters.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Sensitivity of daily Eff and systematic error, averaged over the 12 basins and sorted in decreasing order, to the calibrated parameters.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and the absolute value of Bias of streamflow simulations performed by different models for different periods averaged over 12 basins. The vertical bars represent STDs.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and the absolute value of Bias of streamflow simulations performed by different models for different periods averaged over 12 basins. The vertical bars represent STDs.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and the absolute value of Bias of streamflow simulations performed by different models for different periods averaged over 12 basins. The vertical bars represent STDs.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and the absolute values of Bias, sorted in increasing order, for each of the 12 basins and the different periods from calibrated results of the models, participated in the MOPEX experiments, and from the SWAP_SCE3 experiment.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and the absolute values of Bias, sorted in increasing order, for each of the 12 basins and the different periods from calibrated results of the models, participated in the MOPEX experiments, and from the SWAP_SCE3 experiment.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Daily and monthly Eff and the absolute values of Bias, sorted in increasing order, for each of the 12 basins and the different periods from calibrated results of the models, participated in the MOPEX experiments, and from the SWAP_SCE3 experiment.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP, SAC-SMA, and PRMS models for two basins and two water years in comparison with observed hydrographs and precipitation.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP, SAC-SMA, and PRMS models for two basins and two water years in comparison with observed hydrographs and precipitation.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Streamflow hydrographs simulated by SWAP, SAC-SMA, and PRMS models for two basins and two water years in comparison with observed hydrographs and precipitation.

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Flow interval statistics for the (a) wet basin 7 and (b) dry basin 12 for the entire 39-yr period (1960–98), calculated for SAC-SMA and PRMS streamflow simulations and for two different calibrations of SWAP (SWAP_K0 and SWAP_SCE3).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Flow interval statistics for the (a) wet basin 7 and (b) dry basin 12 for the entire 39-yr period (1960–98), calculated for SAC-SMA and PRMS streamflow simulations and for two different calibrations of SWAP (SWAP_K0 and SWAP_SCE3).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Flow interval statistics for the (a) wet basin 7 and (b) dry basin 12 for the entire 39-yr period (1960–98), calculated for SAC-SMA and PRMS streamflow simulations and for two different calibrations of SWAP (SWAP_K0 and SWAP_SCE3).

Citation: Journal of Hydrometeorology 10, 5; 10.1175/2009JHM1083.1

Basic characteristics of 12 MOPEX river basins. Here, *P* is climatic precipitation from PRISM data, *Q* is climatic streamflow discharge (1961–90), and PE is NOAA climatic potential evaporation.

Spatial coverage of each of the USDA soil texture classes and the University of Maryland vegetation classes in the basins. The table heads are defined as follows: lS, loamy sand; sL, sandy loam; siL, silt loam; Si, silt; L, loam; scL, sandy clay loam; sicL, silty clay loam; cL, clay loam; sC, sandy clay; siC, silty clay; C, clay; BR, bedrock; and O, other. ENF, evergreen needleleaf forest; DBF, deciduous broadleaf forest; MC, mixed cover; Wd, woodland; WG, wooded grassland; Gr, grassland; and Cr, cropland.

List of models in the MOPEX project. NWS refers to National Weather Service.

The list of parameters that were optimized in different calibration experiments. The + indicates parameters that were calibrated for all basins and ++ indicates parameters calibrated for some basins.

Statistical comparison of different models runs (averages across 12 basins).