## 1. Introduction

Land surface models (LSMs) have been developed by the atmospheric science community to provide atmospheric models with bottom boundary conditions (water and energy balance) and to serve as the land base for hydrologic modeling. Over the past two decades, overland and subsurface runoff calculations done by LSMs have extensively been used to provide water inflow to river routing models that calculate river discharge (De Roo et al. 2003; Habets et al. 1999a–c, 2008; Lohmann et al. 1998a,b, 2004; Maurer et al. 2001; Oki et al. 2001; Olivera et al. 2000). However, river routing within LSMs has traditionally been done using gridded river networks that best fit the computational domain used in LSMs. Today, geographic information system (GIS) hydrographic datasets are increasingly becoming available at continental [e.g., the National Hydrography Dataset Plus (NHDPlus; Horizon Systems Corporation 2007)] and global scales [e.g., Hydrological Data and Maps Based on Shuttle Elevation Derivatives at Multiple Scales (HydroSHEDS; Lehner et al. 2006)]. These datasets provide a vector-based representation of the river network using the “blue line” mapped rivers and streams. Furthermore, observations of the river systems are now widely available in databases such as the U.S. Geological Survey (USGS) National Water Information System (NWIS) for the United States, in which thousands of gauges are available along with their exact location on the NHDPlus river network. Most studies mentioned above—with the exception of Habets et al. (2008)—use a limited number of gauges throughout large river basins, often focusing on gauges located at river mouths. As the spatial and temporal resolutions of weather and climate models and their underlying land surface models increase, using gauges located across basins would help in diagnosing the quality of LSM computations. The latest work on general circulation models by the international scientific community, especially by the Intergovernmental Panel on Climate Change (Solomon et al. 2007), opens potential studies of the evolution of water resources with global change. Using mapped streams and water bodies in LSMs could benefit the resulting assessment of the impact of global change in water resources by providing estimation of changes at the blue-line level. Furthermore, the use of parallel computing is quite common in regional- to global-scale atmospheric and ocean modeling but comparatively infrequent in modeling of large river networks. Generally, parallel computing can be utilized to either solve problems of increasing size [as done with the Parallel Flow simulator (ParFlow); Jones and Woodward 2001; Kollet and Maxwell 2006; Kollet et al. 2010] or to decrease computation time (see, e.g., Apostolopoulos and Georgakakos 1997; Larson et al. 2007; Leopold et al. 2006; von Bloh et al. 2010). These two types of approaches to parallel computing are respectively referred to as scalability and speedup of calculations; the work presented herein focuses on the latter. Apostolopoulos and Georgakakos (1997) investigated the speedup of streamflow computations using hydrologic models in river networks as a function of network decomposition and of the computing time ratio between vertical and horizontal water balance calculations. Simple river routing within LSMs being traditionally performed by carrying computations from upstream to downstream, one way to speed up river flow modeling is to use a sequential river routing code to compute independent basins on different processing cores, as done in Leopold et al. (2006) and in Larson et al. (2007). Such methods allow avoiding interprocessor communication but result in imbalanced computing loads when some basins are much larger than others. Leopold et al. (2006) partly addressed load imbalance by using parallel computing for surface water balance, but the river routing part remains sequential. Von Bloh et al. (2010) implemented a routing method in which computations do not have to be carried in order from upstream to downstream, therefore obtaining almost perfect speedup. The work developed herein investigates a way to obtain speedup while retaining traditional upstream-to-downstream computations that are used in most river routing schemes.

The present study links a land surface model with a new river network model called Routing Application for Parallel Computation of Discharge (RAPID) using NHDPlus for the representation of the river network and USGS NWIS gauges for the optimization of model parameters and the assessment of river flow computations. All models and datasets used herein are available at least for the contiguous United States. The work presented here focuses first on the Guadalupe and San Antonio River basins in Texas (see Fig. 1), together covering a surface area of about 26 000 km^{2}. These basins have about 5000 river reaches and their corresponding catchments in the NHDPlus dataset (see Fig. 2) out of 3 million for the United States. These two basins are also chosen for study because of significant contributions to surface water flow from groundwater sources, because of a large reservoir at Canyon Lake where the impacts of constructed infrastructure on flow dynamics have to be considered, and because these rivers flow out into an estuarine system at San Antonio Bay. A synthetic study of the performance of RAPID in a parallel computing environment is also presented using the upper Mississippi River basin (see Fig. 3), which has about 180 000 river reaches in NHDPlus and covers an area of about 490 000 km^{2}.

The research presented in this paper aims at answering the following questions: how can a river model be developed for calculation of flow and volume of water in a river network of thousands of blue-line river reaches? How can the connectivity information in NHDPlus be used to run a river network model in part of the United States? How can flow at ungauged locations be reconstructed? How can model computations be assessed and optimized based on all available measurements? How can parallel computing be used to speed up upstream-to-downstream computations of river flow within a large river network?

First, the development of the RAPID model is presented. Then, the modeling framework for calculation of river flow in the Guadalupe and San Antonio River basins using runoff data from a land surface model is developed, followed by results. Finally, the speedup of RAPID in a parallel computing environment is assessed.

## 2. Model development

The model presented here is named RAPID (http://www.geo.utexas.edu/scientist/david/rapid.htm). RAPID is based on the traditional Muskingum method that was first introduced by McCarthy (1938) and has been extensively studied in the literature in the past 70 years. The Muskingum method has two parameters, *k* and *x*, respectively a time and a dimensionless parameter. Among the most noteworthy papers related to the Muskingum method, Cunge (1969) showed the Muskingum method is a first-order approximation of the kinematic and diffusive wave equation and proposed a method known as the Muskingum–Cunge method—a second-order approximation of the kinematic and diffusive wave equation—in which the Muskingum parameters are computed based on mean physical characteristics of the river channel and of the flow wave. Koussis (1978) proposed a variable-parameter Muskingum method based on the Muskingum–Cunge method where *k* varies with the flow but *x* remains constant on the grounds that the Muskingum method is relatively insensitive to this parameter. Other variable-parameter Muskingum methods allow both *k* and *x* to vary (see, e.g., Miller and Cunge 1975; Ponce and Yevjevich 1978), although these variable-parameter methods fail to conserve mass (Ponce and Yevjevich 1978). Notable large-scale uses of the variable-parameter Muskingum–Cunge method include Orlandini and Rosso (1998) and Orlandini et al. (2003). More recently, Todini (2007) developed a mass-conservative variable-parameter Muskingum method known as the Muskingum–Cunge–Todini method.

As a first step, the traditional Muskingum method with temporally constant parameters calculated partly based on the work of Cunge (1969) is used in this study because there are significant challenges to overcome in adapting the Muskingum method for river networks, in efficiently running it within a parallel computing environment, and in developing an automated parameter estimation procedure before more sophisticated flow equations are used. However, the physics of flow could be improved with many variations based on the Muskingum method or adapted to the Saint Venant equations.

### a. Calculation of flow and volume of water in a river network

*t*is time and Δ

*t*is the river routing time step. The bold roman notation is used for vectors and bold sans serif font for matrices:

**Q**is a vector of outflows from each reach; and

**Q**

^{e}is a vector of lateral inflows for each reach. Such a vector-matrix formulation of the Muskingum method has to our knowledge never been previously published.

Equation (1) is used for river network routing and can be solved using a linear system solver. The vector-matrix notation provides one flow equation for the entire river network, therefore avoiding spatial iterations. For a river network with *m* river reaches, all vectors are of size *m* and all matrices are square of size *m*. Each element of a vector corresponds to one river reach in the network. For performance purposes, all matrices are stored as sparse matrices (only the nonzero values are recorded). A five-reach, two-node, and two-gauge river network is used here to clarify the mathematical formulation of the river network model and is shown in Fig. 4a. The river network is made up of a combination of river reaches similar to that of Fig. 4b. The model formulation is presented here for a small river network but can be generalized to any size of river network.

**Q**is a vector of the outflows

*Q*of all reaches of the river network, where

_{j}*j*is the index of a river reach within the network:and

**Q**

^{e}is a vector of flows

**Q**

^{e}. Two assumptions are made in the development of RAPID, one regarding the temporal variability of

**Q**

^{e}and one regarding the location at which

**Q**

^{e}enters the river network. In this study, the river routing time step is 15 min and inflow from land surface runoff is available every 3 h. In the derivation of Eq. (1),

**Q**

^{e}is assumed constant [i.e.,

**Q**

^{e}(

*t*+ Δ

*t*) =

**Q**

^{e}(

*t*)] over all 15-min river routing time steps included within a given land surface model 3-h time step. This partial temporal uniformity simplifies the river network model formulation, limits the quantity of input data, and facilitates the coupling with land surface models. This assumption is valid at all times except at the last routing time steps before a new

**Q**

^{e}is made available by the land surface model. Also, the external inflow

**Q**

^{e}is assumed to enter the network as an addition to the upstream flow. With these two assumptions, the Muskingum method applied to reach 5 in Fig. 4b gives the following:where

*C*

_{1},

*C*

_{2}, and

*C*

_{3}are the Muskingum parameters that are stated in Eq. (6). The reader should note that these two assumptions are equivalent to using a unit-width lateral inflow along with a term

*C*

_{4}as found in available literature (see, e.g., Fread 1993; NERC 1975; Orlandini and Rosso 1998; Ponce 1986). Equation (1) is a generalization of Eq. (4) using a vector-matrix notation.

*i*and column

*j*if reach

*j*flows into reach

*i*, and zero is used everywhere else:The upstream inflow to the network can therefore be computed by multiplying the network connectivity matrix

**Q**. In case of a divergence in the river network (when going downstream) or in case of a loop, a unique reach (the major divergence) is used to carry all the upstream flow, and the other reaches (minor divergences) carry only the flow that results from their lateral inflow. This formulation could be modified to take into account given fractions of flows that separate into different parts of a divergence if that information is available.

*C*

_{1j},

*C*

_{2j}, and

*C*

_{3j}, such thatwhere

*k*is a storage constant (with dimension of a time) and

_{j}*x*a dimensionless weighting factor characterizing the relative influence of the inflow and the outflow on the volume of the reach

_{j}*j*. The Muskingum method is stable for any

*x*∈ [0, 0.5], regardless of the value of

*k*and Δ

*t*(Cunge 1969). For any

*j*,

*C*

_{1j}+

*C*

_{2j}+

*C*

_{3j}= 1.

*k*and

*x*of the Muskingum method are allowed to differ from one river reach to another, and corresponding vectors are defined in Eq. (7):The constants defined in Eq. (6) are used as the diagonal elements of the matrices

**V**is a vector of the volume of water

*V*in each river reach

_{j}*j*:Details on the massively parallel implementation of the matrix-based Muskingum method presented in this section, and of the automated parameter estimation presented in the section below, are given in appendix A.

### b. Parameter estimation

**k**and

**x**to be used in RAPID, an inverse method is developed. The principle of an inverse method is to optimize the parameters of a model so that the outputs of the model approach observations. A cost function reflecting the difference between model calculations and observations is needed to assess the quality of a set of model parameters. The best set of parameters is chosen as the set that minimizes the cost function and is determined through optimization. A square-error cost function

*φ*is chosen:where the summation is made daily. The T in the exponent is for vector transpose, and

*t*and

_{o}*t*are respectively the first and last day used for the calculation of

_{f}*φ*. The model parameter vectors

**k**and

**x**are kept constant within the temporal interval [

*t*,

_{o}*t*], and the cost function is calculated several times with different sets of parameters during the optimization procedure. The scalar

_{f}*f*allows

*φ*to be on the order of magnitude of 10

^{1}, which is helpful for automated optimization procedures;

**Q**

^{g}(

*t*) is a vector with the total number of river reaches for dimension, with the daily value observed

*j*where gauge measurements are available and zero where no gauge is available;

*j*if a gauge is available on reach

*j*and zero everywhere else. Using the example network given in Fig. 4a,

**Q**

^{g}(

*t*) take the following form:According to Fread (1993),

*x*∈ [0.1, 0.3] in most streams. By analogy with the kinematic wave equation, Cunge (1969) showed that the parameter

*k*of the Muskingum method is the travel time of a flow wave through a river reach. For a given river reach

*j*of length

*L*where a flow wave of celerity

_{j}*c*travels,

_{j}*k*is obtained by dividing the length by the celerity of the wave, as shown in Eq. (13):Although the routing model defined by Eq. (1) allows for variability of the parameters (

_{j}*k*,

_{j}*x*) on a reach-to-reach basis, attempting to automatically estimate model parameters independently for all the reaches of a basin would be a costly undertaking. Therefore, the search for optimal parameters is limited to determining two multiplying factors

_{j}*λ*and

_{k}*λ*such thatTo minimize the influence of the initial guess on the optimization procedure, three different initial guesses for (

_{x}*λ*,

_{k}*λ*) are used. Out of the three corresponding optimal (

_{x}*λ*,

_{k}*λ*) obtained, only the one couple leading to the minimum value of the cost function

_{x}*φ*is kept. Therefore, the optimization procedure leads to only one optimal couple (

*λ*,

_{k}*λ*) for a given basin in the network. Note that—as a first step—

_{x}**x**is here constant over a given basin on the grounds that the Muskingum method is relatively insensitive to this parameter (Koussis 1978). Some data available in NHDPlus (such as mean flow, mean velocity, slope, etc.) associated with available formulations for

**x**(e.g., Cunge 1969; Orlandini and Rosso 1998) could be used to improve the proposed method.

## 3. Application

RAPID is designed to handle large routing problems. Given a river network and connectivity information as well as lateral inflow to the river network, RAPID can run on any river network. In this study, a framework for computation of river flow in the Guadalupe and San Antonio River basins is developed that uses a one-way modeling framework with an atmospheric dataset, a land surface model, and RAPID as the river model. This section presents how the Guadalupe and San Antonio River basins are described in the NHDPlus dataset, how a land surface model is used to provide lateral inflow to the river network, and how the meteorological forcing is prepared.

### a. RAPID used on NHDPlus

There are a total of 5175 river reaches with known direction and connectivity within the NHDPlus description of the Guadalupe and San Antonio river basins (as shown in Fig. 2). These 5175 reaches have an average length of 3.00 km and the average catchment defined around them is 5.11 km^{2} in area; all are used for this study. Details on the fields used in the NHDPlus dataset, including the unique “common identifier” (COMID) used for all river reaches and their corresponding catchments, and on how NHDPlus is used with RAPID, are given in appendix B. In this study, the vector of outflows in all river reaches **Q** was arbitrarily initialized to the uniform value of 0 m^{3} s^{−1} prior to running RAPID.

### b. Land surface model and coupling with RAPID

Within this study, the core physical model governing the one-dimensional vertical fluxes of energy and moisture is the community Noah land surface model with multiparameterization options (Noah-MP; Niu et al. 2011). Noah-MP offers multiple options for choosing the modeling of certain physical phenomena. In this study, the soil moisture factor for stomatal resistance is of “Noah type” (Niu et al. 2011) and the runoff scheme is TOPMODEL based, using a simple groundwater model (SIMGM; Niu et al. 2007). The soil column is 2 m deep, below which is an unconfined aquifer. To represent the characteristics of the structural soil over the model domain, the saturated hydraulic conductivity, which is determined by the soil texture data, is enlarged by factor of 10 (through calibration). The soil hydrology of Noah (soil moisture) is run at an hourly time step and runoff data are produced every three hours. In this study, the state variables of Noah were initialized through a spinup method.

Noah-MP calculates the amount of water that runs off on and below the land surface. This quantity is used to provide RAPID with the water inflow from outside of the river network. David et al. (2009) presented a coupling technique using a hydrologically enhanced version of the Noah LSM called “Noah distributed” (Gochis and Chen 2003) that allows physically based modeling of the horizontal movement of surface and subsurface water from the land surface to a river reach. In interest of a simpler coupling scheme, the work of David et al. (2009) has been modified. In this study, a flux coupler between Noah and RAPID is developed using the catchments available in the NHDPlus dataset.

The NHDPlus catchments contributing runoff to each river reach were determined as part of the NHDPlus development using a digital elevation model and its associated flow accumulation and flow direction grids. These grids have a native resolution of 30 m. The map of catchments is available in NHDPlus in both gridded (at 30-m resolution) and vector formats in a shape file. Running a land surface model at a 30-m resolution is very resource demanding. Therefore, a coarser resolution of 900-m cell size is chosen. The shape file of NHDPlus catchment boundaries is converted to a grid of size 900 m. Within this conversion process, the accuracy of the boundaries of the catchments is lowered but the catchment boundaries are reasonably respected and the computational cost of the land surface model calculations is reasonable. For each 3-h output of the Noah model, surface and subsurface runoff data is superimposed onto the catchment grid, and all runoff that corresponds to the catchment of each river reach is summed and used as the water inflow to the river reach. Figure 5 shows the principle of the flux coupler in which the 900-m runoff data generated by the Noah model is superimposed on the 900-m map of NHDPlus catchment COMIDs to determine the lateral inflow for NHDPlus reaches used by RAPID.

Therefore, no horizontal routing is used between the land surface and the river network in the proposed scheme. This differs from some other models that use runoff from a one-dimensional model to force a river routing model. For instance, the two-dimensional wave equation is used in Gochis and Chen (2003) or the linear reservoir equation is used in Ledoux et al. (1989).

The coupling method used here can be adapted to any land surface model that computes surface and subsurface runoff on a grid. This coupling technique is automated in a FORTRAN program.

### c. Meteorological forcing

Land surface models need meteorological forcing in order to compute the water and the energy balance at the surface. The Noah LSM requires seven meteorological parameters: precipitation, specific humidity, air temperature, air pressure, wind speed, downward shortwave, and downward longwave radiation. Hourly precipitation is obtained from the Next Generation Weather Radar (NEXRAD) and downscaled from its original resolution (4.763 km) to 900 m using the method developed in Guan et al. (2009). All other meteorological parameters are downloaded from the 3-hourly North American Regional Reanalysis (NARR) and converted from its original resolution (32.463 km) to 900 m using a simple triangle-based linear interpolation. All meteorological data are prepared for four years (1 January 2004–31 December 2007).

## 4. Calibration and results for the Guadalupe and San Antonio River basins

The framework for computation of river flow that is developed in the previous section is used to calculate river flow in all 5175 river reaches of the Guadalupe and San Antonio River basins for four years (1 January 2004–31 December 2007). In this section, flow wave celerities in several subbasins are estimated from measurements, the model parameters used in RAPID are presented, and flows computed are compared to observed flows. Issues related to the time step used in RAPID and to the simulated wave celerities are also presented.

### a. Estimation of wave celerities

*ρ*is a measure of similarity between two wave forms as a function of a lag time

*τ*

_{lag}applied to one of them, as shown in Eq. (15):where

*Q*and

^{a}*Q*are the flows measured at the upstream and downstream station, respectively, and the summation is here made every 15 min for six months. Figure 6 shows the correlation as a function of increasing lag time between three different sets of consecutive gauging stations. The lag time giving the maximum correlation is taken as the travel time

^{b}*τ*

_{travel}for the flow wave between the two stations. The travel times are estimated for 11 sets of two stations and are shown on Table 1. Travel times of 0 s are reported at two stations, where the flow wave is probably too fast to be captured by 15-min measurements. The wave celerity

*c*is then computed using Eq. (16):where

*d*is the distance between two stations. The NHDPlus Flow Table Navigator Tool (http://www.horizon-systems.com/nhdplus/tools.php) is used to estimate the curvilinear distance between two stations along the NHDPlus river network that are shown on Table 1. The wave celerity has been estimated for 11 subbasins within the Guadalupe and San Antonio River basins. Table 2 shows the values that are obtained for the two time periods considered as well as their average. Figure 7 shows the corresponding subbasins as well as the locations of all gauging stations.

Travel time (s) for the flow waves estimated using the lagged cross correlation in the Guadalupe and San Antonio River basins, both from IDA measurements and from RAPID model runs, and distance (km) between gauging stations.

Wave celerities (m s^{−1}) estimated using the lagged cross correlation in the Guadalupe and San Antonio River basins, both from IDA measurements and from RAPID model runs.

### b. Parameters used in RAPID

RAPID needs two vectors of parameters **k** and **x** that can either be determined using physically based equations, through optimization, or a combination of both. In this study, daily streamflow data are obtained from the USGS National Water Information System (http://waterdata.usgs.gov/nwis) in order to use the built-in parameter estimation. Within the Guadalupe and San Antonio River basins, NWIS has 74 gauges that measure flow, 36 of them having full records of daily measurements for the four years studied (1 January 2004–31 December 2007). These 36 stations are used for parameter estimation.

Four sets of model parameters—denoted by the superscripts *α*, *β*, *γ*, and *δ*—are used in this study. These sets of parameters are all based on Eq. (14), which is used with a uniform wave celerity of *c*^{0} = 1 km h^{−1} = 0.28 m s^{−1} throughout the basin or with the celerities *c _{j}* determined based on the IDA lagged cross-correlation study.

**k**

*,*

^{α}**x**

*) is obtained from parameter estimation shown in Eq. (11) using the uniform wave celerity*

^{α}*c*

^{0}= 0.28 m s

^{−1}, and the resulting values of the two multiplying factors

*λ*and

_{k}*λ*of Eq. (14) areThe parameters (

_{x}**k**

*,*

^{β}**x**

*) are determined without optimization using the celerities*

^{β}*c*determined based on the IDA lagged cross-correlation study and set toThe third set of parameters (

_{j}**k**

*,*

^{γ}**x**

*) is obtained through optimization using the celerities*

^{γ}*c*determined based on the IDA lagged cross-correlation study, and the resulting values areThe optimization converges to a value of

_{j}**k**that is 38% smaller than that estimated with the IDA lagged cross-correlation, suggesting that a faster flow wave in the river network produces better flow calculations. In the present study, routing on the land surface from the catchment to its corresponding reach is not modeled. Therefore, one would expect that the optimized flow celerity in the river network would be slower than that estimated from river flow observations, which is not the case here. This suggests that runoff is either produced too slowly or too far upstream of each gauge, perhaps because runoff in land surface models is often calibrated based on a lumped value at the downstream gauge of a basin, as was done here with Noah-MP. Further details on the quality of runoff simulations are given in section 4d.

The fourth set of parameters (**k*** ^{δ}*,

**x**

*) is determined for a better match of celerity calculations, as explained later in this paper.*

^{δ}### c. Time step of RAPID simulation

*x*∈ [0, 0.5] and that the wave celerity computed by the Muskingum method approaches the theoretical wave celerity of the kinematic wave equation if the time step of the river routing equals the travel time of the wave (for

*x*= 0.5), as shown in Eq. (20):However, both the celerity of flow and the length of river reaches vary along the network, and the model formulation of RAPID allows for only one unique value of the time step Δ

*t*be chosen. In the Guadalupe and San Antonio River basins, the mean length is 3 km and the median length is 2.4 km. The probability density function and the cumulative distribution functions for the lengths of river reaches are shown in Fig. 8. The celerities estimated earlier are on the order of

*c*= 2.5 m s

^{−1}. Using the median value of the reach length along with

*c*= 2.5 m s

^{−1}, Eq. (20) gives Δ

*t*= 960 s. To have an integer conversion between the river routing time step and the land surface model time step (3 h), a value of Δ

*t*= 900 s = 15 min is chosen.

### d. Analysis of the quality of river flow computation

For various model simulations, the average and the root-mean-square error (RMSE) of computed flow rate are calculated using daily data and are given in Table 3. The Nash efficiency (Nash and Sutcliffe 1970) is bounded by the interval ]−∞, 1] and gives an estimate of the quality of modeled river flow computations when compared to observations, and is also given in Table 3. An efficiency of 1 corresponds to a perfect model and 0 corresponds to a model producing the mean of observations. The results shown for a lumped model correspond to when runoff from Noah is accumulated at the gauge directly without any routing. The average values of flow in RAPID simulations are tied to the amount of runoff water calculated by the Noah LSM and the bias generated by the land surface model cannot be fixed by RAPID. However, the internal connectivity of the NHDPlus river network is well translated in RAPID and mass is conserved within RAPID since the flow rates in the lumped simulation and in all four simulations of RAPID are the same. Figure 9 shows the ratio between observed and lumped streamflow at 17 gauges located across the Guadalupe and San Antonio River basins. This ratio is around unity downstream of the Guadalupe and San Antonio Rivers but is greater than seven upstream, suggesting that runoff is most likely overestimated at the center of the basin. Additionally, runoff is largely underestimated at two stations just downstream of the outcrop area of the Edwards Aquifer: the Comal River at New Braunfels and the San Marcos River at San Marcos. These stations measure large average streamflow (respectively 10.59 and 5.9 m^{3} s^{−1}) although draining a relatively small area (respectively 336 and 129 km^{2}), and are actually two of the largest springs in Texas. These flows are much larger than the lumped runoff (respectively 0.67 and 0.26 m^{3} s^{−1}), which is expected because the modeling framework presented herein does not does not explicitly simulate aquifers.

Comparison of observed and simulated flows at 15 locations within the Guadalupe and San Antonio River basins.

However, the RAPID simulations (**k*** ^{α}*,

**x**

*), (*

^{α}**k**

*,*

^{β}**x**

*), and (*

^{β}**k**

*,*

^{γ}**x**

*) lead to a smaller RMSE and a higher Nash efficiency than the lumped runoff. This shows that an explicit river routing scheme with carefully chosen parameters allows obtaining better streamflow calculations than a simple lumped runoff scheme, as expected.*

^{γ}Within the different RAPID simulations, the set of parameters (**k*** ^{δ}*,

**x**

*) gives the best results for RMSE and Nash efficiency, followed by (*

^{δ}**k**

*,*

^{β}**x**

*), (*

^{β}**k**

*,*

^{α}**x**

*), and (*

^{α}**k**

*,*

^{γ}**x**

*). Therefore, a greater spatial variability in the values of*

^{γ}*k*contributes to the quality of model results, and the built-in optimization in RAPID further enhances these model results. An example hydrograph for the Guadalupe River near Victoria, Texas is shown in Fig. 10, and is computed using (

**k**

*,*

^{γ}**x**

*).*

^{γ}### e. Comparison between estimated and computed wave celerities

**k**

*,*

^{β}**x**

*), the model-simulated celerities are different than the observed celerities that are used to determine the vector*

^{β}**k**

*itself. This was predicted by Cunge (1969), who showed that the difference between the celerity of the kinematic wave equation and that computed using the Muskingum method is a function of both*

^{β}*x*and the quotient Δ

*t/L*. Only the specific values

_{j}*x*= 0.5 and Δ

*t*verifying Δ

*t/L*=

_{j}*c*allow obtaining the same celerity. Furthermore, the work herein is done in a river network, and the celerity estimated between two points does not correspond only to the main river stem but rather to a combination of all river reaches present in the network between the two points. The ratio of the average celerities from RAPID using (

_{j}**k**

*,*

^{β}**x**

*) over the average observed celerities is 1.54. As a final experiment, a new set of parameters (*

^{β}**k**

*,*

^{δ}**x**

*) is created to account for the faster waves in RAPID:Table 2 shows that the parameters (*

^{δ}**k**

*,*

^{δ}**x**

*) allow for wave celerities that are closer to the observed ones than the celerities obtained with the other sets of parameters. The average flow wave celerity over the 11 calculations in RAPID is within 3% of that estimated with IDA flows. Unfortunately, these closer wave celerities also lead to a decrease in the quality of RMSE and Nash efficiency. Therefore, model celerities closer to celerities estimated from observations can be obtained but generally deteriorate other statistics of calculations. Again, this might be due to runoff being produced too slowly or too far upstream of each gauge.*

^{δ}### f. Potential improvement of spatial variability in RAPID parameters

In the work presented here, the parameter *x* is spatially and temporally constant over the modeling domain and the parameter *k* is temporally constant but varies at the river reach level based on the length of each reach and on the celerity of the flow wave going through it. Flow wave celerities are estimated for 11 subbasins based on flow observations, and the spatial variability of *k* presented in this study is therefore partly limited by the size of the subbasins used for flow wave estimation. However, such an approach for computation of RAPID parameters allows taking into account wave celerities that are estimated based on observations made at high temporal resolution as well as verifying the modeling framework through reproduction of estimated wave celerities. In a separate study applying RAPID to all rivers of metropolitan France, David et al. (2011) present a physically based formulation of *k* and a subbasin optimization for both *k* and *x*, therefore allowing further spatial variability of parameters. David et al. (2011) show that using a combination of reach length, river bed slope, and basin residence time for the parameter *k* and applying the optimization procedure to subbasins both improve the efficiency and the RMSE of RAPID flow computations. Such work could be adapted to the study herein based on information provided in the NHDPlus dataset (e.g., reach length, mean annual flow velocity, and river bed slope), which would be advantageous when applying RAPID to domains larger than the Guadalupe and San Antonio River basins where estimation of wave celerities everywhere may require excessive amounts of computations.

### g. Statistical significance

Changes in the routing procedure (i.e., no routing or routing using various RAPID parameters) lead to various changes in the values of efficiency and RMSE, as shown in section 4b. The statistical significance of the changes can be assessed in order to determine whether or not various routing experiments are effective. For two different routing procedures used, the efficiency (RMSE) at one gauge can be compared to the efficiency (RMSE) at the same gauge, although variability of efficiency (RMSE) between independent gauges can be large. Therefore, there is a logical pairing of efficiency and RMSE calculated at a given gauge between two experiments and, hence, matched pair tests are appropriate to assess the statistical significance. Several common options are available for matched pair tests (with increasing level of complexity): the sign test, the Wilcoxon signed-rank test (Wilcoxon 1945), and the paired *t* test. The sign test has no assumption on the shape of probability distributions of samples used but is quite simple since only the sign of differences between two paired samples is accounted for. The Wilcoxon signed-rank test incorporates the magnitude of differences between paired samples under the assumption that differences between pairs are symmetrically distributed. The paired *t* test may be used when the differences between pairs are known to be normally distributed. The assumption of the Wilcoxon signed-rank test (symmetry) is not as restrictive as that of the paired *t* test (normality). In cases where small sample sizes are used—as done in this study—testing for symmetry or normality may not be meaningful. Additionally, violations of the symmetry assumption in the Wilcoxon signed-rank test have minimal influence on the corresponding *p* values (Helsel and Hirsch 2002). These two reasons motivate the use of the Wilcoxon signed-rank test in the study herein. The null hypothesis *H*_{0} for this test is that the median of differences between two populations is zero. The purpose of changes in the routing procedure being to improve results by increasing the efficiency and decreasing the RMSE, alternate hypotheses can assume that one population tends to be generally either larger (*H*_{1}) or smaller (*H*_{2}) than the other. Therefore, *p* values corresponding to one-sided tests are used in this study. Low significance levels mean that *H*_{0} is unlikely, hence that a significant change is observed. The Wilcoxon signed-rank test sorts pairs with nonzero difference based on the absolute value of the differences and sums all positive (negative) ranks in a variable named *W*^{+} (*W*^{−}). The corresponding *p* values vary with the number of nonzero differences and with the value of *W*^{+} and *W*^{−}. FORTRAN programs were created to compute the exact value of the test statistic (not using a large-sample approximation) as well as the corresponding *p* values. Table 4 shows the results of the Wilcoxon signed-rank test for both efficiency and RMSE and for several paired experiments using two different routing procedures. The same 15 stations named on Fig. 7 and used in Table 3 serve here for statistical significance assessment, and the corresponding 15 values of efficiency and of RMSE are utilized as sample values.

Results of the Wilcoxon signed-rank test applied to 15 stations for efficiency and RMSE and to various routing procedures.

Several conclusions can be drawn from Table 4. First, the Wilcoxon signed-rank tests comparing results obtained by RAPID with parameters *α*, *β*, and *γ* to a lumped runoff approach show that the null hypothesis can be rejected for a one-sided test at a 10% level of significance in all cases, except for the efficiency between RAPID with *β* parameters and a lumped approach at a 13% level of significance. All these tests validate that the improvements mentioned in section 4b (increased efficiency and decreased RMSE) are statistically significant and confirm that an explicit river routing scheme allows for obtaining better streamflow calculations than a simple lumped runoff scheme, as expected. Second, comparisons between RAPID using *α* and *γ* parameters show that subbasin variability in wave celerities is advantageous to a spatially uniform wave celerity approach at a 19% level of significance for efficiency and at a 7% level for RMSE. Third, comparisons between RAPID using *γ* and *δ* parameters confirms that wave celerities close to those determined from observations deteriorate results at a 3% level of significance for both efficiency and RMSE. Finally, one cannot conclude on the statistical significance of the comparison between RAPID using *β* and *γ* parameters concerning the improvement of optimization procedure. However, since RAPID using *γ* parameters produce better average values than RAPID using *β* parameters, and since the statistical significance of RAPID using *γ* parameters compared to a lumped approach is better than that of RAPID using *β* parameters compared to lumped approach, the optimization can still be considered advantageous.

## 5. Synthetic study of the upper Mississippi River basin, speedup of parallel computations

Through the use of mathematical and optimization libraries that run in a parallel computing environment, RAPID can be applied on several processing cores. The work presented above focuses on the Guadalupe and San Antonio River basins together forming a river network with 5175 river and water body reaches, the size of which does not justify the use of parallel computing. However, all the tools and datasets used are available for the contiguous United States, where the NHDPlus dataset has about 3 million reaches. Adapting the proposed framework to simultaneously compute flow and volume of water in all mapped water bodies of the contiguous United States would require solving matrix equations of the size of 3 million. For such a large scientific problem, parallel computing can be helpful if speedup can be achieved—that is, if increasing the number of processing cores decreases the total computing time.

### a. Synthetic study used for assessment of parallel performance

As a proof of concept, the evaluation of the parallel computing capabilities of RAPID is presented here using the upper Mississippi River basin (shown on Fig. 3), which has 182 240 river and water body reaches available as region 7 in the NHDPlus dataset. The number of computational elements for the upper Mississippi River basin is about 35 times larger than the combination of the Guadalupe and San Antonio River basins, and about 16 times smaller than the entire contiguous United States. The river network of the upper Mississippi River basin is fully interconnected, all water eventually flowing to a unique outlet.

**Q**

^{e}in Eq. (1) are synthetically generated and set to 1 m

^{3}every 3 h for all reaches and all time steps, and the vectors of parameters

**k**and

**x**are temporally and spatially uniform as shown in Eq. (22):

### b. Basics of solving a linear system on computers

Numerically solving a linear system is typically an iterative process mainly involving two steps at each iteration: preconditioning followed by applying a linear solver. Preconditioning is a procedure that transforms a given linear system through matrix multiplication into one that is more easily solved by linear solvers, hence decreasing the total number of iterations to find the solution and saving time. If the linear system is triangular, preconditioning is sufficient to solve the problem, and a linear solver is not needed. In a parallel computing environment, a matrix is separated into diagonal and off-diagonal blocks, each processing core being assigned one diagonal block and its adjacent off-diagonal block. Solving a linear system in parallel is made using blocks, and parallel preconditioning is determined based on elements in the diagonal blocks. Preconditioning is sufficient to solve a given parallel linear system if the system is diagonal by blocks (i.e., all off-diagonal blocks are empty) and if each diagonal block is triangular; in most other cases, iterations of preconditioning and applying a linear solver are needed.

### c. Parallel speedup of the synthetic study

For comparison purposes, the traditional Muskingum method was also implemented in RAPID in order to assess the performance of the matrix-based Muskingum method developed herein. Figure 11 shows a comparison of computing time between the traditional Muskingum method shown in Eq. (4) applied consecutively from upstream to downstream and the matrix-based Muskingum method used in RAPID. Only one processor is used for all results in Fig. 11 but the computation method differs. The matrix *i* and *j* are connected but belong to different subbasins]. This limitation would not apply if one was to compute the Mississippi River basin on one (or on one set of) processing core(s) and the Colorado River basin on another (or on another set of) processing core(s), for example. Therefore, when solving Eq. (1) on several processing cores for the upper Mississippi River basin, preconditioning is not sufficient and iterative methods need be used. An iterative method implies several computations including preconditioning, matrix-vector multiplication, and calculation of residual norm at each iteration.

On one processing core, solving the matrix-based Muskingum method with preconditioning only is about twice as long as solving the traditional Muskingum method, as shown in Fig. 11. This extra time can be explained because the computation of the right-hand side of Eq. (1) is approximately as expensive as solving the traditional Muskingum method and approximately as expensive as preconditioning. However, the computation of the right-hand side is done only once per time step, regardless of the number of iterations if using an iterative linear solver, and scales very well because all operations require no communication except for the product

Computing loads are balanced for all simulations in this study; that is, the number of river reaches assigned to each processing core is almost identical across cores. Figure 13 shows how subbasins of the upper Mississippi River basin are divided among processing cores as well as the longest river path of the basin. The longest path goes through 8 subbasins on 8 cores and 13 subbasins on 16 cores. If one were to apply the traditional Muskingum method on several processing cores with the division in subbasins shown in Fig. 13, computations would have to be made sequentially from upstream to downstream, each core having to wait for its upstream core to be done prior to starting its work. Hence, assuming that the total computing time can be evenly divided by the total number of nodes and neglecting communication overhead, one could only hope to decrease computing time by a factor of

River flow is a causal phenomenon that mainly goes downstream. Therefore, when using an upstream-to-downstream computation scheme and unless dealing with completely separated river basins, one cannot expect to obtain perfect speedup (i.e., decreasing of computing time by a factor equal to the number of cores). However, today’s supercomputers having tens of thousands of computing cores, one could leverage such power to save human time. Additionally, the matrix method developed here can be directly applied to a combination of independent river basins, in which case speedup would be ideally perfect. Furthermore, matrix methods such as the one developed here could be adapted to more complex river flow equations—like variable-parameter Muskingum methods or schemes allowing for backwater effects—in order to save total computing time. Finally, the splitting up into subbasins used here is very simple, and optimizing this partition by limiting connections between subbasins or taking into account flow wave celerities relatively to basin sizes could help limit the number of communications and the number of iterations, respectively, in the linear system solver.

## 6. Conclusions

NHDPlus is a GIS dataset that describes the networks of mapped rivers and water bodies of the United States. One of the main advantages of NHDPlus is that connectivity information for the river networks is available. Therefore, this dataset offers possibilities for the development of river routing models that simultaneously calculate flow and volume of water in all water bodies of the nation. Furthermore, the USGS National Water Information System has thousands of gauges located on the NHDPlus network that can be used to assess the quality of such river models across river basins (not only at basin outlets). The research presented in this paper investigates how to develop a river network model using NHDPlus networks and how to assess model computations and optimize model parameters with USGS streamflow measurements. All tools and datasets used herein are available for the contiguous United States, but this research addresses two smaller domains. The combination of the Guadalupe and San Antonio River basins in Texas is used in a 4-year case study, and the upper Mississippi River basin is used in a speedup study with synthetic data. Graph theory is applied to a river network to create a network matrix that is used to develop a vector-matrix version of the Muskingum method and applied in a new river network model called RAPID. It has been shown that a GIS-based hydrographic dataset can be used as the river network for a river model to compute flow in large networks of thousands of reaches, including ungauged locations. A simple flux coupler for connecting a land surface model with an NHDPlus river network is presented. No horizontal routing of flow from the land surface to the river network is used in this study; such an addition would help improve model calculations. An inverse method is developed to estimate model parameters in RAPID using available gauge measurements located across the river basins. Wave celerities are estimated in several locations of the basin studied. RMSE and Nash efficiency of computed flow rates in four RAPID simulations are compared with a basic lumped model where runoff is directly accumulated at the gauge, with gauge measurements, and among themselves. RAPID produces better RMSE and Nash efficiency than the lumped model, and the improvements are statistically significant. Although the quality of RAPID calculations is tied to the quantity of runoff generated by the land surface model that provides runoff, mass is conserved within RAPID since the average flow rate is conserved. Spatial variability of parameters enhances the RMSE and Nash efficiency of RAPID calculations. Wave celerities are reproduced within a few percents of the model proposed, although wave celerities closer to those estimated from gauge data generally deteriorate the other statistics of calculations. This deterioration might be due to runoff being produced too slowly or too far upstream of each gauge. The parameters used in this study are simple, but could be improved based on information available in NHDPlus such as slope, mean flow, and velocity of all reaches, or by using modified versions of the Muskingum method with time-variable parameters, although the latter would necessitate modification of the optimization procedure developed herein. The matrix formulation in RAPID can be transferred in a parallel computing environment. A synthetic study of the upper Mississippi River basin shows that although a large initial overhead is added by the matrix method, this overhead decreases with increasing number of processing cores. More importantly, an iterative matrix solver allows decoupling of subbasins—even if the main river basin is fully interconnected—hence permitting computation of subbasins separately if they are distant enough from each other. As consequence, while producing the exact same results as the traditional Muskingum method, the matrix-based Muskingum method decreases the total computing time when run on several processing cores. Such a gain in computing time would be highly beneficial if addressing larger scales, like the entire contiguous United States, which would represent a square matrix of the size of 3 million.

## Acknowledgments

This work was partially supported by the U.S. National Aeronautics and Space Administration under the Interdisciplinary Science Project NNX07AL79G; by the U.S. National Science Foundation under project EAR-0413265: CUAHSI Hydrologic Information Systems; by Ecole des Mines de Paris, France; and by the American Geophysical Union under a Horton (Hydrology) Research Grant. The authors wish to thank the PETSc and TAO developers, especially Dr. Barry Smith, Dr. Matthew Knepley, Dr. Satish Balay and Dr. Jason Sarich for their continuous assistance throughout the development of RAPID. Thank you to Dr. Karl Shultz from TACC for his help regarding the handling of inputs and outputs on supercomputers. The computing resources were provided by TACC, which is gratefully acknowledged. Thank you to Dr. Stefano Orlandini for suggestions on an early version of this work. The authors are thankful to the two anonymous reviewers and to the editor for their valuable comments and suggestions that helped improved the original version of this manuscript.

## APPENDIX A

### Implementation of RAPID

The river network routing model is coded in FORTRAN 90 using the Portable, Extensible Toolkit for Scientific Computation (PETSc) mathematical library (Balay et al. 1997, 2008, 2009) and the Toolkit for Advanced Optimization (TAO) optimization library (McInnes et al. 2009). PETSc can be used to create matrices and vectors and to apply a variety of linear operations such as matrix-vector multiplications or linear system solving. TAO offers multiple methods for unconstrained and constrained optimization. Both PETSc and TAO are built upon the Message Passing Interface (see special issue of *International Journal of High Performance Computing Applications,* 1994, Vol. 8, No. 3–4)—a standard for communications between processing cores—and can seamlessly be run in a sequential or a parallel computing environment. In this study, sparse matrices are stored using the sequential AIJ format when using one processing core and the MPIAIJ format when using several cores. Linear systems are solved within PETSc either by preconditioning only or with preconditioning associated with a Richardson method. The preconditioning methods used herein are incomplete lower upper (ILU) on one processing core, and block Jacobi on several cores. The optimization method used in TAO is a line search algorithm called the Nelder–Mead method. The Network Common Data Form (NetCDF) file format (Rew and Davis 1990) is utilized for both inputs and outputs. RAPID is run on single- and multiple-processor workstations as well as on Lonestar, a supercomputer running at the Texas Advanced Computing Center (TACC). This Dell Linux Cluster has 1460 nodes, each node with 8 GB of memory and with two dual-core sockets. Lonestar has a total of 5840 computing cores.

## APPENDIX B

### NHDPlus Used in RAPID

NHDPlus (Horizon Systems Corporation 2007) is a geographic information system (GIS) dataset for the hydrography of the United States. This dataset provides the mapped streams and rivers as well as the catchments that surround them. NHDPlus is based on the medium-resolution 1:100 000-scale national hydrographic dataset (NHD). One of the main improvements in NHDPlus is the network connectivity available in the value-added attributes (VAA) table for the river network. Each NHDPlus reach in the national network is assigned a unique integer identifier called COMID. NHDPlus catchments also have a COMID, the same COMID being used for the reach and its local contributing catchment. Nodes are located at the two ends of each NHDPlus river reach. A unique integer identifier is given to all nodes in the national river reach network. The VAA table includes FromNode and ToNode fields that identify which node is upstream and which is downstream of a given reach. Two reaches that are connected in a river network share a node, and the reach *j* flows into the reach *i* if ToNode(*j*) = FromNode(*i*). The NHDPlus connectivity between reaches, catchments, and nodes is illustrated for three catchments of the Guadalupe and San Antonio River basins in Fig. B1.

*N*

_{i}_{,j}is the element of

*i*and column

*j*. Therefore, upstream-to-downstream connection is conserved if the downstream reach is the major branch of a divergence or if it is not part of a divergence at all, but the connection is not made for a minor branch of a divergence.

The VAA table also has information on the relative location—upstream or downstream—of NHDPlus reaches. This information is available in a field called Hydroseq, consisting of a unique integer attributed to all NHDPlus reaches. Sorting the Hydroseq field in decreasing order prior to computations guarantees that all upstream elements are computed prior to solving the flow equations for any given river reach. This organization of computations allows the matrix

## REFERENCES

Apostolopoulos, T. K., , and Georgakakos K. P. , 1997: Parallel computation for streamflow prediction with distributed hydrologic models.

,*J. Hydrol.***197**, 1–24.Balay, S., , Gropp W. D. , , McInnes L. C. , , and Smith B. F. , 1997: Efficient management of parallelism in object oriented numerical software libraries.

*Modern Software Tools in Scientific Computing,*E. Arge, A. M. Bruaset, and H. P. Langtangen, Eds., Birkhäuser, 163–202.Balay, S., and Coauthors, 2008: PETSc users manual: Revision 3.0.0. Argonne National Laboratory Rep. ANL-95/11, 191 pp.

Balay, S., , Buschelman K. , , Gropp W. D. , , Kaushik D. , , Knepley M. G. , , McInnes L. C. , , Smith B. F. , , and Zhang H. , cited 2009: PETSc: Portable, extensible toolkit for scientific computation. [Available online at http://www.mcs.anl.gov/petsc.]

Berge, C., 1958: Matrice associée d’un graphe.

*Théorie des Graphes et ses Applications,*H. Hierche, Ed., Collection Universitaire de Mathématiques, Vol. 2, Dunod, 126–128.Cunge, J. A., 1969: On the subject of a flood propagation computation method (Muskingum method).

,*J. Hydraul. Res.***7**, 205–230.David, C. H., , Gochis D. J. , , Maidment D. R. , , Yu W. , , Yates D. N. , , and Yang Z.-L. , 2009: Using NHDPlus as the land base for the Noah-distributed model.

,*Trans. GIS***13**, 363–377.David, C. H., , Habets F. , , Maidment D. R. , , and Yang Z.-L. , 2011: RAPID applied to the SIM-France model.

*Hydrol. Processes,*doi:10.1002/hyp.8070, in press.De Roo, A., , Gouweleeuw B. , , and Thielen J. , 2003: Development of a European flood forecasting system.

,*Int. J. River Basin Manage.***1**, 49–59.Fread, D. L., 1993: Flow routing.

*Handbook of Hydrology,*D. R. Maidment, Ed., McGraw-Hill, 10.17–10.18.Gochis, D. J., , and Chen F. , 2003: Hydrological enhancements to the community Noah land surface model. NCAR Tech. Note NCAR/TN-454+STR, 77 pp.

Guan, H., , Wilson J. L. , , and Xie H. , 2009: A cluster-optimizing regression-based approach for precipitation spatial downscaling in mountainous terrain.

,*J. Hydrol.***375**, 578–588.Habets, F., , Etchevers P. , , Golaz C. , , Ledoux E. , , Martin E. , , Noilhan J. , , and Ottle C. , 1999a: Simulation of the water budget and the river flows of the Rhone basin.

,*J. Geophys. Res.***104D**, 31 145–31 172.Habets, F., , Noilhan J. , , Golaz C. , , Goutorbe J. P. , , Lacarrere P. , , Martin E. , , Ottle C. , , and Vidal-Madjar D. , 1999b: The ISBA surface scheme in a macroscale hydrological model applied to the Hapex-Mobilhy area: Part I: Model and database.

,*J. Hydrol.***217**, 75–96.Habets, F., and Coauthors, 1999c: The ISBA surface scheme in a macroscale hydrological model applied to the Hapex-Mobilhy area: Part II: Simulation of streamflows and annual water budget.

,*J. Hydrol.***217**, 97–118.Habets, F., and Coauthors, 2008: The SAFRAN-ISBA-MODCOU hydrometeorological model applied over France.

,*J. Geophys. Res.***113D**, D06113, doi:10.1029/2007JD008548.Helsel, D. R., , and Hirsch R. M. , 2002: Statistical methods in water resources.

*Hydrologic Analysis and Interpretation,*Vol. 4,*Techniques of Water Resources Investigations of the United States Geological Survey,*United States Geological Survey, 137–156.Horizon Systems Corporation, cited 2007: National Hydrography Dataset Plus: Documentation. [Available online at http://www.horizon-systems.com/nhdplus/documentation.php.]

Jones, J. E., , and Woodward C. S. , 2001: Newton–Krylov-multigrid solvers for large-scale, highly heterogeneous, variably saturated flow problems.

,*Adv. Water Resour.***24**, 763–774.Kollet, S. J., , and Maxwell R. M. , 2006: Integrated surface–groundwater flow modeling: A free-surface overland flow boundary condition in a parallel groundwater flow model.

,*Adv. Water Resour.***29**, 945–958.Kollet, S. J., , Maxwell R. M. , , Woodward C. S. , , Smith S. , , Vanderborght J. , , Vereecken H. , , and Simmer C. , 2010: Proof of concept of regional scale hydrologic simulations at hydrologic resolution utilizing massively parallel computer resources.

,*Water Resour. Res.***46**, W04201, doi:10.1029/2009WR008730.Koussis, A. D., 1978: Theoretical estimations of flood routing parameters.

,*J. Hydraul. Div., Amer. Soc. Civ. Eng.***104**, 109–115.Larson, J. W., , Craig A. P. , , Drake J. B. , , Erickson D. J. I. , , Branstetter M. , , and Ham M. W. , 2007: A massively parallel dynamical core for continental- to global-scale river transport.

*Proc. Int. Congress on Modelling and Simulation (ModSim 2007),*Christchurch, New Zealand, Modelling and Simulation Society of Australia and New Zealand, 532–538.Ledoux, E., , Girard G. , , de Marsily G. , , Villeneuve J. P. , , and Deschenes J. , 1989: Spatially distributed modeling: Conceptual approach, coupling surface water and groundwater.

*Unsaturated Flow in Hydrologic Modeling Theory and Practice,*H. J. Morel-Seytoux, Ed., Kluwer Academic Publishers, 435–454.Lehner, B., , Verdin K. , , and Jarvis A. , cited 2006: HydroSHEDS. [Available online at http://hydrosheds.cr.usgs.gov.]

Leopold, C., , Süß M. , , and Breitbart J. , 2006: Programming for malleability with hybrid MPI-2 and OpenMP: Experiences with a simulation program for global water prognosis.

*Proc. 20th European Conf. on Modelling and Simulation,*Bonn, Germany, European Council for Modelling and Simulation, 665–670.Lohmann, D., , Raschke E. , , Nijssen B. , , and Lettenmaier D. P. , 1998a: Regional scale hydrology: I. Formulation of the VIC-2L model coupled to a routing model.

,*Hydrol. Sci. J.***43**, 131–141.Lohmann, D., , Raschke E. , , Nijssen B. , , and Lettenmaier D. P. , 1998b: Regional scale hydrology: II. Application of the VIC-2L model to the Weser River, Germany.

,*Hydrol. Sci. J.***43**, 143–158.Lohmann, D., and Coauthors, 2004: Streamflow and water balance intercomparisons of four land surface models in the North American Land Data Assimilation System project.

,*J. Geophys. Res.***109**, D07S91, doi:10.1029/2003JD003517.Maurer, E. P., , O’Donnell G. M. , , Lettenmaier D. P. , , and Roads J. O. , 2001: Evaluation of the land surface water budget in NCEP/NCAR and NCEP/DOE reanalyses using an off-line hydrologic model.

,*J. Geophys. Res.***106D**, 17 841–17 862.McCarthy, G. T., 1938: The unit hydrograph and flood routing.

*Proc. Conf. of the North Atlantic Division,*New London, CT, U.S. Engineer Department, 1–19.McInnes, L. C., , Moré J. , , Munson T. , , and Sarich J. , 2009: TAO user manual, revision 1.10. Mathematics and Computer Science Division, Argonne National Laboratory, 64 pp.

Miller, W. A., , and Cunge J. A. , 1975: Simplified equations of unsteady flow.

*Unsteady Flow in Open Channels,*K. Mahmood and V. Yevjevich, Eds., Water Resources Publications, 216–232.Nash, J. E., , and Sutcliffe J. V. , 1970: River flow forecasting through conceptual models part I—A discussion of principles.

,*J. Hydrol.***10**, 282–290.NERC, 1975: Flood routing studies. National Environment Research Council Flood Studies Rep., 33–34.

Niu, G. Y., , Yang Z. L. , , Dickinson R. E. , , Gulden L. E. , , and Su H. , 2007: Development of a simple groundwater model for use in climate models and evaluation with Gravity Recovery and Climate Experiment data.

,*J. Geophys. Res.***112**, D07103, doi:10.1029/2006JD007522.Niu, G. Y., and Coauthors, 2011: The community Noah land surface model with multiparameterization options (Noah-MP): 1. Model description and evaluation with local-scale measurements.

,*J. Geophys. Res.***116**, D12109, doi:10.1029/2010JD015139.Oki, T., , Agata Y. , , Kanae S. , , Saruhashi T. , , Yang D. W. , , and Musiake K. , 2001: Global assessment of current water resources using total runoff integrating pathways.

,*Hydrol. Sci. J.***46**, 983–995.Olivera, F., , Famiglietti J. , , and Asante K. , 2000: Global-scale flow routing using a source-to-sink algorithm.

,*Water Resour. Res.***36**, 2197–2207.Orlandini, S., , and Rosso R. , 1998: Parameterization of stream channel geometry in the distributed modeling of catchment dynamics.

,*Water Resour. Res.***34**, 1971–1985.Orlandini, S., , Moretti G. , , Franchini M. , , Aldighieri B. , , and Testa B. , 2003: Path-based methods for the determination of nondispersive drainage directions in grid-based digital elevation models.

,*Water Resour. Res.***39**, 1144, doi:10.1029/2002WR001639.Ponce, V. M., 1986: Diffusion wave modeling of catchment dynamics.

,*J. Hydraul. Eng.***112**, 716–727.Ponce, V. M., , and Yevjevich V. , 1978: Muskingum-Cunge method with variable parameters.

,*J. Hydraul. Div.***104**, 1663–1667.Rew, R., , and Davis G. , 1990: NetCDF—An interface for scientific data access.

,*IEEE Comput. Graphics Appl.***10**, 76–82.Solomon, S., , Qin D. , , Manning M. , , Marquis M. , , Averyt K. , , Tignor M. M. B. , , and Miller H. L. Jr., , and Chen Z. , Eds., 2007:

*Climate Change 2007: The Physical Science Basis*. Cambridge University Press, 996 pp.Todini, E., 2007: A mass conservative and water storage consistent variable parameter Muskingum-Cunge approach.

,*Hydrol. Earth Syst. Sci.***11**, 1645–1659.Von Bloh, W., , Rost S. , , Gerten D. , , and Lucht W. , 2010: Efficient parallelization of a dynamic global vegetation model with river routing.

,*Environ. Model. Software***25**, 685–690.Wilcoxon, F., 1945: Individual comparisons by ranking methods.

,*Biom. Bull.***1**, 80–83.