The mapped rivers and streams of the contiguous United States are available in a geographic information system (GIS) dataset called National Hydrography Dataset Plus (NHDPlus). This hydrographic dataset has about 3 million river and water body reaches along with information on how they are connected into networks. The U.S. Geological Survey (USGS) National Water Information System (NWIS) provides streamflow observations at about 20 thousand gauges located on the NHDPlus river network. A river network model called Routing Application for Parallel Computation of Discharge (RAPID) is developed for the NHDPlus river network whose lateral inflow to the river network is calculated by a land surface model. A matrix-based version of the Muskingum method is developed herein, which RAPID uses to calculate flow and volume of water in all reaches of a river network with many thousands of reaches, including at ungauged locations. Gauges situated across river basins (not only at basin outlets) are used to automatically optimize the Muskingum parameters and to assess river flow computations, hence allowing the diagnosis of runoff computations provided by land surface models. RAPID is applied to the Guadalupe and San Antonio River basins in Texas, where flow wave celerities are estimated at multiple locations using 15-min data and can be reproduced reasonably with RAPID. This river model can be adapted for parallel computing and although the matrix method initially adds a large overhead, river flow results can be obtained faster than with the traditional Muskingum method when using a few processing cores, as demonstrated in a synthetic study using the upper Mississippi River basin.
Land surface models (LSMs) have been developed by the atmospheric science community to provide atmospheric models with bottom boundary conditions (water and energy balance) and to serve as the land base for hydrologic modeling. Over the past two decades, overland and subsurface runoff calculations done by LSMs have extensively been used to provide water inflow to river routing models that calculate river discharge (De Roo et al. 2003; Habets et al. 1999a–c, 2008; Lohmann et al. 1998a,b, 2004; Maurer et al. 2001; Oki et al. 2001; Olivera et al. 2000). However, river routing within LSMs has traditionally been done using gridded river networks that best fit the computational domain used in LSMs. Today, geographic information system (GIS) hydrographic datasets are increasingly becoming available at continental [e.g., the National Hydrography Dataset Plus (NHDPlus; Horizon Systems Corporation 2007)] and global scales [e.g., Hydrological Data and Maps Based on Shuttle Elevation Derivatives at Multiple Scales (HydroSHEDS; Lehner et al. 2006)]. These datasets provide a vector-based representation of the river network using the “blue line” mapped rivers and streams. Furthermore, observations of the river systems are now widely available in databases such as the U.S. Geological Survey (USGS) National Water Information System (NWIS) for the United States, in which thousands of gauges are available along with their exact location on the NHDPlus river network. Most studies mentioned above—with the exception of Habets et al. (2008)—use a limited number of gauges throughout large river basins, often focusing on gauges located at river mouths. As the spatial and temporal resolutions of weather and climate models and their underlying land surface models increase, using gauges located across basins would help in diagnosing the quality of LSM computations. The latest work on general circulation models by the international scientific community, especially by the Intergovernmental Panel on Climate Change (Solomon et al. 2007), opens potential studies of the evolution of water resources with global change. Using mapped streams and water bodies in LSMs could benefit the resulting assessment of the impact of global change in water resources by providing estimation of changes at the blue-line level. Furthermore, the use of parallel computing is quite common in regional- to global-scale atmospheric and ocean modeling but comparatively infrequent in modeling of large river networks. Generally, parallel computing can be utilized to either solve problems of increasing size [as done with the Parallel Flow simulator (ParFlow); Jones and Woodward 2001; Kollet and Maxwell 2006; Kollet et al. 2010] or to decrease computation time (see, e.g., Apostolopoulos and Georgakakos 1997; Larson et al. 2007; Leopold et al. 2006; von Bloh et al. 2010). These two types of approaches to parallel computing are respectively referred to as scalability and speedup of calculations; the work presented herein focuses on the latter. Apostolopoulos and Georgakakos (1997) investigated the speedup of streamflow computations using hydrologic models in river networks as a function of network decomposition and of the computing time ratio between vertical and horizontal water balance calculations. Simple river routing within LSMs being traditionally performed by carrying computations from upstream to downstream, one way to speed up river flow modeling is to use a sequential river routing code to compute independent basins on different processing cores, as done in Leopold et al. (2006) and in Larson et al. (2007). Such methods allow avoiding interprocessor communication but result in imbalanced computing loads when some basins are much larger than others. Leopold et al. (2006) partly addressed load imbalance by using parallel computing for surface water balance, but the river routing part remains sequential. Von Bloh et al. (2010) implemented a routing method in which computations do not have to be carried in order from upstream to downstream, therefore obtaining almost perfect speedup. The work developed herein investigates a way to obtain speedup while retaining traditional upstream-to-downstream computations that are used in most river routing schemes.
The present study links a land surface model with a new river network model called Routing Application for Parallel Computation of Discharge (RAPID) using NHDPlus for the representation of the river network and USGS NWIS gauges for the optimization of model parameters and the assessment of river flow computations. All models and datasets used herein are available at least for the contiguous United States. The work presented here focuses first on the Guadalupe and San Antonio River basins in Texas (see Fig. 1), together covering a surface area of about 26 000 km2. These basins have about 5000 river reaches and their corresponding catchments in the NHDPlus dataset (see Fig. 2) out of 3 million for the United States. These two basins are also chosen for study because of significant contributions to surface water flow from groundwater sources, because of a large reservoir at Canyon Lake where the impacts of constructed infrastructure on flow dynamics have to be considered, and because these rivers flow out into an estuarine system at San Antonio Bay. A synthetic study of the performance of RAPID in a parallel computing environment is also presented using the upper Mississippi River basin (see Fig. 3), which has about 180 000 river reaches in NHDPlus and covers an area of about 490 000 km2.
The research presented in this paper aims at answering the following questions: how can a river model be developed for calculation of flow and volume of water in a river network of thousands of blue-line river reaches? How can the connectivity information in NHDPlus be used to run a river network model in part of the United States? How can flow at ungauged locations be reconstructed? How can model computations be assessed and optimized based on all available measurements? How can parallel computing be used to speed up upstream-to-downstream computations of river flow within a large river network?
First, the development of the RAPID model is presented. Then, the modeling framework for calculation of river flow in the Guadalupe and San Antonio River basins using runoff data from a land surface model is developed, followed by results. Finally, the speedup of RAPID in a parallel computing environment is assessed.
2. Model development
The model presented here is named RAPID (http://www.geo.utexas.edu/scientist/david/rapid.htm). RAPID is based on the traditional Muskingum method that was first introduced by McCarthy (1938) and has been extensively studied in the literature in the past 70 years. The Muskingum method has two parameters, k and x, respectively a time and a dimensionless parameter. Among the most noteworthy papers related to the Muskingum method, Cunge (1969) showed the Muskingum method is a first-order approximation of the kinematic and diffusive wave equation and proposed a method known as the Muskingum–Cunge method—a second-order approximation of the kinematic and diffusive wave equation—in which the Muskingum parameters are computed based on mean physical characteristics of the river channel and of the flow wave. Koussis (1978) proposed a variable-parameter Muskingum method based on the Muskingum–Cunge method where k varies with the flow but x remains constant on the grounds that the Muskingum method is relatively insensitive to this parameter. Other variable-parameter Muskingum methods allow both k and x to vary (see, e.g., Miller and Cunge 1975; Ponce and Yevjevich 1978), although these variable-parameter methods fail to conserve mass (Ponce and Yevjevich 1978). Notable large-scale uses of the variable-parameter Muskingum–Cunge method include Orlandini and Rosso (1998) and Orlandini et al. (2003). More recently, Todini (2007) developed a mass-conservative variable-parameter Muskingum method known as the Muskingum–Cunge–Todini method.
As a first step, the traditional Muskingum method with temporally constant parameters calculated partly based on the work of Cunge (1969) is used in this study because there are significant challenges to overcome in adapting the Muskingum method for river networks, in efficiently running it within a parallel computing environment, and in developing an automated parameter estimation procedure before more sophisticated flow equations are used. However, the physics of flow could be improved with many variations based on the Muskingum method or adapted to the Saint Venant equations.
a. Calculation of flow and volume of water in a river network
In a network of thousands of reaches, matrices are needed for network connectivity and flow computation. The backbone of RAPID is a vector-matrix version of the Muskingum method shown in Eq. (1) and derived subsequently in this section:
where t is time and Δt is the river routing time step. The bold roman notation is used for vectors and bold sans serif font for matrices: is the identity matrix; is the river network matrix; and are parameter matrices; Q is a vector of outflows from each reach; and Qe is a vector of lateral inflows for each reach. Such a vector-matrix formulation of the Muskingum method has to our knowledge never been previously published.
Equation (1) is used for river network routing and can be solved using a linear system solver. The vector-matrix notation provides one flow equation for the entire river network, therefore avoiding spatial iterations. For a river network with m river reaches, all vectors are of size m and all matrices are square of size m. Each element of a vector corresponds to one river reach in the network. For performance purposes, all matrices are stored as sparse matrices (only the nonzero values are recorded). A five-reach, two-node, and two-gauge river network is used here to clarify the mathematical formulation of the river network model and is shown in Fig. 4a. The river network is made up of a combination of river reaches similar to that of Fig. 4b. The model formulation is presented here for a small river network but can be generalized to any size of river network.
The vector Q is a vector of the outflows Qj of all reaches of the river network, where j is the index of a river reach within the network:
and Qe is a vector of flows that are lateral inflows to the river network. Lateral inflows include runoff, groundwater, or any type of forced inflow (outflow at a dam, pumping, etc.):
A land surface model whose time step is coarser than the river routing time step provides Qe. Two assumptions are made in the development of RAPID, one regarding the temporal variability of Qe and one regarding the location at which Qe enters the river network. In this study, the river routing time step is 15 min and inflow from land surface runoff is available every 3 h. In the derivation of Eq. (1), Qe is assumed constant [i.e., Qe(t + Δt) = Qe(t)] over all 15-min river routing time steps included within a given land surface model 3-h time step. This partial temporal uniformity simplifies the river network model formulation, limits the quantity of input data, and facilitates the coupling with land surface models. This assumption is valid at all times except at the last routing time steps before a new Qe is made available by the land surface model. Also, the external inflow Qe is assumed to enter the network as an addition to the upstream flow. With these two assumptions, the Muskingum method applied to reach 5 in Fig. 4b gives the following:
where C1, C2, and C3 are the Muskingum parameters that are stated in Eq. (6). The reader should note that these two assumptions are equivalent to using a unit-width lateral inflow along with a term C4 as found in available literature (see, e.g., Fread 1993; NERC 1975; Orlandini and Rosso 1998; Ponce 1986). Equation (1) is a generalization of Eq. (4) using a vector-matrix notation.
Berge (1958) proposed the concept of matrices associated with graphs. This concept can be applied to the river network in Fig. 4a in order to create the network connectivity matrix given in Eq. (5) in both full and sparse formats. The network connectivity matrix is a square matrix whose dimension is the total number of reaches in the network. A value of one is used at row i and column j if reach j flows into reach i, and zero is used everywhere else:
The upstream inflow to the network can therefore be computed by multiplying the network connectivity matrix by the vector of outflows Q. In case of a divergence in the river network (when going downstream) or in case of a loop, a unique reach (the major divergence) is used to carry all the upstream flow, and the other reaches (minor divergences) carry only the flow that results from their lateral inflow. This formulation could be modified to take into account given fractions of flows that separate into different parts of a divergence if that information is available.
The matrices are diagonal matrices with their diagonal elements being the coefficients used in the Muskingum method (McCarthy 1938), respectively C1j, C2j, and C3j, such that
where kj is a storage constant (with dimension of a time) and xj a dimensionless weighting factor characterizing the relative influence of the inflow and the outflow on the volume of the reach j. The Muskingum method is stable for any x ∈ [0, 0.5], regardless of the value of k and Δt (Cunge 1969). For any j, C1j + C2j + C3j = 1.
In RAPID, the parameters k and x of the Muskingum method are allowed to differ from one river reach to another, and corresponding vectors are defined in Eq. (7):
The sum equals the identity matrix.
The calculation of the volume of water in a given reach can be needed for coupling with groundwater models. Here, the first-order, explicit, forward Euler method is applied to the continuity equation to calculate the volume of water in each river reach of the network, as shown in Eq. (9) where the first, second, and third terms of the right-hand side are the volume of water that respectively were in the river reach, flowed into the reach, and discharged from the reach:
where V is a vector of the volume of water Vj in each river reach j:
Details on the massively parallel implementation of the matrix-based Muskingum method presented in this section, and of the automated parameter estimation presented in the section below, are given in appendix A.
b. Parameter estimation
To estimate the parameters k and x to be used in RAPID, an inverse method is developed. The principle of an inverse method is to optimize the parameters of a model so that the outputs of the model approach observations. A cost function reflecting the difference between model calculations and observations is needed to assess the quality of a set of model parameters. The best set of parameters is chosen as the set that minimizes the cost function and is determined through optimization. A square-error cost function φ is chosen:
where the summation is made daily. The T in the exponent is for vector transpose, and to and tf are respectively the first and last day used for the calculation of φ. The model parameter vectors k and x are kept constant within the temporal interval [to, tf], and the cost function is calculated several times with different sets of parameters during the optimization procedure. The scalar f allows φ to be on the order of magnitude of 101, which is helpful for automated optimization procedures; is the daily average outflow vector, calculated based on the mean of all routing time steps in a given day; Qg(t) is a vector with the total number of river reaches for dimension, with the daily value observed corresponding to reach j where gauge measurements are available and zero where no gauge is available; is a sparse diagonal matrix that allows the dot product to survive only where gauges are available, so that has a value of one on the diagonal element of index j if a gauge is available on reach j and zero everywhere else. Using the example network given in Fig. 4a, and Qg(t) take the following form:
According to Fread (1993), x ∈ [0.1, 0.3] in most streams. By analogy with the kinematic wave equation, Cunge (1969) showed that the parameter k of the Muskingum method is the travel time of a flow wave through a river reach. For a given river reach j of length Lj where a flow wave of celerity cj travels, kj is obtained by dividing the length by the celerity of the wave, as shown in Eq. (13):
Although the routing model defined by Eq. (1) allows for variability of the parameters (kj, xj) on a reach-to-reach basis, attempting to automatically estimate model parameters independently for all the reaches of a basin would be a costly undertaking. Therefore, the search for optimal parameters is limited to determining two multiplying factors λk and λx such that
To minimize the influence of the initial guess on the optimization procedure, three different initial guesses for (λk, λx) are used. Out of the three corresponding optimal (λk, λx) obtained, only the one couple leading to the minimum value of the cost function φ is kept. Therefore, the optimization procedure leads to only one optimal couple (λk, λx) for a given basin in the network. Note that—as a first step—x is here constant over a given basin on the grounds that the Muskingum method is relatively insensitive to this parameter (Koussis 1978). Some data available in NHDPlus (such as mean flow, mean velocity, slope, etc.) associated with available formulations for x (e.g., Cunge 1969; Orlandini and Rosso 1998) could be used to improve the proposed method.
RAPID is designed to handle large routing problems. Given a river network and connectivity information as well as lateral inflow to the river network, RAPID can run on any river network. In this study, a framework for computation of river flow in the Guadalupe and San Antonio River basins is developed that uses a one-way modeling framework with an atmospheric dataset, a land surface model, and RAPID as the river model. This section presents how the Guadalupe and San Antonio River basins are described in the NHDPlus dataset, how a land surface model is used to provide lateral inflow to the river network, and how the meteorological forcing is prepared.
a. RAPID used on NHDPlus
There are a total of 5175 river reaches with known direction and connectivity within the NHDPlus description of the Guadalupe and San Antonio river basins (as shown in Fig. 2). These 5175 reaches have an average length of 3.00 km and the average catchment defined around them is 5.11 km2 in area; all are used for this study. Details on the fields used in the NHDPlus dataset, including the unique “common identifier” (COMID) used for all river reaches and their corresponding catchments, and on how NHDPlus is used with RAPID, are given in appendix B. In this study, the vector of outflows in all river reaches Q was arbitrarily initialized to the uniform value of 0 m3 s−1 prior to running RAPID.
b. Land surface model and coupling with RAPID
Within this study, the core physical model governing the one-dimensional vertical fluxes of energy and moisture is the community Noah land surface model with multiparameterization options (Noah-MP; Niu et al. 2011). Noah-MP offers multiple options for choosing the modeling of certain physical phenomena. In this study, the soil moisture factor for stomatal resistance is of “Noah type” (Niu et al. 2011) and the runoff scheme is TOPMODEL based, using a simple groundwater model (SIMGM; Niu et al. 2007). The soil column is 2 m deep, below which is an unconfined aquifer. To represent the characteristics of the structural soil over the model domain, the saturated hydraulic conductivity, which is determined by the soil texture data, is enlarged by factor of 10 (through calibration). The soil hydrology of Noah (soil moisture) is run at an hourly time step and runoff data are produced every three hours. In this study, the state variables of Noah were initialized through a spinup method.
Noah-MP calculates the amount of water that runs off on and below the land surface. This quantity is used to provide RAPID with the water inflow from outside of the river network. David et al. (2009) presented a coupling technique using a hydrologically enhanced version of the Noah LSM called “Noah distributed” (Gochis and Chen 2003) that allows physically based modeling of the horizontal movement of surface and subsurface water from the land surface to a river reach. In interest of a simpler coupling scheme, the work of David et al. (2009) has been modified. In this study, a flux coupler between Noah and RAPID is developed using the catchments available in the NHDPlus dataset.
The NHDPlus catchments contributing runoff to each river reach were determined as part of the NHDPlus development using a digital elevation model and its associated flow accumulation and flow direction grids. These grids have a native resolution of 30 m. The map of catchments is available in NHDPlus in both gridded (at 30-m resolution) and vector formats in a shape file. Running a land surface model at a 30-m resolution is very resource demanding. Therefore, a coarser resolution of 900-m cell size is chosen. The shape file of NHDPlus catchment boundaries is converted to a grid of size 900 m. Within this conversion process, the accuracy of the boundaries of the catchments is lowered but the catchment boundaries are reasonably respected and the computational cost of the land surface model calculations is reasonable. For each 3-h output of the Noah model, surface and subsurface runoff data is superimposed onto the catchment grid, and all runoff that corresponds to the catchment of each river reach is summed and used as the water inflow to the river reach. Figure 5 shows the principle of the flux coupler in which the 900-m runoff data generated by the Noah model is superimposed on the 900-m map of NHDPlus catchment COMIDs to determine the lateral inflow for NHDPlus reaches used by RAPID.
Therefore, no horizontal routing is used between the land surface and the river network in the proposed scheme. This differs from some other models that use runoff from a one-dimensional model to force a river routing model. For instance, the two-dimensional wave equation is used in Gochis and Chen (2003) or the linear reservoir equation is used in Ledoux et al. (1989).
The coupling method used here can be adapted to any land surface model that computes surface and subsurface runoff on a grid. This coupling technique is automated in a FORTRAN program.
c. Meteorological forcing
Land surface models need meteorological forcing in order to compute the water and the energy balance at the surface. The Noah LSM requires seven meteorological parameters: precipitation, specific humidity, air temperature, air pressure, wind speed, downward shortwave, and downward longwave radiation. Hourly precipitation is obtained from the Next Generation Weather Radar (NEXRAD) and downscaled from its original resolution (4.763 km) to 900 m using the method developed in Guan et al. (2009). All other meteorological parameters are downloaded from the 3-hourly North American Regional Reanalysis (NARR) and converted from its original resolution (32.463 km) to 900 m using a simple triangle-based linear interpolation. All meteorological data are prepared for four years (1 January 2004–31 December 2007).
4. Calibration and results for the Guadalupe and San Antonio River basins
The framework for computation of river flow that is developed in the previous section is used to calculate river flow in all 5175 river reaches of the Guadalupe and San Antonio River basins for four years (1 January 2004–31 December 2007). In this section, flow wave celerities in several subbasins are estimated from measurements, the model parameters used in RAPID are presented, and flows computed are compared to observed flows. Issues related to the time step used in RAPID and to the simulated wave celerities are also presented.
a. Estimation of wave celerities
The USGS Instantaneous Data Archive (IDA; http://ida.water.usgs.gov/ida/) provides 15-min flow data that can be used to determine the flow wave celerity. Data at fifteen gauging stations within the two basins studied are obtained from IDA over two time periods (1 January 2004–30 June 2004 and for 1 January 2007–30 June 2007). The maximum lagged cross correlation between hydrographs at two consecutive gauging stations is used to determine the flow wave celerity. The lagged cross-correlation ρ is a measure of similarity between two wave forms as a function of a lag time τlag applied to one of them, as shown in Eq. (15):
where Qa and Qb are the flows measured at the upstream and downstream station, respectively, and the summation is here made every 15 min for six months. Figure 6 shows the correlation as a function of increasing lag time between three different sets of consecutive gauging stations. The lag time giving the maximum correlation is taken as the travel time τtravel for the flow wave between the two stations. The travel times are estimated for 11 sets of two stations and are shown on Table 1. Travel times of 0 s are reported at two stations, where the flow wave is probably too fast to be captured by 15-min measurements. The wave celerity c is then computed using Eq. (16):
where d is the distance between two stations. The NHDPlus Flow Table Navigator Tool (http://www.horizon-systems.com/nhdplus/tools.php) is used to estimate the curvilinear distance between two stations along the NHDPlus river network that are shown on Table 1. The wave celerity has been estimated for 11 subbasins within the Guadalupe and San Antonio River basins. Table 2 shows the values that are obtained for the two time periods considered as well as their average. Figure 7 shows the corresponding subbasins as well as the locations of all gauging stations.
b. Parameters used in RAPID
RAPID needs two vectors of parameters k and x that can either be determined using physically based equations, through optimization, or a combination of both. In this study, daily streamflow data are obtained from the USGS National Water Information System (http://waterdata.usgs.gov/nwis) in order to use the built-in parameter estimation. Within the Guadalupe and San Antonio River basins, NWIS has 74 gauges that measure flow, 36 of them having full records of daily measurements for the four years studied (1 January 2004–31 December 2007). These 36 stations are used for parameter estimation.
Four sets of model parameters—denoted by the superscripts α, β, γ, and δ—are used in this study. These sets of parameters are all based on Eq. (14), which is used with a uniform wave celerity of c0 = 1 km h−1 = 0.28 m s−1 throughout the basin or with the celerities cj determined based on the IDA lagged cross-correlation study.
The first set (kα, xα) is obtained from parameter estimation shown in Eq. (11) using the uniform wave celerity c0 = 0.28 m s−1, and the resulting values of the two multiplying factors λk and λx of Eq. (14) are
The parameters (kβ, xβ) are determined without optimization using the celerities cj determined based on the IDA lagged cross-correlation study and set to
The third set of parameters (kγ, xγ) is obtained through optimization using the celerities cj determined based on the IDA lagged cross-correlation study, and the resulting values are
The optimization converges to a value of k that is 38% smaller than that estimated with the IDA lagged cross-correlation, suggesting that a faster flow wave in the river network produces better flow calculations. In the present study, routing on the land surface from the catchment to its corresponding reach is not modeled. Therefore, one would expect that the optimized flow celerity in the river network would be slower than that estimated from river flow observations, which is not the case here. This suggests that runoff is either produced too slowly or too far upstream of each gauge, perhaps because runoff in land surface models is often calibrated based on a lumped value at the downstream gauge of a basin, as was done here with Noah-MP. Further details on the quality of runoff simulations are given in section 4d.
The fourth set of parameters (kδ, xδ) is determined for a better match of celerity calculations, as explained later in this paper.
c. Time step of RAPID simulation
Cunge (1969) showed that the Muskingum method is stable for any x ∈ [0, 0.5] and that the wave celerity computed by the Muskingum method approaches the theoretical wave celerity of the kinematic wave equation if the time step of the river routing equals the travel time of the wave (for x = 0.5), as shown in Eq. (20):
However, both the celerity of flow and the length of river reaches vary along the network, and the model formulation of RAPID allows for only one unique value of the time step Δt be chosen. In the Guadalupe and San Antonio River basins, the mean length is 3 km and the median length is 2.4 km. The probability density function and the cumulative distribution functions for the lengths of river reaches are shown in Fig. 8. The celerities estimated earlier are on the order of c = 2.5 m s−1. Using the median value of the reach length along with c = 2.5 m s−1, Eq. (20) gives Δt = 960 s. To have an integer conversion between the river routing time step and the land surface model time step (3 h), a value of Δt = 900 s = 15 min is chosen.
d. Analysis of the quality of river flow computation
For various model simulations, the average and the root-mean-square error (RMSE) of computed flow rate are calculated using daily data and are given in Table 3. The Nash efficiency (Nash and Sutcliffe 1970) is bounded by the interval ]−∞, 1] and gives an estimate of the quality of modeled river flow computations when compared to observations, and is also given in Table 3. An efficiency of 1 corresponds to a perfect model and 0 corresponds to a model producing the mean of observations. The results shown for a lumped model correspond to when runoff from Noah is accumulated at the gauge directly without any routing. The average values of flow in RAPID simulations are tied to the amount of runoff water calculated by the Noah LSM and the bias generated by the land surface model cannot be fixed by RAPID. However, the internal connectivity of the NHDPlus river network is well translated in RAPID and mass is conserved within RAPID since the flow rates in the lumped simulation and in all four simulations of RAPID are the same. Figure 9 shows the ratio between observed and lumped streamflow at 17 gauges located across the Guadalupe and San Antonio River basins. This ratio is around unity downstream of the Guadalupe and San Antonio Rivers but is greater than seven upstream, suggesting that runoff is most likely overestimated at the center of the basin. Additionally, runoff is largely underestimated at two stations just downstream of the outcrop area of the Edwards Aquifer: the Comal River at New Braunfels and the San Marcos River at San Marcos. These stations measure large average streamflow (respectively 10.59 and 5.9 m3 s−1) although draining a relatively small area (respectively 336 and 129 km2), and are actually two of the largest springs in Texas. These flows are much larger than the lumped runoff (respectively 0.67 and 0.26 m3 s−1), which is expected because the modeling framework presented herein does not does not explicitly simulate aquifers.
However, the RAPID simulations (kα, xα), (kβ, xβ), and (kγ, xγ) lead to a smaller RMSE and a higher Nash efficiency than the lumped runoff. This shows that an explicit river routing scheme with carefully chosen parameters allows obtaining better streamflow calculations than a simple lumped runoff scheme, as expected.
Within the different RAPID simulations, the set of parameters (kδ, xδ) gives the best results for RMSE and Nash efficiency, followed by (kβ, xβ), (kα, xα), and (kγ, xγ). Therefore, a greater spatial variability in the values of k contributes to the quality of model results, and the built-in optimization in RAPID further enhances these model results. An example hydrograph for the Guadalupe River near Victoria, Texas is shown in Fig. 10, and is computed using (kγ, xγ).
e. Comparison between estimated and computed wave celerities
To assess the capacity of the modeling framework to reproduce surface flow dynamics, the celerity of the flow wave in outputs from RAPID are computed. Fifteen-minute river flow is computed with RAPID, and the lagged cross correlation presented earlier is used to calculate the wave celerity within the RAPID simulation. Table 2 shows the celerities that are computed from RAPID outputs. In the first three sets of model parameters used, the wave celerities simulated in RAPID are greater than those observed. One can also notice than even for (kβ, xβ), the model-simulated celerities are different than the observed celerities that are used to determine the vector kβ itself. This was predicted by Cunge (1969), who showed that the difference between the celerity of the kinematic wave equation and that computed using the Muskingum method is a function of both x and the quotient Δt/Lj. Only the specific values x = 0.5 and Δt verifying Δt/Lj = cj allow obtaining the same celerity. Furthermore, the work herein is done in a river network, and the celerity estimated between two points does not correspond only to the main river stem but rather to a combination of all river reaches present in the network between the two points. The ratio of the average celerities from RAPID using (kβ, xβ) over the average observed celerities is 1.54. As a final experiment, a new set of parameters (kδ, xδ) is created to account for the faster waves in RAPID:
Table 2 shows that the parameters (kδ, xδ) allow for wave celerities that are closer to the observed ones than the celerities obtained with the other sets of parameters. The average flow wave celerity over the 11 calculations in RAPID is within 3% of that estimated with IDA flows. Unfortunately, these closer wave celerities also lead to a decrease in the quality of RMSE and Nash efficiency. Therefore, model celerities closer to celerities estimated from observations can be obtained but generally deteriorate other statistics of calculations. Again, this might be due to runoff being produced too slowly or too far upstream of each gauge.
f. Potential improvement of spatial variability in RAPID parameters
In the work presented here, the parameter x is spatially and temporally constant over the modeling domain and the parameter k is temporally constant but varies at the river reach level based on the length of each reach and on the celerity of the flow wave going through it. Flow wave celerities are estimated for 11 subbasins based on flow observations, and the spatial variability of k presented in this study is therefore partly limited by the size of the subbasins used for flow wave estimation. However, such an approach for computation of RAPID parameters allows taking into account wave celerities that are estimated based on observations made at high temporal resolution as well as verifying the modeling framework through reproduction of estimated wave celerities. In a separate study applying RAPID to all rivers of metropolitan France, David et al. (2011) present a physically based formulation of k and a subbasin optimization for both k and x, therefore allowing further spatial variability of parameters. David et al. (2011) show that using a combination of reach length, river bed slope, and basin residence time for the parameter k and applying the optimization procedure to subbasins both improve the efficiency and the RMSE of RAPID flow computations. Such work could be adapted to the study herein based on information provided in the NHDPlus dataset (e.g., reach length, mean annual flow velocity, and river bed slope), which would be advantageous when applying RAPID to domains larger than the Guadalupe and San Antonio River basins where estimation of wave celerities everywhere may require excessive amounts of computations.
g. Statistical significance
Changes in the routing procedure (i.e., no routing or routing using various RAPID parameters) lead to various changes in the values of efficiency and RMSE, as shown in section 4b. The statistical significance of the changes can be assessed in order to determine whether or not various routing experiments are effective. For two different routing procedures used, the efficiency (RMSE) at one gauge can be compared to the efficiency (RMSE) at the same gauge, although variability of efficiency (RMSE) between independent gauges can be large. Therefore, there is a logical pairing of efficiency and RMSE calculated at a given gauge between two experiments and, hence, matched pair tests are appropriate to assess the statistical significance. Several common options are available for matched pair tests (with increasing level of complexity): the sign test, the Wilcoxon signed-rank test (Wilcoxon 1945), and the paired t test. The sign test has no assumption on the shape of probability distributions of samples used but is quite simple since only the sign of differences between two paired samples is accounted for. The Wilcoxon signed-rank test incorporates the magnitude of differences between paired samples under the assumption that differences between pairs are symmetrically distributed. The paired t test may be used when the differences between pairs are known to be normally distributed. The assumption of the Wilcoxon signed-rank test (symmetry) is not as restrictive as that of the paired t test (normality). In cases where small sample sizes are used—as done in this study—testing for symmetry or normality may not be meaningful. Additionally, violations of the symmetry assumption in the Wilcoxon signed-rank test have minimal influence on the corresponding p values (Helsel and Hirsch 2002). These two reasons motivate the use of the Wilcoxon signed-rank test in the study herein. The null hypothesis H0 for this test is that the median of differences between two populations is zero. The purpose of changes in the routing procedure being to improve results by increasing the efficiency and decreasing the RMSE, alternate hypotheses can assume that one population tends to be generally either larger (H1) or smaller (H2) than the other. Therefore, p values corresponding to one-sided tests are used in this study. Low significance levels mean that H0 is unlikely, hence that a significant change is observed. The Wilcoxon signed-rank test sorts pairs with nonzero difference based on the absolute value of the differences and sums all positive (negative) ranks in a variable named W+ (W−). The corresponding p values vary with the number of nonzero differences and with the value of W+ and W−. FORTRAN programs were created to compute the exact value of the test statistic (not using a large-sample approximation) as well as the corresponding p values. Table 4 shows the results of the Wilcoxon signed-rank test for both efficiency and RMSE and for several paired experiments using two different routing procedures. The same 15 stations named on Fig. 7 and used in Table 3 serve here for statistical significance assessment, and the corresponding 15 values of efficiency and of RMSE are utilized as sample values.
Several conclusions can be drawn from Table 4. First, the Wilcoxon signed-rank tests comparing results obtained by RAPID with parameters α, β, and γ to a lumped runoff approach show that the null hypothesis can be rejected for a one-sided test at a 10% level of significance in all cases, except for the efficiency between RAPID with β parameters and a lumped approach at a 13% level of significance. All these tests validate that the improvements mentioned in section 4b (increased efficiency and decreased RMSE) are statistically significant and confirm that an explicit river routing scheme allows for obtaining better streamflow calculations than a simple lumped runoff scheme, as expected. Second, comparisons between RAPID using α and γ parameters show that subbasin variability in wave celerities is advantageous to a spatially uniform wave celerity approach at a 19% level of significance for efficiency and at a 7% level for RMSE. Third, comparisons between RAPID using γ and δ parameters confirms that wave celerities close to those determined from observations deteriorate results at a 3% level of significance for both efficiency and RMSE. Finally, one cannot conclude on the statistical significance of the comparison between RAPID using β and γ parameters concerning the improvement of optimization procedure. However, since RAPID using γ parameters produce better average values than RAPID using β parameters, and since the statistical significance of RAPID using γ parameters compared to a lumped approach is better than that of RAPID using β parameters compared to lumped approach, the optimization can still be considered advantageous.
5. Synthetic study of the upper Mississippi River basin, speedup of parallel computations
Through the use of mathematical and optimization libraries that run in a parallel computing environment, RAPID can be applied on several processing cores. The work presented above focuses on the Guadalupe and San Antonio River basins together forming a river network with 5175 river and water body reaches, the size of which does not justify the use of parallel computing. However, all the tools and datasets used are available for the contiguous United States, where the NHDPlus dataset has about 3 million reaches. Adapting the proposed framework to simultaneously compute flow and volume of water in all mapped water bodies of the contiguous United States would require solving matrix equations of the size of 3 million. For such a large scientific problem, parallel computing can be helpful if speedup can be achieved—that is, if increasing the number of processing cores decreases the total computing time.
a. Synthetic study used for assessment of parallel performance
As a proof of concept, the evaluation of the parallel computing capabilities of RAPID is presented here using the upper Mississippi River basin (shown on Fig. 3), which has 182 240 river and water body reaches available as region 7 in the NHDPlus dataset. The number of computational elements for the upper Mississippi River basin is about 35 times larger than the combination of the Guadalupe and San Antonio River basins, and about 16 times smaller than the entire contiguous United States. The river network of the upper Mississippi River basin is fully interconnected, all water eventually flowing to a unique outlet.
To assess the performance of RAPID, the same problem consisting in the computation of river flow in all reaches of the upper Mississippi River basin, over 100 days, at a 900-s time step is solved for all results reported in section 5c. For this performance study, the runoff data symbolized by vector Qe in Eq. (1) are synthetically generated and set to 1 m3 every 3 h for all reaches and all time steps, and the vectors of parameters k and x are temporally and spatially uniform as shown in Eq. (22):
b. Basics of solving a linear system on computers
Numerically solving a linear system is typically an iterative process mainly involving two steps at each iteration: preconditioning followed by applying a linear solver. Preconditioning is a procedure that transforms a given linear system through matrix multiplication into one that is more easily solved by linear solvers, hence decreasing the total number of iterations to find the solution and saving time. If the linear system is triangular, preconditioning is sufficient to solve the problem, and a linear solver is not needed. In a parallel computing environment, a matrix is separated into diagonal and off-diagonal blocks, each processing core being assigned one diagonal block and its adjacent off-diagonal block. Solving a linear system in parallel is made using blocks, and parallel preconditioning is determined based on elements in the diagonal blocks. Preconditioning is sufficient to solve a given parallel linear system if the system is diagonal by blocks (i.e., all off-diagonal blocks are empty) and if each diagonal block is triangular; in most other cases, iterations of preconditioning and applying a linear solver are needed.
c. Parallel speedup of the synthetic study
For comparison purposes, the traditional Muskingum method was also implemented in RAPID in order to assess the performance of the matrix-based Muskingum method developed herein. Figure 11 shows a comparison of computing time between the traditional Muskingum method shown in Eq. (4) applied consecutively from upstream to downstream and the matrix-based Muskingum method used in RAPID. Only one processor is used for all results in Fig. 11 but the computation method differs. The matrix being triangular (see appendix B), solving the linear system of Eq. (1) can be limited to matrix preconditioning if using only one processing core. In a parallel computing environment, is separated in blocks, each diagonal block corresponding to a subbasin. With several processing cores, matrix preconditioning would be sufficient to solve Eq. (1) if could be made diagonal by blocks, each diagonal block being a triangular matrix. In a river network that is fully interconnected, such as that of the upper Mississippi River basin, cannot be made diagonal by blocks because the connectivity between adjacent subbasins would always appear as an element in an off-diagonal block matrix [cf. Equation (23) when i and j are connected but belong to different subbasins]. This limitation would not apply if one was to compute the Mississippi River basin on one (or on one set of) processing core(s) and the Colorado River basin on another (or on another set of) processing core(s), for example. Therefore, when solving Eq. (1) on several processing cores for the upper Mississippi River basin, preconditioning is not sufficient and iterative methods need be used. An iterative method implies several computations including preconditioning, matrix-vector multiplication, and calculation of residual norm at each iteration.
On one processing core, solving the matrix-based Muskingum method with preconditioning only is about twice as long as solving the traditional Muskingum method, as shown in Fig. 11. This extra time can be explained because the computation of the right-hand side of Eq. (1) is approximately as expensive as solving the traditional Muskingum method and approximately as expensive as preconditioning. However, the computation of the right-hand side is done only once per time step, regardless of the number of iterations if using an iterative linear solver, and scales very well because all operations require no communication except for the product , which involves little communication. Figure 11 also shows the computing time when using an iterative solver. The sole purpose of the first iteration in an iterative solver is to determine an initial residual error that is to be used as a criterion for convergence in following iterations. This first iteration mainly involves preconditioning and calculation of a residual norm. On one processing core only, the second iteration converges because preconditioning is sufficient. The two iterations and calculations of norms explain the doubling of computing time between preconditioning only and an iterative solver on one unique processing core that is shown in Fig. 11. Overall, the overhead created by an iterative solver over the traditional Muskingum method is about a factor of 4. Again, both preconditioning and calculation of residual norms scale well, although the latter can be limited by communications. Therefore, the main issue with using a matrix method is the number of iterations needed before the iterative solver converges because all other overhead dissipates with an increasing number of processing cores used. Surprisingly, the number of iterations needed for the iterative solver to converge increases much less quickly than the number of processing cores used, hence allowing us to gain total computation time with an increased number of processing cores and to produce results faster than the traditional Muskingum method, as shown on Fig. 12. This suggests that even in a basin where all river reaches are interdependent, some upstream and downstream subbasins can be computed separately in an iterative scheme given that they are distant enough from each other. The physical explanation is that flow waves are not fast enough to travel across the entire basin within one 15-min time step. This decoupling of computations could not be achieved by using the traditional version of the Muskingum method, since computations are not iterative and have to be performed going from upstream to downstream. Figure 12 shows that the total computing time with an iterative matrix solver on 16 processing cores is almost a third of the time needed by the traditional Muskingum method and keeps decreasing further with more processing cores. However, as the number of cores increase, the relative importance of the computation of residual norms within the iterative solver increases up to taking almost half of the solving time, as shown in Fig. 12. This limitation will most likely disappear as computer technology advances and communication time decreases. One should note that the output files match on a byte-to-byte basis and, hence, model computations are strictly the same regardless of the method used (i.e., traditional Muskingum method or matrix-based Muskingum method, iterative or not). This strict similarity between output files and the slow increase in iterations are also verified for the study of the Guadalupe and San Antonio River basins presented above; hence, the use of synthetic data and simplified model parameters does not influence the trends in speedup.
Computing loads are balanced for all simulations in this study; that is, the number of river reaches assigned to each processing core is almost identical across cores. Figure 13 shows how subbasins of the upper Mississippi River basin are divided among processing cores as well as the longest river path of the basin. The longest path goes through 8 subbasins on 8 cores and 13 subbasins on 16 cores. If one were to apply the traditional Muskingum method on several processing cores with the division in subbasins shown in Fig. 13, computations would have to be made sequentially from upstream to downstream, each core having to wait for its upstream core to be done prior to starting its work. Hence, assuming that the total computing time can be evenly divided by the total number of nodes and neglecting communication overhead, one could only hope to decrease computing time by a factor of (no gain) for 8 cores and by a factor of 16/13 = 1.23 for 16 cores. The iterative matrix solver provides much better results (a decrease by a factor of 2.90 for 16 cores).
River flow is a causal phenomenon that mainly goes downstream. Therefore, when using an upstream-to-downstream computation scheme and unless dealing with completely separated river basins, one cannot expect to obtain perfect speedup (i.e., decreasing of computing time by a factor equal to the number of cores). However, today’s supercomputers having tens of thousands of computing cores, one could leverage such power to save human time. Additionally, the matrix method developed here can be directly applied to a combination of independent river basins, in which case speedup would be ideally perfect. Furthermore, matrix methods such as the one developed here could be adapted to more complex river flow equations—like variable-parameter Muskingum methods or schemes allowing for backwater effects—in order to save total computing time. Finally, the splitting up into subbasins used here is very simple, and optimizing this partition by limiting connections between subbasins or taking into account flow wave celerities relatively to basin sizes could help limit the number of communications and the number of iterations, respectively, in the linear system solver.
NHDPlus is a GIS dataset that describes the networks of mapped rivers and water bodies of the United States. One of the main advantages of NHDPlus is that connectivity information for the river networks is available. Therefore, this dataset offers possibilities for the development of river routing models that simultaneously calculate flow and volume of water in all water bodies of the nation. Furthermore, the USGS National Water Information System has thousands of gauges located on the NHDPlus network that can be used to assess the quality of such river models across river basins (not only at basin outlets). The research presented in this paper investigates how to develop a river network model using NHDPlus networks and how to assess model computations and optimize model parameters with USGS streamflow measurements. All tools and datasets used herein are available for the contiguous United States, but this research addresses two smaller domains. The combination of the Guadalupe and San Antonio River basins in Texas is used in a 4-year case study, and the upper Mississippi River basin is used in a speedup study with synthetic data. Graph theory is applied to a river network to create a network matrix that is used to develop a vector-matrix version of the Muskingum method and applied in a new river network model called RAPID. It has been shown that a GIS-based hydrographic dataset can be used as the river network for a river model to compute flow in large networks of thousands of reaches, including ungauged locations. A simple flux coupler for connecting a land surface model with an NHDPlus river network is presented. No horizontal routing of flow from the land surface to the river network is used in this study; such an addition would help improve model calculations. An inverse method is developed to estimate model parameters in RAPID using available gauge measurements located across the river basins. Wave celerities are estimated in several locations of the basin studied. RMSE and Nash efficiency of computed flow rates in four RAPID simulations are compared with a basic lumped model where runoff is directly accumulated at the gauge, with gauge measurements, and among themselves. RAPID produces better RMSE and Nash efficiency than the lumped model, and the improvements are statistically significant. Although the quality of RAPID calculations is tied to the quantity of runoff generated by the land surface model that provides runoff, mass is conserved within RAPID since the average flow rate is conserved. Spatial variability of parameters enhances the RMSE and Nash efficiency of RAPID calculations. Wave celerities are reproduced within a few percents of the model proposed, although wave celerities closer to those estimated from gauge data generally deteriorate the other statistics of calculations. This deterioration might be due to runoff being produced too slowly or too far upstream of each gauge. The parameters used in this study are simple, but could be improved based on information available in NHDPlus such as slope, mean flow, and velocity of all reaches, or by using modified versions of the Muskingum method with time-variable parameters, although the latter would necessitate modification of the optimization procedure developed herein. The matrix formulation in RAPID can be transferred in a parallel computing environment. A synthetic study of the upper Mississippi River basin shows that although a large initial overhead is added by the matrix method, this overhead decreases with increasing number of processing cores. More importantly, an iterative matrix solver allows decoupling of subbasins—even if the main river basin is fully interconnected—hence permitting computation of subbasins separately if they are distant enough from each other. As consequence, while producing the exact same results as the traditional Muskingum method, the matrix-based Muskingum method decreases the total computing time when run on several processing cores. Such a gain in computing time would be highly beneficial if addressing larger scales, like the entire contiguous United States, which would represent a square matrix of the size of 3 million.
This work was partially supported by the U.S. National Aeronautics and Space Administration under the Interdisciplinary Science Project NNX07AL79G; by the U.S. National Science Foundation under project EAR-0413265: CUAHSI Hydrologic Information Systems; by Ecole des Mines de Paris, France; and by the American Geophysical Union under a Horton (Hydrology) Research Grant. The authors wish to thank the PETSc and TAO developers, especially Dr. Barry Smith, Dr. Matthew Knepley, Dr. Satish Balay and Dr. Jason Sarich for their continuous assistance throughout the development of RAPID. Thank you to Dr. Karl Shultz from TACC for his help regarding the handling of inputs and outputs on supercomputers. The computing resources were provided by TACC, which is gratefully acknowledged. Thank you to Dr. Stefano Orlandini for suggestions on an early version of this work. The authors are thankful to the two anonymous reviewers and to the editor for their valuable comments and suggestions that helped improved the original version of this manuscript.
Implementation of RAPID
The river network routing model is coded in FORTRAN 90 using the Portable, Extensible Toolkit for Scientific Computation (PETSc) mathematical library (Balay et al. 1997, 2008, 2009) and the Toolkit for Advanced Optimization (TAO) optimization library (McInnes et al. 2009). PETSc can be used to create matrices and vectors and to apply a variety of linear operations such as matrix-vector multiplications or linear system solving. TAO offers multiple methods for unconstrained and constrained optimization. Both PETSc and TAO are built upon the Message Passing Interface (see special issue of International Journal of High Performance Computing Applications, 1994, Vol. 8, No. 3–4)—a standard for communications between processing cores—and can seamlessly be run in a sequential or a parallel computing environment. In this study, sparse matrices are stored using the sequential AIJ format when using one processing core and the MPIAIJ format when using several cores. Linear systems are solved within PETSc either by preconditioning only or with preconditioning associated with a Richardson method. The preconditioning methods used herein are incomplete lower upper (ILU) on one processing core, and block Jacobi on several cores. The optimization method used in TAO is a line search algorithm called the Nelder–Mead method. The Network Common Data Form (NetCDF) file format (Rew and Davis 1990) is utilized for both inputs and outputs. RAPID is run on single- and multiple-processor workstations as well as on Lonestar, a supercomputer running at the Texas Advanced Computing Center (TACC). This Dell Linux Cluster has 1460 nodes, each node with 8 GB of memory and with two dual-core sockets. Lonestar has a total of 5840 computing cores.
NHDPlus Used in RAPID
NHDPlus (Horizon Systems Corporation 2007) is a geographic information system (GIS) dataset for the hydrography of the United States. This dataset provides the mapped streams and rivers as well as the catchments that surround them. NHDPlus is based on the medium-resolution 1:100 000-scale national hydrographic dataset (NHD). One of the main improvements in NHDPlus is the network connectivity available in the value-added attributes (VAA) table for the river network. Each NHDPlus reach in the national network is assigned a unique integer identifier called COMID. NHDPlus catchments also have a COMID, the same COMID being used for the reach and its local contributing catchment. Nodes are located at the two ends of each NHDPlus river reach. A unique integer identifier is given to all nodes in the national river reach network. The VAA table includes FromNode and ToNode fields that identify which node is upstream and which is downstream of a given reach. Two reaches that are connected in a river network share a node, and the reach j flows into the reach i if ToNode(j) = FromNode(i). The NHDPlus connectivity between reaches, catchments, and nodes is illustrated for three catchments of the Guadalupe and San Antonio River basins in Fig. B1.
In its current formulation, RAPID can handle several upstream reaches but only one unique downstream reach. However, divergences exist in mapped river networks, as they do in NHDPlus. The VAA table offers a Divergence field to each of the river reaches (with values of 0—not part of a divergence, 1—main path of a divergence, and 2—minor path of a divergence). In the current formulation of RAPID, the main part of a divergence carries all the upstream flow. The FromNode, ToNode, and Divergence fields are used to populate the network matrix given in Eq. (5), by means of the following logical statement:
where Ni,j is the element of located at row i and column j. Therefore, upstream-to-downstream connection is conserved if the downstream reach is the major branch of a divergence or if it is not part of a divergence at all, but the connection is not made for a minor branch of a divergence.
The VAA table also has information on the relative location—upstream or downstream—of NHDPlus reaches. This information is available in a field called Hydroseq, consisting of a unique integer attributed to all NHDPlus reaches. Sorting the Hydroseq field in decreasing order prior to computations guarantees that all upstream elements are computed prior to solving the flow equations for any given river reach. This organization of computations allows the matrix of Eq. (1) to be made lower triangular, which increases the ease and speed of solving this linear system.