Underlying Fundamentals of Kalman Filtering for River Network Modeling

Charlotte M. Emery Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California

Search for other papers by Charlotte M. Emery in
Current site
Google Scholar
PubMed
Close
,
Cédric H. David Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California

Search for other papers by Cédric H. David in
Current site
Google Scholar
PubMed
Close
,
Konstantinos M. Andreadis Department of Civil and Environmental Engineering, University of Massachusetts Amherst, Amherst, Massachusetts

Search for other papers by Konstantinos M. Andreadis in
Current site
Google Scholar
PubMed
Close
,
Michael J. Turmon Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California

Search for other papers by Michael J. Turmon in
Current site
Google Scholar
PubMed
Close
,
John T. Reager Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California

Search for other papers by John T. Reager in
Current site
Google Scholar
PubMed
Close
,
Jonathan M. Hobbs Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California

Search for other papers by Jonathan M. Hobbs in
Current site
Google Scholar
PubMed
Close
,
Ming Pan Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey

Search for other papers by Ming Pan in
Current site
Google Scholar
PubMed
Close
,
James S. Famiglietti Global Institute for Water Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada

Search for other papers by James S. Famiglietti in
Current site
Google Scholar
PubMed
Close
,
Edward Beighley Department of Civil and Environment Engineering, Northeastern University, Boston, Massachusetts

Search for other papers by Edward Beighley in
Current site
Google Scholar
PubMed
Close
, and
Matthew Rodell NASA Goddard Space Flight Center, Greenbelt, Maryland

Search for other papers by Matthew Rodell in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

The grand challenge of producing hydrometeorological estimates every time and everywhere has motivated the fusion of sparse observations with dense numerical models, with a particular interest on discharge in river modeling. Ensemble methods are largely preferred as they enable the estimation of error properties, but at the expense of computational load and generally with underestimations. These imperfect stochastic estimates motivate the use of correction methods, that is, error localization and inflation, although the physical justifications for their optimality are limited. The purpose of this study is to use one of the simplest forms of data assimilation when applied to river modeling and reveal the underlying mechanisms impacting its performance. Our framework based on assimilating daily averaged in situ discharge measurements to correct daily averaged runoff was tested over a 4-yr case study of two rivers in Texas. Results show that under optimal conditions of inflation and localization, discharge simulations are consistently improved such that the mean values of Nash–Sutcliffe efficiency are enhanced from −11.32 to 0.55 at observed gauges and from −12.24 to −1.10 at validation gauges. Yet, parameters controlling the inflation and the localization have a large impact on the performance. Further investigations of these sensitivities showed that optimal inflation occurs when compensating exactly for discrepancies in the magnitude of errors while optimal localization matches the distance traveled during one assimilation window. These results may be applicable to more advanced data assimilation methods as well as for larger applications motivated by upcoming river-observing satellite missions, such as NASA’s Surface Water and Ocean Topography mission.

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JHM-D-19-0084.s1.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Charlotte M. Emery, charlotte.emery@jpl.nasa.gov

Abstract

The grand challenge of producing hydrometeorological estimates every time and everywhere has motivated the fusion of sparse observations with dense numerical models, with a particular interest on discharge in river modeling. Ensemble methods are largely preferred as they enable the estimation of error properties, but at the expense of computational load and generally with underestimations. These imperfect stochastic estimates motivate the use of correction methods, that is, error localization and inflation, although the physical justifications for their optimality are limited. The purpose of this study is to use one of the simplest forms of data assimilation when applied to river modeling and reveal the underlying mechanisms impacting its performance. Our framework based on assimilating daily averaged in situ discharge measurements to correct daily averaged runoff was tested over a 4-yr case study of two rivers in Texas. Results show that under optimal conditions of inflation and localization, discharge simulations are consistently improved such that the mean values of Nash–Sutcliffe efficiency are enhanced from −11.32 to 0.55 at observed gauges and from −12.24 to −1.10 at validation gauges. Yet, parameters controlling the inflation and the localization have a large impact on the performance. Further investigations of these sensitivities showed that optimal inflation occurs when compensating exactly for discrepancies in the magnitude of errors while optimal localization matches the distance traveled during one assimilation window. These results may be applicable to more advanced data assimilation methods as well as for larger applications motivated by upcoming river-observing satellite missions, such as NASA’s Surface Water and Ocean Topography mission.

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JHM-D-19-0084.s1.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Charlotte M. Emery, charlotte.emery@jpl.nasa.gov

1. Introduction

Hydrological models are essential tools to simulate the fluxes of water and associated storage changes within continental surfaces and hence to understand the terrestrial water cycle (Döll et al. 2016). The first hydrological models emerged during the second half of the nineteenth century as empirical rainfall–runoff models and were initially conceived to predict peak flows (Todini 2007). Nowadays, a noticeable portion of hydrologic models, known as river routing models (RRMs; Perumal and Price 2013), still focuses on river discharge. These RRMs are primarily concerned with spatiotemporal mechanisms for the horizontal propagation of water within river systems and leave the vertical exchange of water between the land and the atmosphere to land surface models (LSMs).

A diversity of RRMs exists in the published literature and the complexity of their driving equations varies from the complete Saint-Venant equation (Barré de Saint-Venant 1871) to simplifications such as the kinematic wave equation (Weinmann 1979) or the Muskingum method (McCarthy 1938; Cunge 1969). Models such as Total Runoff Integrating Pathways (TRIP; Oki and Sud 1998), PCRaster Global Water Balance model (PCR-GLOBWB; van Beek and Bierkens 2008), or Hydrological Modeling and Analysis Platform (HyMAP; Getirana et al. 2012) use the kinematic wave approximation while others such as the Global Water Availability Assessment (GWAVA; Meigh et al. 1999), Hillslope River Routing (HRR; Beighley et al. 2009), or the Routing Applications for Parallel Computation of Discharge model (RAPID; David et al. 2011b) are based on the Muskingum/Muskingum–Cunge methods. Alternatives like LISFLOOD-FP (Bates and De Roo 2000), CaMa-Flood (Yamazaki et al. 2011), or MGB-IPH (Paiva et al. 2013) currently employ more advanced yet simplified versions of the Saint-Venant equations (Bates et al. 2010).

Despite significant advancements in river modeling, RRMs are still plagued by unavoidable inherent uncertainties that originate from an incomplete knowledge of the physics and by the simplifying assumptions that are necessary to the solutions of model equations. Other sources of uncertainties include the approximations resulting from numerical discretization and numerical resolution and limited knowledge of model parameters and model inputs. The combination of these sources of uncertainty results in nonnegligible uncertainties in the outputs of river models.

Available observations along the Earth surface water networks are key allies for river models. While in situ observations have been declining (Vörösmarty et al. 2001) globally, a considerable increase in the availability of spaceborne observations (Alsdorf et al. 2007; McCabe et al. 2017) is helping to fill this gap, and these observations together provide valuable information on surface water extent and surface water elevation from which discharge can be estimated (e.g., Durand et al. 2016). While the relative accuracy of observations and models can be debated, the spatiotemporal coverage of observations remains sparse, and therefore motivates a growing interest in techniques that can coherently merge observations with models. Such techniques are generally known as data assimilation (DA) and consist in combining information from model simulations with observations while accounting for their respective uncertainties in order to improve model estimates (Liu and Gupta 2007).

DA methods for Earth science were first developed and used for atmospheric science (Daley 1991; Kalnay 2003) and oceanography (Ghil and Malanotte-Rizzoli 1991; Bertino et al. 2003). The use of DA for hydrology is relatively more recent but has been rising during the past two decades in part due to an increase in available remotely sensed hydrologic data (Liu et al. 2012). Various components of the terrestrial water cycle have benefited from DA studies including snow cover (Rodell and Houser 2004; Andreadis and Lettenmaier 2006; Zaitchik and Rodell 2009; DeChant and Moradkhani 2011; De Lannoy et al. 2012; Oaida et al. 2019), soil moisture (Pauwels et al. 2001; Brocca et al. 2010; Montzka et al. 2011), land surface temperature (Reichle et al. 2010; Campo et al. 2013), evapotranspiration and vegetation characteristics (Schuurmans et al. 2003; Fang et al. 2011), and terrestrial water storage (Zaitchik et al. 2008; Forman et al. 2012; Eicker et al. 2014; van Dijk et al. 2014; Kumar et al. 2016; Girotto et al. 2017). River modeling has also seen the application of DA methods leveraging measurements of surface water levels (Romanovicz et al. 2006; Matgen et al. 2010; Biancamaria et al. 2011; Pereira-Cardenal et al. 2011; Michailovsky et al. 2013; Pedinotti et al. 2014), river discharge (Vrugt et al. 2006; Clark et al. 2008; Moradkhani et al. 2012; Rakovec et al. 2012; Coustau et al. 2015; McMillan et al. 2013; Abaza et al. 2014; Rafieeinasab et al. 2014; Bauer-Gottwein et al. 2015; Li et al. 2015; Ercolani and Castelli 2017; Emery et al. 2018), or both discharge and water level (Paiva et al. 2013), and even in combination with soil moisture data (Aubert et al. 2003; López López et al. 2016). The objective of these DA methods for river modeling—regardless of the type of observation used—has largely remained to directly or indirectly correct estimates of river discharge.

A variety of DA approaches exist but the most common techniques used in the aforementioned river DA studies appear to be ensemble-based methods such as the ensemble Kalman filter (EnKF; Evensen 1994) or the particle filter (PF; Del Moral 1996). These ensemble-based methods are advantageous because they can efficiently deal with nonlinearities in hydrological systems while remaining relatively easy to implement regardless of model characteristics (Liu et al. 2012). Perhaps most favorably for these ensemble-based methods, all essential components of DA such as the error covariance matrices or the observation operator—together relating the modeled, corrected, and observed variables along with their respective uncertainties—are stochastically estimated from an ensemble of model simulations during the assimilation procedure. Such approaches differ from variational DA methods (Le Dimet and Talagrand 1986; Courtier et al. 1994) or even from the traditional Kalman filter (Kalman 1960) that require an explicit development of such components prior to performing data assimilation. As a result, the EnKF is rather dominant in the published literature (Vrugt et al. 2006; Pereira-Cardenal et al. 2011; Rakovec et al. 2012; Paiva et al. 2013; Abaza et al. 2014; Rafieeinasab et al. 2014; López López et al. 2016; Emery et al. 2018), along with some extensions of the EnKF such as the ensemble square-root filter (Clark et al. 2008), the recursive EnKF (McMillan et al. 2013), the ensemble Kalman smoother (Li et al. 2015), and the local ensemble Kalman smoother (Biancamaria et al. 2011).

Despite the broad benefits of ensemble-based DA methods, such approaches can be—by definition—computationally demanding, and can become cost prohibitive as the resolution and size of study domains increase (Liu et al. 2012). This limitation, while perhaps negligible in the earliest studies focusing on local scales (Vrugt et al. 2006; Romanovicz et al. 2006; Clark et al. 2008) becomes more acute with studies of the world’s largest rivers (Biancamaria et al. 2011; Pedinotti et al. 2014; Paiva et al. 2013; Emery et al. 2018). One potential mitigation strategy for limiting computational costs is the use of simplified versions of the forward model equations as part of the DA method, though it is rather rarely employed (e.g., Margvelashvili et al. 2016). In addition, the stochastic estimation of DA components has motivated the development of correction approaches designed to alleviate some of their imperfections. These corrections include modifications to the error covariance matrices such as localization (Greybush et al. 2011; Sakov and Bertino 2011) to spatially focus the impact of data assimilation, and inflation (Anderson and Anderson 1999; Anderson 2007) to avoid filter divergence (due to a collapsed ensemble spread) and underestimation of the error covariance (caused by insufficient model error specification and/or a limited ensemble size). The application of such correction methods in river DA requires data assimilation expertise and, notwithstanding their effectiveness, may benefit from further justifications on the physical processes defining them.

The purpose of this study is therefore to investigate these parameters, namely the inflation factor and the localization radius, specifically when used in the context of river modeling. As they are not known a priori, our objective is also to reveal the underlying processes determining their optimal value. We use the RAPID model (David et al. 2011b) because its linear equations allow for a classical Kalman filter approach that circumvent the need for ensembles during assimilation and large size problems associated with these methods. Published applications of RAPID range in domain size from 30 000 to 3 000 000 km2 and in spatial resolution from 2 to 5 km (David et al. 2011a,b, 2013a,b, 2015). The underlying code for RAPID has benefited from dedicated efforts toward decreased computational costs (David et al. 2013a, 2015). Previous studies have demonstrated a general capability of RAPID to reproduce observed discharge (David et al. 2011b,a, 2013b). Yet, because the traditional Muskingum method at the core of RAPID remains simple, the model performance is limited in regions that are subject to floods, backwater effects, or home to active anthropogenic storage of surface water, as these processes are not currently accounted for. The existing limitations of RAPID therefore further motivate the inclusion of a DA capability.

The paper is organized as follows. The core routing equations of RAPID are first summarized and followed by a description of a Kalman filter implementation that includes an efficient approximation of its routing procedure. We then present an application to the combined San Antonio and Guadalupe River basins in Texas (see Fig. 1) using state-of-the-art hydrographic and meteorological inputs and discuss our evaluation strategy. Our results follow, along with their implications for the characteristic spatiotemporal scales of the physical processes involved. Core DA components including error covariances and their inflation or localization are then discussed, along with their characteristic spatiotemporal scales.

Fig. 1.
Fig. 1.

(a) The Guadalupe River and San Antonio River basins in the United States. (b) The NHDPlus river network with location of the main stems for the Guadalupe and San Antonio Rivers, the 23 assimilation gauges, and the 13 validation gauges (both from USGS). The downstream-most gauges used in Fig. 4 and their closest upstream gauges are also displayed.

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

2. The RAPID model

a. The RAPID model

The RAPID model (David et al. 2011b) is a river routing model based on the traditional Muskingum method (McCarthy 1938). The Muskingum method relates the outflow of a single channel to its inflow:
Qout(t+Δt)=C1Qin(t+Δt)+C2Qin(t)+C3Qout(t),
where t is the time, Δt is the model time step, Qin is the channel inflow, and Qout is the channel outflow. C1, C2, and C3 are temporally constant parameters depending on flow wave propagation characteristics and are defined as a function of a flow wave propagation time k and a dimensionless weighting parameter x characterizing the relative influence of the inflow and the outflow—although this parameter has been shown to have limited impacts on simulations (Koussis 1978)—such that
C1=Δt2kxk(1x)+Δt2,C2=Δt2+kxk(1x)+Δt2,C3=k(1x)Δt2k(1x)+Δt2.
In the more general case of a river basin, a river network is used and composed of a set of individual channels, or reaches, that are connected to one another and “oriented” such that water flows downstream. Each individual reach also receives water inflow Qe from the exterior of the river network as a result of the accumulation of surface and subsurface runoff over the reach catchment. Such a river network approach is used for RAPID in which Eq. (1) is adapted into the following linear system (David et al. 2011b):
(IC1N)Q(t+Δt)=(C1+C2)Qe(t)+(C3+C2N)Q(t),
where Q and Qe are column vectors of size nr, the number of river reaches, and containing the river discharge and the input runoff at each reach, respectively. All matrices are of size nr × nr: I is the identity matrix, C1, C2, and C3 are diagonal parameter matrices, and N is the sparse network matrix representing the connectivity among the reaches of the river network.

The matrix N is built such that the matrix element at row i and column j is equal to 1 if the reach j flows into reach i, while all other elements are 0. In RAPID, any reach can receive water from several upstream reaches, but the outflow goes into only one unique downstream reach. The diagonal elements of the parameter matrices C1, C2, and C3 are the Muskingum parameters C1,j, C2,j, and C3,j for each reach j, respectively. These parameters are different for each reach but constant in time. The surface and subsurface runoff inputs Qe to RAPID are obtained from the outputs of an LSM. LSM outputs are generally provided over a regular mesh grid that has a coarser spatial resolution than the averaged catchment area of RAPID reaches, hence allowing for individual catchments to receive runoff contribution from a unique LSM grid cell (David et al. 2013b).

Notably, the core RAPID formulation given in Eq. (3) is a linear system. In addition, if the reaches are sorted from upstream to downstream in all vectors and matrices, Eq. (3) turns into a lower triangular system which facilitates its solving. Further, the sparsity of the river network matrix N limits the computer memory requirements of RAPID. The initial implementation of RAPID (David et al. 2011b) already allowed for efficient execution on both serial and parallel computing environments, and subsequent studies (David et al. 2013a, 2015) have enhanced the computational efficiency of the software.

However, while RAPID can efficiently be run over large domains, the relative simplicity of its governing equations results in coarse approximations of some hydrological processes. Additionally, imperfect model inputs and model parameters together also cause further limitations in the modeling system. The fusion of discharge observations with discharge simulations from RAPID through data assimilation therefore offers one potential way to alleviate existing challenges in the accurate estimation of discharge in surface water networks. The choice of the specific assimilation approach shall also be made cognizant of past efforts toward computational efficiency.

RAPID, as a forward model, was initially developed, tested, and validated (David et al. 2011b) over the Guadalupe and San Antonio River basins (Fig. 1), two basins in Texas of a total drainage area of 17 453 and 10 800 km2, respectively. The same study domain is used here to validate the proposed data assimilation method.

b. Data

The enhanced National Hydrography Dataset (NHDPlus; McKay et al. 2019) provides a description of the river network for the combined San Antonio and Guadalupe River basins that is composed of 5175 reaches (averaged length of 3 km) and their contributing catchments (average area of 5.11 km2). This NHDPlus river network was initially used for RAPID in David et al. (2011b) and is also used in this study.

The lateral inflow from the land into the river network is computed based on runoff estimates from version 4.0.5 of the VIC land surface model (Liang et al. 1994; Wood et al. 1997) as provided by phase 2 of the North American Land Data Assimilation System (NLDAS2; Xia et al. 2012a,b). The VIC runoff available at a 1/8° spatial resolution and at both an hourly and a monthly temporal resolution were retrieved here for a 4-yr study period ranging from 2010 to 2013. A 3-hourly lateral inflow is used as input to RAPID similarly to previous studies (David et al. 2011b,a, 2013a,b, 2015) and derived here from the temporal accumulation of hourly NLDAS2 VIC runoff that is spatially aggregated using the catchment centroid method of David et al. (2013b). Moreover, two additional LSM are included to the NLDAS2, namely Noah (Betts et al. 1997; Chen et al. 1997; Ek et al. 2003) and Mosaic (Koster and Suarez 1994), and provide runoff outputs at the same spatiotemporal resolution as the VIC outputs.

The set of Muskingum (k, x) parameters used herein was developed through calibration in a previous study David et al. (2011a) where it is denoted using α superscripts (kα, xα). Note that this set of parameters was calibrated using a temporal period and a runoff dataset that are both different from those used here. Our approach therefore guarantees that the evaluation of the proposed data assimilation method is performed with parameters that were not specifically tailored for the runoff that is itself the subject of our assimilation procedure.

Observed daily averaged discharge estimates were obtained from the U.S. Geological Survey (USGS) National Water Information System (NWIS). A total of 36 USGS gauges in the Guadalupe and San Antonio basins that have a complete daily data record during our 4-yr study period were retrieved as part of this study.

3. Development of a data assimilation capability for RAPID

While a broad range of advanced DA methods has been used in the aforementioned review of available literature, the classical Kalman filter (KF; Kalman 1960) appears to be best suited for RAPID. This choice avoids the costly computations required for more complex ensemble DA approaches, and is consistent with the linearity of RAPID that is required by the assumptions of the KF. Perhaps more importantly, the use of a KF here permits an investigation of some of the fundamental aspects of various DA methods applied to river modeling in an attempt to reveal the underlying physical processes impacting their performance. Note that development of a Kalman filtering approach for RAPID is also an improvement over the existing direct insertion method that was developed in David et al. (2011a).

a. General aspects of the Kalman filter

The Kalman filter is a sequential DA algorithm, that is, one in which a new correction is performed at each time a new observation is available. The assimilation cycle of the KF therefore corresponds to the time window ranging between two subsequent observations. Two primary assumptions are made in the development on the KF. First, it is presumed that the model being corrected has linear dynamics. Second, all errors (i.e., control variables, model, observations), represented as random variables, are considered to follow a Gaussian distribution with a zero mean (that is unbiased errors) hence ensuring that error statistics are fully determined by their associated error covariance matrices.

The KF assimilation cycle k is divided into two steps. The first “background” step temporally propagates the model from the last update to the time when a new observation is available. This mechanism, sometimes referred to as “forecast” step or “prediction” step, provides an a priori estimate of the modeled state at the new observation time from a direct execution of the model based on the background control variables xkb. The second “analysis” step, also known as the “update” step, corrects these current control variables by combining the modeled state with the observed state at the end of the assimilation window while accounting for their respective uncertainties.

Mathematically, given the numerical operator Hk that linearly transforms a control variable xk into an observed variable yk = Hkxk, the analysis step produces a corrected control variable xka based on the discrepancies between the simulated observables ykb=Hkxkb, and the observed state yko, along with the error covariance matrix Pkb of the control variables and the error covariance Rk of the observations. These two KF steps are contained in the following equation:
xka=xkb+PkbHkT(HkPkbHkT+Rk)1(ykoHkxkb).

Note that the most general definition of the KF also includes two equations allowing for the update of the control variable error covariance matrix at each analysis step. However, given our specific data assimilation setup (section 3), we opted here for a common approach in which the a priori error covariance matrix Pkb is kept constant for all assimilation cycles and hereafter denoted as Pb.

Figure 2 summarizes all the main features of our implementation of the classical Kalman filter to RAPID. Each component of the DA scheme is further described below.

Fig. 2.
Fig. 2.

The data assimilation approach used in this study over the assimilation cycle k includes 1) a background step using a priori runoff data where the associated variables are presented in blue, 2) an analysis step correcting the daily averaged runoff forcing through Kalman filtering using the observed variables displayed here in green, and 3) a rerun using the updated runoff data where the associated corrected variables are shown in red.

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

b. Observations yko and assimilation window

The observed variables used herein consist of daily-averaged discharge measurements from in situ gauges. Only a subset ny of all nr reaches in the study domain are observed and all gauges retained have daily observations available every day. The length of any assimilation cycle is therefore set to one day. All available observations during the daily assimilation cycle k are gathered in the observation vector yko of size ny.

c. Control variables xk

The variables involved in any execution of RAPID—see Eq. (3)—are the lateral inflow Qe into the river network, the Muskingum parameters (k, x), and the simulated discharge Q within the network. DA methods can be used to correct any one of these variables if selected as the control variable. The selection of the one control variable then drives the derivation of the specific KF equation to be used. We chose not to focus on model parameters here, in light of our previous work on automatic calibration of RAPID (David et al. 2011a,b, 2013b). As the data assimilation platform developments occurred concurrently to the runoff error propagation model published in David et al. (2019) and the Kalman filter requires an explicit control error model to be defined, the lateral inflow Qe was therefore selected as control variable for this study—a DA practice first proposed and experimented in Pan and Wood (2013) and later studied in Fisher et al. (2020) and Yang et al. (2019).

Subdaily lateral inflows are used in RAPID and a 3-hourly time step for these inputs has been common to all of our previous studies. However, the available daily-averaged discharge observations do not provide information at such fine temporal resolution, and any attempt to correct subdaily lateral inflows may also result in a cost-prohibitive implementation. It was therefore decided to use daily-averaged lateral inflow Qke¯ as the control variables gathered in the vector xk:
xk=Qke¯=18i=18Qk{i}e,
where 8 is the number of 3-hourly time steps in one day and Qk{i}e,i=1,,8, is the vector of input runoff at the 3 × (i − 1)th hour of the kth day. The control vector is therefore of size nr, the number of reaches in the river network.

d. Observation operator H and innovation dk

The choice of the observation variable yk and control variable xk over the assimilation window k together dictate the derivation of the observation operator H such that yk = Hxk. In this study, H has to transform the daily averaged lateral inflow over the entire river network into the daily averaged simulated discharge at locations where observations are available. Such a process can be divided in three consecutive tasks starting with the computation of the simulated discharge at every time step of the current day k from daily averaged lateral inflow, followed by the computation of the daily averaged discharge and completed by the selection of the ny discharge estimates at the observed locations of the nr computations.

The recursive application of Eq. (3) with a temporally constant daily-averaged lateral inflow Qke¯ leads to an analytical expression for the instantaneous simulated discharge for any one of the 96 routing time steps (of 15 min) in one day:
i=1:96,Qk{i}=p=0i1[(IC1N)1(C3+C2N)]p(IC1N)1(C1+C2)Qke¯+[(IC1N)1(C3+C2N)]iQk{0}.
Averaging Eq. (6) over the current day results in an expression for the daily averaged simulated discharge Qk¯:
Qk¯={p=09596p96[(IC1N)1(C3+C2N)]p(IC1N)1(C1+C2)}Qke¯+{196p=196[(IC1N)1(C3+C2N)]p}Qk{0}Qk¯=AeQke¯+A0Qk{0}.

The extraction of the daily-averaged simulated discharges at the observed locations is then performed by applying a selection operator S to Eq. (7). This operator S is a sparse matrix of size ny × nr in which the element at row i and column j is set to 1 if the ith observation in yko is located on the jth reach of the domain.

Note that Eq. (7) shows that the daily-averaged runoff Qke¯ and the discharge state at the start of the day Qk{0} are both necessary to completely define the simulated observables, and hence the observation operator H. Such an approach would imply the augmentation of the control vector to include the initial discharge into a dual runoff–initial condition problem:
xk=[Qke¯Qk{0}]andH=S×[AeA0].
While building the method, we assumed perfect initial conditions for discharge and uncorrelated runoff errors and initial condition errors. These assumptions together simplify the dual runoff–initial condition problem to Eq. (8) (where the k indices are omitted for clarity). Note that with the first assumption, we therefore consider that, among all potential sources of uncertainty in the simulated discharge, the uncertainty originating from the runoff uncertainty takes the largest part. We revisit the fairness of this assumption in section 4:
{Qe¯a=Qe¯b+PbAeTST(SAePbAeTST+R)1(yoSQ¯)Q0a=Q0b
Therefore, the assumption of exact initial conditions effectively leads to the sole correction of runoff in this study. We hence limit our implementation to the first equation in Eq. (8) and define the observation operator as
H=SAe,
while the innovation dk is computed from the discrepancies between yko and SQk¯:
dk=ykoSQk¯.

Any potential uncertainty introduced in the system with the assumption of perfect initial conditions can be associated with representativeness error. In DA, this type of error, directly related to the observation operator, represent the imperfect mapping from the control space to the observation space (Janjić and Cohn 2006; Janjić et al. 2017). However, the study of the representativeness error is out of the scope of this paper.

e. Observation error covariance matrix Rk

Observation errors gather measurements errors, systematically occurring when an instrument is used to make a measurement, and representativeness errors, originating from the flawed representation of the real observed system when using model and simplifying assumptions. As previously introduced, the representativeness errors are neglected in the present study, and therefore the observations errors reduce to the errors in the measured discharge.

Errors in observations of river discharge are commonly associated with errors in the shape of the relationship linking river discharge to water depth and traditionally estimated as a percentage of the observed discharge (Sorooshian and Dracup 1980; Clark et al. 2008; Paris et al. 2016). Such a methodology is applied herein: a fraction of 10%, as assumed for USGS gauges but also applied in Clark et al. (2008), is used to estimate the standard deviation of discharge error as a function of the observed discharge, and errors among gauges are assumed to be uncorrelated. The time-varying diagonal error observation matrix Rk used here is therefore
Rk=diag[(0.1×yko)]2.

f. Runoff error covariance matrix Pb

The runoff error covariance matrix Pb gathers the statistics of the runoff error δRnr at each reach and is commonly defined by
Pb=E[δδT],
where E[.] is the expectation operator. This feature of the Kalman filter is essential as it sets how uncertain the control variables are and how the uncertainties are related from one control variable to another.
Here, the runoff error is simply defined as the difference between the actual variable Qe and its “true” value Qe,t:
δ=QeQe,t.
An estimate of the true runoff Qe,t is obtained from the average of the ensemble of runoff data from all land surface models included in NLDAS2 (section 2b):
Qe,t=QVICe+QMOSAICe+QNOAHe3.
Note that the use of an ensemble average for a given variable from different models as a proxy for its true value is a relatively common approach (Famiglietti et al. 2011; Reager et al. 2016) enabling the compensation of uncertainties from each model and limiting the resulting bias. Such an approach is further justified because surface and subsurface runoff are not directly measurable hence ensemble-averaged runoff remains one of the only options—if not the only option—to provide a proxy of the true runoff value. This methodology was also validated in a recent study of runoff uncertainty (David et al. 2019).

The error in lateral inflow is estimated based on the comparison of the aforementioned VIC-based lateral inflow with the NLDAS2 ensemble average in Eq. (14). Such an estimate of the runoff error is computed at each time runoff data are available. This time series is then used to estimate Pb over the study period prior to the assimilation procedure. Note that a monthly time step is initially considered here for the runoff ensemble similarly to David et al. (2019) but a daily time step—specific to this study—is also used. While the size of this ensemble is acknowledged to be rather small and hence the probability of spurious correlations to be quite high, the use of localization should limit the impact of such imperfections. Additionally, it should be noted that both the truth and open loop runoff use outputs from the VIC model and therefore are not independent. Although this approach could lead to an underestimation of the runoff errors, the application of error inflation can be expected to overcome this limitation. Finally, one must emphasize that the VIC-based lateral runoff will likely be biased, which is a conflict with the hypothesis of the Kalman filter and although relatively common in KF applications.

Last, as previously stated in section 3a, we chose to keep Pb constant in time, hence leveraging our recent estimates of runoff uncertainty (David et al. 2019) and limiting the computational requirements of our methodology for faster execution. This aspect also ensues from the KF equations applied to our specific configuration. In its initial formulation, the KF assumes that the true state at time k, xkt (note here that x is temporarily associated to the model state), is propagated from previous time k − 1 following:
xkt=F[k1,k]xk1t+Gkuk+wk,
where F[k−1,k] is the model state-transition operator from time k − 1 to k, uk is the input operator, and wk is the process noise with covariance Wk. The KF propagation equations are then
xkb=F[k1,k]xk1a+Gkuk,Pkb=F[k1,k]Pk1aF[k1,k]T+Wk.
In addition to the propagation equation for both the state variables xkb and the associated error covariance matrix Pkb, there is also an analysis/update equation for both state variables xka and the associated error covariance matrix Pka. The error covariance matrix is therefore commonly reduced during the analysis step but it is then increased again during the following propagation step, hence maintaining the effectiveness of the filter.
In our particular setting where x corresponds to the RAPID lateral forcing provided at each time step by an external LSM, F[k−1,k] is equal to the null operator and Gk is the identity matrix giving
xkb=ukPkb=Wk.
The error covariance matrix at a given time is independent from the error covariance matrix at the previous time. Our retention of a constant (initial) estimate of Pb throughout the simulation therefore also allows for persistence in the effectiveness of our data assimilation approach.

g. Adjustable implementation features

The investigation of the physical processes impacting the performance of data assimilation for river modeling motivates the inclusion of two adjustable features respectively related to localization and to inflation of the runoff error covariance matrix. A third adjustable feature is also included in an effort to retain the computational efficiency of RAPID while allowing for controllable approximations on one of the key matrices involved in the computation of Eq. (9).

1) The localization radius R

Localization is traditionally used in ensemble-based data assimilation method to limit spurious correlations resulting from small-sized ensembles. While a classical (nonensemble) Kalman filter is used in this study, the estimation of Pb herein still indirectly results from an ensemble of LSM runoff data (see section 3f), and although this ensemble is computed prior to data assimilation it likely shares similar suboptimality with ensemble-based data assimilation algorithms.

A simple static B-localization of Pb (Greybush et al. 2011; Sakov and Bertino 2011) is therefore applied here to only retain error covariances within a radius R of each river reach and resulting in the nonzero elements of Pb being gathered around the matrix diagonal, see Figs. 3b and 3c, in which the matrix’s rows and columns are associated to reaches sorted from the most upstream to the most downstream reach. The radius used here corresponds to a number of river reaches upstream or downstream and has no units. Note that any potential use of different radii for upstream and downstream locations with regards to gauges used in assimilation cannot safely be implemented because of the intrinsic symmetry of covariance matrices. It is also worth acknowledging that runoff error covariances are expected to follow precipitation error covariances and hence be best described in Cartesian coordinates rather than along the river network. However, the localization approach used here allows investigating the implications of potentially high spurious correlations (see section 3f) from the perspective of the river model. It may be expected that the optimal value of localization radius would be influenced by the size of the river reaches and the speed of flow wave propagation.

Fig. 3.
Fig. 3.

Impact of the localization radius R: (a) river reaches within two different radii (R = 25 and R = 50) upstream and downstream of two gauges of interest, (b) nonzero pattern of the runoff error covariance matrix Pb with R = 50, and (c) as in (b), but for R = 25.

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

Note that, by enforcing the value of some element in the matrix Pb to 0, a structural constraint is applied to the error covariance matrix. The result of such a structural constraint may not necessarily be positive definite, which is a common limitation of our approach (Kang et al. 2011; Roh et al. 2015).

2) The inflation factor I

The artificial increase of the magnitude of Pb—also known as inflation—is an equally common practice (e.g., Anderson 2012) to alleviate potential limitations in its estimation. An inflation by a single factor I is applied here to each component of Pb and its value is further determined in section 3b. Should the magnitude of the daily error covariances stored in Pb be accurately estimated, the value I can be expected to approach unity.

3) The Muskingum operator threshold ε

A detailed inspection of Eq. (9) reveals the inclusion of the so-called Muskingum operator M which was the subject of detailed scrutiny in a past study (David et al. 2013a):
M=(IC1N)1.
Notably, a nonzero element at row i and column j (with ij) of M expresses the contribution of the jth reach to the ith reach located downstream. This contribution decreases with increasing distance downstream the river network, until it becomes numerically negligible. The operator M is therefore a dense lower-triangular matrix although many of its nonzero elements can be neglected. A unitless threshold parameter ε is hence used herein to limit the computation of M to only the elements that have a magnitude greater than the threshold. Such an approximation has a direct impact on the numerical storage needs (i.e., computer memory) and numerical operations (i.e., computer time) of the proposed implementation.

4. Results

a. Evaluation strategy

The evaluation of the proposed data assimilation methodology is here designed sequentially as a three-step strategy. The initial step evaluates an open-loop simulation—that is, a RAPID model execution without data assimilation—against observed discharge records in order to provide a basis for comparison, and also validates our runoff error estimates using the uncertainty propagation approach of David et al. (2019).

The second evaluation step focuses on sensitivity analysis for two fundamental aspects of data assimilation applied to river modeling, that is, error localization R and error inflation I, in an attempt to reveal the underlying physical processes impacting their performance. Note that each aspect is evaluated independently while keeping the other at a constant nominal value. Additionally, no approximation is made to the Muskingum operator (ε = 0) during this second step in order to fully conserve the physical properties of the system. The optimal values for error localization R and error inflation I are determined through the analysis of various discharge metrics and kept for subsequent experiments.

For the experiments using data assimilation, only 23 out of the 36 gauges were selected for assimilation into RAPID, while the 13 remaining gauges were kept for validation (Fig. 1). This selection is performed such that only one gauge is assimilated out of each two consecutive gauges along a same stem of the river network. Note that multiple gauges along a same river system are assimilated simultaneously. However, the update of the lateral runoff only modifies the amount of water entering the river system and the updated runoff is then propagated downstream through RAPID over the same assimilation period starting from the same initial condition as the forecast run (see Fig. 2), hence ensuring no upstream–downstream discontinuities in mass balance along segments between a pair of stations.

The third and final step of our evaluation investigates the impact of Muskingum operator simplification through the use of the threshold ε in order to estimate potential savings in computational storage requirements and expected reduction in execution time; and potential associated degradations in the performance of the data assimilation methodology.

b. Results before data assimilation

1) Evaluation of the open-loop discharge estimates

We start here with an evaluation of our open-loop discharge simulations, that is, the simulated discharge before using DA. To assess model performance, the model is run freely and serves as a reference for subsequent DA experiments. Given that this experimental setup uses model parameters from a different study (David et al. 2011b) and off-the-shelf lateral inflow without specific calibration for our study domain, limited quality is to be expected from this experiment.

Table 1 (row 1) shows the mean and the median Nash–Sutcliffe efficiency (NSE; Nash and Sutcliffe 1970) values independently for the 23 assimilation and the 13 validation gauges. NSE ranges from −∞ to 1; 1 indicates a perfect match and 0 indicates that the model is as accurate as the mean of the observations. These NSE values are obtained from the comparison of daily averaged observations with daily averaged open-loop simulations. Note again that assimilation and validation gauges are separated here solely for the purpose of subsequent comparisons given that no assimilation is performed in this section. The negative mean and median NSE values obtained here confirm that the open-loop run has very limited overall ability to reproduce the observed discharge throughout the domain. Figure 4 shows an example of observed and open-loop simulated hydrographs for the downstream-most stations in the two subbasins of our study domain: the Guadalupe River at Victoria, Texas (NSE = −9.956), and the San Antonio River at Goliad, Texas (NSE = −14.050), and highlights significant overestimation of discharge although some temporal variability is accurately captured. Mass conservation is enforced in the Muskingum method; the large positive bias that is observed in the open-loop simulation (Fig. 4) is therefore related to large positive runoff bias that provides excessive amounts of water to the river system. This limitation further supports the choice of runoff as the control variable for our DA implementation.

Table 1.

Mean and median values of Nash–Sutcliffe efficiency computed from daily averaged discharge statistics over the 4-yr simulation (2010–13) for experimental configurations with varying localization radius R, inflation factor I, and Muskingum operator threshold ε.

Table 1.
Fig. 4.
Fig. 4.

Daily hydrographs from the open-loop RAPID simulation and from in situ observations over the 4-yr study period for (a) the Guadalupe River at Victoria (NSE = −9.956), and (b) the San Antonio River at Goliad (NSE = −14.050). The geolocation of these gauges is shown in Fig. 1.

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

2) Evaluation of the runoff error estimates

The control variable error statistics are an essential part of any DA method (e.g., section 3). In this study, the control variable is the combined surface and subsurface runoff from outside of the river network, and Pb is the corresponding runoff error covariance matrix. It provides information on the magnitude of the individual runoff errors (the diagonal elements of the matrix) and how they covary (the off-diagonal elements). However, the absence of runoff observations makes it a challenge to validate estimates of runoff errors. Nevertheless, as discharge observations are available, we here use the known discharge errors to indirectly validate the corresponding runoff errors by using the uncertainty propagation method developed by using the uncertainty propagation method developed by David et al. (2019).

First, using monthly time series, our estimated runoff error—obtained from applying Eq. (14) in Eq. (13)—is mapped into its corresponding estimated discharge error using the error propagation model developed by David et al. (2019). Then, the measured discharge error is then obtained from the comparison of discharge observations with open-loop simulations. Three error metrics are computed and compared: the error mean (i.e., the bias), the error standard deviation [i.e., the standard error (STDE)], and the root-mean-square error (RMSE). To validate the runoff error model, the estimated discharge errors must match the measured discharge errors. Figure 5a demonstrates that the monthly estimated discharge errors from propagation of monthly estimated runoff errors conserves the spatial variability (coefficients of determination ρ2 > 0.95) of the monthly measured discharge errors. Yet, an underestimation of the magnitude of the errors is evidenced by linear trends of slope smaller than unity.

Fig. 5.
Fig. 5.

Validation of the runoff errors: (a) estimated monthly discharge errors from the propagation of estimated runoff errors vs measured monthly discharged errors, (b) measured daily discharge errors vs measured monthly discharge errors, and (c) as in (a), but for daily errors.

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

Given that the daily assimilation window used in this study requires daily runoff errors, we also extended the aforementioned methodology to runoff statistics estimated from daily time series. Before that, Fig. 5b shows that measured error STDE and RMSE calculated from daily time series are greater than the measured error STDE and RMSE calculated from monthly time series, as was the case in David et al. (2019). This is expected because monthly averaging suppresses daily variability, while the bias clearly remains constant at both time scales. More importantly, Fig. 5b confirms that measured daily errors are highly correlated with measured monthly errors (with coefficients of determination ρ2 greater than 0.99).

These high correlations justify the application of the same error propagation methodology to daily errors as shown in Fig. 5c. Here again, the comparison of measured errors and estimated errors leads to high coefficients of determination (ρ2 > 0.95). These results allow the indirect validation of the daily runoff errors spatial variability. Still, the slopes of the inferred linear relationships indicate that the discharge errors computed from the propagation of runoff errors underestimate the measured error. However, the information on these slopes can be used to scale the runoff errors so that the corresponding estimated discharge error magnitudes match the measured discharge error magnitude: here a factor of (1/0.3876)2 has to be used for the variances and covariances (as shown in Fig. S1 in the online supplemental material). Such scaling corresponds would actually correspond to error inflation in data assimilation.

From now on, the runoff error variances and covariances in Pb will therefore be estimated from the daily runoff time series, and one may expect that an inflation factor I2 = (1/0.3876)2 = (2.58)2 applied to each element of Pb would be most appropriate for data assimilation experiments. Note that the error statistics show a nonzero bias, which is a common conflict with the basic assumptions of the Kalman filter.

c. Results with data assimilation

1) Impact of the localization radius R

The longest path from the most upstream to the most downstream reach in the NHDPlus description of the San Antonio and Guadalupe River network traverses 286 river reaches. This longest path provides an upper bound of R = 286 for the largest possible localization radius in our study. We therefore test several values of R from 286 (for which Pb remains a nearly full matrix) to 0 (Pb is simplified to retain only its diagonal elements). Independent DA experiments are run for each R value using the 23 selected assimilation gauges over the 4-yr study period (2010–13). For illustration, Fig. 3 shows examples of the geographical interpretation of R on the river network, and the associated mathematical implication for the runoff error covariance matrix Pb using the values R = 25 and R = 50. Note that all experiments varying R use an inflation factor I = 2.58 and no approximation of the Muskingum operator (ε = 0).

Table 1 (rows 2–9) shows the mean and median daily discharge NSE values after assimilation for various values of R and indicates that the assimilation procedure diverges for R ≥ 30. In contrast, all assimilation experiments with a radius ranging from R = 0 to R = 20 are consistently improved compared to the open-loop simulation, and the improvement is evidenced at both assimilation and validation gauges. When compared to R = 20, the specific case of R = 25 shows a slight degradation over the assimilation gauges and a larger degradation over the validation gauges. More notably, R = 25 leads to a lower mean NSE over the validation gauges than the open-loop simulation, hence suggesting that assimilation gauges are overfitted at the expense of validation gauges. Figs. 6a and 6b show examples of hydrographs obtained for two values of R and illustrate that increasing the value of the radius nudges data assimilation results closer to the observations. These benefits are expected because larger values of R lead to additional information content for enlarging Kalman filter corrections around available assimilation gauges. However, the results shown in Table 1 indicate that degradations occur beyond a radius on the order of 20 reaches. Yet, the physical meaning of this value remains to be determined.

Fig. 6.
Fig. 6.

Discharge hydrographs at (a),(c) Victoria and (b),(d) Goliad over the first six months (January 2010–June 2010) comparing observations, open-loop simulations, and data assimilation for (top) varying localization radius R = 0 and R = 20 while inflation is kept constant at I = 2.58 and (bottom) varying inflation factor I = 1.00 and I = 2.58 while localization is kept constant at R = 20.

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

Two physical distances are particularly important in this investigation: the radius of the distance traveled by flow waves during the 1-day data assimilation window, represented hereafter by the variable Rp, and the radius of the distance separating consecutive data assimilation gauges, represented hereafter by the variable Ra. To better grasp all of these variables, Fig. S2 illustrates these two topological distances Rp and Ra in the context of the localization radius R for this study.

The Muskingum k parameters used in this study were determined from a temporally and spatially constant flow wave celerity of c = 2.12 m s−1 in David et al. (2011b). Given the 3.0 km average size of the NHDPlus river reaches used here, the distance over which the flow wave propagates within a 1-day assimilation window can be expressed in the form of a radius Rp corresponding to a number of reaches:
Rp=86400s×2.12m s12×3000m31.
In our experimental setup, the average topological distance separating two consecutive assimilation gauges is 39 reaches, meaning that the radius Ra allowed before corrections at two consecutive assimilation gauges overlap is
Ra=39220.

Figure 7 illustrates the spatial distribution of discharge simulation improvements in NSE compared to the open-loop execution and confirms that optimal data assimilation is obtained for a localization radius of 20. Interestingly, Fig. 7 also highlights a subbasin-specific degradation of data assimilation in that simulations within the Guadalupe River basin (located at the north of the domain) consistently break down at R = 30 whereas such a localization radius has much more limited impact on the San Antonio River basin. Further analysis of river reach lengths in the Guadalupe and San Antonio basins individually reveals that their respective radius of propagation are Rp = 27 and Rp = 32—because of longer reach lengths in the Guadalupe—hence providing preliminary evidence that degradation occurs when the radius of localization exceeds the radius of propagation.

Fig. 7.
Fig. 7.

Overall assimilation results when assimilating 23 gauge with an inflation of I = 2.58 and various localization: (a) R = 30, (b) R = 20, (c) R = 10, and (d) R = 0. The maps show, for assimilation gauges (circles) and validation gauges (squares), whether the assimilation improved the simulated discharge (green) or degraded the simulated discharge (red). For a given R (equivalently, for a given map), a marker (circle or square) highlighted in yellow indicates that the best assimilation performance for this gauge was obtained for this value of R (among all maps, each gauge is highlighted only once).

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

To further investigate the physical drivers of the localization radius, a similar sensitivity analysis was performed over an additional set of assimilation gauges. The new set encompasses all 36 gauges and is characterized by a smaller radius Ra = 12. Despite the shorter distance separating the gauges used in assimilation, the same spatial patterns of improvement are observed (Fig. S3). Specifically, the assimilation methodology improves with increasing localization radius until it starts degrading when the localization radius R exceeds the propagation radius Rp, which further confirms the initial observation above even for this smaller radius of assimilation.

Therefore, our analysis shows that the localization radius should be chosen such that the longest distance for which error covariances are accounted for between connected river reaches does not exceed the traveled distance during one assimilation window. This research also suggests that the localization radius could be regionalized as a function of varying reach length and flow wave celerity although such is beyond the scope of the current study. All subsequent experiments will use a fixed localization radius of R = 20 and focus on the reference case which assimilates the subset of 23 gauges.

2) Impact of the inflation factor I

The runoff error validation effort [section 4b(2)] influenced the initial value of the inflation factor I = 2.58 used in the aforementioned evaluation of the localization radius R. Here we instead set a constant value of R = 20 and investigate the effects of multiple values of I ranging from I = 1.00 (no inflation) to I = 5 where the inflation is about twice what was suggested by our error validation. The same 23 assimilation gauges (Fig. 1) are used over our 4-yr study period (2010–13) and no approximation is made on the Muskingum operator (ε = 0).

Table 1 (rows 10–13) shows the mean and median daily discharge NSE values for these experiments, and suggests that increasing the inflation consistently improves discharge estimates for assimilation gauges up to I = 5, with no indication of degradation for any potentially higher inflation values. This behavior is expected because the Kalman gain weighs the relative magnitude of runoff errors and discharge observation errors so that increasing trust is placed in observed discharge when runoff errors grow from inflation. Figures 6c and 6d show example hydrographs obtained with various levels of inflation at the same two downstream validation stations used in section 4c(1) and illustrate the benefits of inflation. The inspection of mean and median NSE values for the other gauges instead—that is, the validation gauges—in Table 1 shows that best performance is obtained on or around I = 2.58. This is also expected from our runoff error validation. Overall, our analysis confirms that inflation is beneficial when exactly making up for underestimated runoff errors, but that excessive inflation degrades simulations away from assimilation sites. Inflation must therefore be used along with an appropriate validation of errors when at all possible.

3) Assimilation results with optimal localization inflation and limitations

Analysis of the sensitivity of our data assimilation methodology to the adjustable localization and inflation parameters therefore suggest that a localization radius of R = 20 and an inflation factor of I = 2.58 are optimal for our case study.

Figure 7 shows further illustration of this specific optimal case. With the sole exception of one validation gauge, our proposed data assimilation approach consistently improves daily NSE values for all assimilation and validation gauges (Fig. 7). This confirms that our methodology is able to provide accurate results even for reaches that are not hosts to a gauge, that is, at unobserved locations. The hydrographs in Fig. 8 show the daily behavior of discharge simulations at both validation gauges (Figs. 8a,c) and assimilation gauges (Figs. 8b,d). In all cases, the assimilated discharge is visually closer to the observations than the open-loop discharge as was already evidenced by improved NSE values (Table 1). Note that we evaluated the relative performance of our DA methodology over a high flow period (the year 2010) and a regular flow period (2011)—periods chosen subjectively from the hydrographs in Fig. 4—and found similar performance (see Table S1).

Fig. 8.
Fig. 8.

Overall assimilation results for optimal localization (R = 20) and inflation (I = 2.58). Hydrographs are shown for the first two years (2010–11) for clarity at four locations: (a) Cuero and (c) Fall City, respectively upstream of the assimilation gauges of (b) Victoria and (d) Goliad. The location of all four gauges is identified on Fig. 1.

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

One must note, however, that the assimilated discharge simulations show some oscillations, especially at validation gauges (Fig. S8). A close examination of these oscillations reveals an apparent 2-day period around the gauge observations that appears to be a series of successive over and undercompensation of the corrections. These oscillations are the source of degraded NSE noted for one validation gauge and explain the relatively lower NSE values over validation gauges compared to assimilation gauges (Table 1).

The oscillations are likely a source for the instabilities noted with increasing localization radius [section 4c(1)] or inflation [section 4c(2)] and may have triggered the associated divergence of simulations. Multiple other factors could have caused the instabilities including the relatively simple routing equations (section 2), our Kalman filter simplifications of steady error covariances (section 3f), the presence of runoff bias despite not being accounted for by the Kalman filter [section 4b(2)] and the several assumptions made when designing the platform, namely, the perfect initial condition (section 3d), the omission of the representativeness error (section 3d), the assumed absence of observational error covariance (section 3e), the flawed runoff estimates (section 3f) and the static relatively simple truncation of the runoff error covariance matrix [section 3g(1)].

The strength of the assumption concerning perfect initial conditions is expected to vary based on the relative position of a given river reach within the river network. As expected, and as further illustrated in Fig. S9, the lateral runoff contribution to the total flaw is relatively small for downstream reaches compared to upstream reaches. However, further investigation of this assumption would involve redefining the error model used in the data assimilation methodology, which is beyond the scope of the present study, although it could be the subject of future developments.

Other options to minimize the oscillations could be to use more regionalized or adaptive localization and/or inflation along with a more refined definition of the control error (by using a larger ensemble to surrogate the true runoff) and the observation error (by including representativeness error) although such approaches are beyond the scope of this paper. One could also adopt a smoothing rather than a filtering approach for the assimilation. Smoothers (e.g., Pan and Wood 2013) are designed to tailor upstream and downstream corrections over longer time windows than filters, though at increased size of the assimilation problem and associated computational costs. Despite these impediments, our proposed data assimilation methodology provides significant improvements in discharge estimates that are evidenced in Table 1 and Figs. 7 and 8.

d. Retaining the computational efficiency of RAPID

The computational implications of the proposed methodology are of importance for RAPID given past efforts ensuring software efficiency (David et al. 2013a, 2015) and ever-increasing domain sizes. Note here that, compared to simulations without data assimilation, our Kalman filtering approach that corrects the inputs inherently imposes a doubling of the computational burden because of the need for both a “background” and an “analysis” step that are each equally as expensive as the open-loop execution. This unavoidable doubling of simulation time was verified experimentally in this study (not shown). However, another aspect of computational efficiency that can benefit from analysis here is the duration of initialization and associated memory requirements.

One critical component of the model setup in this study is the computation and storage of the Muskingum operator M, or its potential approximations through the use of the threshold ε. The dense lower-triangular structure of M implies numerous nonzero elements although the vast majority of these elements are so small that they can be neglected David et al. (2013a).

Figures 9a–c show the decreasing number of nonzero elements in the matrix M when ε increases and Fig. 9d highlights that even the smallest thresholds can lead to tenfold savings in storage of M. The removal of these negligible nonzero elements has direct implications on the number of computer operations needed in tasks such as matrix–matrix multiplications to compute the observational operator H [Eq. (9)] which in turn leads to fivefold saving in setup time (Fig. 9e). Table 1 (rows 14–19) confirms that the approximation resulting from the use of a threshold has minimal impact on the quality of simulations, particularly for ε < 10−3. While the temporal savings are admittedly minimal here—that is, mere seconds—for the small size of this study domain, it is bound to be critical for future larger simulations, particularly when simulations are memory limited.

Fig. 9.
Fig. 9.

Influence of threshold ε on the fill pattern of the Muskingum operator M. (a) Full Muskingum operator. (b) Muskingum operator with ε = 10−9, (c) Muskingum operator with ε = 10−3, and (d) percent fill for various threshold values. The impact of threshold model setup time is also shown in (d).

Citation: Journal of Hydrometeorology 21, 3; 10.1175/JHM-D-19-0084.1

5. Summary and conclusions

While the relative uncertainties of discharge observations from gauges and discharge simulations from river routing models can be debated, the spatiotemporal sparseness of observations has naturally driven their fusion with models in an effort to fill in the observational blanks and hence aid various hydrometeorological and water management endeavors. Our review of available literature suggests that such fusion has to date primarily been accomplished through the use of advanced ensemble-based data assimilation methods. These ensemble methods are beneficial because they readily estimate key components of data assimilation, namely the control error covariance and cross-covariance matrices, but they are also computationally demanding. All assimilation approaches have used corrections designed to alleviate some of their imperfections. These common corrections consist of error variance/covariance localization and inflation which, notwithstanding their efficiency, can benefit from further physically based justifications.

This study therefore evaluates the use of these corrections in the simplest relevant setting—the classical Kalman filter—in an attempt to reveal some of the underlying mechanisms controlling their optimal value. We use the RAPID model and hence take advantage of its linear routing equation. Our application assimilates daily averaged in situ discharge measurements to correct daily-averaged runoff inputs and our methodology is evaluated in a 4-yr (2010–13) case study of the San Antonio and Guadalupe River basins in Texas. This study also constitutes the initial developments of a Kalman filter capability for the RAPID model for which the retention of its computational efficiency is a self-imposed criteria.

We find that inflation is indeed justified when it compensates for potential discrepancies in the magnitude of the control variable errors, as expected. However, we also show that excessive inflation—while apparently improving simulations at assimilation sites—actually degrades simulations away from observations. Inflation is therefore best applied in conjunction with a detailed validation of errors prior to the assimilation as done in this study. Additionally, our investigation of localization suggests that instabilities (in the form of 2-day-period oscillations) occur when nonzero control error covariances exist between far away reaches. The threshold distance corresponds to approximately 120 km [(2 × Ra × L) = 2 × 30 × 3 km], that is, the distance traveled by the flow wave during the 1-day assimilation window. Moreover, the experiments indicate that a regionalized localization radius may be beneficial to account for subcatchment-scale variability.

The instabilities evidenced in our results may originate from a variety of simplifying assumptions used in this study including the routing equations, steady error covariances, imperfect initial conditions, absence of observational error covariance, representativeness error, flawed runoff error estimates, and presence of bias. Nonetheless, these simplifying assumptions are commonly used and our results may hence be broadly applicable to more advanced data assimilation techniques for river modeling including ensemble methods. Despite these limitations which primarily appear away from assimilation gauges, our data assimilation algorithm is able to consistently improve the discharge simulations at both observed and unobserved locations. Additionally, the limitations may be a direct result of filtering and could potentially be avoided by smoothing instead of filtering, or by keeping a filtering method but using a dual state-forcing estimation paradigm. Yet, the high computational demands of smoothers and augmented control vectors must also be weighed in such decision.

Finally, and in light of past efforts focusing on software efficiency with RAPID, this study evaluates the use of a threshold that limits storage and computation requirements for one of the key matrices used in the analysis step of the Kalman filter. We show that minimal thresholds can lead to notable temporal savings during model setup while having very limited impact on the quality of simulations.

Despite the relatively small size of the river basins and short study period used in this paper, our methodology is expected to be applicable to larger geographical domains and longer temporal coverage. However, the current daily assimilation window used in our study is best suited for daily observations and its application to temporally sparser observations may results in issues of persistence. Still, while this study makes use of in situ data from USGS, it is an initial step toward assimilation of satellite observations (despite their different spatiotemporal coverage). Specifically, the much anticipated global discharge estimates from upcoming Earth orbiting missions such as NASA’s Surface Water and Ocean Topography (SWOT) mission will likely motivate a number of new investigations that could reuse the methodology presented in this paper.

Acknowledgments

Thank you to the editor, to Dr. Claire Michailovsky, and to two anonymous reviewers whose comments help improve our manuscript. C. M. Emery, C. H. David, K. M. Andreadis, M. J. Turmon, J. T. Reager, J. M. Hobbs, and J. S. Famiglietti were supported by the Jet Propulsion Laboratory, California Institute of Technology, under a contract with NASA; including grants from the SWOT Science Team and the Terrestrial Hydrology Program. We follow a community effort (David et al. 2016; Gil et al. 2016) for sharing software, data, and methods. The Reproducible Routing Rituals (RRR, https://github.com/c-h-david/rrr/tree/20181003) and the Routing Application for Parallel computatIon of Discharge (RAPID, https://github.com/c-h-david/rapid/tree/20180921) are freely available under a Berkeley Software Distribution 3-clause license. The data are shared (http://rapid-hub.org) under a Creative Commons Attribution 4.0 License. The steps linking software and data to produce the results are included with the software. All rights reserved.

REFERENCES

  • Abaza, M., F. Anctil, V. Fortin, and R. Turcotte, 2014: Sequential streamflow assimilation for short-term hydrological ensemble forecasting. J. Hydrol., 519, 26922706, https://doi.org/10.1016/j.jhydrol.2014.08.038.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alsdorf, D. E., E. Rodriguez, and P. Lettenmaier, 2007: Measuring surface water from space. Rev. Geophys., 45, RG2002, https://doi.org/10.1029/2006RG000197.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., 2007: An adaptive covariance inflation error correction algorithm for ensemble filters. Tellus, 59A, 210224, https://doi.org/10.1111/j.1600-0870.2006.00216.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., 2012: Localization and sampling error correction in ensemble Kalman filter data assimilation. Mon. Wea. Rev., 140, 23592371, https://doi.org/10.1175/MWR-D-11-00013.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 27412758, https://doi.org/10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Andreadis, K. M., and D. P. Lettenmaier, 2006: Assimilating remotely-sensed snow observations into a macroscale hydrology model. Adv. Water Resour., 29, 872886, https://doi.org/10.1016/j.advwatres.2005.08.004.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Aubert, D., C. Loumagne, and L. Oudin, 2003: Sequential assimilation of soil moisture and streamflow data in a conceptual rainfall-runoff model. J. Hydrol., 280, 145161, https://doi.org/10.1016/S0022-1694(03)00229-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Barré de Saint-Venant, A., 1871: Théorie du mouvement non permanent des eaux, avec application aux crues de rivières et à l’introduction des marées dans leur lit (in French). C. R. Acad. Sci., 73, 237240.

    • Search Google Scholar
    • Export Citation
  • Bates, P. D., and A. P. J. De Roo, 2000: A simple raster-based model for flood inundation simulation. J. Hydrol., 236, 5477, https://doi.org/10.1016/S0022-1694(00)00278-X.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bates, P. D., M. S. Horritt, and T. J. Fewtrell, 2010: A simple inertial formulation of the shallow water equations for efficient two-dimensional flood inundation modelling. J. Hydrol., 387, 3345, https://doi.org/10.1016/j.jhydrol.2010.03.027.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bauer-Gottwein, P., I. H. Jensen, R. Guzinski, G. K. T. Bredtoft, S. Hansen, and C. I. Michailovsky, 2015: Operational river discharge forecasting in poorly gauged basins: The Kavango River basin case study. Hydrol. Earth Syst. Sci., 19, 14691485, https://doi.org/10.5194/hess-19-1469-2015.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Beighley, R. E., K. G. Eggert, T. Dunne, Y. He, V. Gummadi, and K. L. Verdin, 2009: Simulating hydrologic and hydraulic processes throughout the Amazon River basin. Hydrol. Processes, 23, 12211235, https://doi.org/10.1002/hyp.7252.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bertino, L., G. Evensen, and H. Vackernagel, 2003: Sequential data assimilation techniques in oceanography. Int. Stat. Rev., 71, 223241, https://doi.org/10.1111/j.1751-5823.2003.tb00194.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Betts, A. K., F. Chen, K. E. Mitchell, and Z. I. Janjić, 1997: Assessment of the land surface and boundary layer models in two operational versions of the NCEP Eta Model using FIFE data. Mon. Wea. Rev., 125, 28962916, https://doi.org/10.1175/1520-0493(1997)125<2896:AOTLSA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Biancamaria, S., and Coauthors, 2011: Assimilation of virtual wide swath altimetry to improve Arctic river modeling. Remote Sens. Environ., 115, 373381, https://doi.org/10.1016/j.rse.2010.09.008.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brocca, L., F. Melone, T. Moramarco, W. Wagner, V. Naeimi, Z. Bartalis, and S. Hasenauer, 2010: Improving runoff prediction through the assimilation of the ASCAT soil moisture product. Hydrol. Earth Syst. Sci., 14, 18811893, https://doi.org/10.5194/hess-14-1881-2010.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Campo, L., F. Castelli, D. Entekhabi, and F. Caparrini, 2013: Analysis of a two-year meteorological dataset produced on Italian territory with a coupling procedure between a limited area atmospheric model and a sequential MSG-SEVIRI LST assimilation scheme. Int. J. Remote Sens., 34, 35613586, https://doi.org/10.1080/01431161.2012.716535.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chen, F., Z. Janjić, and K. Mitchell, 1997: Impact of atmospheric surface-layer parameterizations in the new land-surface scheme of the NCEP Mesoscale Eta Model. Bound.-Layer Meteor., 85, 391421, https://doi.org/10.1023/A:1000531001463.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Clark, M. P., D. E. Rupp, R. A. Woods, X. Zheng, R. P. Ibbitt, A. G. Slater, J. Schmidt, and M. J. Uddstrom, 2008: Hydrological data assimilation with the ensemble Kalman filter: Use of streamflow observations to update states in a distributed hydrological model. Adv. Water Resour., 31, 13091324, https://doi.org/10.1016/j.advwatres.2008.06.005.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Courtier, P., J.-N. Thepaut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-var, using an incremental approach. Quart. J. Roy. Meteor. Soc., 120, 13671387, https://doi.org/10.1002/QJ.49712051912.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Coustau, M., F. Rousset-Regimbeau, G. Thirel, F. Habets, B. Janet, E. Martin, C. de Saint-Aubin, and J.-M. Soubeyroux, 2015: Impact of improved meteorological forcing, profile of soil hydraulic conductivity and data assimilation on an operational hydrological ensemble forecast system over France. J. Hydrol., 525, 781792, https://doi.org/10.1016/j.jhydrol.2015.04.022.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cunge, J. A., 1969: On the subject of a flood propagation computation method (Muskingum method). J. Hydraul. Res., 7, 205230, https://doi.org/10.1080/00221686909500264.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 471 pp.

  • David, C. H., F. Habets, D. R. Maidment, and Z.-L. Yang, 2011a: Rapid applied to the Sim-France model. Hydrol. Processes, 25, 34123425, https://doi.org/10.1002/hyp.8070.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • David, C. H., D. R. Maidment, G.-Y. Niu, Z.-L. Yang, F. Habets, and V. Eijkhout, 2011b: River network routing on the NHDPlus dataset. J. Hydrometeor., 12, 913934, https://doi.org/10.1175/2011JHM1345.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • David, C. H., Z.-L. Yang, and J. S. Famiglietti, 2013a: Quantification of the upstream-to-downstream influence in the Muskingum method and implications for speedup in parallel computations of river flow. Water Resour. Res., 49, 27832800, https://doi.org/10.1002/wrcr.20250.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • David, C. H., Z.-L. Yang, and S. Hong, 2013b: Regional-scale river flow modeling using off-the-shelf runoff products, thousands of mapped rivers and hundreds os tream flow gauges. Environ. Modell. Software, 42, 116132, https://doi.org/10.1016/j.envsoft.2012.12.011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • David, C. H., J. S. Famiglietti, Z.-L. Yang, and V. Eijkhout, 2015: Enhanced fixed-size parallel speedup with the Muskingum method using a trans-boundary approach and a large subbasins approximation. Water Resour. Res., 51, 75477571, https://doi.org/10.1002/2014WR016650.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • David, C. H., Y. Gil, C. J. Duffy, S. D. Peckham, and S. K. Venayagamoorthy, 2016: An introduction to the special issue on geoscience papers of the future. Earth Space Sci., 3, 441444, https://doi.org/10.1002/2016EA000201.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • David, C. H., J. M. Hobbs, M. J. Turmon, C. M. Emery, J. T. Reager, and J. S. Famiglietti, 2019: Analytical propagation of runoff uncertainty into discharge uncertainty through a large river network. Geophys. Res. Lett., 46, 81028113, https://doi.org/10.1029/2019GL083342.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DeChant, C. M., and H. Moradkhani, 2011: Radiance data assimilation for operational snow and streamflow forecasting. Adv. Water Resour., 34, 351364, https://doi.org/10.1016/j.advwatres.2010.12.009.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • De Lannoy, G. J. M., R. H. Reichle, K. R. Arsenault, P. R. Houser, S. Kumar, N. E. C. Verhoest, and V. R. N. Pauwels, 2012: Multiscale assimilation of advanced microwave scanning radiometer-EOS snow water equivalent and moderate resolution imaging spectroradiometer snow cover fraction observations in northern Colorado. Water Resour. Res., 48, W01522, https://doi.org/10.1029/2011WR010588.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Del Moral, P., 1996: Non linear filtering: Interacting particle solution. Markov Processes Related Fields, 2, 555580.

  • Döll, P., H. Douville, A. Güntner, H. M. Schmied, and Y. Wada, 2016: Modelling freshwater resources at the global scale: Challenges and prospects. Surv. Geophys., 37, 195221, https://doi.org/10.1007/s10712-015-9343-1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Durand, M., and Coauthors, 2016: An intercomparison of remote sensing river discharge estimation algorithms from measurements of river height, width, and slope. Water Resour. Res., 52, 45274549, https://doi.org/10.1002/2015WR018434.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Eicker, A., M. Schumacher, J. Kusche, P. Döll, and H. M. Schmied, 2014: Calibration/data assimilation approach for integrating grace data into the Watergap Global Hydrology Model (WGHM) using an ensemble Kalman filter: First results. Surv. Geophys., 35, 12851309, https://doi.org/10.1007/S10712-014-9309-8.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ek, M. B., K. E. Mitchell, Y. Lin, E. Rogers, P. Grunmann, V. Koren, G. Gayno, and J. D. Tarpley, 2003: Implementation of Noah land surface model advances in the national centers for environmental prediction operational mesoscale Eta model. J. Geophys. Res., 108, 8851, https://doi.org/10.1029/2002JD003296.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Emery, C. M., A. Paris, S. Biancamaria, A. Boone, S. Calmant, P.-A. Garambois, and J. S. D. Silva, 2018: Large scale hydrological model river storage and discharge correction using satellite altimetry-based discharge product. Hydrol. Earth Syst. Sci., 22, 21352162, https://doi.org/10.5194/hess-22-2135-2018.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ercolani, G., and F. Castelli, 2017: Variational assimilation of streamflow data in distributed flood forecasting. Water Resour. Res., 53, 158183, https://doi.org/10.1002/2016WR019208.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, 10 14310 162, https://doi.org/10.1029/94JC00572.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Famiglietti, J. S., and Coauthors, 2011: Satellites measures recent rates of groundwater depletion in California’s Central Valley. Geophys. Res. Lett., 38, L03403, https://doi.org/10.1029/2010GL046442.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fang, H., S. Liang, and G. Hoogenboom, 2011: Integration of MODIS LAI and vegetation index products with the CSM-CERES-Maize model for corn yield estimation. Int. J. Remote Sens., 32, 10391065, https://doi.org/10.1080/01431160903505310.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fisher, C. K., M. Pan, and E. F. Wood, 2020: Spatiotemporal assimilation– interpolation of discharge records through inverse streamflow routing. Hydrol. Earth Syst. Sci., 24, 293305, https://doi.org/10.5194/HESS-24-293-2020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Forman, B., R. Reichle, and M. Rodell, 2012: Assimilation of terrestrial water storage from grace in a snow-dominated basin. Water Resour. Res., 48, W01507, https://doi.org/10.1029/2011WR011239.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Getirana, A. C. V., A. Boone, D. Yamazaki, B. Decharme, F. Papa, and N. Mognard, 2012: The Hydrological Modeling and Analysis Platform (HyMAP): Evalution over the Amazon basin. J. Hydrometeor., 13, 16411665, https://doi.org/10.1175/JHM-D-12-021.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gil, Y., and Coauthors, 2016: Toward the geoscience paper of the future: Best practices for documenting and sharing research from data to software to provenance. Earth Space Sci., 3, 388415, https://doi.org/10.1002/2015EA000136.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Girotto, M.,