## 1. Introduction

Understanding and evaluating the distribution and cycling of water in different forms in the land–atmosphere system is a central question in many hydrological studies (Roads et al. 2003). More specifically, estimates of water cycle components include, for the atmosphere budget, the atmospheric moisture divergence, storage, precipitation, and evapotranspiration, and for the terrestrial water budget, the soil moisture storage, surface/subsurface runoff, precipitation, and evapotranspiration. The terrestrial and atmospheric water budgets are linked via the common terms of the precipitation and evapotranspiration fluxes. Therefore, better understanding of the terrestrial water budget would improve our knowledge of the global hydrologic cycle and its dynamics, and thus improve our skills in modeling, forecasting, and analyzing the land–atmosphere system.

A number of studies are devoted to assessing these water budget components at different temporal and spatial scales using a variety of approaches. One example is the Global Energy and Water Cycle Experiment (GEWEX) project and the Water and Energy Budget Synthesis (WEBS) within it (Roads et al. 2003). One of the important goals of these efforts is to provide consistent estimates of the water cycle components using observations such that the mass balance of water is closed. The widely used approaches to evaluate the terrestrial water cycle can be divided into three categories: 1) observations based on in situ measurements, which for some variables, like evaporation, are poorly or sparsely measured (or both); 2) derived estimates based on ground- and space-based remotely sensed observations; and 3) estimates based on land surface models (LSMs), either coupled or uncoupled with the atmosphere. The estimates from the three approaches can be combined through data assimilation, and this is the approach from the reanalysis efforts of major weather centers like the European Centre for Medium-Range Weather Forecasts (ECMWF) or the National Centers for Environmental Prediction (NCEP).

Direct observation is the most traditional approach for water budget estimates and considered reliable at the scale of measurement. An observation-based water budget analysis at a large basin scale was conducted by Ropeleski and Yarosh (1998) over the central United States, and the results provide very useful insights on the annual and interannual dynamics of the water cycle. An important limiting factor of the observational approach is data availability. At large scales—for example, regional to continental scales—dense networks of instruments are just too expensive and infeasible to deploy. Fortunately, the fast-developing remote sensing, especially from space, and the corresponding retrieval techniques are increasing the potential to monitor the global land surface (Jackson et al. 2002)

Terrestrial water and energy budget simulations using either climate models or offline land surface models are another approach to water budget analyses. An advantage of this approach is that the physical consistency (i.e., closure of the water and energy budgets) is maintained by construct in the models. Offline land surface models need meteorological forcing fields, which could be taken from data archive (Abdulla et al. 1996; Nijssen et al. 2001) or from atmospheric model analysis (Maurer et al. 2001). This makes the modeling approach feasible over continental to global areas (Ziegler et al. 2003), and at different temporal and spatial scales. After careful calibrations, land models can produce reliable results over many areas (Abdulla et al. 1996; Nijssen et al. 2001). A major drawback of a modeling-only approach for water budget estimation is that models are not perfectly parameterized or calibrated—errors and biases exist and propagate through time.

Data assimilation techniques combine the virtues of observations and modeling by fusing them together. The assimilation techniques have been studied and used for decades in meteorological and oceanic applications. It is a common practice to assimilate observations in atmospheric studies to do analysis and reanalysis. For example, the global NCEP–National Center for Atmospheric Research (NCAR) reanalysis (Kalnay 1996), and more recently the reanalysis by ECMWF (European Centre for Medium-Range Weather Forecasts 2002) and the North American Regional Reanalysis (NARR) by NCEP. In hydrologic studies, data assimilation has not been fully utilized, especially with remotely sensed data (McLaughlin 2002), though a number of assimilation studies have been performed (Reichle et al. 2001, 2002; Crow and Wood 2003; Crow 2003).

As already mentioned, it is important to obtain estimates of the water budget variables that satisfy closure. Estimations of the water budget from observations alone rarely close themselves due to measurement inaccuracy, interpolations errors, sparse data, and so forth. The nonclosure or imbalance problem is also often seen in data assimilation approaches. For instance, the ensemble Kalman filter (EnKF) (Evensen 1994) is a widely used data assimilation scheme. When EnKF is applied to a land model, it updates the soil moisture states using the measurements obtained. This modification to soil moisture may destroy the balance established in the model. This does not mean the EnKF itself does not conserve the water mass. This is because EnKF provides an ensemble of state forecasts, and subsequently filtered states, at an instantaneous time, while the water balance constraint involves not only instantaneous states but also integration of fluxes over a time step. The broken water balance in current EnKF practice gives the motivations in this study to impose some “constraint” on the assimilation procedure to maintain the water balance.

The focus of this paper is development and application of a constrained data assimilation approach that allows the merging of land surface state and flux observations assimilated into a land surface model while maintaining closure of the water budget. Our constrained ensemble Kalman filter (CEnKF) is developed as a two-step filtering approach, with the first step assimilating in situ measurements of the water budget into a land surface model and the second step being the constraint step where the imbalance term is optimally distributed among the terms of the water budget. The development of the generalized constrained filter is presented in section 2 and in a form that can be applied to all classes of Kalman filters: linear, nonlinear, and ensemble-based. The application of the water balance constraint is presented in section 3. In section 4, the approach is applied to a 75 000 km^{2} domain utilizing the EnKF (Evensen 1994) in the data assimilation step with the Variable Infiltration Capacity (VIC) land surface model (Liang et al. 1994, 1996, 1999).

## 2. Constrained Kalman filters: Theoretical development

The theory of state estimation and the filtering of system and measurement noise extends back four decades to the development of the Kalman filter for linear state estimation with additive Gaussian noise (Kalman 1960) and is well documented in numerous papers and textbooks (e.g., Ho and Lee 1964; Gelb 1974; Anderson and Moore 1979). The application of classical linear Kalman filtering to hydrologic problems extends back to the 1970s (see Wood and Rodriguez-Iturbe 1975; Kitanidis and Bras 1980; McLaughlin and Wood 1988). For hydrological problems where side constraints such as the closure of the water balance is important, the development of a constrained Kalman filter and its application for regional water balance filtering has not occurred to date.

In this section we formulate the constrained Kalman filter as an unconstrained filter plus a second filter that imposes the constraints. A brief overview of the Kalman filter will facilitate the presentation of the constrained version.

### a. Kalman filters

**x**, and its time evolution is described by the equation

**u**

_{k}_{+1}are exogenous forcings from time

*k*to

*k*+ 1 weighted by

*B*(·),

_{k}**w**is an additive noise term that represents the uncertainty in the state dynamics, and

_{k}*F*(·) is referred to as the state transition function. Basically Eq. (1) represents the model of the system, which for the terrestrial hydrosphere would be a land surface model, forced by precipitation, surface meteorology, and incoming solar and longwave radiation. The model parameterization are contained in

_{k}*B*(·) and

_{k}*F*(·). Because of incorrect specifications of

_{k}*B*(·) or

_{k}*F*(·), and spatial heterogeneities or scale effects, or uncertainties in their parameters and constants, or perhaps uncertainty in the forcings (which is usually assumed perfectly known in classical filtering), the predictions of the states at time

_{k}*k*+ 1 are uncertain. This uncertainty is usually represented by an additive noise to Eq. (1), which is represented by

**w**

*.*

_{k}*H*(·) describes the relationship between the states

_{k}**x**and the outputs

**y**. For a hydrologic system, the outputs usually include all the terms in the water balance (i.e., soil moisture, runoff, evapotranspiration, drainage, etc.) It is the case that the outputs are measured with error

**v**

*so that the quantity*

_{k}**z**

_{k}=

**y**

_{k}+

**v**

_{k}represents (noisy) measurements of the system outputs.

#### 1) Discrete linear Kalman filter

**w**

*and*

_{k}**v**

*independently distributed*

_{k}*N*(

**0**, 𝗤

_{k}) and

*N*(

**0**, 𝗥

_{k}), where 𝗤

_{k}=

*E*(

**w**

_{k}

**w**

_{k}

^{T}) and 𝗥

_{k}=

*E*(

**v**

_{k}

**v**

_{k}

^{T}), the minimum error variance estimation of the system state at

*k*given the measurement

**z**

*is*

_{k}**x̂**

_{k}(+) is the filtered (or updated) estimate of the state at time

*k*considering the measurement

**z**

*and*

_{k}**x̂**

_{k}(−), the state forecast at time

*k*using measurements up to time

*k*− 1. The quantity

**H**

_{k}

**x̂**

_{k}(−) is the forecast of the system output; the difference between this and the measurement of the system output

**z**

*is*

_{k}*=*

**ν**_{k}**z**

_{k}− 𝗛

_{k}

**x̂**

_{k}(−) and is referred to as the innovation. In Eq. (5), the updating of the state

**x**

*is weighted between the forecast*

_{k}**x̂**

_{k}(−) and the innovation

*, and 𝗞*

**ν**_{k}_{k}is referred to as the Kalman gain whose value is computed by

_{k}(−) is the error covariance of the state forecast

**x̂**

_{k}(−). It can be shown that the cross covariance between the system state and measurement is

**Σ**

_{xz}=

**Σ**

_{xy}= 𝗣

_{k}(−)𝗛

_{k}

^{T}and the covariance of the system measurement

**Σ**

_{zz}=

**Σ**

_{yy}+ 𝗥

_{k}= 𝗛

_{k}𝗣

_{k}(−)𝗛

_{k}

^{T}+ 𝗥

_{k}so that the Kalman gain can be expressed as

#### 2) Ensemble Kalman filters

*k*to

*k*+ 1 [Eq. (3)] and measurements at time

*k*[Eq. (4)] can be generated. Here the errors models can be more general than assumed in Eqs. (3) and (4) in that they can be non-Gaussian, nonadditive, represent errors in the input vector

**u**or parameters of

*F*(·),

_{k}*B*(·), or

_{k}*H*(·), but deviating away from the error specification discussed above may result in “nonoptimal” filtered state estimates—that is, there is no guarantee that the state error covariance is minimized and the innovation time series, the time series of the difference between predicted measurements and the measurements

*=*

**ν**_{k}**z**

_{k}− 𝗛

_{k}

**x̂**

_{k}(−), have zero temporal correlation. Given the ensembles, the Kalman gain can be computed using Eq. (7) with the terms

**Σ**

_{xz}=

**Σ**

_{xy}and

**Σ**

_{yy}estimated from the ensembles. The

*i*th ensemble member of the state vector is updated using

**z**

*is perturbed according to its error covariance 𝗥*

_{k}*to create an ensemble of*

_{k}**z**

_{k}

^{(i)}values, which are used in the state update according to Eq. (9) (Burgers et al. 1998). EnKF is computationally feasible for complex land surface models, can incorporate complex nonlinearity within the models, and allows flexible error models. This is the filter approach that will be used in section 4 for assimilating data for the regional water budget.

### b. Adding equality constraints to state estimation

A significant shortcoming in the current use of data assimilation for water and energy budget studies is that the approaches do not impose any constraints on the states, and thus rarely preserve the water balance. For example, Betts and Ball (2003) found nonclosure of the water budget in the ECMWF 40-yr Re-Analysis (ERA-40) over the Mackenzie Basin to be approximately 90 mm yr^{−1}, about 50% of the observed river discharge, while the recently released NARR, which assimilated observed precipitation, had nonclosure of 70 mm yr^{−1} (S. Déry 2005, personal communication). Besides water and energy budget constraints, in many cases, there are other known constraints—for example, physical relationships between state variables—that can further refine our estimation of the system state. Here we will present two approaches for adding conservation constraints—that is, equality constraints.

The first approach to constrain the state is structuring the output/measurement equation as a budget constraint as perfect observations (Maybeck 1979). Assuming perfect observations during the computation of the Kalman gain [Eq. (4)] and Kalman update results in singularity in the observation error covariance matrix, but this does not present any theoretical problems if we use Moore–Penrose pseudoinverse (Geeter et al. 1997). Some authors argue that this singularity would lead to numerical problems, and moreover, the added perfect observations increase the dimension of the matrices that are inverted (Simon and Chia 2002).

The second approach is to do postfiltering (i.e., two-step or sequential filtering), where a second filter is constructed to satisfy the constraints (Simon and Chia 2002). For ensemble filtering applications, this approach is more suitable and easy to implement. The first step consists of performing an unconstrained state estimation procedure using any suitable Kalman filters approach and its corresponding error models. The second step is to perform another state estimation process that meets the constraints based on the (unconstrained) estimates already obtained. This sequential filter algorithm could be carried in one of two ways:

Performing a second filter update using perfect observations to update the state sequentially after obtaining filtered estimates from an unconstrained filter. Note that the “second filter update” referred to here only filters a second but perfect observation into the system state, and no state transition (one-step-ahead prediction) is performed since it is already done in the first filter. Depending on whether the constraints are linear or not, the second filter may be the standard Kalman filter, an extended Kalman filter or an ensemble Kalman filter. Processing observations individually, rather than simultaneously incorporating all measurements obtained at a particular time, is usually referred to as sequential processing (Sorenson 1966), and results from sequential processing are equivalent to simultaneous processing (Sorenson 1966) as long as the observations are independent.

Optimal state estimation under constraints, where these methods are discussed in Simon and Chia (2002). This approach takes the initial unconstrained state estimate and corresponding errors as the starting point of a new estimation problem and tries to find the best estimate that both satisfies the constraints and is optimal, based on a criteria such as “maximum likelihood,” “least squares,” or “orthogonal projection.”

*G*(·) is parameterization of the constraint that is a function of the state variables

_{k}**x**and the inputs

**u**

_{k}in the past time step from time

*k*− 1 to

*k*. Without any loss of generality, we can have

*m*multiple constraints, and we can further define

**x***

_{k}= (

**x**

_{k}

**u**

_{k}), a partitioned vector of size

*n + l*consisting of the

*n*states and

*l*forcing inputs. Then (10) becomes

*G*(

_{k}**x***

_{k}) =

**g**

_{k}, and if the constraint is linear, then 𝗚

_{k}

**x***

_{k}=

**g**

_{k}. Note that expanded state

**x***

_{k}contains the forcing inputs as well; thus the constraint will be imposed on not only the states but also the forcing inputs.

_{k}= 𝗚

_{k}and the observations

**z**″

_{k}=

**g**

_{k}“observed” without error; that is, 𝗥″

_{k}≡

**0**. The Kalman gain for this constraining filter is, using the above,

_{k}(+) is a (

*n*+

*l*) × (

*n*+

*l*) partitioned error covariance matrix with the

*n*×

*n*submatrix being the filtered state error covariance at time

*k*, and the

*l*×

*l*submatrix the error covariance of the inputs and the

*n*×

*l*submatrices the error cross covariance between states and inputs. The dimension of 𝗞″

_{k}is (

*n*+

*l*) ×

*m*, where

*m*is the dimension of

**g**

*. The update of the state equation at time*

_{k}*k*after the observation, was given by Eqs. (5) for the linear filter. After the application of the constraining filter, the state is updated again using

_{k}can be computed in the same way from the ensemble of

**x̂***

_{k}

^{(i)}(+) and

*G*[

_{k}**x̂***

_{k}

^{(i)}(+)] as 𝗞

_{k}is computed from the ensemble of

**x̂**

_{k}

^{(i)}(−) and

*H*[

_{k}**x̂**

_{k}

^{(i)}(−)]:

**Σ**″

_{xz}and

**Σ**″

_{zz}are both computed from the ensemble. It should be noted that

**Σ**″

_{xz}is the cross covariance between state

**x***

_{k}and predicted output

*G*(

_{k}**x***

_{k}), and

**Σ**″

_{zz}is the covariance of

*G*(

_{k}**x***

_{k}). Each ensemble member can then be constrained respectively using

## 3. Implementation of CEnKF with the VIC hydrologic model

### a. Hydrological model and system state

*F*(·) and

_{k}*B*(·) in the filtering system. VIC is forced with meteorological and radiative inputs and computes the terrestrial states (snow water, soil moisture, soil ice content, surface and subsurface temperatures) as well as the water and energy fluxes. VIC works at a spatial grid scale ranging from ∼5 to 500 km. The number of vertical layers may vary and is set to 3 in the simulations performed here. In this study, which is over the southern Great Plains region of the United States, cold season processes are not that significant even in the winter, so snow/ice features, while still part of the VIC simulations, have been ignored. Also, for the assimilation experiments being reported here, only observations related to the water budget were used (i.e., soil moisture, evapotranspiration, and runoff), and we have essentially ignored the energy budget equation. Therefore, the state of the hydrologic system

_{k}**x**

_{k}at time step

*k*is defined as

*s*

_{1},

*s*

_{2}, and

*s*

_{3}are volumetric soil moisture content in the three soil layers. We should include the vegetation interception storage as a state variable, and it is computed within VIC. For ease of presentation and because we do not have interception storage measurements we have ignored this state variable in Eq. (18). The forcing term

**u**

_{k}can be defined as the precipitation input (even though for completeness we could expand this to include all surface meteorological variables used to force the model):

### b. System output

*e*from flux towers, runoff

*q*from river gauges, and the first two layers of soil moisture

*s*

_{1}and

*s*

_{2}. The variables

*s*

_{1}and

*s*

_{2}are system state variables, and

*e*and

*q*are functions of the system state and forcing variables:

*e*=

_{k}*e*(

_{k}*s*

_{1,k},

*s*

_{1,k},

*s*

_{1,k},

*p*) and

_{k}*q*=

_{k}*q*(

_{k}*s*

_{1,k},

*s*

_{1,k},

*s*

_{1,k},

*p*). These functional relationships

_{k}*e*(·) and

_{k}*q*(·) are parameterized in VIC (see Liang et al. 1994, 1999; Peters-Lidard et al. 1998), for example, using the Penman–Monteith–Jarvis parameterization for surface evapotranspiration. So the system output is

_{k}*H*(·) is a partitioned matrix consisting of a linear portion that describes the measurement equations for the states and a nonlinear portion that reflects those output variables that are derived from the states, namely,

_{k}*e*(

_{k}*x**

_{k}) and

*q*(

_{k}*x**

_{k}). With this,

*H*(·) assumes the following form:

_{k}### c. Water balance constraint

*p*,

*e*,

*q*, and change in total soil water storage

*s*

_{1}+

*s*

_{2}+

*s*

_{3}. Define the water balance residual (i.e., imbalance term)

*r*, which must be zero if mass balance is maintained. Thus,

_{k}*s*

_{1,k},

*s*

_{2,k}, and

*s*

_{3,k}are current states; variable

*p*is a forcing term; and variables

_{k}*e*and

_{k}*q*are both functions of the moisture states and precipitation forcing. Note that

_{k}*s*

_{1,k},

*s*

_{2,k}, and

*s*

_{3,k}are instantaneous states at time

*k*, while

*p*,

_{k}*p*, and

_{k}*p*are accumulations of fluxes from time

_{k}*k*− 1 to

*k*. All the above six terms can be included in expanded state

**x***

_{k}defined in Eq. (22). The three remaining terms,

*s*

_{1,k−1},

*s*

_{2,k−1}, and

*s*

_{3,k−1}, are states at the previous time step and are treated as known without error from their estimation at time

*k*− 1. This is done to avoid their adjustment at time

*k*, which would result in a water budget imbalance at time

*k*− 1. Alternatively the data could be assimilated via a Kalman smoother, rather than a Kalman filter, which would allow future observations to contribute to the estimation of the states at previous time steps (Dunne and Entekhabi 2005). If we rewrite the constraint Eq. (22) in the form of

*G*(

_{k}**x***

_{k}) =

**g**

_{k}, we have on the left-hand side

*H″*(·) =

_{k}*G*(·) and the “perfect” measurement is

_{k}**z**″

_{k}=

**g**

_{k}= 0;

*H*″

_{k}(

**x***

_{k}·) computes the residual term from the expanded state vector

**x***

_{k}, so it includes

*e*(·) and

_{k}*q*(·), and has the form

_{k}**can be set equal to the “known” values of the moisture states at time (**

*g*_{k}*k*– 1), and these variables can be removed from Eqs. (25) and (27). In this case, the residual term from the imbalance would be augmented by the moisture states at time (

*k*− 1).

_{k}(·) gives the residual term,

*G*[

_{k}**x̂***

_{k}

^{(i)}(+)] =

*r*

_{k}^{(i)}(+) gives the residual term of each

*i*th ensemble member after the data assimilation step, that is, the first filter. Therefore,

**Σ**″

_{xz}and

**Σ**″

_{zz}, needed for computing the gain in constraining filter

**K**″

_{k}=

**Σ**″

_{xz}

**Σ**

_{zz}″

^{−1}, are the cross covariances between the expanded state and residual, and the covariance of the residual in the ensemble, respectively. After 𝗞″

_{k}is obtained, the constraint is applied following Eq. (15):

**x̂***

_{k}

^{(i)}(++) is the unconstrained state

**x̂***

_{k}

^{(i)}(+) plus an adjustment equal to the term − 𝗞″

*[*

_{k}G_{k}**x̂***

_{k}

^{(i)}(+)]. If, in any ensemble member, the residual term G

_{k}[

**x̂***

_{k}

^{(i)}(+)] is not zero, then Eq. (28) redistributes this residual term across the water balance components in

**x̂***

_{k}

^{(i)}(++) and obtains mass balance. The gain 𝗞″

_{k}determines how the residual term (mass balance error) will be redistributed to the augmented state variables

*p*,

_{k}*e*,

_{k}*q*,

_{k}*s*

_{1,k},

*s*

_{2,k}, and

*s*

_{3,k}. Further inspection of 𝗞″

_{k}=

**Σ**″

_{xz}

**Σ**

_{zz}″

^{−1}shows that

**Σ**″

_{xz}determines the adjustment made to each variable. The larger error covariance that a particular variable has in

**x̂***

_{k}

^{(i)}(++) and the larger correlation between this variable and the residual, the larger is the corresponding row in

**Σ**″

_{xz}and the larger adjustment made to that variable. This results are intuitively correct—variables with large errors and/or are highly correlated to the residual are adjusted more than variables with small errors and/or low correlations.

## 4. Implementing CEnKF data assimilation for water budget estimation

### a. Study area and data sources

The study area chosen was the southern Great Plains (SGP) region that consists of the Atmospheric Radiation Measurement Program Cloud and Radiation Testbed (ARM-CART) experimental area within Oklahoma, shown in Fig. 1 as the thick box. A number of research experiments and facilities are deployed in this box region, and all terrestrial water budget variables are measured. Evapotranspiration is measured by ARM-CART Bowen ratio instruments (crosses in Fig. 1), which report latent heat flux every 15 min. Soil moisture is measured by Oklahoma Mesonet stations (dots in Fig. 1), and the measurements are taken hourly at four depths: 5, 25, 65, and 75 cm. There are 44 Mesonet stations with >90% complete records. Runoff/streamflow measurement is the hardest part due to human regulation of streamflow. A study by Wallis et al. (1991) identified a set of unregulated or very slightly regulated basins gauged by U.S. Geological Survey (USGS) across the United States, and within our study area and its nearby vicinity there are 48 that are used here. They are outlined in Fig. 1 and are used to estimate the runoff from the measured daily streamflow. The meteorological forcing fields come from the 50-yr long-term retrospective North American Land Data Assimilation System (NLDAS) project database created by University of Washington (Maurer et al. 2002), in which the precipitation and air temperature are gridded from National Oceanic and Atmospheric Administration (NOAA) Cooperative Observer (COOP) ground observations.

### b. Water budget model simulations

In this study, the VIC modeling grid pixel size is set to 0.5° (∼50 km) with the modeling domain being a 6 × 5 pixel box (see the gray grids in Fig. 1). For this study the VIC land surface model was run in a “water-balance mode,” so the simulation is performed at a daily time step. All subdaily observations except soil moisture were first aggregated to daily values and interpolated to 0.5° grid by simple linear interpolation. For soil moisture, the instantaneous values were spatially interpolated from the Oklahoma Mesonet measurements. Latent heat measurements from ARM-CART flux towers were converted to an equivalent evapotranspiration depth (volume per unit area) and interpolated over the modeling domain. The USGS streamflow measurements were also converted to a depth per unit area using the gauged area, and interpolated across the domain. Because of the widely distributed river regulation structures in the area, the number of USGS gauge basins selected is limited and they are sparsely located (Fig. 1). Also, the interpolation procedure treats the streamflow, in units of discharge per unit area for a basin, as if it is a point value at the geometric center of the basin. The lack of data and simple interpolation scheme increases the uncertainties in the streamflow observations so obtained. To evaluate the interpolation uncertainties, a “leave-one-basin-out” cross-validation procedure was performed. It was found that the average root mean squared error (rms) for the missing basin during interpolation is 0.1910 mm day^{−1}, ranging from 0.0361 to 0.4085 mm day^{−1}. Compared to the mean streamflow depth of the region, 0.8059 mm day^{−1}, such interpolation errors are about 24%.

A 2-yr simulation period was used for the study from 1 October 1997 to 30 September 1999. This period was used because the available observational data (soil moisture, latent heat, and streamflow) were available.

For the ensemble Kalman filter, an ensemble of model replicates (ensemble members) that properly represents the uncertainties and errors associated with the model (i.e., the system equation; see section 2) is created. This is usually done by perturbing (i.e., adding noise to) the model initial conditions, forcing fields, and/or parameters etc. Prior to this study and as part of the long-term retrospective NLDAS simulations, the VIC parameters were calibrated for the Red–Arkansas Basin, which contains the study area, so they were not perturbed here. In most water budget simulations with a well-calibrated model, most of the uncertainties come from the uncertainties in the estimated precipitation used to force the model and initial soil moisture. Other significant sources include near-surface meteorological conditions like surface air temperature, humidity, and wind. For this study the precipitation was perturbed using a Monte Carlo method developed by J. Schaake (National Weather Service 2003, personal communication; also see Pan 2004) to create an ensemble of inputs. This ensemble generation method makes the following assumptions:

The observed and true precipitation amount can be treated as two jointly distributed random variables, and the marginal probability distribution for both is a lognormal distribution (when it rains) with a concentrated probability mass at zero (probability of no-rain).

Such a marginal distribution can be easily transformed to (and back from) a standard Gaussian distribution via an equal-quantile transform. If both the two random variables are transformed to standard Gaussian, their joint distribution will be a bivariate Gaussian. The quality of the precipitation observations can then be measured by the correlation coefficient of this bivariate Gaussian distribution and the parameters in the equal-quantile transform.

Given an observed precipitation value and transformed to standard Gaussian, the conditional distribution of true precipitation can be derived, and this conditional is transformed back to lognormal variate. Thus for a given precipitation observation, an ensemble of possible true values can be simulated according to the conditional lognormal distribution obtained above.

The soil column in VIC is divided into three layers: the top layer is fixed at 10-cm size; the second is about 40 cm (varies between 39 ∼ 41 cm); and the third, bottom layer is on average 100 cm, variable across the domain. On average, the top two layers of the Mesonet point measurements (5 and 25 cm), when interpolated across the modeling domain, compare well with the top two layers in VIC in that they are highly correlated and display the same dynamics to precipitation forcings. Nonetheless, the Mesonet measurements estimate point-scale soil moisture based on a heat dissipation sensor, while the VIC land surface model represents the modeled volumetric grid-average soil moisture. Because of disparities in scales, including both spatial scales from the Mesonet point scale to the VIC 0.5° spatial scale, and the 5-cm Mesonet to the 10-cm VIC modeling depth, it is noted that the VIC soil moisture values are, on the average, consistently higher than the Mesonet values (Fig. 2). To remove this systematic scaling effect, we developed a regression model that related the Mesonet soil moisture estimate to the VIC top-layer modeled estimates, resulting in Mesonet “adjusted” estimates that were assimilated into VIC. This approach removed the systematic biases between the VIC modeled predictions and the Mesonet observations.

In summary, the assimilation experiment consisted of a 20-member VIC-based ensemble that started on 1 October 1997 and lasted until 30 September 1999 at a daily time step. Every day, at 0000 local time, the Kalman filter updated the state equation with the constrained Kalman filter run sequentially using available observations. To compare the various estimates of the domain-averaged water budgets, the following monthly estimates are compared: (i) the budget based solely on observations, (ii) the budget based solely on VIC model simulations without any assimilation, (iii) the budget from the VIC model simulations with the assimilation of upper two layers of soil moisture, gridded evapotranspiration, and gridded runoff observations within a traditional EnKF, and (iv) the budget from VIC simulation, with the above assimilation and applying the CEnKF.

### c. Results and discussions

Analyses of monthly water budgets (change in soil moisture balanced by precipitation, evapotranspiration, and runoff, plus the corresponding residual/imbalance of nonclosure) from the four experiments (observation, VIC, VIC–EnKF, VIC–CEnKF) are plotted in Figs. 3 –6. Ensemble means were used for assimilation experiments. Each figure contains three panels: the top panel gives precipitation (line), evapotranspiration (thatched bar), and runoff (solid bar); the middle panel gives the monthly change in soil moisture; and the bottom panel gives the residual (imbalance) in the water budget.

In the observationally based water budget (Fig. 3), the residuals are up to 47.2 mm month^{−1}, equal to many other terms in the budget and often exceeding the discharge or change in soil moisture. Cumulatively over the 2 yr, the imbalance is small, −6.614 mm, showing that the water budget is closed statistically over the period. The imbalance time series appears to be positively correlated with the storage change (not shown), which implies the observed soil moisture has a low dynamics bias.

The VIC model-based water budget (Fig. 4) has closure by construct. The model-estimated evapotranspiration and runoff are close to the observations. The soil moisture has a larger seasonal range than the Mesonet observations, but it is unclear which is more reasonable or more accurate. The VIC model reproduces well the monthly time series of observed runoff and evapotranspiration.

The traditional (unconstrained) EnKF assimilation-based water budget (Fig. 5) is very similar to the VIC simulations except for the runoff. Overall, the assimilation makes some, but not a large, difference. This is because the VIC model is not seriously in error and because the transformation between the VIC and Mesonet soil moisture helps reduce the difference between the modeled and observed dynamic range in soil moisture. For some months (especially October of 1998), the soil moisture dynamics are reduced compared to the VIC-estimated values. The imbalances in the ensemble Kalman filtered water budget are large, reaching a maximum of −76.4 mm month^{−1} in March 1998. Cumulatively, the imbalance is −41.4 mm over the 2 yr, larger than from the estimates based on observations alone.

In the CEnKF experiment (Fig. 6), the soil moisture dynamics is slightly larger compared to unconstrained assimilation. The water balance is maintained by adjusting all the variables, including the precipitation forcing. Overall, the differences between the CEnKF and the EnKF are not large except that the imbalance is removed. Such subtle improvements are viewed as a positive result, and are credited to the good underlying performance of the land surface model. In areas that are more poorly gauged, or with models that are poorly calibrated, the differences among the observation-based, or model-based, or the CEnKF estimates may be more pronounced.

Additional details can be seen in the water budget component-by-component comparisons of their monthly ensemble mean of filtered values, as shown in Fig. 7. The top panel shows adjustments by CEnKF (thick dashed lines) to the precipitation forcings. The adjustment to precipitation (a feedback effect from soil moisture and evapotranspiration measurements) turns out to be small, but we can still see that the filtered precipitation is slightly reduced due to the lower observed soil moisture dynamics. For evaporation (second panel), all the estimates are very similar except over the 3 dry months in the late summer of 1998. During these months, soil moisture observation added some water into the soil column and enhanced the evapotranspiration. For runoff (third panel), some large discrepancies among the estimates are seen. For the climate of the southern Great Plains, the runoff ratio is low (especially in summer) so that runoff is a relatively small part of the water budget. Thus, adjustments to runoff have a large relative impact across the various estimates (observation only, modeling only, EnKF, or CEnKF). The EnKF estimate produces much lower runoff values for the wet months due to a reduction in deep soil moisture as a result of the assimilation of the Mesonet soil moisture (bottom panel). The CEnKF in constraining the closure of the water budget tends to increase the runoff. This alleviates much of the negative impact from the low soil moisture in EnKF. This suggests that constraining the water budget will balance the errors across the terms in the budget equation, thus providing better estimates.

In the panels for soil moistures in Fig. 7 (forth and fifth), one notable feature is that the EnKF time series almost always lies between the observations and the VIC simulations, and that the CEnKF series almost always lies between the EnKF and VIC. Thus, the Kalman filter blends the model and observations and finds a balance between the two according to their relative errors—resulting in the filtered estimates lying between the two. Because the VIC model has closure (by construct) while the observations do not, the constrained filter will have a balance between the model estimates and the EnKF estimates.

Data assimilation itself also provides a mean of error trade-off/reduction within and between different variables being simulated and observed, reflecting our uncertainties on the target quantities. Such errors (uncertainties) are prescribed for observed and forcing variables, and represented by the ensemble spread for states before and after the filtering (prior and posterior errors). Table 1 lists the error values that are both prescribed and calculated from the ensemble spread as a root-mean-squared measure. Note that these values are averages over the whole experiment period and vary significantly during wet and dry days. From the results, we can see that the model simulations tend to have much higher confidence than the observations (lower prior uncertainties than observation uncertainties) and filtering does reduce the uncertainties. Constrained assimilation experiment yields slightly lower posterior uncertainties than unconstrained case for all variables except for the bottom layer soil moisture due to the assimilation of the “perfect observations”—that is, the balance constraint. The increased uncertainties in bottom layer (∼1 m thick) soil moisture in the constrained experiment is because this layer will accommodate a large portion of imbalance errors for its relative large capacity and error range, as already explained at the end of section 3. Another reason is that no bottom layer observation is available to reduce its uncertainties.

An “optimal” filter should extract all the information contained in the observations used to update the state equation. This results in an innovation time series being uncorrelated in time, where the innovations are defined as the difference between the model-predicted observations and the observations. In theory, this condition holds for Kalman filters applied to linear, Gaussian system with state and observation errors that are uncorrelated, and where the error covariances for the state and measurement equations are correctly specified. A filter with autocorrelated innovations suggests that an autoregressive model can be formulated that can make further use of information in the observations. In EnKF formulations that are highly nonlinear and non-Gaussian, it is unclear that the optimality condition of uncorrelated innovations can be assured. As suggested in Crow and Bindlish (2004) criteria used in standard Kalman filtering may sometimes be misleading in hydrological data assimilation problems. Efforts were made to estimate the state and measurement error covariances so as to have an auto-uncorrelated innovation time series, but some residual level of correlation exists at short lag time (1∼2 days) as can be seen in Fig. 8. Such residual correlation can be found in the constrained assimilation but not the unconstrained case. This is due in part to the correlation in imbalance errors before the constraint was applied (Fig. 5), which violates the “uncorrelated system errors” assumption for Kalman filter. Figure 8 shows the autocorrelation function (ACF) of soil moisture innovations in EnKF and CEnKF experiments for one grid (latitude 34.75°, longitude −97.75°). Further work is underway to better assess and characterize the spatial and temporal error characteristics in sensors, such as the Mesonet soil moisture sensors, and their effects on estimates from filtering.

## 5. Summary and final comments

This study introduces an EnKF-based data assimilation procedure that incorporates equality constraints and is referred to here as the constrained ensemble Kalman filter (CEnKF). The assimilation procedure is implemented to assure closure of the water balance over a region when hydrological data are assimilated into model-based estimates. The application of the constraint-based filter results in updated soil moisture states and also, if uncertain, flux estimates and forcings. The procedure was applied to a domain in the southern Great Plains regions of the United States of approximately 75 000 km^{2}. The CEnKF-based water budget estimate is compared to three other estimates: observation-based water budget, a VIC land surface model estimate, and an ensemble Kalman filtering (EnKF) data assimilation estimate where data (upper two layer of soil moisture, gridded evapotranspiration, and runoff) are assimilated.

The observation-based estimates closed the water budget statistically over the 2-yr study period, but had relatively large monthly imbalances. The VIC-based model estimates provided closure but had inconsistencies when compared to observations. Estimates based on the EnKF blends the model estimates with observations but resulted in relatively large imbalances at the monthly scale, and over the two-year period. Estimates based on the CEnKF contains the benefits of the EnKF in that observations and model estimates are merged with the assurance that the filtered estimates maintain water balance closure.

The CEnKF effectively redistributes the EnKF nonclosure term (the imbalance term) across the terms in the water balance according to the error levels associated with these terms and their correlation to the nonclosure term. Thus, components with larger errors are adjusted more. Also, from the viewpoint of adding information, imposing the water balance constraint also adds *new* information to the assimilation system in addition to the assimilation of the available observations, which is the balance of water, or zero residuals. Compared to unconstrained data assimilation, the constrained assimilation incorporates more knowledge about the hydrological system. The experiments performed over the Oklahoma region illustrated the power of CEnKF to remove imbalances and adjust the trade-offs among variables. In addition, the water balance constraint reinforces the coupling between the moisture states and the forcing inputs. As we see from these results, the soil moisture observations can have a feedback to the precipitation inputs. This feedback implies that there is a potential to use microwave measurement (signature of soil moisture) to improve precipitation estimation. Crow (2003) used L-band microwave brightness temperatures to adjust for precipitation errors due to temporally sparse rainfall information. Updating precipitation, using observations from other water balance components, holds great promise for synergistic use of in situ and remotely sensed data, but probably needs to be carried out within a CEnKF framework using the water balance closure constraint to contain the errors.

It is our expectation that the CEnKF, with the constraints run as a sequential filter, holds great promise for other, related problems including the use of constraints on the surface energy balance, or constraints describing physical relationships between variables that may not be appropriate as state equations.

## Acknowledgments

This research was supported through NASA Research Grant NAG5-11610 and NOAA Grant NA03O3AR4310001. This support is gratefully acknowledged.

## REFERENCES

Abdulla, F. A., Lettenmaier D. P. , Wood E. F. , and Smith J. A. , 1996: Application of a macroscale hydrologic model to estimate the water balance of the Arkansas–Red River basin.

,*J. Geophys. Res.***101****,**D3. 7449–7459.Anderson, B. D. O., and Moore J. B. , 1979:

*Optimal Filtering*. Vol. 1, Prentice-Hall, 357 pp.Betts, A. K., and Ball J. H. , 2003: Evaluation of the ERA-40 surface water budget and surface temperature for the Mackenzie River basin.

,*J. Hydrometeor.***4****,**1194–1211.Burgers, G., van Leeuwen P. J. , and Evensen G. , 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126****,**1719–1724.Crow, W. T., 2003: Correcting land surface model predictions for the impact of temporally sparse rainfall rate measurements using an ensemble Kalman filter and surface brightness temperature observations.

,*J. Hydrometeor.***4****,**960–973.Crow, W. T., and Wood E. F. , 2003: The assimilation of remotely sensed soil brightness temperature imagery into a land-surface model using ensemble Kalman filtering: A case study based on ESTAR measurements during SGP97.

,*Adv. Water Res.***26****,**137–149.Crow, W. T., and Bindlish R. , 2004: The impact of incorrect model error assumptions on the assimilation of remotely sensed surface soil moisture.

*Proc. Second Int. CAHMDA Workshop on the Terrestrial Water Cycle: Modelling and Data Assimilation across Catchment Scales*, Princeton, NJ, Princeton University, 145–148.Dunne, S., and Entekhabi D. , 2005: Estimation of soil moisture fields using multi-scale observations in an ensemble smoother-based four-dimensional land data assimilation system. Preprints,

*19th Conf. on Hydrology*, San Diego, CA, Amer. Meteor. Soc., CD-ROM, 1.10.European Centre for Medium-Range Weather Forecasts, cited. 2002: The ERA-40 Archive. [Available online at http://www.ecmwf.int/research/era/index.html.].

Evensen, G., 1994: Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**C5. 10143–10162.Geeter, J. D., Brussel H. V. , and Schutter J. D. , 1997: A smoothly constrained Kalman filter.

,*IEEE Trans. Pattern Anal. Mach. Intell.***19****,**1171–1177.Gelb, A., 1974:

*Applied Optimal Estimation*. MIT Press, 374 pp.Ho, Y. C., and Lee R. C. K. , 1964: A Bayesian approach to problems in stochastic estimation and control.

,*IEEE Trans. Autom. Control***9****,**333–339.Jackson, T. J., Hsu A. , and O’Neill P. , 2002: Surface soil moisture retrieval and mapping using high frequency microwave satellite observation in the southern Great Plains.

,*J. Hydrometeor.***3****,**688–699.Kalman, R. E., 1960: A new approach to linear filtering and prediction problems.

,*J. Basic Eng.***82****,**35–45.Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.***77****,**437–471.Kitanidis, P. K., and Bras R. L. , 1980: Real-time forecasting with a conceptual hydrologic model: 1. Analysis of uncertainty.

,*Water Resour. Res.***16****,**1025–1033.Liang, X., Lettenmaier D. P. , Wood E. F. , and Burges S. J. , 1994: A simple hydrologically based model of land surface water and energy fluxes for GCMs.

,*J. Geophys. Res.***99****,**D7. 14415–14428.Liang, X., Lettenmaier D. P. , and Wood E. F. , 1996: One-dimensional statistical dynamic representation of subgrid spatial variability of precipitation in the two-layer variable infiltration capacity model.

,*J. Geophys. Res.***101****,**D16. 21403–21422.Liang, X., Wood E. F. , and Lettenmaier D. P. , 1999: Modeling ground heat flux in land surface parameterization schemes.

,*J. Geophys. Res.***104****,**D8. 9581–9600.Maurer, E. P., O’Donnell G. M. , Lettenmaier D. P. , and Roads J. O. , 2001: Evaluation of the land surface water budget in NCEP/NCAR and NCEP/DOE reanalyses using an off-line hydrologic model.

,*J. Geophys. Res.***106****,**D16. 17841–17862.Maurer, E. P., Wood A. , Lettenmaier D. P. , and Nijssen B. , 2002: A long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States.

,*J. Climate***15****,**3237–3251.Maybeck, P. S., 1979:

*Stochastic Models, Estimation and Control*. Vol. 1, Academic Press, 289 pp.McLaughlin, D. B., 2002: An integrated approach to hydrologic data assimilation: Interpolation, smoothing, and filtering.

,*Adv. Water Resour.***25****,**1275–1286.McLaughlin, D. B., and Wood E. F. , 1988: A distributed parameter approach for evaluating the accuracy of groundwater model predictions: 1. Theory.

,*Water Resour. Res.***24****,**1037–1047.Nijssen, B., O’Donnell G. M. , Lettenmaier D. P. , Lohmann D. , and Wood E. F. , 2001: Predicting the discharge of global rivers.

,*J. Climate***14****,**3307–3323.Pan, M., 2004: Generation of ensembles of spatial hydrologic fields.

,*Eos, Trans. Amer. Geophys. Union***85****.**(Fall Meeting Suppl.), Abstract H23C-1142.Peters-Lidard, C. D., Blackburn E. , Liang X. , and Wood E. F. , 1998: The effect of soil thermal conductivity parameterization on surface energy fluxes and temperatures.

,*J. Atmos. Sci.***55****,**1209–1224.Reichle, R. H., McLaughlin D. B. , and Entekhabi D. , 2001: Variational data assimilation of microwave radiobrightness observations for land surface hydrology applications.

,*IEEE Trans. Geosci. Remote Sens.***39****,**1708–1718.Reichle, R. H., McLaughlin D. B. , and Entekhabi D. , 2002: Hydrologic data assimilation with the ensemble Kalman filter.

,*Mon. Wea. Rev.***130****,**103–114.Roads, J., and Coauthors, 2003: GCIP water and energy budget synthesis (WEBS).

,*J. Geophys. Res.***108****.**8609, doi:10.1029/2002JD002583.Rodriguez-Iturbe, I., and Mejia J. , 1974: Design of rainfall networks in time and space.

,*Water Resour. Res.,***10****,**713–728.Ropeleski, C. F., and Yarosh E. S. , 1998: The observed mean annual cycle of moisture budgets over the central United States (1973–92).

,*J. Climate***11****,**2180–2190.Simon, D., and Chia T. L. , 2002: Kalman filtering with state equality constraints.

,*IEEE Trans. Aerosp. Electron. Syst.***38****,**128–136.Sorenson, H. W., 1966: Kalman filtering techniques.

*Advances in Control Systems: Theory and Applications*, C. T. Leondes, Ed., Vol. 3, Academic Press, 219–292.Wallis, J., Dennis R. , Lettenmaier P. , and Wood E. F. , 1991: A daily hydroclimatological dataset for the continental United States.

,*Water Resour. Res.***27****,**1657–1664.Wood, E. F., and Rodriguez-Iturbe I. , 1975: A Bayesian approach to analyzing uncertainty among flood frequency models.

,*Water Resour. Res.***11****,**839–843.Ziegler, A. D., Sheffield J. , Maurer E. P. , Nijssen B. , Wood E. F. , and Lettenmaier D. P. , 2003: Detection of intensification in global- and continental-scale hydrological cycles: Temporal scale of evaluation.

,*J. Climate***16****,**535–547.

Monthly averaged soil moisture from the VIC model and from Oklahoma Mesonet measurement.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Monthly averaged soil moisture from the VIC model and from Oklahoma Mesonet measurement.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Monthly averaged soil moisture from the VIC model and from Oklahoma Mesonet measurement.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis based on interpolated in situ observations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis based on interpolated in situ observations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis based on interpolated in situ observations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from VIC land surface model simulations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from VIC land surface model simulations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from VIC land surface model simulations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from the unconstrained VIC–EnKF assimilation. Assimilated data are grid-averaged Oklahoma Mesonet soil moisture, ARM-CART EBBR evaporation, and USGS runoff.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from the unconstrained VIC–EnKF assimilation. Assimilated data are grid-averaged Oklahoma Mesonet soil moisture, ARM-CART EBBR evaporation, and USGS runoff.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from the unconstrained VIC–EnKF assimilation. Assimilated data are grid-averaged Oklahoma Mesonet soil moisture, ARM-CART EBBR evaporation, and USGS runoff.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from the CEnKF.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from the CEnKF.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Water budget analysis from the CEnKF.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Comparisons of the water budget components among the assimilation experiments, and with observations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Comparisons of the water budget components among the assimilation experiments, and with observations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Comparisons of the water budget components among the assimilation experiments, and with observations.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Autocorrelation function of innovation time series from the EnKF and the CEnKF for the grid cell at latitude 34.75°, longitude −97.75°.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Autocorrelation function of innovation time series from the EnKF and the CEnKF for the grid cell at latitude 34.75°, longitude −97.75°.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Autocorrelation function of innovation time series from the EnKF and the CEnKF for the grid cell at latitude 34.75°, longitude −97.75°.

Citation: Journal of Hydrometeorology 7, 3; 10.1175/JHM495.1

Prescribed uncertainties in observations: prior and posterior uncertainties in unconstrained and constrained assimilation experiments, in terms of averaged root-mean-squared spread across ensemble members. Units are mm day^{−1} for fluxes and mm for soil moistures.