## Abstract

Hydrologic data assimilation has become an important tool for improving hydrologic model predictions by using observations from ground, aircraft, and satellite sensors. Among existing data assimilation methods, the ensemble Kalman filter (EnKF) provides a robust framework for optimally updating nonlinear model predictions using observations. In the EnKF, background prediction uncertainty is obtained using a Monte Carlo approach where state variables, parameters, and forcing data for the model are synthetically perturbed to explicitly simulate the error-prone representation of hydrologic processes in the model. However, it is shown here that, owing to the nonlinear nature of these processes, an ensemble of model forecasts perturbed by mean-zero Gaussian noise can produce biased background predictions. This ensemble perturbation bias in soil moisture states can lead to significant mass balance errors and degrade the performance of the EnKF analysis in deeper soil layers. Here, a simple method of bias correction is introduced in which such perturbation bias is corrected using an unperturbed model simulation run in parallel with the EnKF analysis. The proposed bias-correction scheme effectively removes biases in soil moisture and reduces soil water mass balance errors. The performance of the EnKF is improved in deeper layers when the filter is applied with the bias-correction scheme. The interplay of nonlinear hydrologic processes is discussed in the context of perturbation biases, and implications of the bias correction for real-data assimilation cases are presented.

## 1. Introduction

Recent advances in hydrologic data assimilation have demonstrated the value of remotely sensed surface soil moisture for improving the prediction of key hydrologic variables such as root-zone soil moisture (Walker et al. 2001; Reichle et al. 2007; Kumar et al. 2008) and surface runoff (Crow et al. 2005). In hydrologic data assimilation, the Kalman filter provides a statistical framework to optimally update model predictions using observations based on the uncertainties of a model and observations. Among existing Kalman filter variants, the ensemble Kalman filter (EnKF) has proven to be a popular choice for hydrologic applications since it allows for the nonlinear propagation of background error and, thus, exhibits robust performance for nonlinear hydrologic models. There have been numerous EnKF-based studies that assimilated synthetically generated (e.g., Reichle et al. 2002), in situ (e.g., Margulis et al. 2002; De Lannoy et al. 2007), and remotely sensed surface soil moisture retrievals (e.g., Margulis et al. 2002; Crow and Wood 2003; Reichle and Koster 2005; Reichle et al. 2007) into land surface models (LSMs) from local to global scales.

In the EnKF approach, an ensemble of model states is generated by adding noise to state variables, model parameters, and/or forcing variables. The uncertainty of land surface model forecasts is then represented by the spread of these stochastically generated ensemble states. A Gaussian random number with a mean of zero is commonly chosen to perturb the model states and parameters due to the assumption of unbiased state variables in the Kalman filter. The rationale of this ensemble perturbation is to simulate model uncertainty using a Monte Carlo approach without directly affecting the mean performance of the model. This rationale is based on the implicit assumption that mean-zero Gaussian noise added during the EnKF analysis should not cause a systematic bias in model output when the ensemble mean of the perturbed state variables are compared with unperturbed state variables. However, owing to nonlinear processes imbedded in land surface models, it is unavoidable that ensemble perturbation using Gaussian noise will lead to biased model forecasts (De Lannoy et al. 2006). As a consequence, the mere ensembling of the model during implementation of the EnKF can introduce systematic error into its flux and state predictions. Examples of nonlinear model processes potentially responsible for such error include infiltration, the vertical percolation of soil water, and evapotraspiration.

It should be stressed that the perturbation bias explained above will likely constitute only a fraction of the total model versus observation bias encountered in a land surface data assimilation problem. In general, the magnitude of other bias components—originating from measurement instrument features, soil moisture retrieval algorithm, and the bias in the land surface model itself—are difficult to estimate and attribute to a specific source (Reichle 2008). Consequently, they must be indirectly estimated either by rescaling long-term measurements to statistically match model output (Reichle and Koster 2004; Drusch et al. 2005) or by using “bias aware” assimilation methods (Dee 2005; De Lannoy et al. 2007). However, the perturbation-based bias discussed above is unique in that it can be conclusively attributed to known nonlinear properties of the hydrologic model and, therefore, directly calculated during the cycling of a data assimilation system. Consequently, it may be possible to operationally correct for its effects through relatively simple modifications to the EnKF.

To examine this possibility, this analysis will focus on three specific questions:

How significant are biases in ensemble model forecasts arising from nonlinearities in model forecasts?

How do these ensemble biases affect subsequent EnKF updating?

Can these biases be operationally corrected during the implementation of an EnKF?

A simple perturbation experiment, summarized in Fig. 1, is conducted to address the first question and motivate the rest of the analysis. The figure compares soil moisture contents at four model soil layers from two different modeling cases: a single base run without ensemble perturbations (black lines) and an ensemble run of 100 perturbed members with zero-mean Gaussian noise added to soil moisture contents (gray dashed lines). For this perturbation experiment, the Noah land surface model (LSM) is run for two years over a medium-scale basin located in Oklahoma (see Fig. 3). The gray lines in Fig. 1 show averaged soil moisture contents for 100 perturbed Noah ensemble members; each member of the top layer is perturbed by adding zero-mean Gaussian noise with a standard deviation of 0.03 m^{3} m^{−3} day^{−1}. This perturbation noise is scaled by the layer thickness in the subsequent soil layers so that the standard deviation of total water volume applied to each layer is constant. Specifications of this experiment are summarized in columns 2 and 3 of Table 1.

The root-mean-squared error (RMSE) and bias of soil moisture calculated using the base run and the mean of the perturbed ensemble throughout the simulation at each layer are summarized in Table 2. Soil moisture in the top layer does not show substantial RMSE or bias except at the end of some drying periods (see Fig. 1). The noticeable positive biases in the perturbed ensemble run at the drawdown are caused by the bounded nature of soil moisture; that is, when a perturbed soil moisture content falls below the limit of soil water content (the residual water content), it is replaced by the lower limit of the soil moisture. The same truncation process also applies at the wet end of the soil moisture. However, RMSE and bias increase with depth, and the biases in deeper layers persist without recovering after rainfall events. Most notably, the bias in the forth layer persistently increases until it levels off at approximately 0.04 m^{3} m^{−3}. These systematic biases within deeper soil layers are likely the result of nonlinear model physics because, due to the scaling of perturbation noise by layer thickness, very little boundary truncation occurs in lower (and thicker) soil layers. Considering that the basic objective of model perturbation is to explicitly simulate the model prediction error without systematically biasing the model forecasts, the results of this perturbation experiment imply a defect of the Monte Carlo approach for assimilating state observations into a nonlinear model. Moreover, the bias observed in the bottom layer may persist even after data assimilation updating since most remotely sensed observations provide information only along a thin (0–5 cm) surface layer and rely exclusively on information contained within the model ensemble to update deeper soil moisture layers. In addition, because of their thickness and high storage capacity, small volumetric errors in deep model soil layers can have a large impact on overall water balance and storage calculations.

In this work, we analyze the impact of the systematic perturbation biases shown in Fig. 1 on the performance of land data assimilation using the EnKF and introduce a simple method to reduce their effect on EnKF predictions. A series of identical synthetic twin experiments are conducted to quantify the impacts of biases originating from state perturbation and to evaluate our proposed bias-correction scheme. In section 4, a discussion on the nonlinear land surface processes responsible for the biases is provided.

## 2. Ensemble Kalman filter and bias correction

In this section, a brief introduction to the ensemble Kalman filter is presented, and a methodology for correcting the biases caused by ensemble perturbation is described. An in-depth analysis of the processes responsible for these biases will be provided later, in section 4.

### a. The ensemble Kalman filter

The Kalman filter is a sequential Bayesian estimator that optimally updates model predictions with observations based on the relative magnitude of uncertainties existing in the model and observations. These uncertainties are expressed as error covariance in the filter equations. In the ensemble Kalman filter, a Monte Carlo approach is employed to create an ensemble of model predictions from which this error covariance information is calculated explicitly.

Supposing **y*** _{t}* is a vector of state variables at time

*t*, the observation vector

**z**

*can be related to the state variables in the measurement equation as*

_{t}where * ɛ_{t}* represents independently and identically distributed (IID) Gaussian observation error with covariance [𝗖

_{ɛ}]

*and 𝗛 is a measurement operator. In the EnKF, an ensemble of model states*

_{t}**y**

*is created by adding noise to the state variables, model parameters, or forcing data so that the resulting ensemble members can be assumed to be normally distributed random variables. The performance of the filter depends largely on how well and realistically the statistics of the ensemble states represent the actual model prediction error (Crow and van Loon 2006; Reichle et al. 2008). At each measurement time step, the model forecasts comprising*

_{t}*i*th realization within the ensemble,

**y**

_{t}

^{i−}, are updated to

**y**

_{t}

^{i+}by the measurements

**z**

*using an update equation as follows:*

_{t}Here, the Kalman gain matrix 𝗞* _{t}* is given by

where 𝗖* _{M}* is the error covariance matrix of the measurement forecasts 𝗛

**y**

_{t}

^{i−}and 𝗖

*is the cross covariance between the measurement forecasts (𝗛*

_{YM}**y**

_{t}

^{i−}) and the predicted model states (

**y**

_{t}

^{i−}). For each ensemble realization, synthetically generated error

**ɛ**

*is added to the measurement*

_{t}^{i}**z**

*, so that the spread of the updated model states better represent the true error statistics (Burgers et al. 1998). The error statistics of the system represented by 𝗖*

_{t}*and 𝗖*

_{M}*are calculated using the ensemble realizations of the measurement forecasts and predicted model states. Filter-updated model predictions are derived by averaging ensemble diagnostic variables that are calculated from the realizations of updated model states.*

_{YM}### b. Correction of unintended perturbation biases

It is assumed in the Kalman filter system that the error terms in prediction and update steps (i.e., model and observation errors) are serially uncorrelated and mutually independent Gaussian random variables with a mean of zero. Therefore ensemble perturbation in the EnKF is commonly achieved through the application of the additive Gaussian noise to model states, parameters, and forcing data. The subsequent spread of the resulting ensemble represents the uncertainty of the model background forecasts. However, owing to the nonlinear processes involved in the land surface model physics and bounded nature of soil moisture state variables, zero-mean ensemble perturbations often cause biased or skewed error statistics in background forecasts. Although the EnKF is known to be a robust choice for nonlinear hydrologic models (Reichle et al. 2002), the resulting perturbation bias is unintended since the model perturbation to generate an ensemble states theoretically should not degrade the mean performance of the underlying model.

Figure 2a is a schematic of the typical EnKF-based data assimilation system. The green circles denote the ensemble averages of the background prediction and the red circle denotes a measurement. The background ensemble prediction **y**_{t}^{−} is updated to **y**_{t}^{+} using the measurement vector **z*** _{t}* at analysis time step

*t*(shown as a blue block arrow in Fig. 2a). The blue circle in Fig. 2a denotes the ensemble mean of

_{k}**y**

_{t}

^{+}after the update. The ensemble perturbation bias originating from the model nonlinearity accumulates at each background prediction step, and the propagation of this bias-prone ensemble-mean state is shown as blue arrow lines in Fig. 2a. Since the source of this ensemble perturbation bias is perfectly known (i.e., ensemble perturbation noise), the bias can be calculated in a deterministic way by comparing the mean of background ensemble prediction with a single realization without ensemble perturbation noise.

To correct the ensemble perturbation bias, additional set of the state variables **y**_{t}^{−o} is calculated by running a single realization of the model (without ensemble state perturbation) in parallel with the perturbed ensemble simulation. This single unperturbed background prediction **y**_{t}^{−o} is then used as an anchor to correct the state bias caused by the ensemble perturbation. It is assumed that the ensemble spread is created by adding Gaussian noise only to the state variables. The mean bias of the perturbed state variables of *n* ensemble members at the time *t*, *δ _{t}*, is calculated by subtracting the unperturbed background prediction from the ensemble mean of the perturbed states as follows:

and the perturbation bias *δ _{t}* is subtracted from the perturbed ensemble states

**y**

_{t}

^{i−}to obtain the bias-corrected ensemble of the state variables

**ỹ**

_{t}

^{i−}as

Figure 2b illustrates the bias-corrected EnKF scheme. The green circles enclosed by gray lines denote state variables resulting from the unperturbed background prediction processes that are shown as red arrow lines. At every time step, the biases in the perturbed background predictions are corrected using the unperturbed background prediction, as described in the Eqs. (4) and (5) (shown as red block arrows in Fig. 2b). At the analysis—the EnKF update—time step *t _{k}*, when the measurements are available, the bias-corrected ensemble members are updated.

## 3. Land surface model and synthetic experiment

The bias-correction scheme presented in the previous section is applied to the land data assimilation using the Noah land surface model. A set of synthetic assimilation experiments is carried out using a real dataset for a medium-scale basin in Oklahoma for two years, between 2003 and 2004. In this section, a summary of the Noah LSM, a description of input data, and specifications for the assimilation experiments are presented.

### a. The Noah land surface model

The Noah LSM is a one-dimensional (1D) soil–vegetation–atmosphere transfer (SVAT) model that is currently used for the operational Global Forecast System (GFS) and North American mesoscale (NAM) models by the National Centers for Environmental Protection (NCEP). Each 1D column of the Noah domain is composed of four soil layers whose thicknesses, 0.1, 0.3, 0.6, and 1.0 m, increase with depth by default. The default soil layer thicknesses of 0.1, 0.3, 0.6, and 1.0 m from top to bottom are modified to 0.05, 0.15, 0.5, and 1.3 m to match the approximately 5-cm soil measurement depth expected for L-band (∼1.4 GHz) microwave instruments. The Noah LSM employs a Richardson equation to calculate the drainage of soil water and the hydraulic conductivity in unsaturated medium (*K*) is calculated following Campbell (1974),

where *b* is the Campbell water-retention parameter, *K _{s}* the saturated hydraulic conductivity,

*θ*the soil water content, and

*θ*the saturated soil water content. The constants

_{s}*A*and

*B*are determined by the pore-size distribution of the soil. The default values of

*A*= 2 and

*B*= 3 are used. After percolating through the four layers, the soil water leaves the forth layer as subsurface runoff by gravity drainage.

In the Noah LSM, total evapotranspiration is calculated as the sum of the following three processes: direct evaporation of the canopy-intercepted water, bare soil evaporation from the top soil layer, and transpiration by the plant canopy. The seasonally varying green vegetation fraction *σ _{f}* is used to partition the contributions of these three components within a grid cell, and the vegetation density within the vegetated fraction is derived from the leaf area index (LAI). The bare soil evaporation

*E*is nonlinearly related to soil moisture as follows:

where PE is the potential evaporation and *θ _{r}* is a residual water content. The potential evaporation PE is derived from the Penman relationship following Mahrt and Ek (1984);

*n*= 2 is used to simulate rapidly decreasing evaporation with drying (Ek et al. 2003). The transpiration is modeled as a function of a canopy conductance (Jarvis 1976; Chen et al. 1996), which is parameterized using LAI in the Noah LSM. The rooting depth is specified by the number of layers from the top, normally between 2 and 4, depending on the vegetation type.

Infiltration and surface runoff are calculated following the model suggested in Schaake et al. (1996), where the infiltration capacity is a function of the moisture deficit, which is a function of the profile soil moisture of the four soil layers. During each rainfall event, the infiltration capacity exponentially decreases with time. For comprehensive information about the model physics of Noah LSM, refer to Ek et al. (2003).

### b. Experiment site and data description

To evaluate the suggested bias-correction scheme and to test the impact of corrected biases on the EnKF-based data assimilation system, a series of synthetic experiments are performed at a medium-scale (∼5000 km^{2}) basin located east of Oklahoma City, near the center of Oklahoma (see Fig. 3). The basin has the U.S. Geological Survey (USGS) site number 07243500 (daily observations of discharge from its outlet from 1938 to present are available online at http://waterdata.usgs.gov). The dominant land cover is grassland (∼38% by area), deciduous forest (∼38%), and pasture (∼12%), but the eastern end of the basin (∼8%) is developed and covered with a mixture of vegetation and impervious material. The dominating surface soil textures of the basin are sandy loam (∼61% by area), silt loam (∼21%), and loam (∼13%).

The Noah LSM is run on a 0.25° × 0.25° grid with a time step of 30 min from 1 January 2003 to 31 December 2004. The forcing data is derived from the Global Data Assimilation System (GDAS) (Derber et al. 1991). The size of the basin is equivalent to approximately eight 0.25°-resolution grid cells, among which the Noah LSM is run in four grid cells located near the center of the basin. Soil hydraulic parameters for the basin are derived using the empirical functions suggested in Zobler (1986), and the necessary inputs (e.g., percentages of sand, silt, and clay contents) are obtained from the Food and Agriculture Organization (FAO) of the United Nations Digital Soil Map of the World (available online at http://www.fao.org/geonetwork). The vegetation fraction within each grid cell is derived from a 1-km resolution land cover classification data provided by the Global Land Cover Facility of the University of Maryland (available online at http://glcf.umiacs.umd.edu/data/). Temporally changing canopy conductance is calculated using LAI (Chen et al. 1996) collected by the Advanced Very High Resolution Radiometer (AVHRR). It is assumed that plant rooting depth is 2 m in the entire basin, and the root distribution in the four soil layers is uniform. The assumption of uniform rooting depth is made to analyze the interplay of hydrologic processes for generating biases in soil moisture at all depths. Atmospheric forcing and air temperature data are from the GDAS (Derber et al. 1991). To generate initial conditions for our experiments, Noah LSM was spun up for two years from 1 January 2001 to 31 December 2002.

### c. Experimental design

Two sets of synthetic experiments are conducted to investigate the impact of ensemble perturbation and our proposed bias-correction scheme on background soil moisture forecasts and a subsequent EnKF analysis. In the first experiment—referred to as the “perturbation experiment”—an ensemble of 100 members is run with zero-mean Gaussian noise added to the soil moisture content of each member to create randomly distributed ensemble states. Only the soil moisture content is perturbed in the synthetic experiments throughout this paper. The objective of this perturbation experiment is to evaluate systematic biases in state variables caused by the application of perturbation noise. The standard deviation of Gaussian noise added to the top soil layer is 0.03 m^{3} m^{−3} day^{−1}, and the magnitude of the noise is scaled by the layer thickness in the second to forth layers. Since a half-hourly time step is chosen to run the model, the daily perturbation rate of each layer is divided by 48 and added to the soil moisture every half-hourly time step. A Gaussian random number with a standard deviation *σ* is equivalent to the sum of *n* Gaussian random numbers with a standard deviation *σ*/*n*. The ensemble members are run in parallel throughout the simulations in this work. The daily perturbation strength for each layer is summarized in Table 1. The results from the ensemble perturbation are shown in Fig. 1 and Table 2. The bias-correction scheme described in Eqs. (4) and (5) is then applied to the perturbed ensemble soil moisture states to evaluate its effectiveness. However, when the ensemble soil moisture states are close to either wet or dry bound—porosity or residual soil water content—the moisture contents exceeding the bounds are replaced by the maximum or minimum water content. This boundary truncation is applied after the bias-correction process as well as after each perturbation and background prediction step. Owing to the boundary truncation, there may still remain some biases after the bias-correction scheme is applied. The remaining bias can be reduced by applying the bias-correction and boundary truncation processes repeatedly. In this work, the bias-correction and boundary truncations processes are iterated 10 times at every background prediction step.

In the second experiment, the effect of ensemble perturbation bias on the EnKF-based data assimilation is investigated. In addition to the ensemble perturbation error, a zero-mean Gaussian error is added uniformly to ensemble soil moisture contents at each time step as a background prediction error; this is named an open-loop run. In other words, a Gaussian noise of an identical magnitude is added to every ensemble members so as to simulate the background prediction error. In the open-loop simulation, the magnitude of background prediction error is assumed to be 0.05 m^{3} m^{−3} day^{−1} at the top soil layer and the magnitude is scaled by layer thickness in layers 2–4. It is assumed that the background prediction errors for the four layers are perfectly correlated. The perturbed background predictions are then updated once per day using measurements of top-layer soil moisture derived from the unperturbed base run of the Noah LSM. It is assumed that the measurements are collected at 0100 LST, and the measurement error is 0.03 m^{3} m^{−3}. The measurement error is added to daily top-layer soil moisture from the base run to create a synthetic observation dataset. Since serially (or temporally) uncorrelated Gaussian random noises are added to both the background prediction and measurement for artificial degradation, they should be in the same climatology with the base run unless the background prediction is affected by the ensemble perturbation.

The synthetic measurements are assimilated into the open-loop background predictions in the EnKF runs following two scenarios. In the first scenario (EnKF-1), the measurements are assimilated without applying the bias correction scheme. In the second scenario (EnKF-2), the ensemble perturbation bias is corrected at every background prediction step using the proposed method [Eqs. (4) and (5)], and in the EnKF analysis step, the measurements are assimilated into the background predictions after bias in background states has been corrected (see Fig. 2b). In both scenarios, soil moisture contents in layers 2–4 are also updated using the measurements at the top layer and error covariance between the measurements and predicted soil moisture in the corresponding layer. For the EnKF-2 scenario, the additional set of state variables [**y**_{t}^{−o} in the Eq. (4)] introduced to correct the ensemble perturbation bias is also degraded with the background prediction error and is updated by the observations in the EnKF analysis step, like for the other ensemble-perturbed members.

The open-loop, EnKF-1, and EnKF-2 simulations are run with 30-member ensembles. A layer-by-layer ensemble mean of soil moisture averaged over the four grid cells is presented for each case. The error statistics for the perturbation setups are listed in Table 1.

## 4. Results and discussion

The first part of this section describes how individual nonlinear processes (e.g., drainage and evaporation) create systematic biases in the perturbed Noah ensemble. Results from a sensitivity test conducted using the Noah LSM are then analyzed to discuss the integrated nonlinear effect of hydrologic processes participating in the soil water balance. After that, results of the ensemble perturbation and EnKF experiments are presented, and implications of the bias-correction scheme on water flux calculations and reducing soil water mass balance errors are discussed.

### a. Nonlinear hydrologic processes

The strong nonlinear feature of soil water drainage function originates from the hydraulic conductivity, which is a strong nonlinear function of soil moisture under unsaturated conditions. Figure 4a shows an example of hydraulic conductivity from Campbell (1974) for a sandy loam soil as a function of soil moisture. Two sets of volumetric soil moisture inputs are shown in the bottom box of Fig. 4a: the red line is a single input of 0.35 m^{3} m^{−3}, and a blue line is a distributed ensemble of 10 000 soil moisture values with the mean of 0.35 m^{3} m^{−3} and standard deviation of 0.03 m^{3} m^{−3}. The distributed soil moisture input represents an ensemble of soil moisture states perturbed by zero-mean Gaussian noise. Resulting hydraulic conductivities from these inputs are shown in the left box of Fig. 4a. The ensemble of hydraulic conductivities resulting from the distributed inputs (blue line) are positively skewed; thus, the ensemble mean of the distributed hydraulic conductivity (green line) is greater than the hydraulic conductivity resulting from the ensemble mean of the distributed soil moisture states. The positively skewed output distribution resulting from a normally distributed input is a natural consequence of a convex nonlinear function, and the skewness, which is proportional to the bias of the averaged output, is determined by the concavity of the nonlinear function. In the given example of hydraulic conductivity, the bias increases with the mean of the distributed input as the curvature of soil moisture/hydraulic conductivity function increases with soil moisture.

Evaporation in Fig. 4b is another typical example of a nonlinear hydrologic process. The ratio of actual evaporation to potential evaporation is often calculated as a convex function of soil moisture until it reaches a threshold value, beyond which the actual evaporation is assumed to equal potential evaporation (e.g., Davies and Allen 1973; Federer 1979). Within the convex range of the function, the nonlinearity of the soil moisture–evaporation relationship works in the same way as the similarly convex soil moisture/drainage function and produces a positive bias in mean evaporation. However, when some part of the distributed input falls beyond the threshold soil moisture, actual evaporation behaves like a concave function and the resulting distributed actual evaporation develops a strong negative skewness. The magenta lines in Fig. 4b demonstrate this reversed bias of evaporation. Since the ratio of evaporation to potential evaporation is a convex function of soil moisture without a limiting threshold in the Noah LSM (see the blue dashed line in Fig. 4b), the negatively biased evaporation caused by the threshold soil moisture is not observed.

When hydrologic processes work simultaneously during a Noah model simulation, the nonlinear effects of various processes can either counteract or reinforce each other. To obtain insight into their integrated effect within the Noah LSM, a set of sensitivity tests are conducted. The Noah LSM is run for 30 days with selected combinations of hydrologic processes in the same spatial domain as previous synthetic experiments. The sensitivity tests are run from 1 June 2003 and initial soil moisture conditions are set to 0.40 m^{3} m^{−3} for all layers. The perturbed simulation is run with an ensemble of 30 members of state variables, and zero-mean Gaussian noise with a standard deviation of 0.03 m^{3} m^{−3} day^{−1} is added to the top-layer soil moisture. Thus perturbation noise is scaled by soil layer thickness in the layers 2–4. The sensitivity test simulations are run in five modes in which various processes are suppressed within the Noah LSM: 1) only soil water drainage is activated (*D*); 2) soil water drainage and bare soil evaporation are activated (*D* + *E*); 3) soil water drainage and vegetation canopy transpiration are activated (*D* + *T*); 4) drainage, bare soil evaporation, and vegetation canopy transpiration are activated (*D* + *E* + *T*); and 5) drainage, bare soil evaporation, vegetation canopy transpiration, and infiltration are activated, that is, every process is allowed to operate (*D* + *E* + *T* + *I*).

Figure 5 shows layer-by-layer mean biases of soil water contents from the five modes of Noah simulations after 30 days. The mean bias is calculated by subtracting the unperturbed base run output from the results of the perturbed sensitivity test runs. The unperturbed base run here is performed in the same mode with the perturbed sensitivity test runs for each case. When drainage is the only process vertically transporting water (D), negative mean biases are observed in all the layers. As illustrated in Fig. 4a, this negative ensemble mean bias of soil moisture is caused by nonlinear drainage process, which is a convex function of soil moisture. However, when evaporation works in combination with drainage (*D* + *E*), owing to the negatively biased top-layer soil moisture, evaporation decreases for the perturbed ensemble soil moisture; this, in turn, causes enhanced downward model drainage and the excessive accumulation of soil water within deeper layers. Note that the evaporation extracts soil water only from the top soil layer in the Noah LSM. The transpiration function of Noah LSM is a linear function of soil moisture; however, due to the inversely proportional relationship between soil moisture and soil temperature, the actual transpiration is a concave function of soil moisture. That is, perturbing soil moisture with zero-mean Gaussian noise results in negatively biased transpiration. When the transpiration is activated with drainage (*D* + *T*), the effect of this transpiration nonlinearity acts to reduce the size of negative biases in soil moisture present in all soil layers with the size of the offset increasing with depth. When both evaporation and transpiration are activated (*D* + *E* + *T*), the contribution of the concave transpiration function to reduce the negative bias of soil moisture by drainage is diminished since the magnitude of the negative bias is also reduced by evaporation, as shown in the second case (*D* + *E*). Conversely, it can be explained that the effect of strong nonlinearity in the evaporation function, especially its impact on deeper layer soil moisture states, is partly offset by the nonlinearity in the soil moisture/transpiration function. The greater positive bias in the *D* + *E* + *T* case than that in *D* + *E* case is due to the boundary truncation of soil moisture in deeper layers after a long period of drying. When infiltration is activated (*D* + *E* + *T* + *I*), so that the input of rainfall to the top of the soil column is allowed, the impact of the boundary truncation effects on top-layer soil moisture decreases. This leads to a less vertical drainage from this layer and a reduction in the positive biases associated with transpiration from deeper soil layers.

### b. Ensemble perturbation experiment and bias correction

It is shown in Fig. 1 and Table 2 that the ensemble perturbation of soil moisture introduced for the stochastic representation of background prediction error in the EnKF can cause biases in subsurface soil moisture states. RMSE and mean bias (BIAS) are calculated by taking differences between the hourly outputs from perturbed and unperturbed model predictions averaged over the four grid cells used in the simulations. The RMSE and BIAS calculations hereafter will employ the same definitions. The six-monthly (wet winter and dry summer) RMSE and BIAS values listed in Table 2 (columns 3 and 4) indicate that, whereas the moisture content at the top soil layer is negatively biased, the moisture contents in the Noah model layers 2–4 are positively biased in a manner that increases with depth. In the bottom layer, the bias is greater in the dry summer period than in wet winter. The reduced negative bias at the top layer in April–September 2004 is caused by lower-boundary-truncation effects operating during persistent dry periods within the simulated time span.

The last two columns of Table 2 summarize RMSE and BIAS statistics for the mean of the perturbed ensemble soil moisture states after the bias-correction scheme is applied. The bias-correction scheme removes nearly all the RMSE and BIAS existing in the mean of the perturbed soil moisture ensemble. This is a fairly obvious consequence of the proposed bias-correction scheme since the correction scheme, as described in Eqs. (4) and (5), operates in such a way that it removes the difference between the state variables from the base run and the ensemble mean of the perturbed simulations. The ensemble perturbation bias is a perfectly known value—that is, the difference between the mean of the perturbed ensemble simulations and the unperturbed base run—and the bias is removed in a deterministic way in the proposed bias-correction scheme.

### c. Ensemble Kalman filter experiment and bias correction

In the EnKF experiment, the bias-correction scheme is applied in combination with the EnKF to investigate the effect of perturbation bias on the filter’s performance. Figures 6 –8 display time series of layer-by-layer soil moisture contents (gray lines) from open-loop, EnKF-1, and EnKF-2 simulations, respectively. Black lines denote synthetic-true soil moisture contents generated from the base run.

In the case of the open-loop simulation, mean soil moisture contents in layer 1 are slightly lower than synthetic true values during most of the simulation period; however, during some persistent dry periods, for example, approximately 40 days from the 200th day of 2003, the need to truncate values at the dry soil moisture bound creates mean soil moisture contents noticeably higher than the synthetic truth. In layer 2, mean soil moisture contents mostly follow the synthetic true values closely, while during the dry periods they are still higher than true soil moisture contents. The positive biases in layers 1 and 2 during the prolonged dry periods cause further positive biases in deeper layers.

In layers 3 and 4, positive biases in soil moisture persist regardless of whether the time period is generally wet or dry. The boundary truncation scheme in layers 3 and 4 causes a certain volume of mass balance error, which is partly responsible for the positive biases in deeper layers. However, considering the magnitude of the scaled ensemble perturbation and the background prediction error added to deeper layers (see Table 1), there should exist other processes that cause the large biases in layers 3 and 4. The scaled (i.e., reduced) ensemble perturbation and model prediction errors in layers 3 and 4 should create only a small amount of truncation bias during very dry periods. Enhanced drainage from top layers originating from the nonlinear drainage process and noticeably high moisture content during drying periods created by the boundary truncation in the top two layers (e.g., see results around days 200–240 in Figs. 6 –8) are major sources of the positive biases in deeper layers of the open-loop simulation.

It can be deduced from the results in Fig. 6 that, even though the shape of the evaporation function (convex function of soil moisture) implies greater evaporation for the ensemble of perturbed soil moisture contents, when the evaporation and drainage processes occur simultaneously, the resulting evaporation process is weaker for the mean of the perturbed soil moisture. Enhanced downward percolation of soil water within the perturbed ensemble creates a negative bias in the top-layer water content, and this drier top-layer soil moisture results in decreased evaporation from the perturbed ensemble soil moisture. The net effect of the nonlinear processes in layer 1 is an important factor responsible for the persistent positive biases in the deeper layers.

Figures 7 and 8 show results from the EnKF-1 and EnKF-2 data assimilation scenarios. In EnKF-1, measurements of layer 1 soil moisture with 0.03 m^{3} m^{−3} of error are assimilated into background predictions using the EnKF without correcting biases; in EnKF-2, the ensemble perturbation biases are corrected and the measurements are assimilated. In EnKF-1, although overall biases decrease after assimilating the measurements, the positive biases in layers 1 and 2 during the dry period and the significant biases in layers 3 and 4 still persist. The lack of correction in layers 3 and 4 is particularly striking and suggests limitations in the ability of the standard EnKF implementation (EnKF-1) to accurately update deeper layer soil moisture. The deeper-layer biases in EnKF-2, however, are substantially reduced after the application of our bias-correction procedure. The results from the EnKF experiment presented in Table 3 suggest that, even though the combined application of bias correction and the EnKF cannot completely remove biases in soil moisture contents, the bias-corrected EnKF scheme can significantly reduce the impact of unintended perturbation biases on soil moisture calculations, which in turn decreases errors in water and energy flux predictions. The residual biases caused by the error added to degrade the background prediction of ensemble mean soil moisture cannot be removed by the bias-correction scheme introduced in this work. These residual biases of soil moisture in the EnKF-2 should not matter in real data assimilation since they do not require artificial model prediction error to be added to the background prediction steps. Only perturbation errors are added in the real data assimilations. It is worth noting here that, since the bias-correction scheme of this work removes known ensemble perturbation bias, its performance is not affected by the presence of biases from other sources in real data assimilations, such as the observation bias. Such biases can be corrected by using bias-aware filters (Dee 2005) or by matching the climatology of model and observations (Reichle and Koster 2004; Drusch et al. 2005).

Figure 9 shows results from a set of sensitivity tests performed to examine the relationship between the size of model prediction errors and the subsequent impact of the bias-correction scheme. The experiment setup is identical to that of earlier EnKF experiments except that the model prediction error (i.e., standard deviation) now varies incrementally from 0 to 0.05 m^{3} m^{−3} day^{−1}. The benefit of bias correction in layers 1 and 2 is small throughout the entire range of model prediction errors; however, in layers 3 and 4, a significant amount of error is eliminated through the application of our bias-correction scheme (EnKF-2). Largely owing to the degrading effect of perturbation bias, the uncorrected EnKF-1 cases make only a trivial improvement when the measurement error is greater than model prediction error. Moreover, whereas the RMSE of open-loop simulation in layers 1 and 2 levels off as the model prediction error increases, it grows at an increasingly steeper rate in layers 3 and 4 for both the open-loop and EnKF-1 cases. Results in Fig. 9 indicate that, especially for large background prediction errors, the correction of ensemble perturbation biases can significantly improve the ability of the EnKF to accurately constrain deep-layer soil moisture.

It should be noted, however, that the bias-correction scheme may reduce the ensemble spread of soil moisture when the ensemble mean is close to the dry (wet) bound of soil moisture in the course of removing positive (negative) biases caused by the boundary truncation. The reduced ensemble spread in turn leads to the degraded filter performance due to the underestimated model error covariance. Table 4 summarizes the ensemble spread (ensemble standard deviation) of the top-layer soil moisture derived from the ensemble Kalman filter experiment. The ensemble standard deviations stand for the ensemble spread of soil moisture in one of the four grid cells of open-loop, EnKF-1, and EnKF-2 simulations. The ensemble standard deviations are grouped into four bins of the top-layer soil moisture of the unperturbed base run. The ensemble-spread contraction of the EnKF-2 is greatest when the soil moisture content is below 20%vol vol^{−1}. This is responsible for the slightly larger layer 1 RMSE of EnKF-2 than that of EnKF-1 (see Table 3). The last column of the Table 4 indicates that decreasing the number of iterations in the bias-correction scheme can reduced the ensemble-spread contraction. However the single iteration scheme slightly improved the EnKF analysis in the top layer (RMSE = 1.71%vol vol^{−1}) at the expense of greater loss in deeper layer soil moisture and water and energy fluxes predictions (results are not show here).

### d. Ensemble perturbation and nonlinear water fluxes

In this section, the effects of ensemble perturbation on individual and integrated ensemble water flux predictions are discussed. Water fluxes from the ensemble perturbation experiment (refer to sections 3c and 4b) are used in the discussion.

Figure 10 shows water flux errors of a perturbed ensemble simulation before (black lines) and after (gray lines) the correction of perturbation biases. Before the ensemble perturbation bias is corrected, errors in transpiration, surface runoff, and subsurface runoff are positively biased, whereas the bare soil evaporation error is negatively biased. However, after the application of our bias correction scheme, the sign of biases for evaporation and transpiration errors have switched, resulting in a persistently negative transpiration bias and a persistently positive bias in bare-soil evaporation error. The observed bias pattern in water flux error can be explained by nonlinearities imbedded in individual hydrologic processes and their integrated effect on Noah water flux calculations.

When a hydrologic flux process is a convex function of soil moisture, the ensemble mean of the flux calculated using perturbed soil moisture state variables should be larger than a single calculation of the same flux using an unperturbed soil moisture prediction (see Fig. 4a). Conversely, in case of a concave function, the ensemble mean flux should invariably be smaller than that calculated using a single realization of the unperturbed model. In a bias-corrected ensemble simulation, the convex-type structure of nonlinear functions of soil moisture such as evaporation and drainage [see Eqs. (6) and (7)], results in persistently positive biases in evaporation and subsurface runoff (Figs. 10a and 10d). The positive bias in surface runoff error originates from the Noah infiltration parameterization, where the infiltration capacity is a concave function of soil moisture (Schaake et al. 1996). In the Noah LSM, with given potential transpiration calculated following Chen et al. (1996), the actual transpiration is a linear function of soil water content. However, owing to the inversely proportional relationship between soil moisture and soil temperature, the potential transpiration is, in fact, a concave function of soil moisture, which causes the negative biases in ensemble transpiration error.

On the other hand, the preferential signs in water flux biases change for evaporation and transpiration errors in the biased ensemble perturbation simulation (black lines in Figs. 10a and 10b). The reversed sign of evaporation bias is caused by negatively biased top-layer soil moisture due to the convex drainage function (note that, in Noah, evaporation extracts water only from the top soil moisture layer), while the reversed sign of biases in transpiration is caused by a positive bias in deep soil moisture layers. The RMSE and bias of Noah water flux and layer-by-layer soil moisture predictions are presented in Tables 2 and 3.

### e. Ensemble perturbation and mass balance error

In this section, soil water mass balance error caused by the ensemble perturbation is analyzed and the effect of bias correction on this mass balance error is discussed.

When an ensemble of soil moisture states is perturbed by zero-mean Gaussian noise, the ensemble of background prediction processes, in theory, should not violate the mass balance of water when they are averaged. Results from the perturbation experiment presented in Fig. 10 and Table 5 indicate that the strength of water fluxes out of the soil layers (total evapotranspiration and subsurface runoff) increases while the infiltration of rainwater decreases (see the increased surface runoff in Fig. 10c) compared to the base run when the ensemble of soil moisture contents are perturbed. These changes in water fluxes should result in a decrease of soil water content in the ensemble perturbation simulation. However, the large positive biases in the deeper-layer soil moisture after the perturbations indicate an increase in soil water, which implies a violation of water mass balance linked to the ensemble perturbation of soil water content.

The mass balance equation of soil water in the soil column can be expressed as

where *E*, *I*, *R*, and *T* denote evaporation, infiltration, subsurface runoff, and transpiration, respectively, and Δ*S* is the change in soil water content within a certain period of time. The last term, Θ, is introduced here to represent the mass balance error and is assumed to be zero for the base run of the physically based hydrology model. By applying Eq. (8) to the base run and the ensemble perturbation run and rearranging it, the mass balance error Θ* _{p}* can be expressed as

where subscripts *p* and *b* denote the perturbed ensemble simulation and the base run, respectively. Supposing Θ* _{p}* is a mean mass balance error during the simulation period, each parenthesized term in Eq. (9) is equivalent to the mean bias in soil moisture change and water fluxes listed in Table 5. The bias of infiltration (

*I*−

_{p}*I*) is equal to the mean bias of surface runoff with the opposite sign. The cumulative mass balance error over the 2-yr simulation period can be derived by temporally integrating biases in Eq. (9). Since the same initial conditions are used for both the base run and ensemble perturbation run, the integrated Δ

_{b}*S*is calculated by subtracting the soil moisture content of the base run from those of the ensemble perturbed run at the last time step. Using the biases listed in Table 5, cumulative mass balance errors from 1 January 2003 to 31 December 2004 are calculated for biased ensemble perturbation and bias-corrected ensemble perturbation runs; the cumulative mass balance error of soil water is 145.4 mm for the biased ensemble perturbation run and 18.4 mm for bias-corrected ensemble perturbation run. These soil moisture mass-balance errors originate from the boundary truncation processes near both the wet and dry bounds of soil moisture content. Although the mass balance error can be either positive (near the dry bound) or negative (near the wet bound), the cumulative effect of these errors result in generally positive biases because, during the simulation period, soil moisture contents tend to stay near the dry bound of the feasible soil moisture range for longer periods of time.

These results demonstrate the implications of the proposed bias-correction scheme for reducing the mass balance error of soil water caused by the boundary truncation of soil moisture values. In fact, applying Kalman filter analysis to a physically based hydrologic model obviously results in mass balance errors for soil water in the land data assimilation system. This mass balance error originating from Kalman updating can be minimized when model versus observation bias is removed. When ensemble perturbation creates additional mass balance error in soil water, however, the Kalman update may reduce the total mass-balance error by correcting positively biased soil moisture variables. Applying Eq. (9) to the output of the EnKF experiment, 361.4 mm of mass balance error results from the open-loop simulation between 2003 and 2004 (input for this calculation is provided in Table 3). Although the mass balance error decreases to 200.4 mm in EnKF-1 simulation, there still remains a significant amount of mass balance error. However, by conducting the EnKF update in combination with bias correction (EnKF-2), the mass balance error further decreases to 33.6 mm. These results indicate that, although mass balance error of soil water caused by ensemble perturbation of soil moisture can be partially removed by the EnKF, there still remains large amount of the mass balance error, and this error can be reduced using the bias-correction scheme proposed in this work with the EnKF.

## 5. Summary and conclusions

In an EnKF-based data assimilation system, background prediction errors are stochastically simulated by adding mean-zero noise to model states, forcing data, and parameters. The explicit representation of error statistics in the EnKF makes it a robust choice for hydrologic models that contain a number of nonlinear processes. However, it is demonstrated in this work that perturbing the state variables of a nonlinear hydrologic model to create an ensemble of model state forecasts can create systematic biases in LSM soil moisture predictions. For deeper soil moisture layers, this bias persists even after the assimilation of surface observations with the EnKF. The demonstrated effects of ensemble perturbation bias on mass balance and EnKF-based data assimilation results originate from the integrated effect of nonlinear hydrologic processes, such as soil water drainage, evaporation, and transpiration, and from the boundary truncation scheme of soil moisture near the wet (porosity) and dry (residual water content) bounds of soil water content.

To correct these effects, we develop a simple bias-correction scheme capable of correcting perturbation biases during the operational cycling of an EnKF. Specifically, the mean perturbation bias of soil moisture is corrected using a single realization of the unperturbed model run in parallel with the ensemble simulation. The proposed bias-correction scheme reduces mass balance errors within Noah soil moisture states, especially for deeper subsurface soil moisture states, and significantly improves the accuracy of EnKF water flux predictions.

Difficulties in correcting deep Noah soil moisture states via the assimilation of surface soil moisture retrievals have been reported in previous studies (e.g., Kumar et al. 2008). The results from this analysis suggest a possible explanation for these difficulties, and the proposed bias-correction scheme may provide a way to improve the overall performance of the EnKF-based hydrologic data assimilation.

It has been suggested that, due to the differences in climatologies for land surface models and satellite observations of surface soil moisture, space-borne measurements of soil moisture need to be rescaled using land surface model output before their assimilation (Reichle and Koster 2004; Drusch et al. 2005). By rescaling observations using model outputs, systematic differences between the model and soil moisture observations can be eliminated. However, since such rescaling is often performed using soil moisture statistics derived from an unperturbed single realization of the model, ensemble perturbation will still produce biases in ensemble mean soil moisture even after the satellite measurements have been rescaled. This, in turn, will result in a biased distribution of filter innovations. Unbiasedness of the filter innovations is an important criterion, along with its whiteness, for evaluating the performance of the EnKF analysis. Consequently, the bias-correction scheme suggested in this paper also has important implications for improving filter innovation statistics of the EnKF even when the satellite observations whose climatology is made equal to models, are assimilated into land surface models.

Lastly, in the demonstrated synthetic experiments of this work, only soil moisture contents are perturbed to create an ensemble of state variables. However, the other components of the land surface model, such as input parameters and forcing data, can also be perturbed for the same purpose, in which case more complicated feedbacks among the processes in the model can generate systematic biases in a different way. The same rationale for the suggested bias-correction scheme can also be applied to correct biases from more complicated perturbation schemes. However, the impacts of these perturbation-based biases on the EnKF analysis still need to be investigated carefully so that we can better understand and mitigate the unintended effects of the ensemble perturbation.

## Acknowledgments

This work was partially supported by the NASA EOS Aqua Project Science and the NASA Terrestrial Hydrology programs through Grant NNH04AC301.

## REFERENCES

**,**

**,**

**,**(

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**(

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**(

**,**

## Footnotes

*Corresponding author address:* Dongryeol Ryu, Dept. of Civil and Environmental Engineering, The University of Melbourne, Parkville, Victoria 3010, Australia. Email: dryu@unimelb.edu.au

This article included in the Catchment-scale Hydrological Modelling & Data Assimilation (CAHMDA) III special collection.