## 1. Introduction

Most operational weather prediction centers worldwide adopt a variational data assimilation algorithm (Sasaki 1970; Le Dimet and Talagrand 1986; Rabier et al. 2000). The state estimation in the variational assimilation is formulated as an optimal control problem, and aims at determining the trajectory that best fits the observations and accounts for the dynamical constraints given by the law supposed to govern the flow. The accuracy of the variational solution is naturally connected to our knowledge of the error associated with the information sources. Based on the Gaussian hypothesis, such knowledge is expressed via error covariances and correlations. However, while an accurate estimate of the observation error covariance is usually at hand, more difficulties arise for the background and model error covariances.

In the last decades, extensive researches have been devoted to improve the estimation of the background error covariance, particularly in the context of sequential Kalman filter (KF)-type algorithms (Kalman 1960; Ghil et al. 1981). Dee (1995) has pointed out the difficulties of specifying the model error because of the large size of a typical geophysical problem and the consequent enormous information requirement involved. He proposed a scheme for the online estimation of error covariance suitable also for the estimation of model error especially when the latter is expressed through a reduced number of relevant degrees of freedom.

In the context of ensemble-based schemes [see, e.g., Evensen (1994) for the ensemble Kalman filter (EnKF)] a lot of efforts have been devoted to the representation of model error through an optimal ensemble design. Among these studies, Hamill and Whitaker (2005) have investigated the ability of two methods, covariance inflation and additive random error, to parameterize error due to unresolved scales. Meng and Zhang (2007) have analyzed the performance of the EnKF in the context of a mesoscale and regional-scale model affected by significant model error due to physical parameterizations, while in Fujita et al. (2007) the EnKF was used to assimilate surface observations with the ensemble designed also to represent errors in the model physics. Houtekamer et al. (2009) have examined several approaches to account for model error in an operational EnKF used in a numerical weather prediction (NWP) context. They found that, from the approaches they considered, a combination of isotropic model-error perturbations and the use of different model versions for different ensemble members gave the best performance. A similar analysis was done by Li et al. (2009) in the context of the local ensemble transform Kalman filter (Hunt et al. 2007). They investigated several methods to account for model error, including model bias and system noise, and concluded that the best performances are obtained when these two approaches are combined.

In variational data assimilation, model error has been often ignored, assuming implicitly that it has only a minor influence compared to errors in the initial condition and in the observations. More recently, the refinement and the increase of the observational network have reversed the problem, suggesting an urgent need for deeper understanding of the model error dynamics and its treatment in data assimilation. Different solutions have been proposed in recent years to estimate and account for model error in variational assimilation (Derber 1989; Zupanski 1997; Vidard et al. 2004; Trémolet 2006). These studies have shown that treating the model error as part of the estimation problem leads to significant improvements in the accuracy of the state estimate. However, these studies have used crude estimations of the model error covariances (Trémolet 2007) and/or simple model error dynamical law, such as a first-order Markov process (Zupanski 1997). Because of the constraints given by the size of the problem, model error has been usually assumed uncorrelated in time (see, e.g., Trémolet 2006). In contrast to the case of ensemble-based schemes where model error covariances are estimated using the ensemble, in variational data assimilation these estimates have to be built up on some statistical or dynamical assumptions.

In the last few years, the dynamics of model error has attracted a lot of interest (e.g., Reynolds et al. 1994; Vannitsem and Toth 2002). In particular, a series of works have studied the behavior of deterministic model errors and identified some universal dynamical features (Nicolis 2003, 2004; Nicolis et al. 2009). They were at the origin of a deterministic formulation of the model error term in the extended Kalman filter (EKF; Carrassi et al. 2008). The present investigation takes advantage of the same theoretical framework on the deterministic dynamics of the model error to formulate a new approach for variational assimilation. Specifically, evolution equations for the model error covariances and correlations are derived along with suitable, application-oriented, approximations. These deterministic laws are then incorporated in the formulation of the variational problem.

Here we focus on model error due to incorrect parameterizations, but the approach can also be used in the case of error coming from processes that are not accounted for by the model but are parameterized in terms of the resolved scales (Nicolis 2004). The proposed algorithm is analyzed, in comparison with traditional approaches, in the context of two systems of increasing dynamical complexity, beginning with a one-dimensional linear system and then with the three-component Lorenz model (Lorenz 1963), a nonlinear dynamical system exhibiting a chaotic behavior.

The paper is organized as follows. In section 2 the variational assimilation problem is described, while the deterministic formulation for model error dynamics is presented in section 3. Results for both systems are given in section 4 and the final conclusions are drawn in section 5.

## 2. Formulation of the problem

*I*-dimensional state vector

**x**(

*t*) describes a set of relevant physical variables of the system under consideration, and the

*I*-dimensional vector

_{p}**describes a set of parameters that can be related, for instance, to parameterized physical mechanisms. Alternatively, the solution of model in (1) can be expressed as**

*λ***x**(

*t*) =

**x**

_{0},

**), with**

*λ***x**

_{0}=

**x**(

*t*

_{0}) as the initial condition. We assume this model is used to describe the dynamics of a real system whose corresponding equations can be written as

**v**(

*t*) is the unknown

*I*

^{tr}-dimensional truth state and

*λ*^{tr}is a

*I*

_{p}^{tr}-dimensional vector of unknown parameters. Alternatively, the evolution of

**v**(

*t*) can be written as

**v**(

*t*) =

**v**

_{0},

*λ*^{tr}).

*M*measurements are collected at the discrete times (

*t*

_{1},

*t*

_{2}, … ,

*t*) within the reference time interval

_{M}*T*. The observations,

**y**

*, are related to the model state through the observation operator*

^{o}*, which is assumed here to be a white noise:*

**ϵ**_{k}**x**

*, of the model initial condition is supposed to be available. This is usually referred to as the background state, and*

^{b}*represents the background error.*

**ϵ**_{b}*T*. However, besides the observations and the background, the model dynamics itself represents a source of additional information that can be exploited in the state estimate. Nevertheless, since the model is not perfect, it is usually assumed that an additive model error,

*(*

**ϵ**^{m}*t*), affects the model prediction in the following form:

_{t′t″}, 𝗥

*, and 𝗕 have to be regarded as a measure of our confidence in the model, in the observations and in the background field, respectively. In this Gaussian formulation these weights can be chosen to reflect the relevant moments of the corresponding Gaussian error distributions.*

_{k}The best fit is defined as the solution, **x̂**(*t*), minimizing the cost function *J* over the interval *T*. It is known that, under the aforementioned hypothesis of Gaussian errors, **x̂**(*t*) corresponds to the maximum likelihood solution and *J* can be used to define a multivariate distribution of **x**(*t*) (Jazwinski 1970, his section 5.3). Note that in order to minimize *J* all errors have to be explicitly written as a function of the trajectory **x**(*t*).

*weak constraint*given that the model dynamics is affected by errors (Sasaki 1970). An important particular case is the

*strong-constraint*variational assimilation in which the model is assumed to be perfect, that is

*= 0 (Lewis and Derber 1985; Le Dimet and Talagrand 1986). In this case the model-error-related term disappears and the cost function reads as*

**ϵ**^{m}The calculus of variations can be used to find the extremum of (6) [or (7)] and leads to the corresponding Euler–Lagrange equations (Le Dimet and Talagrand 1986; Bennett 1992, his sections 5.3–5.4). In the strong-constraint case, the requirement that the solution has to follow the dynamics exactly is satisfied by appending to (7) the model equations as a constraint by using a proper Lagrange multiplier field. However, the size and complexity of the typical numerical weather prediction problem is such that the Euler–Lagrange equations cannot be practically solved unless drastic approximations are introduced. When the dynamics is linear and the amount of observations is not very large, the Euler–Lagrange equations can be efficiently solved with the method of representers (Bennett 1992, his sections 5.3–5.4). An extension of this approach to nonlinear dynamics has been proposed in Uboldi and Kamachi (2000). Nevertheless, the representers method is far from being applicable for realistic high-dimensional problems such as NWP, and an attractive alternative is represented by the descent methods, which makes use of the gradient vector of the cost function in an iterative minimization procedure (Talagrand and Courtier 1987). This latter approach is used in most of the operational NWP centers that employ variational assimilation. Note finally that the Euler–Lagrange equations can also be used as a tool to obtain the cost function gradient. Details on the representers and the descent techniques are provided in section 4 in relation to the applications described therein.

*N*time steps of size Δ

*t*> 0, the weak-constraint cost function becomes

*N*time steps in the interval

*T*, and 𝗣

_{i,j}represents the model error covariance matrix between times

*t*and

_{i}*t*.

_{j}Note that in the cost functions in (6) and (8), the model error is allowed to be correlated in time, and leads to the double integral and summation, respectively. If it is assumed to be a random uncorrelated noise, only covariances have to be taken into account and the double integral in the first rhs of (6) [the double summation in the first rhs of (8)] reduces to a single integral (to a single summation).

The search for the best-fit trajectory by minimizing the associated cost function requires the specification of the weighting matrices. The estimation of the matrices 𝗣_{t′t″} is particularly difficult in realistic NWP applications because of the large size of the typical models currently in use. Therefore, it turns out to be crucial to define approaches for modeling the matrices 𝗣_{t′t″} in order to reduce the number of parameters needed for their estimation.

## 3. Deterministic model error dynamics

The derivation of the statistical weights, 𝗣_{t′t″}, is based on the formalism on model error dynamics introduced in Nicolis (2003) and is an extension of a previous work in which a deterministic model error treatment was incorporated into the extended Kalman filter (Carrassi et al. 2008).

**v**(

*t*) is now

*I*-dimensional and

*λ*^{tr}is an

*I*-dimensional vector. A prediction of the evolution of (9), perceived through the model in (1), will be affected by errors in both the initial condition and the model parameters.

_{p}The assumption that the model error comes only from the misspecification of the parameters does not account for other potential model errors, present in a realistic application, such as those due to unresolved scales and/or to unrepresented physical processes. However, the approach described in the sequel can be straightforwardly extended to the case of omission errors expressible in terms of the resolved scales (Nicolis 2004), since they can be brought back to the simpler case of parametric errors (e.g., Nicolis et al. 2009). These types of errors are typical in NWP systems.

*δ*

**x**=

**v**(

*t*) −

**x**(

*t*) and

*δλ*=

*λ*^{tr}−

**, and reads**

*λ**δ*= (∂

**μ****f**/∂

**)|**

*λ***.**

_{λ}δλ*δ*

**x**(

*t*

_{0}) =

*δ*

**x**

_{0}, reads

_{t,t0}being the fundamental matrix (the propagator) relative to the linearized dynamics along the trajectory between

*t*

_{0}and

*t*. Equation (11) states that, in the linear approximation, the error in the state estimate is given by the sum of two terms: one relative to the evolution of initial condition error and another one,

*δ*

**x**

*, relative to the model error.*

^{m}We now make the conjecture that, as long as the errors in the initial condition and in the model parameters are small, (12) can be used to estimate the model error * ϵ^{m}*(

*t*) entering the weak-constraint cost functions, and consequently the corresponding correlation matrices 𝗣(

*t*′,

*t*″). In this case, the model error dependence on the model state implies the dependence of model error correlation on the correlation time scale of the model variables themselves.

*t*′ and

*t*″. In this form, (13) is of little practical use for any realistic nonlinear systems. A suitable expression can be obtained by considering its short-time approximation through a Taylor expansion around (

*t*′,

*t*″) = (

*t*

_{0},

*t*

_{0}).

*t*′ and

*t*″, within the short-time regime, is equal to the model error covariance at the origin, 〈

*δ*

**μ**_{0}

*δ*

**μ**_{0}

^{T}〉, multiplied by the product of the two time intervals. Naturally the accuracy of this approximation is connected on the one hand to the length of the reference time period and on the other to the accuracy of the knowledge about the error in the parameters needed to estimate 〈

*δ*

**μ**_{0}

*δ*

**μ**_{0}

^{T}〉. Nicolis (2003) has shown that the evolution of the quadratic error is bound to be universally quadratic in the short time for deterministic systems and that the range of validity of this evolution is related to the inverse of the largest Lyapunov exponent of the underlying dynamics.

## 4. Variational assimilation using short-time approximation for the model error

In this section we propose to use the short-time law in (14) as an estimate of the model error correlations in the variational assimilation. Besides the fact of being a short-time approximation, this evolution equation is based on the hypothesis of linear error dynamics. To highlight advantages and drawbacks of its application, we explicitly compare a weak-constraint variational assimilation employing this short-time approximation with other formulations.

The analysis is carried out in the context of two systems of increasing complexity. We first deal with a very simple example of scalar dynamics, which is fully capable of being integrated. The variational problem is solved with the technique of representers. The simplicity of the dynamics allows us to explicitly solve (13) and use it to estimate the model error correlations. This “full weak constraint” formulation of the four-dimensional variational data assimilation (4DVar) is evaluated and compared with the one employing the short-time approximation in (14). In addition, a comparison is made with the widely used strong-constraint 4DVar in which the model is considered as perfect.

In the last part of the section we extend the analysis to an idealized nonlinear chaotic system. In this case the minimization is made by using an iterative descent method, which makes use of the cost function gradient. In this nonlinear context the short-time approximated weak-constraint 4DVar is compared to the strong-constraint and to a weak-constraint 4DVar in which model error is treated as a random uncorrelated noise as it is often assumed in realistic applications.

### a. Linear system: Solution with the representers

*λ*

^{tr}> 0, as our reference.

*M*noisy observations of the state variable are available at the discrete times

*t*

_{k}*T*], 1 ≤

*k*≤

*M*:

*ϵ*being an additive random noise with variance

_{k}^{o}*σ*

_{o}^{2}(

*t*) =

_{k}*σ*

_{o}^{2}, 1 ≤

*k*≤

*M*, and that a background estimate,

*x*, of the initial condition,

_{b}*x*

_{0}, is at our disposal:

*being the background error with variance*

**ϵ**_{b}*σ*

_{b}^{2}. We assume the model is given by

*T*. In (16) we have used the fact that the model error bias,

*δx*(

^{m}*t*), is given by

*x*(

*t*) −

*x*

_{0}

*e*assuming the model and the control trajectory,

^{λt}*x*(

*t*), are started from the same initial condition

*x*

_{0}. Note that

*x*

_{0}is itself part of the estimation problem through the background term in the cost function.

*M*functions,

*r*(

_{k}*t*), are the representers while the coefficients,

*β*, are given by

_{k}**d**is the innovation vector,

**d**= (

*y*

_{1}

*−*

^{o}*x*

_{1}

*, … ,*

^{f}*y*−

_{M}^{o}*x*), 𝗦 is the

_{M}^{f}*M*×

*M*matrix (

*S*)

_{i,j}=

*r*(

_{i}*t*), and 𝗜 is the

_{j}*M*×

*M*identity matrix. The coefficients are then inserted into (17) to obtain the final solution (see appendix B).

*p*

^{2}(

*t*′,

*t*″); the particular choice adopted characterizes the formulations we aim to compare. Our first choice consists in evaluating the model error correlations through (13). By inserting

*δμ*= (∂

**f**/∂

**)**

*λ**δ*, with

**λ***f*(

*x*) =

*λx*, and the fundamental matrix, 𝗠

_{t,t0}=

*e*

^{λ(t−t0)}, associated with the dynamics (15), we get

*M*representer functions in this case:

*x*(

*t*), which is finally obtained through (17). This solution is hereafter referred to as the full weak constraint.

*δ*= (∂

**μ****f**/∂

*λ*)

*δλ*into (14), we obtain

*x*(

*t*), during the reference period

*T*. The solution based on (22) is hereafter referred to as the short-time weak constraint.

*δλ*→ 0 and reads

The three solutions based respectively on (20), (22), and (23) are compared in a set of experiments. Simulated noisy observations sampled from a Gaussian distribution around a solution of (15) are distributed every 5 time units over an assimilation interval *T* = 50 time units. Different regimes of motion are considered by varying the true parameter *λ*^{tr}.

The results displayed in the sequel are averages over 10^{3} initial conditions and parametric model errors, around *x*_{0} = 2 and *λ*^{tr}, respectively. The initial conditions are sampled from a Gaussian distribution with standard deviation *σ _{b}* = 1, while the model parameter,

*λ*, is sampled by a Gaussian distribution with standard deviation |Δ

*λ*| = |

*λ*

^{tr}−

*λ*|; the observation error standard deviation is

*σ*= 0.5.

_{o}Figure 1 shows the mean quadratic estimation error, as a function of time, during the assimilation period *T*. The different panels refer to experiments with different parameter for the truth 0.01 ≤ *λ*^{tr} ≤ 0.03, while the parametric error relative to the true value is set to Δ*λ*/*λ*^{tr} = 50%. The three lines refer to the full weak-constraint (dashed line), the short-time approximated weak-constraint (continuous line), and the strong-constraint (dotted line) solutions, respectively. The bottom-right panel summarizes the results and shows the mean error, averaged also in time, as a function of *λ*^{tr} for the weak-constraint solutions only.

As expected, the full weak-constraint solution performs systematically better than any other approach. The variational solution employing the short-time approximation for the model error successfully outperforms the strong-constraint one. The last plot displays the increase of total error of this solution as a function of *λ*^{tr}. To understand this dependence, one must recall that the duration of the short-time regime in a chaotic system is bounded by the inverse of the largest amplitude Lyapunov exponent (Nicolis 2003). For the scalar unstable case considered here, this role is played by the parameter *λ*^{tr}. The increase of the total error of the short-time-approximated weak constraint as a function of *λ*^{tr} reflects the progressive decrease of the accuracy of the short-time approximation for this fixed data assimilation interval, *T*.

The accuracy of the short-time-approximated weak constraint in relation to the level of instability of the dynamics is further summarized in Fig. 2, where the difference between the mean quadratic error of this solution and the full weak-constraint one is plotted as a function of the nondimensional parameter *Tλ*^{tr}, with 10 ≤ *T* ≤ 60 and 0.0100 ≤ *λ*^{tr} ≤ 0.0275. In all the experiments Δ*λ*/*λ*^{tr} = 50%. Remarkably all curves are superimposed, a clear indication that the accuracy of the analysis depends essentially on the product of the instability of the system and the data assimilation interval. This feature is of course strongly related to the fact that the discrepancy of the short-time approximation is larger for large *λ*^{tr} and long times in (14).

We now turn to the analysis of the effect of the initial condition error on the weak-constraint solutions (20) and (22) in Fig. 3. We focus here on a setting with only one perfect observation in the middle of the assimilation period, at time *t* = 25. The panels refer to experiments with different parametric model error and show the mean quadratic error, averaged over a sample of 10^{3} initial condition errors and over the assimilation period *T*, as a function of the standard deviation of the initial condition error *σ _{b}*. In all the experiments,

*λ*

^{tr}is fixed to 0.0225. For the smallest parametric model errors (top panels −Δ

*λ*/

*λ*

^{tr}≤ 15%), the estimation error of both solutions monotonically increases with the initial condition error, until a common plateau is reached. Note that the full weak constraint, with a perfect initial condition (

*σ*= 0), is able to keep the average error to almost 0. This is a very remarkable performance considering that only one observation is available within the assimilation period. The figure indicates further that the difference between the full and the approximated solutions decreases monotonically by increasing the initial condition error, and it is reduced to almost zero for sufficiently large errors. This clearly reflects the relative impact of initial condition and model errors on the quality of the assimilation. When the initial condition error is significantly larger than the model error, the accuracy of the state estimate is not improved employing the more costly (and accurate) full weak-constraint algorithm. Note also that the error plateau is reached for larger values of

_{b}*σ*as Δ

_{b}*λ*/

*λ*

^{tr}increases.

Finally, in Fig. 4, the possibility of improving the quality of the solutions by inflating the model error covariance matrix is investigated. The aim is to understand whether the model error is under- or overestimated in the weak-constraint approaches and if an improvement can be obtained by inflating–educing the model error correlation term. The original network of 10 noisy observations, every 5 time units, in now used with *σ _{o}* = 0.5,

*σ*= 1, and

_{b}*λ*= 0.01. The amplitude of the model error term, 〈

^{tr}*x*

_{0}

*δλ*〉

^{2}, is now multiplied by a scalar factor 0.1 ≤

*α*≤ 20 and then used in both the weak-constraint assimilations. The panels show the mean quadratic error as a function of

*α*, for different parametric error Δ

*λ*/

*λ*

^{tr}.

As a first remark we observe that by increasing the model parametric error, the analysis error of all the solutions increases accordingly. In the smallest parametric error case, Δ*λ*/*λ*^{tr} = 10%, for *α* close to 1, the solutions are very similar to each other. The full- and short-time-approximated weak constraint (dashed and continuous lines, respectively) only slightly improves over the strong constraint. By increasing *α* both solutions degrade rapidly at a rate that is faster for the full weak-constraint solution, indicating its high sensitivity to the model error correlation amplitude. Note that the growth of the error for large *α* is found for all the parametric errors considered, and that in the cases of small Δ*λ*/*λ*^{tr} the weak-constraint solutions with large *α* performs even worse than the strong-constraint case. This means that when the model error is not dominant, assuming it is perfect is better than incorrectly estimating the corresponding correlations.

It is interesting to remark on the existence of a minimum in the error curves, which deepens with the increase of the parametric error. This minimum is systematically located at *α* = 1 for the full weak-constraint case, indicating that the estimate of the model error correlation based on (13) is adequate. On the other hand, for the short-time-approximated case the minimum is shifted to 3 < *α* < 4. This suggests that, as expected, the estimate of the actual model error correlation based on (14) is an underestimation and that a better performance can indeed be obtained by inflating it. Note furthermore that the level of the minima of the short-time-approximated weak constraint is very close to the full weak-constraint ones. This is a very encouraging result from the perspective of a realistic application where (13) cannot be solved.

### b. Nonlinear system: Solution with descent method

*T*has been discretized over

*N*time steps of equal length Δ

*t*.

**x**

*at each time step in the interval*

_{i}*T*. The minimizing solution is obtained by using a descent iterative method that makes use of the cost function gradient with respect to

**x**

*, 0 ≤*

_{i}*i*≤

*N*. This latter reads

^{T}is the adjoint dynamics operator. The gradient expression (26) is derived assuming that observations are available at each time step

*t*, 0 ≤

_{i}*i*≤

*N*. In the usual case of sparse observations the term proportional to the innovation disappears from the gradient with respect to the state vector at a time when observations are not present. Note furthermore that assuming the model error is correlated in time leads to the summation in squared brackets of (26), which accounts for the full contribution over the entire assimilation interval. The situation is drastically different if the model error is treated as an uncorrelated noise. In this case the cost function takes into account the model error covariances only, and the model error cost function term reduces to a single summation over the time steps weighted by the inverse of the model error covariances. The cost function gradient modifies accordingly and the summation over all time steps disappears (see, e.g., Trémolet 2006).

_{i,j}is done by using the short-time approximation (14). In this discrete case it reads

*δ*

**μ**_{0}

*δ*

**μ**_{0}

^{T}〉, which is here a 3 × 3 symmetric matrix, is assumed to be known a priori and is estimated by accumulating statistics on the model attractor and perturbing randomly each of the three parameters

*σ*,

*ρ*, and

*β*with respect to the canonical values and with a standard deviation |Δ

*λ*|. We compare the short-time weak constraint with the strong-constraint variational assimilation. In this latter case, the model error term disappears from the cost function in (25) and the gradient is computed with respect to the initial condition only (Talagrand and Courtier 1987).

Observation system simulation experiments are conducted with assimilation intervals equal to 2, 4, 8, or 16 time steps and with observations each 2, 4, 8, or 16 time steps. The simulated measurements of the three components of the model state are uncorrelated with each other and are affected by a Gaussian noise whose standard deviation is set to 5% of the system’s natural variability. The results are averaged over a sample of 50 initial conditions and parametric model error sampled by a Gaussian distribution with standard deviation |Δ*λ*|. Each simulation lasts for 240 time steps, which is equivalent to 15 assimilation cycles for the longest *T* = 16 time steps up to 120 cycles for *T* = 2 time steps. The background error covariance matrix is set to diagonal with the entries equal to the initial condition error variance, 0.01% of the system climate variance. A more refined choice for 𝗕 will certainly have a positive impact on the algorithms, but this aspect is not our central concern here and the choice adopted is not expected to have an influence on the relative performance of the assimilation schemes.

Figure 5 shows the mean quadratic estimation error as a function of the observation frequency (i.e., the time steps between them, Δ*t*_{obs}) and for the different assimilation intervals: 2 (crosses), 4 (squares), 8 (triangles), and 16 (circles) time steps. The left panels refer to solutions of the strong-constraint variational assimilation, while the approximated weak-constraint solutions are shown in the right panels. The parametric errors are indicated in the text boxes, and all errors are normalized with respect to the system’s natural variability. As in the linear example of the previous section, when the parametric error is too small (now Δ*λ*/*λ*^{tr} = 5%) the strong-constraint performance is not significantly improved by using the short-time weak constraint. Conversely, as long as larger parametric errors are considered, the improvement over the strong-constraint approach becomes substantial. In addition, by reducing the observation frequency, for fixed *T*, the strong-constraint solution deteriorates at a slower rate than the weak-constraint one.

Note that the accuracy of the weak-constraint algorithm should be affected by the limitation of validity of the short-time regime on which the estimate of the model error correlations is based. According to Nicolis (2003), we estimate the duration of the short-time regime in this system to be approximately equal to 0.07 nondimensional time units (the inverse of the largest in absolute value Lyapunov exponent of the true dynamics, −14.57). The rapid growth of the error for large data assimilation intervals, in light of Fig. 5, reflects this fact.

In Fig. 6 we show the time series of the three model variables relative to the true solution (continuous line), the strong-constraint solution (dotted line), and the approximated weak constraint (dashed line); the observations are displayed with the cross marks. The solutions refer to a period of 1200 time steps, the assimilation interval is *T* = 8 time steps, Δ*t*_{obs} = 4 time steps, and Δ*λ*/*λ*^{tr} = 15%. We see that the tracking of the unknown true evolution provided by the short-time weak-constraint assimilation is very efficient, unlike the strong-constraint one. This is particularly evident in correspondence with the peaks/trough of the signal and with changes of the system regimes. The latter are characterized by a change in the sign of the *x* field, when the trajectory changes the wing of the attractor. By reducing the frequency of the observations to Δ*t*_{obs} = 8 time steps a decrease of the overall quality of the short-time weak-constraint assimilation is observed (Fig. 7). Larger deviations from the truth are only observed in correspondence with the changes of regime.

The effect of a further degradation of the observational network is investigated in Fig. 8, where the observations are reduced to a single component of the system’s state. From the left to the right the panels refer to experiments with measurements of *x*, *y*, and *z* respectively, while from top to bottom the panels show the time series relative to each components. As in Fig. 6, the assimilation interval is *T* = 8 time steps, Δ*t*_{obs} = 4 time steps, and Δ*λ*/*λ*^{tr} = 15%, while the experiments last for 600 time steps. In comparison with the results in Fig. 6, we can observe here a general degradation of the algorithm skill to track the true dynamics. By time step 300, the scheme commits small errors in the estimate of both the phase and amplitude of the true signal, but it still systematically outperforms the strong-constraint solution.

The robustness of the proposed approach is finally compared with the uncorrelated noise treatment of the model error. This assumption has often been done in previous applications; it is particularly attractive because it reduces significantly the computational cost associated with the minimization procedure. Although more refined choices have been recently described in the literature (Trémolet 2007), model error covariances have been often set to be proportional to the background error covariance (e.g., Zupanski 1997; Vidard et al. 2004). We make the same choice here and compute the model error covariances as 𝗣 = *α*𝗕. Figure 9 shows the mean quadratic error as a function of the tuning parameter *α*. The panels refer to different parametric model error. The results are averaged over the same sample of 50 initial conditions and parametric model error, while Δ*t*_{obs} = 2 and *T* = 0.08. The error of the short-time weak-constraint (dashed line) and the strong-constraint (dotted line) 4DVar are also displayed for reference. The solid line with open squares refers to an experiment in which the model error is treated as an uncorrelated noise but the spatial covariances at observing times are estimated using the short-time approximation; the aim is to evaluate the relative impact of neglecting the time correlation and of using an incorrect spatial covariance.

The uncorrelated noise formulation (solid line without any marks) never reaches the accuracy of the proposed short-time weak constraint. Note, furthermore, that for the smallest parametric error considered and for small *α* it is even worse than the strong-constraint 4DVar where the model is assumed to be perfect. By further increasing *α* over *α* = 10^{3} (not shown), the error reaches a plateau whose value is controlled by the observation error level. When the spatial covariance is estimated as in the short-time weak constraint, the performance is generally improved. Note, however, that for the smallest parametric error considered, and large *α*, the estimate 𝗣 = *α*𝗕 gives better skill; for all other cases, the improvement in correspondence with the best-possible *α* is only minor. This suggests that the degradation of the uncorrelated noise formulation over the short-time weak constraint is mainly the consequence of neglecting the time correlation and only to a small extent to the use of an incorrect spatial covariance.

## 5. Conclusions

Recently a deterministic formulation of the model error dynamics has been introduced (Nicolis 2003, 2004; Nicolis et al. 2009). A number of distinctive features have been identified, such as the existence of a universal short-time quadratic evolution law for the mean-square model error evolution (Nicolis 2003). This approach has been exploited here in the context of variational assimilation as a natural extension of the analysis performed for the EKF in Carrassi et al. (2008). A short-time approximation for the model error correlations has been derived, and a short-time-approximated weak-constraint 4DVar has been formulated. The performance of this algorithm has been analyzed in the context of two different dynamical systems.

First, a linear unstable one-dimensional dynamics has been considered. The performance of the short-time weak-constraint 4DVar has been compared to a weak-constraint formulation based on the analytical equation for the model error correlations, and to the classic strong-constraint 4DVar. A dramatic increase of the quality of the analysis was obtained with the short-time weak constraint as compared with the strong constraint. The difference with the weak-constraint 4DVar employing the full model error correlation equations increases with the level of instability and similar performances are attained for relatively stable configurations of the dynamics. The system instability and the length of the assimilation period are the main factors controlling the accuracy of the short-time weak constraint. The maximum length of the assimilation period over which the short-time weak-constraint 4DVar gives accurate skill is inversely proportional to the level of instability of the dynamics. The amplitude of the initial condition error also modulates the accuracy of the short-time weak constraint with respect to its full formulation. The inflation of the model error correlations helps to compensate for the error underestimation affecting the short-time approximation and improves its performance up to a level of accuracy equivalent to the full weak constraint.

The analysis has then been extended to a nonlinear chaotic dynamical system. Analogously, the short-time weak-constraint 4DVar improves substantially over the strong-constraint one. Furthermore, it is able to closely track the unknown true dynamics, which, for the dynamical system considered, is characterized by changes in the regimes that are not captured by the strong-constraint solution. In this nonlinear context we performed additional experiments with a weak-constraint 4DVar employing the uncorrelated noise assumption for the model error and, as often done in practice, the model error covariance was assumed to be proportional to the background-error covariance matrix. The analysis reveals that the 4DVar employing the uncorrelated noise assumption never attains the level of accuracy of the proposed short-time weak-constraint case.

The results obtained in the present idealized contexts suggest that there are potentialities in applying this deterministic formulation to more realistic models and/or observational setups. The size of a typical geophysical system of practical relevance is the main obstacle to the application of the present approach. Nevertheless, the increase of computational power on the one hand and the development of advanced techniques for the optimal choice of the control space representation (Bocquet 2009) on the other hand can make possible its application in realistic contexts. Follow-up studies are necessary to reveal more specific aspects of the practical implementation that could not be highlighted in the experimental setting used here, and will be addressed in future works.

## Acknowledgments

We thank C. Nicolis and F. Uboldi for their careful reading of the manuscript and the three anonymous reviewers and the editor, Herschel Mitchell, for their insightful suggestions and comments. This work is supported by the Belgian Federal Science Policy Program under Contract MO/34/017.

## REFERENCES

Bennett, A. F. , 1992:

*Inverse Methods in Physical Oceanography*. Cambridge University Press, 346 pp.Bocquet, M. , 2009: Towards optimal choices of control space representation for geophysical data assimilation.

,*Mon. Wea. Rev.***137****,**2331–2348.Carrassi, A. , S. Vannitsem , and C. Nicolis , 2008: Model error and sequential data assimilation: A deterministic formulation.

,*Quart. J. Roy. Meteor. Soc.***134****,**1297–1313.Dee, D. P. , 1995: Online estimation of error covariance parameters for atmospheric data assimilation.

,*Mon. Wea. Rev.***123****,**1128–1145.Derber, J. , 1989: A variational continuous assimilation technique.

,*Mon. Wea. Rev.***117****,**2437–2446.Evensen, G. , 1994: Sequential data assimilation with a nonlinear quasigeostrophic model using Monte-Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**(C5). 10143–10162.Fujita, T. , D. J. Stensrud , and D. C. Dowell , 2007: Surface data assimilation using an ensemble Kalman filter approach with initial condition and model physics uncertainties.

,*Mon. Wea. Rev.***135****,**1846–1868.Ghil, M. , S. E. Cohn , J. Tavantzis , K. Bube , and E. Isaacson , 1981: Application of estimation theory to numerical weather prediction.

*Dynamic Meteorology: Data Assimilation Methods,*L. Bengtsson, M. Ghil, and E. Källén, Eds., Springer, 139–224.Hamill, T. M. , and J. S. Whitaker , 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches.

,*Mon. Wea. Rev.***133****,**3132–3147.Houtekamer, P. L. , H. L. Mitchell , and X. Deng , 2009: Model error representation in an operational ensemble Kalman filter.

,*Mon. Wea. Rev.***137****,**2126–2143.Hunt, B. R. , E. Kostelich , and I. Szunyogh , 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter.

,*Physica D***230****,**112–126.Jazwinski, A. H. , 1970:

*Stochastic Processes and Filtering Theory*. Academic Press, 376 pp.Kalman, R. , 1960: A new approach to linear filtering and prediction problems.

,*Trans. ASME J. Basic Eng.***82****,**35–45.Le Dimet, F. X. , and O. Talagrand , 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects.

,*Tellus***38A****,**97–110.Lewis, J. M. , and J. C. Derber , 1985: The use of adjoint equations to solve a variational adjustment problem with advective constraint.

,*Tellus***37A****,**309–322.Li, H. , E. Kalnay , T. Miyoshi , and C. M. Danforth , 2009: Accounting for model errors in ensemble data assimilation.

,*Mon. Wea. Rev.***137****,**3407–3419.Lorenz, E. N. , 1963: Deterministic non-periodic flows.

,*J. Atmos. Sci.***20****,**130–141.Meng, Z. , and F. Zhang , 2007: Test of an ensemble Kalman filter for mesoscale and regional-scale data assimilation. Part II: Imperfect model experiments.

,*Mon. Wea. Rev.***135****,**1403–1423.Nicolis, C. , 2003: Dynamics of model error: Some generic features.

,*J. Atmos. Sci.***60****,**2208–2218.Nicolis, C. , 2004: Dynamics of model error: The role of unresolved scales revisited.

,*J. Atmos. Sci.***61****,**1740–1753.Nicolis, C. , R. Perdigao , and S. Vannitsem , 2009: Dynamics of prediction errors under the combined effect of initial condition and model errors.

,*J. Atmos. Sci.***66****,**766–778.Rabier, F. , H. Järvinen , E. Klinker , J-F. Mahfouf , and A. Simmons , 2000: The ECMWF operational implementation of four-dimensional variational assimilation. Part I: Experimental results with simplified physics.

,*Quart. J. Roy. Meteor. Soc.***126****,**1143–1170.Reynolds, C. , P. J. Webster , and E. Kalnay , 1994: Random error growth in NMC’s global forecasts.

,*Mon. Wea. Rev.***122****,**1281–1305.Sasaki, Y. , 1970: Some basic formalism in numerical variational analysis.

,*Mon. Wea. Rev.***98****,**875–883.Talagrand, O. , and P. Courtier , 1987: Variational assimilation of meteorological observations with the adjoint vorticity equation. I: Theory.

,*Quart. J. Roy. Meteor. Soc.***113****,**1311–1328.Trémolet, Y. , 2006: Accounting for an imperfect model in 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***132****,**2483–2504.Trémolet, Y. , 2007: Model error estimation in 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***133****,**1267–1280.Uboldi, F. , and M. Kamachi , 2000: Time-space weak-constraint data assimilation for nonlinear models.

,*Tellus***52A****,**412–421.Vannitsem, S. , and Z. Toth , 2002: Short-term dynamics of model errors.

,*J. Atmos. Sci.***59****,**2594–2604.Vidard, P. A. , A. Piacentini , and F-X. Le Dimet , 2004: Variational data analysis with control of the forecast bias.

,*Tellus***56****,**177–188.Zupanski, D. , 1997: A general weak constraint applicable to operational 4DVAR data assimilation systems.

,*Mon. Wea. Rev.***125****,**2274–2292.

## APPENDIX A

### Short-Time Evolution of the Model Error Covariance

*t*′ and

*t*″. We proceed by doing a Taylor expansion around (

*t*′,

*t*″) = (

*t*

_{0},

*t*

_{0}):

*t*′,

*t*″) reads

*t*′,

*t*″) = (

*t*

_{0},

*t*

_{0}), we see that the first nontrivial term is the quadratic, so the second-order Taylor expansion of (13) reads

## APPENDIX B

### Scalar System: Solution with Representers

*J*(

*x*):

*J*(

*x*),

**∇**

*(*

_{x}J*x*) = 0, and (B2), after some reordering, we obtain the Euler–Lagrange equations:

*x*(

*t*), and

*q*(

*T*) = 0, for the variable

*q*(

*t*) (usually referred to as the adjoint field), while

*δ*(

*t*−

*t*) is the Dirac delta function. In the derivation of (B3) and (B4) we have used the inverse of the model error correlations defined through

_{k}*x*(

*t*), acts as a forcing in (B4).

As mentioned in section 2, the method of representers can be used to decouple and solve the Euler–Lagrange equations in the linear case (Bennett 1992, his section 5.3). The method is used here to find the minimizing solution of (16) and this application is briefly detailed in the following, but we refer to Bennett (1992, his section 5.3) for an exhaustive description of the approach in a general case.

*x*(

*t*), and

*q*(

*t*). The 2

*M*functions,

*r*(

_{k}*t*) and

*a*(

_{k}*t*), are the representers and their adjoint respectively, while the

*β*are

_{k}*M*coefficients to be determined.

*k*≤

*M*. The coupling with

*x*(

*t*) in (B4) is removed by imposing that the adjoint representers satisfy

*a*(

_{k}*T*) = 0, 1 ≤

*k*≤

*M*. In practice the contribution of each measurement has been converted into a single impulse.

*M*representer functions

*r*(

_{k}*t*). As can be shown by substituting back the expressions for

*r*(

_{k}*t*) and

*a*(

_{k}*t*) in the Euler–Lagrange (B3) and (B4), in order to write (B8) and (B9) the vector of coefficients has to be chosen so that (Bennett 1992, his section 5.3)

**d**is the innovation vector,

**d**= (

*y*

_{1}

*−*

^{o}*x*

_{1}

*, … ,*

^{f}*y*−

_{M}^{o}*x*), 𝗦 is the

_{M}^{f}*M*×

*M*matrix (

*S*)

_{i,j}=

*r*(

_{i}*t*), and 𝗜 is the

_{j}*M*×

*M*identity matrix. The coefficients are then inserted in (B6) to obtain the final solution.