## 1. Introduction

Numerical prediction systems developed for ocean circulation, weather, or climate forecast contain some level of uncertainty in their output. The reason for this can be insufficiently detailed parameterizations of subgrid processes, unrepresented model physics, inaccuracy of forcing, or initial or boundary conditions, etc. (Ménard 2010). As a consequence, an observed actual state of the considered climate parameters deviates from the corresponding model output, which leads to a wrong model forecast. One can try to improve the model forecast using observational data to provide a “best fit” of the model estimates to observations. A data assimilation (DA) procedure provides such a fit, reducing the forecast error (Belanger and Vincent 2005). The application of DA in the last 30 years contributed a lot to improving the quality of short-range forecasting. According to Lahoz et al. (2010) and Simmons and Hollingsworth (2002), the skill of 5-day forecasts produced by the leading climate research centers [the European Centre for Medium-Range Weather Forecasts (ECMWF), the Met Office, the National Centers for Environmental Prediction (NCEP), etc.)] in the early 2000s is comparable to the skill of 3-day forecasts in 1980.

There is, however, one weakness to all DA procedures: even though they commonly improve model predictions, they introduce their own errors resulting from numerous assumptions used in the DA methodology. For instance, the assumption of a Gaussian distribution for the observation errors and observational systematic errors (biases) can never be completely excluded from the DA procedure (Ménard 2010). Representative errors associated with differences in the resolution of observational and model grids also have a strong impact on the DA performance (Lahoz et al. 2010).

This raises a number of fundamental questions such as the following: how accurate is the state estimate, and how reliable is the DA procedure? More generally, how can we assess the quality of the model prediction when the forecast inevitably contains errors? In this paper we contribute to these questions with respect to a particular class of data assimilation algorithms, namely, four-dimensional variational data assimilation (4D-Var). The purpose of the paper is to test the validity of an error estimation algorithm that is based on a variational approach. This variational error estimation algorithm (VEEA) allows one to estimate the errors introduced by an inaccurate initial state obtained from 4D-Var.

The error estimation approach was originally suggested by Parmuzin et al. (2006) and mathematically investigated for generic evolution equations. Later the method was extended to a special algorithm for fast estimation of inverse error covariance matrices. It was tested on simplified problems such as data assimilation for convection-diffusion equations (Parmuzin et al. 2006) and on the 1D Burger’s equation (Gejadze et al. 2011). In short, the algorithm relies on repeated application of VEEA to various sets of errors, which makes it similar to the ensemble methods. However, according to Gejadze et al. (2011), it is at least one order less computationally expensive than the VEEA-based ensemble method. In contrast to the work in Gejadze et al. (2011) and Parmuzin et al. (2006), we do not aim to estimate the inverse of the analysis error covariance matrix but only the error in the state estimate itself. We study the variational error estimation algorithm within the geophysically relevant context of the shallow-water equations on a rotating sphere. Our work is distinct from theirs in treating a less complicated but still challenging and relevant estimation problem within a more difficult modeling context.

Note that any method that intends to estimate the impact of model and observational error on a state estimate needs to assume information about model and observational error as input data. The idea of the variational error estimation algorithm under investigation here is then to treat these errors as data and fit an error model to this data using a 4D-Var algorithm. This leads to a so called “auxiliary or error data assimilation problem.” Of crucial importance is the choice of the dynamical error model. Here we use, in accordance with the theory of Parmuzin et al. (2006) and Gejadze et al. (2011), the linear equations that are linearized around a trajectory of the full nonlinear model. Thus, the resultant dynamical error model is just the tangent-linear version of the initial model but with specific initial conditions and forcing for the error estimation problem. We do not need the expensive calculation of the Hessian of the cost functional.

Regarding the information about model and observational error we note that even with the perfect knowledge of these errors it is unclear how they affect the estimation of the initial state. The VEEA discussed in this paper fills this gap by providing detailed information on the quality and reliability of the 4D-Var solution. For real-world applications, the knowledge of model and observational errors is very limited, and consequently any algorithm that estimates the assimilation error has to show a reliability with respect to model and observational errors. (For a review of available methods see Dee 2005; Gillijns and Moor 2007; Zupanski and Zupanski 2006; Lermusiaux and Robinson 1999; Griffith and Nichols 2000.)

We conduct numerical data assimilation experiments in which we use a spherical shallow-water model to evaluate VEEA. The robustness of VEEA is studied in a series of experiments with different types and levels of model and observational noise including biased data. In particular, we investigate the quality of error estimates when only approximate values of model and observational errors are given. We test also to which extent the tangent linear equation provides a sensible model for the error evolution. This problem is related to the question of how long the assimilation window in the error data assimilation problem can be chosen in order to get a useful error estimate.

Besides the validation of the error estimation algorithm, we also put some emphasis on its implementation. Similar to 4D-Var the error estimation algorithm is a constraint optimization problem and the difficulty of its numerical realization is comparable to the numerical realization of 4D-Var. As already suggested in Parmuzin et al. (2006) the complexity in the numerical realization of VEEA can be reduced by utilizing tools for algorithmic differentiation (AD, also called automatic differentiation). The AD tools supplement the original implementation of a model with additional code for creating various sorts of differentiated models. With AD we compute numerical sensitivities of outputs with respect to inputs of the original model at machine accuracy. See Griewank and Walther (2008) and Naumann (2012) for detailed information about AD. In this paper we use the AD-enabled NAG Fortran compiler (Riehme et al. 2007; Naumann and Riehme 2005; Cohen et al. 2003) building on the results of Rauser et al. (2010), where a first-order adjoint model of our target model Icosahedral Nonhydrostatic (ICON; Bonaventura and Ringler 2005) was used. It is important to note that we use AD consequently for the underlying 4D-Var problem as well as for the error estimation.

The paper is organized as follows. Section 2 gives a brief overview of the mathematical background used in data assimilation problem in the example of shallow-water system. In section 3 we derive the VEEA from the 4D-Var problem, discussed in section 2. Section 4 reports the results and section 5 contains the discussion on the stability tests of the DA system by means of VEEA. Section 6 gives a summary. In the appendix we give an overview of AD and provide the details on the realization of VEEA with the aid of the AD compiler.

## 2. Data assimilation problem

Below we describe an application of VEEA description to shallow-water equations; a more general abstract concept of the method can be found in Parmuzin et al. (2006) and Gejadze et al. (2008). First of all we introduce some definitions. Let *O* denote the surface of the sphere, the operator 〈*a*〉_{O} represents an integral of *a* over *O*, while the Euclidean inner product and the dimensionless *L*_{2} norm are denoted by 〈*a*, *b*〉_{O} and *C*_{O} = 1/([*a*] × [*O*]) is a dimensional parameter, where notations [*a*] and [*O*] mean the dimensions of *a* and *O*, respectively. For instance, in the shallow-water model, *C*_{O} is equal to one and has the dimensions of [*a*]^{−1} × [*m*]^{−2}.

**v**defined on

*O*and the fluid thickness

*h*. We consider motion developing in

*O*starting from some moment of time

*t*= −

*T*in the past up to the present state at

*t*= 0. The shallow-water equations in vector invariant form read as follows:where

*ζ*is the vorticity,

*f*is the Coriolis parameter,

**k**is the unit vector perpendicular to the unperturbed fluid thickness, and

*g*is gravity. The fields

**F**

_{v}and

*F*

_{h}represent additional forcing. As mentioned above, we consider the fluid motion starting from time

*t*= −

*T*, and therefore the corresponding initial conditions for (1) are defined in the past:For further convenience the solution of (1)–(2) will be later referred to as the

*analysis*.

*strong constraint*4D-Var, here it is assumed that the model equations describe perfectly the considered physical process. Hence, if the initial conditions (2) are also perfect, which is an ideal case, the solution of (1) should match the true state of the process (later just

*truth*). Obviously, this situation will never occur in practice. In a more realistic scenario, the initial state (2) always contains an amount of error, which, in turn, results in inaccuracies in the analysis. In such a case the data assimilation problem can be formulated as follows: assuming that the observational error is small (i.e., the measurements are “close” to the truth), find such an initial state (

**v**

_{0},

*h*

_{0}) that the difference between the analysis and the observational data is minimal. We introduce a functional that measures this difference. Suppose that the system (1) with initial conditions (2) has a unique solution for each pair (

**v**

_{0},

*h*

_{0}), then a corresponding functional isHere all terms related to the observations are marked with superscript “obs

*.*” The space

*O*

^{obs}represents the regions of

*O*where the measurements were taken. It is equipped with the corresponding dimensionless

*L*

_{2}norm

*O*on

*O*

^{obs}is defined by a linear operator

*C*. The pair

Equations (1)–(3) form a variational DA problem for the shallow-water equations. This DA problem can be solved with classical gradient-based methods (cf. Talagrand and Courtier 1987; Gunzburger 2003, 14–21). One of the main obstacles is the calculation of the gradient of *J* with respect to the initial condition. We realize this by means of an AD tool. Once a method for calculation of the gradient is obtained this is used within an iterative optimization routine such as steepest descent that searches for the minimizer.

## 3. Error estimation problem

To describe the error estimation problem we assume that (1)–(2) have already been discretized. If not otherwise stated we use the same notation and the same symbols for the discrete equations as for the continuous equations, to keep the presentation simple and avoid excessive notation. The general formulation of the variational error estimation problem and its solution by means of optimization theory was initially derived in Parmuzin et al. (2006) and uses the Hessian of the cost functional. We derive an algorithm in the appendix that, contrary to the original method, does not require the calculation of the Hessian. This is done within an AD framework.

We associate *η*^{mod} and *ξ*^{mod} with the model errors in the velocity and height equations in (1), respectively. Let also *ξ*^{obs} and *h*^{obs} and background state *η*^{obs} and **v**^{obs} and background state

EEP: Suppose the error (*ξ*^{obs}, *η*^{obs}) of the observations, the error *ξ*^{mod}, *η*^{mod}) in the model equations are known. Estimate how these errors affect the estimation of the initial state error (*ξ*_{0}, *η*_{0}) produced by the DA.

**,**

*η**ξ*) be the error of DA resulting from the joint impact of observational, background, and model errors. At first we express velocity and fluid thickness resulting from DA as sum of true state (

*h*,

**v**) and the DA error (

**,**

*η**ξ*). In a similar way we express the initial state, observations, and forcing as a sum of their true values and corresponding errors. Then we substitute these expressions in the cost functional (3). After cancellation of the true state variables it reduces to the following form:Then we substitute the height and velocity fields as the sums of DA errors and true variables into the system (1)–(2). This substitution gives the system of SW equations for the estimation of the state vector with the DA error. Then, following Parmuzin et al. (2006), we subtract from this system of equations the system (1)–(2) with true height and velocity fields. The remaining part of that system is the nonlinear evolution equation for DA error, which reads asUsing this equation as a constraint for the minimization of (4), we should find the true DA error. Note that the first two equations in (5) are nonlinear in

**and**

*η**ξ*; therefore, the optimization problem (4)–(5) might have several local minima. According to Parmuzin et al. (2006), an alternative way of computing the DA error is to minimize the functional (4) subject to a linearized set of nonlinear error evolution equations. The solution of such a constraint optimization problem is the first-order approximation to the original error estimation problem in (4)–(5). Linearizing the system in (5), we get the following system of equations:

We will refer to these equations as the error tangent linear model (ETLM). Note that the minimization of the cost functional (4) with the system in (6) taken as constraints is a linear variational data assimilation problem, and therefore has a unique minimizer. The methods for solving the problems in (4)–(6) are the same as for conventional 4D-Var problems.

We describe now how we solve (6). The DA error (** η**,

*ξ*) as a solution of the ETLM in (6) can be computed numerically by a tangent linear model (TLM) created by AD from the original model: let (

*η*_{0},

*ξ*

_{0}) = (

**|**

*η*_{t=−T},

*ξ*|

_{t=−T}) denote some initial DA error. The DA errors (

**,**

*η**ξ*) = (

*η*_{i},

*ξ*

_{i}),

*i*= −

*T*, …, 0, are propagated over the simulation time as tangents of the computed state variables (

*h*

_{i},

**v**

_{i}), if the initial DA error (

*η*_{0},

*ξ*

_{0}) is supplied to the TLM as initial tangent of the initial condition in (2).

Minimizing the cost in (4) with a gradient-based method requires us to compute the gradient of the cost functional *J* with respect to the initial DA errors (*η*_{0}, *ξ*_{0}). The adjoint model used to compute this gradient as well as the ETLM are generated from the original model source code by algorithmic differentiation.

The details on the numerical and technical realization of the problems in (4)–(6) are discussed in the appendix.

## 4. Results

In this section we study the behavior of VEEA in a series of twin experiments conducted with different types of flows and noise. We split it into four sections. In section 4a we discuss the actual test cases and the methodology of the twin experiments. In section 4b we start testing the performance of VEEA on the simple experiments where either fluid thickness or velocity field is only assimilated. More complex tests are considered in section 4c. Here we investigate the joint impact of errors by assimilating the noisy observations of fluid thickness and velocity field at the same time. In section 4d we repeat all previous experiments under the relaxed assumption of exact knowledge of model and observational errors. Thereby we test the ability of VEEA to produce reliable estimates when only the approximate values of model and observational errors are known. Note that all experiments were conducted for an aquaplanet without continents. For the sake of simplicity, we assume in all experiments that the positions of the observational sites coincide with the grid nodes. This assumption was made to focus on the impact that observational and model errors have on error estimation.

### a. The numerical shallow water model

A series of twin experiments were conducted with ICON-SW, a shallow-water model, constructed on a triangular icosahedral grid with C-type staggering (Bonaventura and Ringler 2005). This model uses a hybrid finite-volume–finite-difference method with two-level semi-implicit time stepping. In all the experiments, we used a time step equal to 15 min, while the spatial resolution was set to approximately 1°.

### b. Test cases

We use the case of a nonlinear flow regime discussed in Williamson et al. (1992) and a linear one presented by Läuter et al. (2005) as test models for twin experiments.

The nonlinear case is represented by a zonal flow that impinges on an isolated conical mountain. In this test model the imbalance in the initial velocity and surface elevation caused by this mountain leads to the generation of waves at the free surface. These waves radiate from the generation area and propagate around the globe. Since the formation of well-developed nonlinear flow takes several days, we set the state vector resulting after 15 days as the initial state vector for DA and VEEA. The distribution of the initial fluid thickness for this test case is given in the top panel in Fig. 1, here the location of the mountain is 30°N, 90°W.

In the second test model the shape of the aquaplanet is given by a smooth ellipsoid. The main axis of this ellipsoid coincides with the rotation axis. The initial fluid thickness is also represented by a prolate ellipsoid with its main axis inclined *π*/4 to the equator. The fluid motion is set in such a way that the fluid spins around the planet as a single solid body. The distribution of the initial fluid thickness for this test model is given in the bottom panel of Fig. 1.

For the sake of simplicity, we will refer to these test models as the first and the second cases respectively. The initial conditions for these cases are given below:

#### 1) Zonal flow impinging on an isolated mountain (first case)

*λ*and

*θ*be the longitudinal and latitudinal directions, respectively. The horizontal velocity field

**v**= (

*u*,

*υ*)

^{T}is given byHere

*u*

_{0}= 20 m s

^{−1}. The shape of the mountain is given by the expression

*H*

_{s}=

*H*

_{s0}(1 −

*r*

_{m}/

*R*), where

*H*

_{s0}= 2000,

*R*=

*π*/9, and

*a*and Ω are Earth’s radius and its rotation rate, respectively, and

*h*

_{0}= 5960 m. Note that these initial conditions are not the actual fluid’s initial state vector that we use for DA and VEEA, the latter corresponds to state vector obtained on the model’s 15th day.

#### 2) Unsteady solid-body rotation (the second case)

_{h}and Φ

_{b}be geopotentials at the surface and the bottom, respectively, then the initial conditions for this test bed are given as follows (Läuter et al. 2005):where

*α*=

*ϕ*/4,

*u*

_{0}= 2

*πa*/12 m day

^{−1}, and

*k*

_{1}= 133 681 m

^{2}s

^{−2}.

### c. Twin experiments

All experiments were conducted according to the following methodology. The ICON-SW was used to simulate a “true state,” which was then saved for further reference. Unperturbed initial conditions were used in this case. At the next stage different types of noise (e.g., white unbiased or white biased noise) were added to every point of the computed true fluid thickness and velocity field. Such a superposition represents noisy observational data. Then, for a simulation of the model error in the state estimates, a biased noise function was added to the forcing term in the corresponding discretized equation for height and velocity fields in ICON-SW. To simulate the background error, noise was added to the true initial conditions. At the next stage the previously simulated “noisy observations” and the wrongly estimated background state were assimilated in the ICON-SW model, which had the error in the forcing term. The true state set up earlier was subtracted from this data assimilation output, and the difference was saved as the “true DA error” for further reference. Finally, the error estimation algorithm was applied, and the results were compared against saved true DA errors. The experiments were conducted with different values of signal-to-noise ratio.

### d. Simulation of model, background, and observational errors

Here *i* = 1, 2, … , *N* is the number of a triangular cell of the ICON grid, where *N* is the total number of cells, and *k* = −*T*, −*T* + 1, …, 0 represents the number of a current time step. Value *k* = −*T* corresponds to the zeroth time step. Functions *i*th cell, such that

The errors were created and stored once and used in each experiment without any modifications.

By default we set, a value for the signal-to-noise ratio (SNR) to 400 in fluid thickness observations and a value for the SNR to 20 in velocity measurements. Such values were chosen for two reasons. The first reason is that the magnitudes of SNRs should be set in such a way that the linearized error model provides a good approximation of the full nonlinear error model in (5). To derive these values, we follow the common practice of estimating the impact of the nonlinear terms used in various oceanographic and atmospheric approximations (geostrophic approximation, SW approximation, etc.). We assume that nonlinear terms in the nonlinear error model in (5) have negligible impact if the deviations in velocity and fluid thickness are at least 20 times smaller than the corresponding mean values. The second reason for such a choice is that the most visible difference between true DA error and DA estimates is achieved when SNR in fluid thickness is 20 times smaller than SNR in velocity. This situation is discussed in experiment 5 in greater detail.

### e. Experiments where either fluid thickness or velocity measurements are assimilated

#### 1) Experiment 1: Biased observational, background, and model errors in fluid thickness

In this experiment we test the performance of VEEA under the joint impact of observational model and initial state errors, and investigate the effect of flow regimes on the quality of DA error estimation. Therefore, we compare the estimated DA errors obtained in two test runs under two different initial conditions, corresponding to the first and the second test cases.

These experiments revealed that quality of the VEEA estimates is similar for both test cases. For this reason we confine ourselves with the presentation of the results obtained just for the first test case. The corresponding estimated and true DA errors in fluid thicknesss and in specific kinetic energy (kinetic energy in one cell divided by mass contained in that cell) along the equatorial slice are demonstrated on Fig. 3. Comparing the estimated and true DA errors, we see that they almost coincide: note the zoomed rectangular fragment in the top panel of this figure. In support of this fact, the corresponding relative differences between these errors, shown in Fig. 4, are much smaller than one at each point along the equatorial line. Checking the quality of the VEEA estimates on a worldwide scale we found that the estimated DA error almost exactly mirrors the true error and the correlation between them equals one for both test cases. The increase of the noise level or expanding the length of assimilation window to 12 h does not affect the quality of the VEEA estimate until the SNR is much bigger than one. The visible difference between VEEA estimates and DA true error appears only if the SNR is smaller than 0.5. The doubling of the assimilation window as well as the reduction of the number of the observational sites by 4 times does not affect the quality of VEEA estimates. In particular these results were confirmed for the cases with the unbiased observational, background, and model noises. Thus, for the case when only fluid thickness observations are assimilated and signal to noise ratio is much bigger than one we conclude the following: first, VEEA produces reasonable estimates for all types of errors, irrespective whether they occur in assimilated observations or in the model equations; and second, although VEEA represents a linear approximation of DA error, it produces reasonable estimates even for weakly nonlinear flow regimes [corresponding to Ursell number (Ursell 1953) equal to 1].

#### 2) Experiment 2: Biased observational errors in velocity field

In this experiment we consider the impact of observational errors in velocity field on the performance of VEEA for both test cases. The default SNR in the “observed” velocity was used in these experiments. Note that DA is performed only for the velocity field, and therefore the fluid thickness initial state should remain unperturbed. However, the fluid thickness error equation of [the second equation in (6)] contains a term with velocity error. Thus, errors in velocity resulting from DA induce errors in fluid thickness in all subsequent computational steps. The estimated and true DA errors in velocity field and fluid thickness obtained from the first test case are given in Fig. 5. The top panel in this figure shows the errors in initial specific kinetic energy (at zeroth time step). The middle and bottom panels show the corresponding induced errors in fluid thickness at the first computational time step after the initialization (later just the first time step). Comparing the true and estimated DA errors in specific kinetic energy (top panel), we see a visible difference appearing in longitude intervals [−70, −50] and [−20, 20]. The correlation between both errors still remains high and equals 0.99. The difference between true and estimated errors becomes stronger in fluid thickness error graphs (see middle panel and also its zoomed part on the bottom of this figure), with a correlation of 0.97. Doubling the assimilation window increases the discrepancy between true and estimated DA errors in fluid thickness and velocity fields, but the correlation for both graphs remains the same. The 4 times increase of the noise level in observational velocity field does not affect much the quality of VEEA estimates. A similar discrepancy appears between true and estimated DA errors, if we add model or background error to the velocity field without considering observational error. Repeating this experiment for the second test case does not result in a principal difference with the first test case. Hence, we conclude that although noise in the velocity field might result in visible differences between the estimates of VEEA and true DA error, the estimated DA error still remains close to the true one.

### f. Experiments in which fluid thickness and velocity measurements are simultaneously assimilated

In these sets of experiments the flow over isolated mountain was chosen as a typical test case.

#### 1) Experiment 3: Zero error in fluid thickness and observational error in velocity field

As we just saw in the previous experiment, an error in velocity induces an error in fluid thickness. In that case the estimations of VEEA diverged from the truth. Nevertheless, in that experiment the fluid thickness was not assimilated and therefore its initial value was unperturbed and equal to the truth. In this experiment we let the noise in velocity observations affect the formation of DA error in the fluid thickness initial state. We add the same noise to the velocity observations as we used in the previous experiment and set the error in observations of fluid thickness to zero. Assimilating the truth into fluid thickness and the noisy observations into the velocity field at the same time, we study the impact of error in the velocity field on the correctness of VEEA estimates. The results are given in Fig. 6. The top panel in this figure shows the initial true and estimated DA errors in specific kinetic energy. The middle panel shows the DA errors in the initial fluid thickness (at zeroth time step) and the bottom panel gives the fluid thickness errors at the first time step. Comparing the true DA error and VEEA estimates in fluid thickness, we see that the difference between them is larger than in the previous experiment. A slight disparity between red and black curves appears along the whole equator, not only in some specific regions, the corresponding correlation equals 0.96. At the same time there is no big increase in difference between true DA error in the initial specific kinetic energy in comparison to the corresponding graph in experiment 2 (see Fig. 5 top panel). Both graphs are almost the same. Doubling the size of the assimilation window does not increase the discrepancy between true DA errors and VEEA estimates error at the first time step obtained in experiment 2. Thus, the growth of error in VEEA estimates occurs as a result of the initial fluid thickness errors induced by the noisy observations.

The reduction of the number of observational sites by 1000 times slightly worsens the VEEA estimates. But the correlation still remains high and equal to 0.97 for the error in specific kinetic energy and 0.86 for the error in fluid thickness, respectively.

#### 2) Experiment 4: Zero error in velocity field and observational error in fluid thickness

We have demonstrated above that errors in the velocity field induce an error in the initial fluid thickness and result in VEEA misestimates. At the same time, we saw that if the velocity field is not assimilated, the impact of noisy fluid thickness in VEEA misestimates is negligible (experiment 1). Keeping these facts in mind, let us pose the following question: can any of the possible errors introduced in fluid thickness induce an error in the initial velocity field that would worsen the results of VEEA? Let us repeat the previous experiment setting the SNR in velocity observations equal to zero and SNR in fluid thickness observational equal to default value. In addition to noisy fluid thickness observations we also introduce noise into the fluid thickness’s forcing and background state. The results of this experiment are presented in Fig. 7. As we can see here, the true and estimated DA errors in specific kinetic energy (top panel) and initial fluid thickness (bottom panel) almost coincide. The reduction of the SNR to 0.5 did not produce any visible difference between these errors; the corresponding correlation was in the range of [0.99, 1]. Thus, we conclude that the impact of noise in fluid thickness on misestimates of VEEA is at least 40 times weaker than the impact of the errors in velocity observations.

#### 3) Experiment 5: Nonzero observational background and forcing errors in velocity field and fluid thickness

In this experiment we investigate the joint impact of observational background and forcing errors in the velocity field and fluid thickness on the VEEA misestimates. The SNRs for both errors were set to the default values. The results of this experiment are given in Fig. 8. The errors in specific kinetic energy are given in the top panel. The errors in fluid thickness at the initial (zeroth) and the first time steps are given in the middle and the bottom panel. Comparing this figure with the results of experiments with observational errors in velocity field (experiments 3–4) we see that the errors in fluid thickness do affect the structure of the DA error in kinetic energy. Nevertheless, the discrepancy between true and estimated DA errors remains at the same level as in experiment 3, where the measurement errors are presented in the velocity fields only. But as we reduce the SNR in fluid thickness to the value of 20, the discrepancy between true and estimated error vanishes. Similar to experiment 1, the further decrease of SNR in fluid thickness did not result in VEEA misestimates, until the value of SNR was lower than 0.5. At the same time, if we set the SNRs in velocity field and fluid thickness equal to 400 (which is the default value of SNR in fluid thickness) VEEA computes DA error without misestimates. Thus, we conclude that if the requirements for the linearized approximation of error estimation problem are satisfied, higher SNRs for velocity are the main reason of VEEA misestimates. The magnitude of these misestimates reduces as the SNRs in velocity approaches to the value of SNR in fluid thickness, and vanishes when they are equal.

### g. Experiments with approximate values of model and observational errors

#### Experiment 6

In previous experiments we assumed that the model and observational errors are precisely known, which is not a case for real scenarios. However, there are several routines based on 4D-Var that estimate approximate values of these errors Tremolet (2006). Let us pose the following question: how reliable would be the VEEA estimates, if only approximate values of the model and observational errors are available? In this subsection we test the quality of the VEEA estimates under such condition, repeating experiments from 1 to 5. We show here only the results obtained for the experiment 5 performed for the test case 1, because this particular experiment is representative for all the others.

In this test we simulate uncertainty in the model and observational error by adding biased and unbiased white noises to them. We also investigate the dependence of the quality of VEEA estimates on the amount of uncertainty in model and observational errors. For simplicity we will denote this ratio with symbol *λ*. The results of this experiment are presented in Figs. 9 and 10. In Fig. 9 the VEEA error estimates are compared against the true error. Here the assimilation window equals 9 h, *λ* equals 0.1, and we set the number of observational sites to the gridpoint ratio equal to 0.1. The essential difference between true and estimated DA errors is clearly seen in this figure, but in general VEEA gives satisfactory estimates. The corresponding correlation between true and estimated DA errors equals 0.75.

To complete the study complete we repeat this experiment with different ratios of observational sites to grid points, assimilation windows, and *λ*. These tests of VEEA revealed that the 2 times reduction of the size of the assimilation window results in the increase of the correlations of true and estimated errors by 0.1, while the increase of the observational sites to grid points ratio has no effect. The change of *λ* values results in changes of correlation between true and estimated DA errors. Figure 10 displays how the correlation between estimated DA and true errors in fluid thickness depends on the *λ* values. The comparison of these results reveals that VEEA gives at least satisfactory estimates if *λ* is not bigger than 12%.

## 5. Stability tests of DA system by means of VEEA

One of the questions that frequently arises in the development and usage of 4D-Var how stable is the DA system against model/observational/background errors, the number and positions of observational sites, etc.. A straightforward way to investigate this issue is to consider these errors, positions, etc. as controls to the DA system, and to compute the sensitivity of the DA estimates with respect to them. The most common method of computing it is to run (at least one) twin experiment with perturbed controls. The difference between the two DA estimates, obtained in this experiment, divided by the value of the perturbation gives the approximate value of the desired sensitivity. There is, however, an alternative way of studying such a stability issue. Let us reformulate it in terms of the DA error sensitivity problem. Namely, how is the growth of the DA error sensitive to changes of the controls? For solving it, we first compute the DA error by means of VEEA and then differentiate it with respect to controls. The result of the differentiation is the desired sensitivity. At first glance, both methods give the same result. But note one peculiarity that makes the stability study made with the aid of VEEA more attractive than the same study done by means of the twin experiment. In the case of VEEA, one deals with exact error equations, directly derived by an AD tool after applying it to the original DA problem. In contrast to the approximate sensitivity obtained in the first method, the sensitivity estimated from these equations is exact, while the computational expenses of both methods are approximately the same. Moreover, if the chosen AD tool allows us to generate differentiated models of higher order, than a differentiated model that computes sensitivities of higher order with respect to the controls can be created. The estimation of the higher-order sensitivities for the first method results in a progressive growth in the number of twin experiments, computational expenses, and approximation error. For the case of VEEA, each new differentiation increases the computational cost approximately by the cost of a single model’s forward run, while sensitivities of any order remain always exact.

## 6. Summary

The variational error estimation algorithm (VEEA) developed by Parmuzin et al. (2006) for 4D-Var is applied here to a spherical shallow-water system. A shallow-water version of the ICON model, was used for applying the VEEA to classical atmospheric test cases. The problem was to estimate the errors appearing in the initial state after application of the 4D-Var algorithm. According to Parmuzin et al. (2006), one of the key elements of the VEEA procedure is to obtain tangent linear and adjoint models. This task was solved using algorithmic differentiation. The AD-enabled NAG FORTRAN compiler builds the tangent linear and adjoint models nearly automatically, which significantly simplifies the application of the VEEA.

The VEEA performance was tested in a series of twin experiments. Several sets of observational, model, and background errors were used to test the sensitivity of the method to biased or unbiased white noise input error with different signal/error ratios and different assimilation windows. The dependence of the method on the type of flow was tested as well. In these experiments, the data assimilation error in the initial state and the forecast estimated by VEEA were compared against the true DA error. The VEEA reveals an excellent performance for all types of considered noise in a wide range of assimilation windows and flow regimes, provided that the model and observational errors are known and that the signal-to-noise ratio is sufficiently larger than one. The correlation between true and estimated errors in such cases is close to one. The visible difference between estimated and true DA errors in the initial state appears if the values of model and observational errors contain some uncertainty. The value of this difference is linearly dependent on the uncertainty level.

The difference between true and estimated errors also appears if the level of noise in observational velocity, initial state velocity, or forcing is much larger than the corresponding noise level in fluid thickness. This difference increases if the number of observational sites is several orders smaller than the number of grid points. However, even in this case it still remains small.

Our results of applying the VEEA to a shallow-water model encourage further investigation of the method in the dynamically much more challenging case of a three-dimensional model.

The authors would like to express their gratitude to the Integrated Climate System Analysis and Prediction (CliSAP) cluster who funded this research. We would like also to thank Jochem Marotzke, Florian Rauser, Philipp Griewank, Dmitriy Sein, and Johannes Lotz for helpful comments on the manuscript.

# APPENDIX

### a. Algorithmic differentiation and the AD-enabled NAG FORTRAN compiler

In this section we discuss very briefly algorithmic differentiation (AD) in general, and the AD-enabled NAG FORTRAN compiler in particular.

Detailed information on AD can be found in Naumann (2012) and Griewank and Walther (2008), for the AD compiler refer to Riehme et al. (2007), Naumann and Riehme (2005), Cohen et al. (2003), and Rauser et al. (2010) in particular. (A draft version of the user manual can be found online at https://www.stce.rwth-aachen.de/files/nagfor_ad/userguide_v0.2.pdf.)

#### 1) Algorithmic Differentiation at a glance

*F*be a mathematical function

*F*: ℝ

^{l+n}→ ℝ

^{m}:

**y**=

*F*(

**p**,

**x**) computing an output vector

**y**∈ ℝ

^{m}from inputs

**x**∈ ℝ

^{n}and input parameters

^{A1}

**p**∈ ℝ

^{l}. Differentiating

*F*with respect to

**x**yields the Jacobian matrix

**∇**

_{x}

*F*∈ ℝ

^{m×n}of

*F*:The rows of

**∇**

_{x}

*F*are formed by partial derivatives of the outputs

*y*

_{j},

*j*= 1, …,

*m*, with respect to the inputs

*x*

_{i},

*i*= 1, …,

*n*.

Here AD is applied to an implementation of the function *F* in some higher-level programming language (not to the mathematical function itself). Running the compiled program with given (that is fixed) values for the inputs of **x** and **p** (the so-called evaluation point) will compute values **y** = *F*(**p**, **x**) at the evaluation point. The AD allows to compute first and higher derivatives of **y** at the given evaluation point (**p**, **x**).

The basic principle of AD is to look at the program as a sequence of single assignment statements containing only one elemental operation (addition, multiplication, sine, cosine, …) per statement. Differentiated models are created by applying the chain rule to every element of this *code list*. This augments the original code in such a way, that derivatives of outputs **y** with respect to inputs **x** can be computed.^{A2} Looking at first-order differentiation only, two main types of differentiated models exist.

A tangent-linear model (TLM) *F*^{(1)}(**p**, **x**, **x**^{(1)}) of *F* computes sensitivities **y**^{(1)} as the product **y**^{(1)} = **∇**_{x}*F* ⋅ **x**^{(1)} of the Jacobian **∇**_{x}*F* = **∇**_{x}*F*(**p**, **x**) of *F* with a vector **x**^{(1)} ∈ ℝ^{n} of tangents for the inputs **x**. Usually the code of the TLM is blended with the function evaluation code. Thus, outputs **y** = *F*(**p**, **x**) will be computed by the evaluation of the TLM, too.

Note that only Jacobian projections can be computed. If the complete Jacobian is required, a sequence of TLM evaluations with **x**^{(1)} ranging over the *n* Cartesian basis vectors has to be performed. In case that *k* > 1 input tangent vectors need to be propagated simultaneously, the so-called vector mode of AD can increase the performance, where the tangents of elemental operations are *k*-elemental vectors.

An adjoint model (AM) *F*_{(1)}(**p**, **x**, **y**_{(1)}) of *F* computes adjoints **x**_{(1)} of inputs **x** as the product **x**_{(1)} = **∇**_{x}*F*(**p**, **x**)^{T} ⋅ **y**_{(1)} of the transposed Jacobian **∇**_{x}*F*(**p**, **x**)^{T} of *F* at the given evaluation point with adjoints **y**_{(1)} of the outputs **y**.

An AM consists of two parts: in the *forward sweep* of an AM the augmented model code stores all relevant information for the *reverse sweep*. Function values **y** = *F*(**p**, **x**) might be available at the end of the forward sweep. In the following *reverse sweep* every single assignment statement has an adjoint counterpart, but the order of adjoint statements is the other way around: Adjoints of outputs will be propagated backward through the computation of the original model to adjoints of inputs. Note that in case of a scalar valued function *F* the complete gradient of *F* can be evaluated in one adjoint model run. Multiple adjoints might be computed in an adjoint vector mode.

Differentiated models of higher order can be obtained by combinations of both basic types (Naumann 2012; Griewank and Walther 2008).

#### 2) The AD-enabled NAG FORTRAN compiler

The AD-enabled NAG FORTRAN compiler (Riehme et al. 2007; Naumann and Riehme 2005; Cohen et al. 2003) (hereafter referred to as “the compiler”) is a joint effort of NAG (Oxford, United Kingdom), and STCE (LuFG 12, RWTH Aachen, Germany). The compiler successfully generated an adjoint model (Rauser et al. 2010) of the shallow-water version (hereafter ICON-SW) of the ICON model (Bonaventura and Ringler 2005), which was chosen as a discrete representation of the system of model equation (1) in this paper, too.

The compiler generates differentiated models by a hybrid approach that combines both source transformation (compile time) and overloading (e.g., run time) techniques resulting in a very robust AD tool. The overloading part comes as a set of run-time support libraries based on the tool derivative code by overloading for FORTRAN (dco/FORTRAN) for different modes of differentiation. Each provides an active data type, overloaded versions of operators, and intrinsic functions for the active type together with the user interface to access derivatives.

An adjoint model based on overloading requires to record all elemental operations performed during an model evaluation (forward sweep). Adjoints are propagated through the recording (the so-called *tape*) reversely from outputs to inputs (reverse sweep). In Rauser et al. (2010) we discuss the tape structure used to computed adjoints for ICON-SW in more detail by a first-order adjoint model of a simple example program.^{A3}

For an implementation of the EEP discussed in this paper, an adjoint model of ICON-SW is required, that allows to compute tangents of ICON-SW in the forward sweep. The tangents are the propagated DA errors used in the cost functional of the EEP. This combined first-order tangent linear and adjoint model if ICON-SW was created by replacing the first-order adjoint run-time library of the AD-enabled NAG FORTRAN compiler by the first-order tangent linear and adjoint library: All we had to do was to replace the compiler option -AD_DCO_A1S by -AD_DCO_T1S_A1S in the makefiles and recompile.

### b. Realization of the error estimation algorithm with the aid of the AD-enabled NAG FORTRAN compiler

The error optimization problem in (4) and (6) will be solved by simple gradient descent method. The required gradient is computed by a differentiated model build by the AD-enabled NAG FORTRAN compiler introduced in the last section.

For the sake of simplicity we discuss here the DA error estimation for fluid thickness with errors *ξ* only. Other cases with both DA errors have a more complicated structure, but the technique of their construction is similar to this simplified case.

Since we skip the DA error ** η** of the velocity field

**v**in the following section, no differentiation with respect to

**v**is necessary. Thus,

**v**will be associated with the passive inputs

**p**in (A2).

#### 1) Discrete model: ICON-SW

As a discrete representation of the system of model equation (1) the shallow-water version (hereafter ICON-SW) of the ICON model (Bonaventura and Ringler 2005) was chosen.

**v**

_{j},

*h*

_{j}),

*j*= 0, …,

*K*, denote the discrete state vector at time step

*j*corresponding to time

*t*

_{j}= (

*j*/

*K*− 1) ×

*T*,

*t*∈ [−

*T*, 0]. The discrete initial condition for time

*t*= −

*T*(e.g.,

*j*= 0) reads as (

**v**

_{0},

*h*

_{0}) = (

**v**|

_{t=−T},

*h*|

_{t=−T}) from the initial condition in (2). Let

**v**

_{j−1},

*h*

_{j−1}) = (

**v**

_{j},

*h*

_{j}),

*j*= 1, …,

*T*, denote a single iteration of ICON-SW, with

*j*∈ [1, …,

*K*] is given by

We assume that the space discretization is done by a grid of *N* cells. Thus for every time step the function (**v**_{j}, *h*_{j}) = **v**_{0}, *h*_{0}) of ICON-SW is a mapping ^{j}: ℝ^{N+N} → ℝ^{N+N}. Since we investigate the DA errors *ξ* of the fluid thickness only, we will look at the differentiation of ^{j} with respect to *h*_{0} only. Therefore, the Jacobian *N* × *N* matrix.

#### 2) Construction of the ETLM and the cost functional

*ξ*

_{j}=

*ξ*

_{j}(

**v**

_{0},

*h*

_{0},

*ξ*

_{0}) can be computed as tangents (directional derivatives) of the original problem for all time steps

*j*= 1, …,

*K*:Equation (A4) show the assumed equivalence of the DA errors

*ξ*

_{j}of

*h*

_{j}with the directional derivative of the ICON-SW (i.e.,

*j*times) in direction of the initial DA errors

*ξ*

_{0}. The directional derivative can be written as the product of the Jacobian

*h*

_{0}and the initial errors

*ξ*

_{0}[(A5)]. But this product is exactly what a tangent linear model

^{(1)}(

**v**

_{0},

*h*

_{0},

*ξ*

_{0}) of ICON-SW will compute after

*j*evaluations, if the initial DA errors are provided as input tangents for the height field

*h*[in terms of (A6): (

**p**,

**x**,

**x**

^{(1)}) = (

**v**

_{0},

*h*

_{0},

*ξ*

_{0})]. Thus, the desired DA errors

*ξ*

_{j}for

*j*= 1, …,

*K*can be computed simply by repeated evaluations of the tangent linear model

^{(1)}(

**v**

_{o},

*h*

_{0},

*ξ*

_{0}) of ICON-SW.

In Fig. A1 we show the first part of the driver program that solves the EEP for ICON-SW with a simple steepest descent mode. We use a pseudoprogramming language based on FORTRAN, but enhanced with mathematical symbols for variables that corresponds to the symbols used in the formulas in this paper. Discretized variables **v**, *h*, *ξ*, and *ξ*^{obs} are stored as matrices, where the first index is used for the time and the second index for space discretization. Hence, ** ξ**(0,:) denotes the vector

*ξ*

_{0}of initial errors, and

*h*(

*j*,:) denotes the height field

*h*

_{j}of time step

*j*. Moreover we assume in the pseudocode that unspecified numbers

*K*(time steps) and

*N*(grid cells) can be used for the declarations of variables. In the real code most variables are allocatable arrays, thus both

*K*and

*N*will be specified by data from input files at run time.

In the presented pseudo code no total separation between the ETLM and the adjoint part is possible: our adjoint approach is based on overloading, thus, we need to record the computation, which that has to be adjoined, on a so-called tape first. This can be done together with the propagation of tangents for the ETLM. Therefore, we have a mixture of ETLM and tape-handling code in Fig. A1. All routines named with DCO_T1S_A1S are interface routines provided by run-time support library for mode T1S_A1S of the AD compiler [details can be found in the user documentation of dco/FORTRAN (https://www.stce.rwth-aachen.de/files/nagfor_ad/userguide_v0.2.pdf)].

In line 2 of Fig. A1 the module DCO_T1S_A1S is included that provides the active data type DCO_T1S_A1S_TYPE that allows to propagate tangents and record the computation on the tape for a later adjoint propagation. In lines 4–5 we define the active variables: state variables **v** and *h*, errors *ξ* in *h*, and the cost functional value *J*. Line 6 declares the observations *ξ*^{obs} and line 7 declares the update Δ_{ξ} for *ξ*_{0}.

Data from input files (call of INIT in line 13) has not to be differentiated, thus we turn of the tape machinery first in line 11 (this also the first line of the forward sweep of the adjoint code). After the initialization the tape recording is started by creating the required tape structures (line 15). The adjoint forward sweep code section is finished by registering input variables (here *h* only) we want to compute adjoints for (line 17).

The code of the ETLM starts in line 18. We set the tangent of *h*_{0} to the initial errors *ξ*_{0} by calling the interface routine DCO_T1S_A1S_SET (here for active variables of one dimension). The first argument specifies the variable we want to set values for, the second specifies the values to be used, and the third specifies the kind of data to be set: “1” stands for first-order tangent, “−1” would be the first-order adjoint, and “0” the value of the variable in turn. In the loop of lines 20–22 the differentiated code of the ICON-SW time step is called, where the first two arguments denote inputs (**v**_{j−1}, *h*_{j−1}), and the last two are outputs (**v**_{j}, *h*_{j}).

The call in line 24 finishes the tape recording for the adjoint computation. Why we do not tape the cost functional becomes clear in the next paragraph about the adjoint model.

*h*in line 26 by the counterpart of the set routine, now for two-dimensional fields. Thus, we can compute the discretized cost functionalin line 28 in Fig. A1.

#### 3) Construction of Adjoint model for ETLM

A gradient-based method to optimize the cost functional *J*(*ξ*, *ξ*^{obs}) from (A7) with respect to the initial DA error *ξ*_{0} requires to compute the gradient *J* = *J*(*ξ*, *ξ*^{obs}) with respect to *ξ*_{0}.

*ξ*=

*ξ*(

*ξ*

_{0}) as inputs of cost functional

*J*were computed as tangents of the ICON-SW by an tangent linear model. To compute the gradient

*ξ*on the initial DA errors

*ξ*

_{0}. With a naive application of the AD compiler to ICON-SW and the code of the cost functional

*J*, a second-order adjoint model seems necessary. But the second-order differentiation can be avoided: a combination of an adjoint model of ICON-SW generated by the compiler with a handwritten derivative code of the cost functional computes the desired gradient more efficient. For that we differentiate

*J*(

*ξ*,

*ξ*

^{obs}) with respect to the initial DA error

*ξ*

_{0}by hand:The DA error

*j*is according to (A4) the product of the Jacobian

*j*evaluations of ICON-SW with the vector

*ξ*_{0}of initial DA errors: for fixed

*j*and

*i*we havethat is the

*i*th row

^{A4}of the Jacobian

*t*as a column vector. Thus, we can write the entire gradient

*J*in terms of columns

*i*= 1, …,

*N*, of the transposed Jacobian as

*ξ*

_{ji}in Eq. (A9)]the inner sum of (A11) can be expanded towhich is the product of the transposed Jacobian

*α*_{j}∈ ℝ

^{N}. But this projection of the transposed Jacobian can be computed efficiently with a first-order adjoint model

*α*_{j}is used as initial adjoint

*h*

_{(1),j}of the output

*h*

_{j}.

The outer sum over the transposed Jacobian projections for time steps *j*, *j* = 0, …, *K*, in (A11) will be computed by a single reverse propagation step over all time steps. For this purpose the initial adjoints of the complete height field *h* are set in line 31 of Fig. A2 to the scalars *α*_{ji}, *j* = 0, …, *K*, *i* = 1, …, *N*, which we have from the differentiation by hand in (A9). The third parameter of DCO_T1S_A1S_SET is now −1 indicating that the first-order adjoints of *h* have to be set. In line 33 the adjoints propagated backward through the entire tape, that is through time steps *K*, *K* − 1, …, 1 of the recorded ICON-SW evaluation in reverse order. After the interpretation of the recorded tape the desired gradient _{ξ} for *ξ*_{0} from the adjoint part of *h*_{0} (line 35). The steepest descent step is done in line 37 by the call in line 39 the allocated tape is prepared to record the next iteration of the VEEA. The test in line 41 decides if a next iteration is required. If yes, then the execution jumps to line 17. Otherwise, the program terminates after releasing the memory of the tape (line 43).

## REFERENCES

Belanger, E., , and A. Vincent, 2005: Data assimilation (4D-Var) to forecast flood in shallow-water with sediment erosion.

,*J. Hydrol.***300**, 114–125, doi:10.1016/j.jhydrol.2004.06.009.Bonaventura, L., , and T. Ringler, 2005: Analysis of discrete shallow-water models on geodesic Delaunay grids with c-type staggering.

,*Mon. Wea. Rev.***133**, 2351–2373, doi:10.1175/MWR2986.1.Cohen, M., , U. Naumann, , and J. Riehme, 2003: Toward differentiation-enabled Fortran 95 compiler technology.

*Proc. 2003 ACM Symp. on Applied Computing*, Melbourne, FL, ACM, 143–147.Dee, D. P., 2005: Bias and data assimilation.

,*Quart. J. Roy. Meteor. Soc.***131**, 3323–3343, doi:10.1256/qj.05.137.Gejadze, Y., , F.-X. LeDimet, , and V. Shutyaev, 2008: On analysis error covariances in variational data assimilation.

,*SIAM J. Sci. Comput.***30**, 1847–1874, doi:10.1137/07068744X.Gejadze, Y., , G. Copeland, , F.-X. L. Dimet, , and V. Shutyaev, 2011: Computation of the analysis error covariance in variational data assimilation problems with nonlinear dynamics.

,*J. Comput. Phys.***230**, 7923–7943, doi:10.1016/j.jcp.2011.03.039.Gillijns, S., , and B. D. Moor, 2007: Model error estimation in ensemble data assimilation.

,*Nonlinear Processes Geophys.***14**, 59–71, doi:10.5194/npg-14-59-2007.Griewank, A., , and A. Walther, 2008:

*Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation.*2nd ed. Society for Industrial and Applied Mathematics, 460 pp.Griffith, A., , and N. K. Nichols, 2000: Adjoint methods in data assimilation for estimating model error.

,*Flow Turbul. Combust.***65**, 469–488, doi:10.1023/A:1011454109203.Gunzburger, M., 2003:

*Perspectives in Flow Control and Optimization.*Society for Industrial and Applied Mathematics, 261 pp.Lahoz, W., , B. Khattatov, , and R. Ménard, 2010: Data assimilation and information.

*Data Assimilation*, W. Lahoz et al., Eds., Springer, 3–13.Läuter, M., , D. Handorf, , and K. Dethloff, 2005: Unsteady analytical solutions of the spherical shallow water equations.

,*J. Comput. Phys.***210**, 535–553, doi:10.1016/j.jcp.2005.04.022.Lermusiaux, P., , and M. A. Robinson, 1999: Data assimilation via error subspace statistical estimation. Part I: Theory and schemes.

,*Mon. Wea. Rev.***127**, 1385–1398, doi:10.1175/1520-0493(1999)127<1385:DAVESS>2.0.CO;2.Ménard, R., 2010: Bias estimation.

*Data Assimilation*, W. Lahoz et al., Eds., Springer, 113–136.Naumann, U., 2012:

*The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation.*Society for Industrial and Applied Mathematics, 358 pp.Naumann, U., , and J. Riehme, 2005: A differentiation-enabled Fortran 95 compiler.

*ACM Trans. Math. Software,***31,**458–474, doi:10.1145/1114268.1114270.Parmuzin, E., , F.-X. LeDimet, , and V. Shutyaev, 2006: On error analysis in variational data assimilation problem for a nonlinear convection-diffusion model.

,*Russ. J. Numer. Anal. Math. Modell.***21**, 169–183, doi:10.1515/156939806776369483.Pedlosky, J., 1987:

*Geophysical Fluid Dynamics.*2nd ed. Springer-Verlag, 710 pp.Rauser, F., , J. Riehme, , K. Leppkes, , P. Korn, , and U. Naumann, 2010: On the use of discrete adjoints in goal error estimation for shallow water equations.

*Proc. Comput. Sci.,***1,**107–115, doi:10.1016/j.procs.2010.04.013.Riehme, J., , U. Naumann, , and B. Christianson, 2007: The differentiation-enabled NAGWare Fortran compiler.

*Proc. Appl. Math. Mech.,***7,**1 140 207–1 140 208, doi:10.1002/pamm.200700928.Simmons, A. J., , and A. Hollingsworth, 2002: Some aspects of the improvement in skill of numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***128**, 647–677, doi:10.1256/003590002321042135.Talagrand, O., , and P. Courtier, 1987: Variational assimilation of meteorological observations with adjoint vorticity equation. I: Theory.

,*Quart. J. Roy. Meteor. Soc.***113**, 1311–1328, doi:10.1002/qj.49711347812.Tremolet, Y., 2006: Accounting for an imperfect model in 4D-Var.

,*Quart. J. Roy. Meteor. Soc.***132**, 2483–2504, doi:10.1256/qj.05.224.Ursell, F., 1953: The long-wave paradox in the theory of gravity waves.

,*Math. Proc. Cambridge Philos. Soc.***49**, 685–694, doi:10.1017/S0305004100028887.Williamson, D. L., , J. B. Drake, , J. Hack, , J. Rudiger, , and P. Swarztrauber, 1992: A standard test set for numerical approximations to the shallow water equations in spherical geometry.

,*J. Comput. Phys.***102**, 211–224, doi:10.1016/S0021-9991(05)80016-6.Zupanski, D., , and M. Zupanski, 2006: Model error estimation employing an ensemble data assimilation approach.

,*Mon. Wea. Rev.***134**, 1337–1354, doi:10.1175/MWR3125.1.

^{A1}

More common would be to define *F* as *F*(**x**, **p**) instead of *F*(**p**, **x**). But the notation used here is more suitable the numerical model we discuss in the next section.

^{A2}

The input **p** is considered passive: No derivatives of **y** with respect to **p** are requested.

^{A3}

The paper from 2010 uses the interface of the compiler version of the year 2010. Meanwhile all tools developed at STCE use a unified interface based on the notation developed and described in Naumann (2012). Moreover, the compiler in its overloading part shares the run-time support libraries with our FORTRAN overloading tool dco/FORTRAN, and these are based our C++ AD tool dco/c++.

^{A4}

For a matrix *k*th row of *k*th column of