## 1. Introduction

In atmospheric or oceanic applications of the Kalman filters, the growing number of available observations often leads to a prohibitive cost of the observational update (analysis step), and to the necessity of simplifying the problem. Ad hoc solutions must be found to make the problem numerically tractable. One first option is to synthesize the observational information by aggregating observations in superobservations, or even by dropping the least useful or most redundant measurements (data thinning). Another option is to transform the original algorithm and reduce its computational complexity by taking advantage of prior hypotheses on the error statistics (i.e., on the shape of the state and observation error covariance matrices). Simplifications are thus applied on the error second-order statistical moments (which are anyway only approximately known) rather than on the observations themselves. Of course, these two options are not mutually exclusive; they can interact with and complement each other. As explained in Rabier (2006), the need for data thinning can also result from over simplistic assumptions in the parameterization of the observation error covariance matrix. For instance, with a suboptimal scheme neglecting observation error correlations, decreasing the observation density can help improving the accuracy of the estimation (Liu and Rabier 2002, 2003). In this paper, we propose to reduce the numerical cost of the observational update by using simplified (but rather general) parameterizations of the observation error covariance matrix. The expected consequence is that, with improved efficiency, together with sufficient accuracy and robustness in the representation of the observation error covariance matrix, this method can substantially reduce the need for data thinning.

If the forecast error covariance matrix is available in square root form, as in square root or ensemble Kalman filters, it is possible to use a modified observational update algorithm (proposed by Pham et al. 1998), whose computational complexity is linear in the number of observations (instead of being cubic in the standard formula), providing that the observation error covariance matrix can be inverted at low cost, as for instance if it is diagonal. It is the purpose of this paper to introduce specific parameterizations of the observation error correlations that preserve the numerical efficiency of that modified algorithm. This can be done (i) by expressing the observation error covariance matrix as the sum of a diagonal and a low rank matrix, or (ii) by applying a linear transformation to the observation vector (and assuming uncorrelated observations in the transformed space). It is interesting to note that, in parameterization ii, nonsquare transformation matrices are possible, which means that the observation vector can be augmented with new observations that are linear combinations of the original observations. Both parameterizations are presented in section 2 of this paper. In section 3, a specific choice of linear transformation, consisting of augmenting the observation vector with gradients of the original observations, is studied in more detail.

In section 4, the algorithm is applied to ocean altimetric observations, as simulated by a 1/4° model of the tropical Atlantic Ocean, and focusing on the North Brazil Current. It is known indeed that altimetric observation errors are spatially correlated, because, for example, of orbit errors or atmospheric correction errors. Moreover, these correlations are important to take into account, because they can directly improve the quality of the observational update (especially for the dynamic height gradient, and thus for velocities), and the accuracy of the associated error estimates. In the North Brazil Current, the ratio between signal amplitude (about 5 cm) and typical observational noise (about 4 cm) remains moderate: the signal is marginally observed; this example is thus appropriate to show the importance of accounting for error correlations to reconstruct the ocean circulation, and to check the validity of our simplified parameterizations.

## 2. Parameterization of the observation error covariance matrix

*ϵ*is defined as the difference between the observation vector

**y**(size

*y*) and the observation counterpart in the true state

**x**

*(size*

^{t}*x*):

*y*×

*x*) is the observation operator. The specification of the observation error statistics thus always requires defining properly the truth of the problem (Cohn 1997; Kalnay 2003), which generally amounts to identifying the exact scope of the estimation problem. In atmospheric or oceanic applications, this is usually done by restricting the range of resolved scales in space and time, using for instance a filtering or averaging operator acting on the continuous state of the atmospheric or oceanic system. Observation error thus not only includes a measurement error, but also a representation error that results from this restriction in the scope of the problem. In this paper, it is assumed that the total observation error

**is characterized by a zero mean 〈**

*ϵ***〉 = 0 (unbiased observations) and a known covariance matrix 𝗥 = 〈**

*ϵ**ϵϵ*

^{T}〉. Our purpose is to introduce efficient approximate parameterizations of this known observation error covariance matrix for use in square root or ensemble Kalman filters.

### a. Observational update in square root or ensemble Kalman filters

*δ*

**x**of the model state vector is

*δ*

**y**=

**y**− 𝗛

**x**

*is the innovation vector, representing the difference between the observation vector*

^{f}**y**(size

*y*) and the model equivalent to the observation in the forecast state vector

**x**

*(size*

^{f}*x*), and 𝗣

*(*

^{f}*x*×

*x*) is the forecast error covariance matrix. The computational complexity (leading behavior for large

*x*and

*y*) of this standard formula is

*C*

_{0}, it is assumed that a linear system is solved to compute (𝗛𝗣

*𝗛*

^{f}^{T}+ 𝗥)

^{−1}

*δ*

**y**, with asymptotic complexity

*y*

^{3}/6 (for a symmetric matrix). The second terms in

*C*

_{0}corresponds to the left multiplication by (𝗛𝗣

*)*

^{f}^{T}. In addition, the cost of application of the observation operator 𝗛 is assumed negligible throughout this discussion. It is negligible for instance if every observation is related to a small number of state variables. If 𝗛 is more complex, it is straightforward to add the cost of 𝗛 to the computational complexity formulas and transform the conclusions accordingly.

*x*,

*y*and

*r*) is

*r*is the number of columns in 𝗦

*(the maximum rank of 𝗣*

^{f}*). The first term corresponds to the computation of the*

^{f}*r*×

*r*matrix between brackets; the second term, to the solution of the linear system; and the last term, to the left multiplication by 𝗦

*. The main difference between formula (5) and formula (2) is that the linear system to solve is of size*

^{f}*r*(complexity

*r*

^{3}/6) instead of

*y*(complexity

*y*

^{3}/6).

*C*

_{1}is

*linear*in the number of observations

*y*(a property that disappears if a general matrix 𝗥 is inverted). With formula (5), larger observation vectors become numerically tractable. Asymptotically, for large values of

*x*,

*y*, and

*r*, with fixed ratios

*y*/

*x*and

*r*/

*x*(this means in practice that any of these numbers is small with respect to the products of the other two:

*x*≪

*yr*,

*y*≪

*xr*,

*r*≪

*yx*), the gain factor that is obtained by using formula (5) instead of formula (2) simplifies (only the cubic terms remain) to

*r*≥

*x*), formula (5) is cheaper than formula (2) (asymptotically) as soon as

*y*/

*r*> 2.53. But the benefit of formula (5) becomes really clear in the small rank problems (

*r*≪

*x*), that result from the application of reduced rank or ensemble Kalman filters. In these problems, it is often useful to reach very small

*r*/

*y*ratios, for which formula (5) is by far preferable. Nevertheless, the main drawback of using formula (5) is that it leads to assuming a diagonal observation error covariance matrix. It is the purpose of this paper to show how it is possible to introduce parameterizations of the observation error correlations that preserve the numerical efficiency of formula (5).

It is worth noting here that, in realistic applications, the observational update is often performed locally by dividing the full model state in subdomains, and by performing a separate observational update for every subdomain using a subset of the global observation dataset (see, e.g., Anderson 2003; Houtekamer and Mitchell 1998; Ott et al. 2004; Tippett et al. 2003). With local methods, the size of the observation vector can be significantly reduced with respect to a global observational update, thus modifying the computational complexity (3) and (7) of the algorithms and the gain factor (8) that is obtained by using formula (5) instead of formula (2). The same expressions can however still be applied providing that *x* and *y* are defined as the size of the *local* state and observation vectors. In addition, if *r* is still the number of columns in 𝗦* ^{f}*, it can usually be set smaller with local methods. The use of low-rank 𝗣

*parameterizations (or small size ensembles) is indeed one important reason for which local methods are required (Houtekamer and Mitchell 1998).*

^{f}### b. Observational update of the error covariance in square root or ensemble Kalman filters

*is the updated (analysis) error covariance matrix. The computational complexity (leading behavior for large*

^{a}*x*and

*y*) of this standard formula is

*C*

_{0}if formulas (2) and (9) are applied together because most operations of formula (2) are included in formula (9).

^{a}= 𝗦

^{a}𝗦

^{aT}) using the formula

*x*,

*y,*and

*r*) of formula (11) is

*r*×

*r*matrix between brackets, the second term includes the computation of the inverse matrix and the Cholesky decomposition of the inverse (the cheapest square root), and the last term corresponds to the left multiplication by 𝗦

*. This complexity includes complexity*

^{f}*C*

_{1}if formulas (5) and (11) are applied together because most operations of formula (5) are included in formula (11).

*C*

_{1}

^{P}is linear in the number of observations

*y*. However, new cubic terms appear so that the (asymptotic) gain factor is not as simple as (8):

*x*/

*y*remains moderate, the conclusions of section a remain valid: the gain behaves proportionally to (

*r*/

*y*)

^{2}for small

*r*/

*y*. And if

*x*/

*y*is large, formula (11) is even more favorable since the gain behaves like (

*r*/

*y*)

^{2}

*y*/

*x*for small

*r*/

*y*. It is also worth noting that with formula (11), the additional cost of computing the update of error covariance, with respect to formula (5) is usually moderate, behaving at most (for small

*r*/

*y*) like

*C*

_{1}

^{P}/

*C*

_{1}∼ 1+

*x*/

*y*.

*C*

_{0}and

*C*

_{1}are not simply multiplied by the size

*r*of the ensemble because it is cheaper here to explicitly invert the matrix, rather than solving the

*r*linear systems, so that (3) transforms to

*𝗛*

^{f}^{T}+ 𝗥, the second term includes the computation of the matrix 𝗛𝗣

*𝗛*

^{f}^{T}from the square root representation and the application of the inverse matrix to the ensemble innovations, and the third term corresponds to the left multiplications by (𝗛𝗣

*)*

^{f}^{T}to obtain the ensemble corrections. On the other hand, (7) transforms to

*r*×

*r*matrix between brackets and the application of the inverse matrix to the ensemble innovations, the second term corresponds to the inversion of the matrix between brackets, and the last term to the left multiplication by 𝗦

*to obtain the ensemble corrections. New cubic terms appear in (14) and (15) so that the gain factor is analog to (13):*

^{f}*x*/

*y*remains moderate [

*x*/

*y*≪ (

*y*/

*r*)

^{2}], the conclusions of section a remain qualitatively the same: the computational complexity

*C*

_{1}

^{E}is linear in

*y*, and for small

*r*/

*y*, the gain (16) behaves proportionally to (

*r*/

*y*)

^{2}. [If

*x*/

*y*is large, the leading behavior of

*C*

_{1}

^{E}/

*C*

_{0}

^{E}for small

*r*/

*y*is proportional to

*r*/

*y*instead of (

*r*/

*y*)

^{2}, because the leading terms in (14) and (15) become the last terms, proportional to

*x*, whose ratio is equal to

*r*/

*y*.]

### c. Modal parameterization of the observation error covariance matrix

*y*×

*q*) of the positive definite symmetric matrix 𝗗

^{−1/2}𝗥𝗗

^{−1/2}− 𝗜. Using (6), the inverse of 𝗥 can be written

^{−1}requires the computation of:

*is the Cholesky decomposition of (𝗜 +*

^{T}**Θ**

^{T}

**Θ**)

^{−1}. The leading behavior of the additional computational complexity comes from the second term of (19):

**Θ**

^{T}

**Θ**), the second term to the multiplication 𝗔 = (𝗛𝗦

*)*

^{f}*𝗗*

^{T}^{−1/2}

**Θ**, the third term includes the inversion of (𝗜 +

**Θ**

^{T}

**Θ**)and the Cholesky decomposition of the inverse, and the two last terms are matrix multiplications (𝗕 = 𝗔𝗟 and 𝗕

^{T}𝗕).

Formula (5) or (11) with parameterization (17) for 𝗥 can only be advantageous with respect to formula (2) or (9) (i.e., *C*_{1} + *C*_{1}^{R} < *C*_{0}, *C*_{1}^{P} + *C*_{1}^{R} < *C*_{0}^{P} or *C*_{1}^{E} + *C*_{1}^{R} < *C*_{0}^{E}) if the number of columns *q* of Θ is small with respect to the number of observations (*q* ≪ *y*); that is, if the observation error covariance matrix 𝗥 is the sum of a diagonal matrix and a low rank matrix. [This is why expression (17) is chosen: the diagonal term is necessary to make the matrix regular.] If this can be done, the computational complexity remains linear in the number of observations *y* and the numerical efficiency of formulas (5) and (11) is preserved.

A further difficulty is that Θ needs to be computed. Obviously, it cannot be computed by decomposition of a full size 𝗥 matrix (followed by rank reduction), because the computational complexity of such operation is again proportional to *y*^{3}. A possibility (for spatially distributed observations) is to define the correlated part of 𝗥 at the nodes of a regular grid (𝗥* ^{g}*), compute the square root 𝗥

*=*

^{g}**Θ**

^{g}

**Θ**

^{gT}(once for all) on that grid (with rank reduction if possible), and interpolate the modes

**Θ**

^{g}at observations locations (for every spatial distributions of the observations) to obtain

**Θ**= 𝗛

^{g}**Θ**

^{g}. Such approximation is valid if the error modes

**Θ**

^{g}contain only scales that are large against the regular grid resolution; that is, if the 𝗥 matrix can be represented by the superposition of a white noise (the diagonal part 𝗗) and a large-scale red noise (the correlated part 𝗗

^{1/2}

**Θ**

**Θ**

^{T}𝗗

^{1/2}). In such case, two observations that are close together (much closer than the red noise correlation scales) are assumed fully independent with respect to the white noise, and fully dependent with respect to the red noise.

This parameterization is thus particularly efficient if the typical distance between observations is small with respect to the correlation scales, because then the number *q* of error modes can be made small with respect to the number of observations *y* (*q* ≪ *y*), and the additional cost *C*_{1}^{R}, given by (21), remains tractable (asymptotically for large *y*): the linear term in *y* is only increased to *y*(*r* ^{2} + *q*^{2} + *rq*) instead of *yr* ^{2}. In other situations, this parameterization cannot preserve the efficiency of formulas (5) and (11) and other solutions must be found (see next sections).

An even more efficient parameterization can be built by using directly a reduced rank parameterization for the inverse observation error covariance matrix 𝗥^{−1} = **Θ****Θ**^{T}, with square root Θ(*y* × *q*), *q* ≪ *y*. With respect to parameterization (17), the linear term in *y* is reduced to *y*(*r* ^{2} + *rq*) instead of *y*(*r* ^{2} + *q*^{2} + *rq*). Such a simplified parameterization is used in the oceanographic applications described in Brankart et al. (2003) and Testut et al. (2003). However, singular parameterizations of 𝗥^{−1} are dangerous because they imply that the null space of 𝗥^{−1} is assumed unobserved (infinite observation error), which may lead to neglecting important observational information. Again, this amounts to building superobservations (presumably the most useful ones) by projecting the original observations on the error modes (the columns of **Θ**), and dropping everything that is orthogonal to that. In this paper, we prefer to follow our original plan to keep all observations and thus only propose regular parameterizations of 𝗥.

### d. Simulating correlations by linear transformation of the observation vector

*that must be used if the matrix is rank-deficient. In this cost function, we can transform the observation vector by a regular (rank equal to*

^{f}*y*) linear transformation operator 𝗧:

*δ*

**y**

^{+}= 𝗧

*δ*

**y**, 𝗛

^{+}= 𝗧𝗛, in such a way that

*J*remains unchanged, providing that the observation error covariance matrix is also transformed according to

^{+}in a transformed observation space. An immediate solution is to choose 𝗥

^{+}as the matrix of eigenvalues of 𝗥 and 𝗧 as the matrix with the corresponding normalized eigenvectors (so that 𝗥 = 𝗧

^{T}𝗥

^{+}𝗧, with 𝗧 unitary and 𝗥

^{+}diagonal). Obviously, this is not the solution that we are looking for, since the computational complexity of the eigenvalue problem is again proportional to

*y*

^{3}.

Moreover, for a general linear operator 𝗧, the computational complexity of the application of the operator (e.g., to compute *δ***y**^{+} = 𝗧*δ***y**) is equal to *yy*^{+}, where *y*^{+} is the size of the transformed observation vector (*y*^{+} ≥ *y* for a regular transformation). Hence, this complexity can only be linear in *y* if the structure of 𝗧 is simple. It can even become negligible (asymptotically) if every transformed observation (in the vector **y**^{+}) is related to a small number of original observations. (It is the same argument that leads to neglecting the cost of 𝗛, see section 2a.) On the other hand, because the cost of the observational update is linear in *y* in formulas (5) and (11), we have the freedom to imagine a transformation 𝗧 that increases the number of observations (*y*^{+} > *y*), without prohibitive consequence on the numerical cost. Essentially, as soon as 𝗧 is known and is simple enough, the same computational complexity as (7), (12), and (15) applies, with *y* replaced by *y*^{+}. (Thus the relative cost is multiplied by *y*^{+}/*y*.) An example of such simple transformation, consisting of adding gradient observations to the original observation vector, is examined in section 3.

In addition, it is interesting to point out that, with uncorrelated observation errors in the transformed observation space, the observational update described by formulas (2) and (9) can be replaced by a repeated application of these formulas, using the observations in **y**^{+} one by one. The updated **x*** ^{a}*, 𝗣

*obtained at each stage of the sequence are used as background state and background error covariance (*

^{a}**x**

*and 𝗣*

^{f}*) for the next update. This is the serial processing algorithm that is also often used in ensemble filtering to reduce the numerical cost (at the expense of the assumption that observation errors are independent). By constructing an augmented observation vector with a diagonal error covariance matrix, the transformation method proposed in this paper thus also allows the application of this serial algorithm in presence of observation error correlations. On the other hand, in many applications, there can be several observation datasets with independent errors (e.g., if they originate from different instruments) so that the matrix 𝗥 is block-diagonal. Such observation error covariance matrices can also be easily simulated by this method using separate transformations to the corresponding segments of the observation vector, for instance by augmenting the observation vector with discrete gradients computed inside each observation dataset.*

^{f}*x*(

**′) is assumed observed by a continuous observation**

*ξ**y*(

**) through a general linear observation operator**

*ξ***ε**(

**) is the observational noise. Then (22) becomes:**

*ξ**δw*(

**) is the observation residual:**

*ξ*^{(−1)}(

**,**

*ξ***) is the inverse observation error covariance:**

*η*^{f(− 1)}(

**′,**

*ξ***′) is the inverse (or pseudoinverse) forecast error covariance. If the observation**

*η**y*(

**) is transformed by a general linear transformation**

*ξ**J*remains unchanged if

^{+}(

*ξ*^{+},

*η*^{+}) =

^{+}(

*ξ*^{+})

*δ*(

*ξ*^{+},

*η*^{+}), this last formula simplifies to

*ξ*^{+},

**) and the observation error variance in the transformed space**

*ξ*^{+}(

*ξ*^{+}), the corresponding observation error covariance in the original space

**,**

*ξ***) can be computed using (30) together with (27).**

*η*## 3. Simulating correlations by adding gradient observations

### a. One-dimensional problem

*y*(

*ξ*) is the original observation (where

*ξ*is a curvilinear abscissa along the line), the transformed observation vector is then composed of the original function together with its first derivative:

_{1}(

*ξ*

^{+},

*ξ*) =

*δ*(

*ξ*

^{+}−

*ξ*) is the identity operator and

_{2}(

*ξ*

^{+},

*ξ*) is the derivative operator. Assuming that

^{+}(

*ξ*

^{+}) is spatially homogeneous:

*σ*

_{0}is the observation error standard deviation and

*σ*

_{1}is the gradient error standard deviation, (30) can be rewritten as

_{1}and

_{2}, and taking benefit of the homogeneity of the solution

*ξ*,

*η*) =

*ρ*) with

*ρ*=

*ξ*−

*η*, (27) transforms to

*equivalent*to assuming that the observation error correlation decreases exponentially with the distance |

*ρ*|, with a decorrelation length ℓ equal to the ratio between observation error and gradient error standard deviations (while the observation error variance is divided by 2).

*y*are available along the line, at abscissas

_{i}*ξ*,

_{i}*i*= 1, …,

*y*, and that we add to this observation vector, observations of the discrete gradient (left difference)

*y*

^{+}= 2

*y*− 1, and the transformation is

_{1}is the identity matrix, and

*T*

_{2,ij}= (

*δ*−

_{ij}*δ*

_{i − 1,j})/(

*ξ*−

_{i}*ξ*

_{i − 1}). From this, it is easy to compute the observation error covariance matrix 𝗥 on

**y**corresponding to a diagonal observation error covariance matrix 𝗥

^{+}on

**y**

^{+}using (23). If 𝗥

^{+}is homogeneous,

*ξ*−

_{i}*ξ*

_{i}_{−1}= Δ

*ξ*∀

*i*), it follows that

*y*− 1)Δ

*ξ*, with Neuman homogeneous boundary conditions], except for a factor Δ

*ξ*/ℓ in the discretization of the Dirac function, so that (35) provides the asymptotic solution of (39) (multiplied by ℓ/Δ

*ξ*) as Δ

*ξ*→ 0 and

*y*Δ

*ξ*→ ∞.

Figure 1 shows the solution of (39) computed numerically (by inversion of a tridiagonal matrix) for *σ*_{0} = 1 and different values of ℓ/Δ*ξ*, as compared with the continuous solution (35). The solution is drawn for ℓ = 1 and decreasing Δ*ξ* (left panel), showing the convergence toward the exponential decorrelation as Δ*ξ* → 0; and for Δ*ξ* = 1 and decreasing ℓ, showing how small correlation length scales (smaller than the observation resolution Δ*ξ*) are parameterized with this approach.

*replace*the original observations by gradient observations (instead of

*adding*gradient observations, as suggested in this paper), and assume a diagonal error covariance matrix in this transformed space. To make the transformation regular, we keep the first observation of

**y**as first element of

*y*^{+}:

*y*

_{1}

^{+}=

*y*

_{1}(with error variance

*σ*

_{0}

^{2}), and then use the observation differences as next elements:

*y*

_{i}^{+}=

*y*−

_{i}*y*

_{i}_{−1},

*i*= 2, …,

*y*(with error variance

*σ*

_{1}

^{2}Δ

*ξ*

^{2}, assuming a regular distribution). The transformation matrix is thus square (

*y*

^{+}=

*y*) and regular, and can be inverted:

*y*is the sum of the first

_{i}*i*elements of

*y*^{+}, the first observations plus the sum of the

*i*− 1 first differences until

*y*is reached.) Since 𝗧 is square and regular, (23) can be inverted explicitly:

_{i}*R*=

_{i}*σ*

_{0}

^{2}+

*iσ*

_{1}

^{2}Δ

*ξ*

^{2}increases linearly with distance with respect to the reference observation. (Of course, the increase can be reduced by placing the reference observation in the middle of the line, or by using the mean of the observations, but the effect remains essentially the same.) Hence, assuming independent errors on the observation differences means that their error variances (

*σ*

_{1}

^{2}Δ

*ξ*

^{2}) add up to form the error variances on the original observations,

*y*. Even if the errors on the gradient are assumed small, such a transformation is inappropriate because it leads to large errors on the original variable.

_{i}### b. Two-dimensional problem

*n*-dimensional manifold, error correlations along the manifold can be simulated by adding gradient observations in the observation vector. The continuous problem is formally identical with several dimensions, except that

**is an**

*ξ**n*-dimensional vector of curvilinear coordinates, and

_{2}is the

*n*-dimensional gradient, so that (34) becomes

*n*-dimensional Laplacian operator and

*ρ*= ||

**−**

*ξ***|| is the Euclidean distance. Equation (43) stands for the homogeneous and isotropic problem, but it is straightforward to introduce inhomogeneity or anisotropy by a nonlinear change of the**

*η***coordinates. In two dimensions, the solution of (43) is**

*ξ**K*

_{0}is the second kind modified Bessel function of order 0. It can indeed be easily verified [using the properties of

*K*

_{0}in Abramowitz and Stegun (1970)] that (44) is the solution of the homogeneous equation in (43) [i.e.,

*δ*(

*ρ*) replaced by 0] everywhere except at the origin, and that the coefficient

*σ*

_{0}

^{2}/2

*π*is scaled so that the logarithmic singularity of (44) at

*ρ*= 0 has the right amplitude to be the solution of (43) [viewed as a Green equation; see Morse and Feshbach (1953), chapter 7].

*y*are available on the two-dimensional surface at the nodes of a grid, with coordinates

_{ij}*ξ*,

_{ij}*η*,

_{ij}*i*= 1, …,

*y*

_{1},

*j*= 1, …,

*y*

_{2}(where

*y*

_{1}and

*y*

_{2}are the number of rows and columns of the grid), and that we add to the observation vector, observations of the two components of the discrete gradient (left difference):

*y*

^{+}= 3

*y*

_{1}

*y*

_{2}− (

*y*

_{1}+

*y*

_{2}), that is, almost a factor 3 with respect to the number of original observations (

*y*=

*y*

_{1}

*y*

_{2}). The transformation is

_{1}is the identity matrix and 𝗧

_{2,1}, 𝗧

_{2,2}are the discrete gradient operators. Note that each line of the 𝗧 operator combines only one or two observations, so that the cost of application of 𝗧 remains always negligible (the computational complexity is equal to 2

*y*). The application of (23) with 𝗥

^{+}homogeneous leads to an expression of 𝗥

^{−1}similar to (39). However, the matrix is no longer tridiagonal because each element

**y**vector. It can be easily seen that this provides a consistent discretization of (43) in two dimensions (except for a factor Δ

*ξ*

^{2}/ℓ

^{2}in the discretization of the two-dimensional Dirac function), so that (44) is the asymptotic solution of the discrete problem (with a scale factor ℓ

^{2}/Δ

*ξ*

^{2}) as the grid steps tend to zero. Figure 2 presents the same information as Fig. 1 for the two-dimensional problem (with regular and isotropic grid spacings), illustrating the shape of the simulated covariance for

*σ*

_{0}= 1 and for various values of ℓ/Δ

*ξ*, and showing the convergence toward the analytical solution (44) as Δ

*ξ*→ 0.

### c. Higher order derivatives

*κ*= ||

**k**||), and the solution is

*κ*≫ 1/ℓ) and constant spectral distribution for the large scales (

*κ*≪ 1/ℓ). The corresponding isotropic covariance function

*ρ*) given by (35) and (44) (valid for ℓ ≠ 0) can then be found as the inverse Fourier transform (for the one-dimensional function) or the inverse Hankel transform (for the two-dimensional isotropic function) of (47) [see the general formulas (1.0) and (2.0) in Table 1].

*p*successive derivatives of the observations in the observation vector (with error standard deviation

*σ*,

_{i}*i*= 1, …,

*p*). Equation (47) then generalizes to

Table 1 provides explicit expressions of the covariance function *ρ*) that can be obtained from (48) for some specific values of the parameters *σ _{i}*. However, with expression (48), any shape of the observation error spectrum (provided that it is indefinitely continuously differentiable) can virtually be simulated; even negative

*σ*

_{i}^{2}are possible (for 0 <

*i*<

*p*) providing that

*σ*=

_{i}*ℓ*,

^{i}*i*= 1, …, ∞ [functions (1.7) and (2.5) in Table 1], but can be approximated by a truncated sequence. Going to a second-order derivative is nevertheless always necessary to simulate a correlation function with zero derivative at

*ρ*= 0 [as functions (1.2)–(1.6) and (2.2)–(2.4) in Table 1].

*n*= 1) and for one derivative included in the observation vector (

*p*= 1), the error power spectrum

*ξ*) that is governed by the differential equation:

*σ*

_{0}

*w*(

*ξ*) is a white noise with standard deviation

*σ*

_{0}. Equation (49) is the Langevin equation, which is used in statistical physics to describe the time evolution of particle velocities in the Brownian motion (Reif 1965) or the behavior of random fluctuations in thermodynamical systems (Landau and Lifshitz 1951, chapter 12). The corresponding correlation model given by (35) describes thus also the time correlation of these important physical processes. More generally, any observational noise that is related to a white noise with such a linear differential equation (written here in one dimension):

*p*equal to the degree of the differential equation in (50). The parameters

*σ*of the power spectrum can be easily deduced from the coefficients

_{i}*a*, by transforming (50) in the spectral domain. For instance, for

_{i}*p*= 2, an observational noise governed by

*σ*

_{1}

^{2}=

*σ*

_{0}

^{2}ℓ

^{−2}/(

*λ*

^{2}− 2) and

*σ*

_{2}

^{2}=

*σ*

_{0}

^{2}ℓ

^{−4}[correlation functions (1.2), (1.3), and (1.4) in Table 1]. Such relationships can help to determine the appropriate parameterization of the error power spectrum as soon as it is possible to find approximate linear differential equations governing the observational noise (e.g., if it is due to unresolved physical processes).

The generality of the method is more directly obvious for discrete problems since any transformation 𝗧 can be obtained by adding finite differences of successive orders. However, increasing the number *p* of derivatives added to the observation vector also increases the numerical cost, so that the most effective parameterization always results from a compromise between a fine representation of a target observation error spectrum and the numerical efficiency of the observational update. In this respect, two critical elements are always the identification of an accurate prior model for the observation error correlations and the validation of this model using the observed information.

## 4. Application to altimetry in the North Brazil Current

Ocean altimetric observations are distributed along lines (the satellite ground track), or in the future, also along two-dimensional ribbons (wide-swath altimeters). And it is known that altimetric observation errors (due to the altimetric measure itself, orbit errors, or atmospheric correction errors) are spatially correlated along the ground track (or across the swath). The purpose of this section is to demonstrate the sensitivity of the observational update to these observation error correlations, and to check if the parameterization proposed in this paper is appropriate to take these errors into account. This is done on the particular example of the North Brazil Current circulation.

### a. Description of the experiment

The North Brazil Current is a surface western boundary current flowing westward along the north Brazilian coast. It is fed from the southeast by the tropical surface current, and brings the water to the northwest into the Caribbean Sea. The current sheds large anticyclonic rings (with diameter of about 200 km), that are also transported toward the Caribbean Sea, covering the 2000 km in about 3 months [see Fratantoni et al. (1995) for more details]. The total transport of the mean current is about 21 Sv (1 Sv ≡ 10^{6} m^{3} s^{−1}; da Silveira et al. 1994), with typical surface velocities of 1 m s^{−1} for the main current and for the rings, corresponding to dynamic height differences of about 0.2 m.

A reference simulation of the circulation is computed using a primitive equation model covering the tropical Atlantic between 15°S and 20°N. It is a subregion of the Drakkar global ocean configuration at a 1/4° resolution of the Nucleus for European Modelling of the Ocean model (NEMO; Barnier et al. 2006; Penduff et al. 2007), using boundary conditions extracted from a global simulation. The model atmospheric forcing is computed from European Centre for Medium-Range Weather Forecasts 40-year reanalysis (ERA-40) atmospheric data using bulk aerodynamic formula. A 5-yr reference simulation of the tropical Atlantic model (computed by repeating 5 times the 2002 atmospheric data) is illustrated in Figs. 3 and 4. In this study, we focus on the results obtained in the region of the North Brazil Current (between 4.5° and 12.5°N, and between 6° and 46.5°W), that is shown on the figures. Figure 3 presents two snapshots of the sea surface height, together with its gradient and surface velocity for 2 and 14 December of the first year, showing the rings moving westward, and illustrating the close relation between altimetry and surface velocity. Figure 4 shows the mean circulation (sea surface height, gradient and surface velocity) averaged over the 5 yr of the simulation, together with the corresponding standard deviation. The order of magnitude of the sea surface height variability is similar to the bulk error standard deviation on satellite altimetric measurements, which is presently about 0.04 m. This variability is thus only marginally observed by such satellites, so that a fine tuning of the statistical parameters is particularly needed.

To test the observational update with different kinds of observation error parameterization, we need to define (i) the background (or forecast) state **x*** ^{f}*, and (ii) the true state

**x**

*, from which the observations*

^{t}**y**are sampled, and to which the estimation must be compared. As background state, we use the mean circulation (shown in Fig. 4, top panels); as true state, we use one of the model snapshots (illustrated in Fig. 3). And as observation, we assume that altimetry is observed over the full domain, with a 4-cm error standard deviation, at every node of the model grid. To test the sensitivity of the solution to the kind of observation error, two observation vectors are generated from the true state

**x**

*: a first one by adding uncorrelated observation noise, and a second one by adding a correlated observation noise, with a covariance matrix given by (23), with transformation (46) (for various values of ℓ=*

^{t}*σ*

_{0}/

*σ*

_{1}). The noise is scaled to have a uniform standard deviation

*σ*= 0.04 m. To randomly draw Gaussian noise vectors with known covariance 𝗥, we use the method described in the appendix of Fukumori (2002). Figure 5 (top panels) shows an example of such noise vectors, generated for three correlation lengths: ℓ = 0, 5, and 15 grid points. (In this section, ℓ = 0 stands for uncorrelated noise.) The figure also shows the corresponding error on the right difference between adjacent grid points, illustrating how the observational error on the discrete gradient decreases with ℓ.

It is interesting to make the link between this simulated observational noise and the characteristics of observation error in real altimetry data. As explained at the beginning of section 2, observation error is always the sum of a measurement error and a representation error. On the one hand, the altimetric measurement is affected by several kinds of error (altimetric measure, orbit error, atmospheric correction error) with a bulk standard deviation of about 3–5 cm, and horizontal correlation patterns that can depend on the satellite orbit and on the state of the atmosphere. On the other hand, altimetric data are actually spatial averages over about 5–10 km along track, which is about a factor of 3 smaller than the resolution of our model. The resulting representation error, which corresponds to this limited range of spatial scales in the continuous ocean system, is thus here likely to remain small with respect to measurement errors. Consequently, the properties of our randomly simulated observational noise (4-cm standard deviation, with various correlation length scales) is chosen quite adequately to be in the range of what can be expected for real altimetric data in this region and for that kind of ocean model.

In addition, in order to increase the robustness of the test, each experiment is repeated by using, as true state, every snapshot of the sequence (one every 6 days for 5 yr), and the results are averaged over that ensemble of experiments. There is thus an ensemble of true states **x*** _{i}^{t}*,

*i*= 1, …,

*N*(with

*N*= 300) and the corresponding ensemble of observations

**y**

*sampled from them. (Observational errors are drawn independently for every member of the sample.) Hence, as soon as the ensemble of true states can be viewed as representative of all possible states of the system, our indicator gives the average error that is committed using the observation error parameterization that is being tested (starting from the mean as background state).*

_{i}To parameterize the background (or forecast) error covariance matrix 𝗣* _{i}^{f}*, we use the covariance of all snapshots of the model simulation (sampled every month over the 5 yr of the simulation), except those that are less than 1 month away from the true state (larger than the typical decorrelation time scale), in order to avoid any influence of the true state on the input error covariance matrix (𝗣

*, is thus recomputed for every member*

_{i}^{f}*i*= 1, …,

*N*). With an ensemble of about 60 independent realizations (one per month), it is only for correlations >0.26 that the 95% confidence interval for correlations (assuming normal pdfs) does not include zero. Correlations of <0.26 are thus not significant. Thus, if the size of the region is much larger than the spatial decorrelation scale, the integrated influence of distant observations with nonsignificant correlations can be as large as that of close observations with significant correlations. To avoid the spurious effect of these inaccurate long-range correlations (resulting from the use of a small size ensemble), we perform a separate local observational update [as in Brankart et al. (2003) or Testut et al. (2003)] for each water column, with an additional weight on the observations decreasing with the distance

*r*as exp(−

*r*

^{2}/

*d*

^{2}), with

*d*= 200 km (the typical distance at which the correlation ceases to be significant). Figure 6 illustrates the resulting local structure of the background covariance that is used to perform the observational update. The figure shows the observational update increment that would result from one single observation (with 0.04-m error standard deviation), located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). The long-range (nonsignificant) influence is effectively set to zero, without affecting much the local covariance structure described by the ensemble.

### b. Uncorrelated errors

*σ*

^{2}𝗜). The parameterization is thus fully consistent with the simulated errors. Figure 7 (top panels) shows a map of the ensemble standard deviation of errors (difference with respect to the true state) after the observational update corresponding to experiment 1. It is shown for altimetry (

*ϵ*), for the altimetric gradient, and for velocity (

_{ζ}*ϵ*). It is computed as (the formula for the gradient is similar to the formula used for velocity)

_{υ}*ζ*,

_{i}^{t}*u*,

_{i}^{t}*υ*is the

_{i}^{t}*i*th true state (corresponding to the

*i*th snapshot of the model sequence),

*ζ*,

^{f}*u*,

^{f}*υ*is the forecast state (corresponding to the mean of the model sequence), and

^{f}*δζ*,

_{i}*δu*,

_{i}*δυ*is the

_{i}*i*th observational update. This result can be directly compared with Fig. 4 (bottom panels), which represents the same quantity before the observational update. Indeed, since the background state is the mean state and since the ensemble of true state is the ensemble of all snapshots of the sequence, the standard deviation of the sequence is equal to the ensemble standard deviation of errors before the observational update; that is, (52) with

*δζ*= 0,

_{i}*δu*= 0, and

_{i}*δυ*= 0. The comparison shows that the error on altimetry is significantly reduced by the observational update, becoming much smaller than both the background error standard deviation (Fig. 4, bottom panels) and the observational error standard deviation. This is because background errors are correlated over an area of about

_{i}*L*×

*L*, with

*L*∼ 125 km (see Fig. 6), including about

*L*

^{2}/Δ

*ξ*

^{2}∼ 25 observations with uncorrelated errors. The resulting errors are thus about 1/

*ϵ*

_{ζ}^{2}∼ 1/

*σ*

_{f}^{2}+ 25/

*σ*

^{2}. Observations are dense and very accurate and therefore the background has only little influence and the resulting error is

*ϵ*∼

_{ζ}*σ*/5 = 0.008 m, a rough estimation that is quite consistent with the results is observed in Fig. 7. Filtering off a white noise is easy if the background error correlation scales (about

*L*) are large with respect to typical data spacing (Δ

*ξ*).

However, the error reduction factor (with respect to background error) is less favorable for the gradient of altimetry. This is because computing altimetric difference Δ*ζ* between adjacent model cells amplifies the relative errors. Relative errors on velocities are again slightly worse because the relation between surface velocity and altimetry is not perfectly geostrophic (and thus not perfectly linear).

On the other hand, the observational update of the error covariance is illustrated in Fig. 7 (bottom panels), showing the estimated error standard deviation (the square root of the diagonal of 𝗣* ^{a}*). This estimation is quite consistent in amplitude and structure with the ensemble standard deviation of the error (measured by difference with respect to the true state), also shown in Fig. 7 (top panels). The good quality of the error estimate (for all variables) is the consequence of the consistent parameterization of the observation error covariance matrix; it also indicates that the background error covariance matrix (𝗣

*) is quite accurately parameterized.*

^{f}### c. Correlated errors, with diagonal 𝗥 parameterization

In experiment 2, we use the observation vector that is perturbed by the correlated noise (with ℓ = 5 grid points), but keep the same diagonal parameterization of the observation error covariance matrix (𝗥 = *σ*^{2}𝗹). The parameterization is thus inconsistent with the simulated errors, which are assumed uncorrelated, even though they are not. Figure 8 shows the corresponding error maps to be compared with Fig. 7. The comparison shows that the error on altimetry is significantly larger than in experiment 1. This is because the number of independent observations in the *L* × *L* area is reduced to about (*L*/ℓ)^{2} ∼ 1. Here, the background error keeps an influence, and the typical error is about ϵ* _{ζ}* ∼ 0.03 m, a rough estimation that is again quite consistent with the results observed in Fig. 8.

However, larger errors on altimetry do not necessarily mean larger errors on the altimetry gradient or larger errors on velocity. In experiment 2, the error increase on the gradient with respect to experiment 1 is actually smaller than the error increase on altimetry. This is because the gradient is better observed through correlated observations than through uncorrelated observations (see Fig. 5). Sticking the solution to correlated observations (even with inappropriate diagonal error parameterization, as in experiment 2) thus partly compensates the easiness of filtering off a white noise from a large-scale (*L* = 125 km) signal (with optimal parameterization, as in experiment 1). This better observation of the gradient is the reason why keeping all available observations in the observation vector (even if the errors are very correlated) is always a better solution than subsampling the observations.

On the other hand, the observational update of the error covariance is illustrated in Fig. 8 (bottom panels), showing the estimated error standard deviation (the square root of the diagonal of 𝗣* ^{a}*). This estimation is identical to that of experiment 1 (Fig. 7, bottom panels), since all statistical parameters (𝗣

*,𝗥) are kept identical. This largely underestimates the standard deviation of the true error (measured using the ensemble of differences with respect to the true state), that is shown in the top panels of Fig. 8. The estimation is about a factor of 3 below reality. This situation is the consequence of the inconsistent parameterization of the observation error covariance matrix. The diagonal 𝗥 parameterization lets the scheme believe that the data are more accurate than they are, so that it underestimates the error that is effectively in the system.*

^{f}### d. Correlated errors, with consistent 𝗥 parameterization

In experiment 3, we use the same observation vector as in experiment 2 (perturbed by the correlated noise), but add gradient observations to simulate correlations in the observation error covariance matrix, with the diagonal covariance matrix (38). By choosing *σ*_{0} = 0.275 m and *σ*_{1} = *σ*_{0}/ℓ (for ℓ = 5 grid points, see Table 3), this observation error parameterization is thus perfectly consistent with the simulated errors. According to the theory presented in section 3, these observations with correlated errors are thus equivalent to much less accurate observations (*σ*_{0} = 0.275 m), *together with* accurate observations of the gradient (*σ*_{1} = 0.055 m per grid point). This is consistent with the idea suggested above that increasing ℓ reduces the number of independent observations. Figure 9 shows the corresponding error maps to be compared to Figs. 8 and 7. The comparison shows that the errors on altimetry are still larger than in experiment 1 (Fig. 7), because the quantity of information in the observation vector is still the same as in experiment 2, but the errors are smaller than in experiment 2 (Fig. 8), because the observation error parameterization is now consistent, so that the observational update is closer to optimality. (𝗣* ^{f}* is still an approximation: it cannot be considered that the background error is drawn randomly from a pdf of covariance 𝗣

*.)*

^{f}However, the error reduction (with respect to experiment 2) on altimetry gradient and velocity is significantly larger because the better confidence that we must give to the gradient is now explicitly taken into account in the observational update, through the nondiagonal parameterization of the observation error covariance matrix 𝗥. Better, this parameterization is effectively (and equivalently) applied in practice by the addition of gradient observations in the observation vector. (This addition of gradient observations can only bring the estimated gradient closer to the observed gradient, which is more accurate if the observation errors are spatially correlated.) The improvement of the gradient resulting from the parameterization of error correlations (if they exist) is thus clearly demonstrated by this experiment.

On the other hand, the observational update of the error covariance is illustrated in Fig. 9 (bottom panels), showing the estimated error standard deviation (the square root of the diagonal of 𝗣* ^{a}*). As in experiment 1, it is consistent with the standard deviation of the true error (measured using the ensemble of differences with respect to the true state), which is shown in the top panels of Fig. 9. Again, this is due to the consistent parameterization of the observation error covariance matrix, which has been restored by the addition of gradient observations (with adequate values for

*σ*

_{0}and

*σ*

_{1}).

### e. Sensitivity to the correlation scale

In this last section, we examine how the results presented above depend on the observation error correlation length. For that purpose, the same experiment is repeated for various simulated observation noises, with a correlation length *ℓ _{o}* ranging from 0 to 10 grid points. And, for each of these simulated noises, several parameterizations of the observation error covariance matrix are tested, using a correlation length

*ℓ*also ranging from 0 to 10 grid points. Figure 10 shows the resulting error standard deviation for sea surface height and velocity (as measured by the ensemble of differences with respect to the true states), averaged over the domain of interest, as a function of

_{p}*ℓ*and

_{o}*ℓ*. The figure also shows the ratio between the averaged estimated error and the averaged measured error. It is only if

_{p}*ℓ*=

_{o}*ℓ*that the parameterization is consistent with the simulated errors: it is thus along that line that the measured error should be minimum (for a given

_{p}*ℓ*) and that the ratio between estimated and measured errors should be equal to 1.

_{o}The results show that underestimating the observation error correlation length scale (*ℓ _{p}* ≪

*ℓ*) leads to a moderate increase of the error on sea surface height, but to a very significant increase of the error on velocity. The estimation of the error standard deviation is also well below reality. This situation indeed corresponds to giving too much importance to the observations and to imposing too weak a constraint on the gradient. On the contrary, overestimating the observation error correlation length scale (

_{o}*ℓ*≫

_{p}*ℓ*) leads to a moderate increase of the error on velocity but to a significant increase of the error on sea surface height. The estimation of the error standard deviation is also well above reality for sea surface height, whereas no sensitivity can be observed for velocity. A correct tuning of

_{o}*ℓ*is thus required to accurately estimate both variables, with consistent error estimates.

_{p}However it must be noted that in these experiments, the optimal parameterization is not on the line *ℓ _{o}* =

*ℓ*as it should be, but noticeably below that value, especially for the large values of

_{p}*ℓ*. The benefit obtained by giving to the observations more credit than they deserve (by using

_{o}*ℓ*<

_{p}*ℓ*) can only be explained by inaccuracies in the parameterization of the 𝗣

_{o}*matrix, which is here approximated by a limited size ensemble. Overestimating the confidence in the observations is thus somewhat useful here to compensate suboptimalities in the statistical parameterization of the scheme.*

^{f}## 5. Conclusions

Classical algorithms to compute the observational update in Kalman filters are penalized by a computational complexity proportional to the cube of the number of observations. In square root or ensemble Kalman filters, this algorithm can be modified [as proposed by Pham et al. (1998)] to become linear in the number of observations if the observation error covariance matrix is diagonal. In this paper, it has been demonstrated that these benefits can be preserved with two nondiagonal parameterizations of the observation error covariance matrix 𝗥. The first method, parameterizing 𝗥 as the sum of a diagonal and a low rank matrix, is especially efficient if the typical distance between observations is small with respect to the correlation scales. The second method, simulating correlations by application of a linear transformation of the observation vector (with diagonal 𝗥 in the transformed space), is more generic. It is shown to be especially efficient to describe simple correlation structures if gradient observations can be added to the observation vector. This is possible, for instance, if the observations are distributed along lines or at the nodes of two-dimensional grids so that discrete gradients can be computed by subtracting successive observations. This has been shown to be equivalent to assuming a specific form of the observation error covariance matrix, with a correlation function or power spectrum that have been computed analytically in the asymptotic limit of dense (continuous) observations. The correlation scale is then the ratio of the observation error standard deviations that are assumed for the original observations and for the gradient observations.

Test experiments have been performed with the aim of reconstructing the circulation of the North Brazil Current, as simulated by a 1/4° model of the tropical Atlantic Ocean, using synthetic altimetric observations. Various observation datasets were generated by perturbation with uncorrelated and correlated noise, and for several correlation scales. For each dataset, diagonal and nondiagonal parameterizations of the observation error covariance matrix have been used to perform the observational update of altimetry together with surface velocities. The results show first that the more the observations are correlated, the less information they contain about altimetry. This is also true for velocity (but to a lesser degree) although the gradient of altimetry is better observed through correlated observations. Second, assuming a diagonal observation error covariance matrix in presence of a correlated noise leads to a nonoptimal solution that mainly penalizes the reconstruction of surface velocities, and underestimates the error variance (a factor of 3 lower than reality in our experiments). Third, optimal parameterizations of the observation error covariance matrix usually produce solutions that are close to minimizing the resulting error, although an artificial increase of the confidence to the observations (e.g., using a smaller correlation length) can lead to smaller errors (by compensating misspecifications of the forecast error covariance). Fourth, the experiments also suggest that our efficient parameterization of the observation error covariance matrix by adding gradient observations is appropriate to parameterize adequately observation error correlations. Adding explicit gradient observations can even be useful to compensate deficiencies in the forecast error statistics, by ensuring a direct control of velocities through gradient data. It must be stressed, however, that these conclusions may be sensitive to the region of interest: a fine-tuning of the observation error correlations may be less critical in regions where the noise-to-signal ratio is much smaller (like in the Gulf Stream region), since high relative accuracy is always obtained.

In ocean data assimilation applications, altimetry is always a key element of the observation system. However, the growing number of available observations (not only altimetry) often leads to a prohibitive numerical cost and to the temptation of simplifying the problem by aggregating (or even dropping) observations or by simplistic assumptions about the statistics (such as uncorrelated observation errors). These simplifications are always done at the expense of an optimal use of the observations, and singularly of altimetry, which is sensitive to that kind of approximation. The scheme proposed in this paper is a response to that problem: analyzing more observations at lesser cost becomes possible with realistic and robust parameterization of the observation error correlations. Being closer to statistical optimality, the scheme can thus make a better use of the observational information (especially about velocity), and be of direct benefit to ocean data assimilation systems.

Incidentally, our results also suggest a possible way of improving data thinning strategies. For instance, if the density of altimetric observations along the ground track is reduced, a critical information about the gradient (and thus about velocity) is likely to be lost, especially if the observation errors are correlated. A better strategy is certainly to transform the original observation vector by adding gradient observations, and parameterize a diagonal observation error covariance matrix in the transformed space as explained in this paper. The data thinning can then be performed on the transformed observation vector (aggregating observations and rescaling error variances) as if the data were independent. In that way, it becomes possible to give a reasonable importance to the gradient information in the reduced observation vector.

## Acknowledgments

This work was conducted as part of the MERSEA project funded by the European Union (Contract AIP3-CT-2003-502885), with additional support from CNES. We also thank the anonymous reviewers for their useful comments and suggestions. The calculations were performed with the support of IDRIS/CNRS.

## REFERENCES

Abramowitz, M., and I. A. Stegun, 1970:

*Handbook of Mathematical Functions*. 9th ed. Dover Publications, 1046 pp.Anderson, J. L., 2003: A local least squares framework for ensemble filtering.

,*Mon. Wea. Rev.***131****,**634–642.Barnier, B., and Coauthors, 2006: Impact of partial steps and momentum advection schemes in a global ocean circulation model at eddy permitting resolution.

,*Ocean Dyn.***56****,**543–567.Bateman, H., and A. Erdelyi, 1954:

*Tables of Integral Transforms*. Vols. 1 and 2. McGraw-Hill Book Company, 835 pp.Brankart, J-M., and P. Brasseur, 1996: Optimal analysis of in situ data in the western Mediterranean using statistics and cross-validation.

,*J. Atmos. Oceanic Technol.***13****,**477–491.Brankart, J-M., C-E. Testut, P. Brasseur, and J. Verron, 2003: Implementation of a multivariate data assimilation scheme for isopycnic coordinate ocean models: Application to a 1993–96 hindcast of the North Atlantic Ocean circulation.

,*J. Geophys. Res.***108****,**(C3). 3074. doi:10.1029/2001JC001198.Cohn, S. E., 1997: An introduction to estimation theory.

,*J. Meteor. Soc. Japan***75****,**257–288.Da Silveira, I., L. Miranda, and W. Brown, 1994: On the origins of the North Brazil Current.

,*J. Geophys. Res.***99****,**(C11). 22501–22512.Evensen, G., and P. J. van Leeuwen, 1996: Assimilation of Geosat altimeter data for the Agulhas current using the ensemble Kalman filter with a quasigeostrophic model.

,*Mon. Wea. Rev.***124****,**85–96.Fratantoni, D., W. Johns, and T. Townsend, 1995: Rings of the North Brazil Current: Their structure and behavior inferred from observations and a numerical simulation.

,*J. Geophys. Res.***100****,**(C6). 10633–10654.Fukumori, I., 2002: A partitioned Kalman filter and smoother.

,*Mon. Wea. Rev.***130****,**1370–1383.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Kalnay, E., 2003:

*Atmospheric Modeling, Data Assimilation and Predictability*. Cambridge University Press, 341 pp.Kimeldorf, G., and G. Wahba, 1970: A correspondence between Bayesian estimation of stochastic processes and smoothing by splines.

,*Annu. Math. Stat.***41****,**495–502.Landau, L., and E. Lifshitz, 1951:

*Statistical Physics: Course of Theoretical Physics*. Vol. 5. Butterworth-Heinmann, 592 pp.Liu, Z., and F. Rabier, 2002: The interaction between model resolution and observation resolution and density in data assimilation.

,*Quart. J. Roy. Meteor. Soc.***128****,**1367–1386.Liu, Z., and F. Rabier, 2003: The potential of high density observations for numerical weather prediction: A study with simulated observations.

,*Quart. J. Roy. Meteor. Soc.***129****,**3013–3035.McIntosh, P. C., 1990: Oceanographic data interpolation: Objective analysis and splines.

,*J. Geophys. Res.***95****,**(C8). 13529–13541.Morse, P. M., and H. Feshbach, 1953:

*Methods of Theoretical Physics*. Part I and II. Feshbach, 1978 pp.Ott, E., H. B. R. I. Szunyogh, A. V. Zimin, E. J. Kostelich, M. Corazza, E. Kalnay, D. J. Patil, and J. A. Yorke, 2004: A local ensemble Kalman filter for atmospheric data assimilation.

,*Tellus***56A****,**415–428.Penduff, T., J. Le Sommer, B. Barnier, A-M. Treguier, J-M. Molines, and G. Madec, 2007: Influence of numerical schemes on current-topography interactions in 1/4° global ocean simulations.

,*Ocean Sci.***3****,**509–524.Pham, D. T., J. Verron, and M. C. Roubaud, 1998: Singular evolutive extended Kalman filter with EOF initialization for data assimilation in oceanography.

,*J. Mar. Syst.***16****,**323–340.Rabier, F., 2006: Importance of data: A meteorological perspective.

*Ocean Weather Forecasting: An Integrated View of Oceanography,*E. P. Chassignet and J. Verron, Eds., Springer, 343–360.Reif, F., 1965:

*Fundamentals of Statistical and Thermal Physics*. McGraw Hill, 651 pp.Testut, C., P. Brasseur, J. Brankart, and J. Verron, 2003: Assimilation of sea-surface temperature and altimetric observations during 1992–1993 into an eddy permitting primitive equation model of the North Atlantic Ocean.

,*J. Mar. Syst.***40–41****,**291–316.Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters.

,*Mon. Wea. Rev.***131****,**1485–1490.

Observation error covariance as a function of the distance *ρ* (along the grid lines), as obtained numerically for regular and isotropic grid spacings (for *σ*_{0} = 1 and different values of ℓ/Δ*ξ*). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δ*ξ* = 1, 0.5, and 0.2 and (right) for Δ*ξ* = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δ*ξ*. In the left panel, the discrete solutions are multiplied by the factor ℓ^{2}/Δ*ξ*^{2}, to show the convergence to the continuous solution given by (44) (solid curve).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observation error covariance as a function of the distance *ρ* (along the grid lines), as obtained numerically for regular and isotropic grid spacings (for *σ*_{0} = 1 and different values of ℓ/Δ*ξ*). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δ*ξ* = 1, 0.5, and 0.2 and (right) for Δ*ξ* = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δ*ξ*. In the left panel, the discrete solutions are multiplied by the factor ℓ^{2}/Δ*ξ*^{2}, to show the convergence to the continuous solution given by (44) (solid curve).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observation error covariance as a function of the distance *ρ* (along the grid lines), as obtained numerically for regular and isotropic grid spacings (for *σ*_{0} = 1 and different values of ℓ/Δ*ξ*). The solution is drawn (dotted curves) (left) for ℓ = 1 and decreasing Δ*ξ* = 1, 0.5, and 0.2 and (right) for Δ*ξ* = 1 and decreasing ℓ = 2, 1, 0.5, and 0.1. Larger bullets correspond to smaller ℓ/Δ*ξ*. In the left panel, the discrete solutions are multiplied by the factor ℓ^{2}/Δ*ξ*^{2}, to show the convergence to the continuous solution given by (44) (solid curve).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Snapshots of the circulation in the region of the North Brazil Current, as simulated by the model for (top) 2 and (bottom) 14 Dec of the first year. (left) The sea surface height (m), (middle) the magnitude of its gradient (meters per grid point), and (right) sea surface velocity (m s^{−1}).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Snapshots of the circulation in the region of the North Brazil Current, as simulated by the model for (top) 2 and (bottom) 14 Dec of the first year. (left) The sea surface height (m), (middle) the magnitude of its gradient (meters per grid point), and (right) sea surface velocity (m s^{−1}).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Snapshots of the circulation in the region of the North Brazil Current, as simulated by the model for (top) 2 and (bottom) 14 Dec of the first year. (left) The sea surface height (m), (middle) the magnitude of its gradient (meters per grid point), and (right) sea surface velocity (m s^{−1}).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 3, but for the (top) means and (bottom) standard deviations of the 5-yr simulation.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 3, but for the (top) means and (bottom) standard deviations of the 5-yr simulation.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 3, but for the (top) means and (bottom) standard deviations of the 5-yr simulation.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Simulated observational noise on sea surface elevation for three correlation lengths: (from left to right) ℓ = 0, 5, and 15 grid points. (top) The random noise (with variance equal to 1) and (bottom) the corresponding gradient, using the grid spacing as the length unit.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Simulated observational noise on sea surface elevation for three correlation lengths: (from left to right) ℓ = 0, 5, and 15 grid points. (top) The random noise (with variance equal to 1) and (bottom) the corresponding gradient, using the grid spacing as the length unit.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Simulated observational noise on sea surface elevation for three correlation lengths: (from left to right) ℓ = 0, 5, and 15 grid points. (top) The random noise (with variance equal to 1) and (bottom) the corresponding gradient, using the grid spacing as the length unit.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observational update increment on (left) sea surface height (m), (middle) zonal velocity (m s^{−1}), and (right) meridional velocity (m s^{−1}), that would result from one single observation (with 0.04-m error standard deviation) of altimetry located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). This illustrates the size of the domain of influence of the observations.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observational update increment on (left) sea surface height (m), (middle) zonal velocity (m s^{−1}), and (right) meridional velocity (m s^{−1}), that would result from one single observation (with 0.04-m error standard deviation) of altimetry located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). This illustrates the size of the domain of influence of the observations.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observational update increment on (left) sea surface height (m), (middle) zonal velocity (m s^{−1}), and (right) meridional velocity (m s^{−1}), that would result from one single observation (with 0.04-m error standard deviation) of altimetry located at 9°N, 54°W (in the middle of the area traversed by the mesoscale rings). This illustrates the size of the domain of influence of the observations.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Error standard deviation corresponding to experiment 1, (top) as measured by the ensemble of differences with respect to the true states, and (bottom) as estimated by the scheme (the square root of the diagonal of 𝗣* ^{a}*). It is shown (left) for altimetry (m), (middle) for its gradient (m per grid point), and (right) for velocity (m s

^{−1}).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Error standard deviation corresponding to experiment 1, (top) as measured by the ensemble of differences with respect to the true states, and (bottom) as estimated by the scheme (the square root of the diagonal of 𝗣* ^{a}*). It is shown (left) for altimetry (m), (middle) for its gradient (m per grid point), and (right) for velocity (m s

^{−1}).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Error standard deviation corresponding to experiment 1, (top) as measured by the ensemble of differences with respect to the true states, and (bottom) as estimated by the scheme (the square root of the diagonal of 𝗣* ^{a}*). It is shown (left) for altimetry (m), (middle) for its gradient (m per grid point), and (right) for velocity (m s

^{−1}).

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 2.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 2.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 2.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 3.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 3.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

As in Fig. 7, but for experiment 3.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

This figure generalizes the results of Figs. 7, 8, and 9 (here averaged over the domain), by showing them as a function of the observation error correlation length scales (in grid points) *ℓ _{o}* (

*x*axis), characterizing the simulated noise, and

*ℓ*(

_{p}*y*axis), which is used to parameterize the observation error covariance matrix 𝗥. Shown are results for (left two panels) sea surface height and (right two panels) velocity. Within each variable pair, the left panel shows the true error standard deviation (as measured by the ensemble of differences with respect to the true states), and the right panel shows the ratio between estimated and measured error standard deviations.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

This figure generalizes the results of Figs. 7, 8, and 9 (here averaged over the domain), by showing them as a function of the observation error correlation length scales (in grid points) *ℓ _{o}* (

*x*axis), characterizing the simulated noise, and

*ℓ*(

_{p}*y*axis), which is used to parameterize the observation error covariance matrix 𝗥. Shown are results for (left two panels) sea surface height and (right two panels) velocity. Within each variable pair, the left panel shows the true error standard deviation (as measured by the ensemble of differences with respect to the true states), and the right panel shows the ratio between estimated and measured error standard deviations.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

This figure generalizes the results of Figs. 7, 8, and 9 (here averaged over the domain), by showing them as a function of the observation error correlation length scales (in grid points) *ℓ _{o}* (

*x*axis), characterizing the simulated noise, and

*ℓ*(

_{p}*y*axis), which is used to parameterize the observation error covariance matrix 𝗥. Shown are results for (left two panels) sea surface height and (right two panels) velocity. Within each variable pair, the left panel shows the true error standard deviation (as measured by the ensemble of differences with respect to the true states), and the right panel shows the ratio between estimated and measured error standard deviations.

Citation: Monthly Weather Review 137, 6; 10.1175/2008MWR2693.1

Observation error power spectra and associated covariance functions. All spectra have the form of (48), so that they can be directly simulated by adding successive derivatives of the observations in the observation vector. The corresponding covariance functions have been derived from the tables of integral transforms compiled by Bateman and Erdelyi (1954); *J _{p}* is the first kind Bessel function of order

*p*,

*K*is the second kind modified Bessel function of order

_{p}*p*, and kei

_{0}is a Kelvin function (see Abramowitz and Stegun 1970). In functions (1.4) and (1.5), the parameters

*θ*and

*α*are such that −

*π*/2 <

*θ*<

*π*/2 and 0 <

*α*< 1. Some particular cases are included separately: (1.1) is (1.6) with

*p*= 0; (1.2) is (1.4) with

*θ*=

*π*/4; (1.3) is (1.4) with

*θ*= 0; (1.3) is (1.6) with

*p*= 1; (2.1) is (2.4) with

*p*= 0; (2.3) is (2.4) with

*p*= 1.

The three experiments described in this paper only differ either by the (simulated) observation error or by the parameterization of the observation error. The observation error standard deviation is always set to 0.04 m, with consistent parameterization. The difference is only in the correlation: in experiment 1, the observation error is simulated by a white noise; in experiments 2 and 3, it is a correlated noise with spatial correlation ℓ = 5 grid points. In experiments 1 and 2, the observation errors are parameterized using a diagonal 𝗥 matrix (i.e., assuming uncorrelated errors); in experiment 3, gradient observations are added to simulate correlations. Hence, only experiments 1 and 3 receive a parameterization that is consistent with the simulated errors, (*σ*_{1} = ∞ means that no gradient observations are used)

Values of observation error standard deviation (*σ*_{0}, m) and gradient error standard deviation (*σ*_{1}, m per grid point) to use for parameterizing observation errors with standard deviation *σ* = 0.04 m and correlation length ℓ (in observation grid points). The correspondence is established using (23), with transformation (46).

^{}

* Current affiliation: MERCATOR-Ocean, Toulouse, France.