## 1. Introduction

The Kalman filter provides the optimal (minimum variance) solution of the linear-Gaussian sequential data assimilation problem (Kalman 1960). Since most dynamical and/or observational systems encountered in practice are nonlinear, the system equations are often linearized about the most recent estimate, leading to the popular, but no longer optimal, extended Kalman (EK) filter. Several studies have demonstrated, however, that the linearization of the system may produce instabilities, even divergence, when applied to strongly nonlinear systems (Gauthier et al. 1993; Evensen 1992). For the latter case, an optimal solution can be obtained from the optimal nonlinear filter, which involves the estimation of the conditional probability density function (PDF), not necessarily Gaussian, of the system state given all available measurements up to the estimation time (Doucet et al. 2001). Knowledge of the state’s PDF allows the determination of estimates of the system state, such as the minimum-variance estimate or the maximum a posteriori estimate, following the Bayesian estimation theory (Todling 1999). Similar to the Kalman filter, the nonlinear filter operates as a succession of a *correction* (*or analysis*) *step* at measurement times to correct the predictive density using Bayes’ rule, and a *prediction step* to propagate the analysis density to the time of the next available observation.

The particle filter is a discrete approximation of the optimal nonlinear filter and is based on point mass representation (mixture of Dirac distributions), called particles, of the state’s PDF (Doucet et al. 2001). In this filter, the particles evolve in time with the numerical model and their assigned weights are updated each time new measurements are available. The filter solution is then the weighted average of the particle ensemble. In practice, this filter suffers from a major problem known as the degeneracy phenomenon; after only a few iterations, weights become concentrated on very few particles and hence only a tiny fraction of the ensemble contributes to the average, very often causing the divergence of the filter. The use of more particles alleviates this problem over short time periods only, so the most efficient way to get around it is resampling (Doucet et al. 2001). This technique consists of drawing new particles according to the distribution of the ensemble and then reassigning them the same weights. However, resampling often introduces Monte Carlo fluctuations, which degrade the filter’s performance. Additionally, even with resampling, a large number of particles are required for an effective behavior of the filter. This makes brute-force implementation of the particle filter problematic with computationally expensive atmospheric and oceanic models. Interesting discussions on the use of the optimal nonlinear filter for high-dimensional oceanic and atmospheric data assimilation problems can be found in Anderson and Anderson (1999), Kivman (2003), and Van Leeuwen (2003).

The popular ensemble Kalman (EnK) filter, which has been introduced by Evensen (1994), is also a particle-based filtering technique. However, while it has the same prediction step as the particle filter, it does not have the same correction step. The EnK filter retains the “linearity aspect” of the Kalman filter in the analysis step, in that it applies the Kalman correction using a forecast error covariance computed as the sample covariance of the particle ensemble. Its correction step therefore uses only the first two moments of the particle ensemble, and is thus suboptimal for non-Gaussian systems. In practical situations however, the EnK filter was found to be more robust than the particle filter when small-size ensembles were considered because the Kalman update of its particles significantly reduces the risk of ensemble degeneracy by pulling the particles toward the true state of the system (Kivman 2003; Van Leeuwen 2003).

In this paper, we introduce a new approximate solution of the optimal nonlinear filter suitable for applications in oceanography and meteorology. The filter makes use of a mixture of Gaussian distributions in a kernel representation to approximate the state PDFs in the optimal nonlinear filter. A Gaussian mixture has been already used by Anderson and Anderson (1999) and Chen and Liu (2000) in the context of the nonlinear filter. It is expected to provide more reliable representation of the state PDFs than a simple mixture of Dirac functions used in the particle filter. Here, we further assume that the covariance matrix of the Gaussian mixtures has low rank to avoid manipulating huge matrices associated with the large dimension of the oceanic and atmospheric systems. This is a very common approach in the atmospheric and oceanic Kalman filtering community (e.g., Fukumori and Malanotte-Rizzoli 1995; Cane et al. 1996; Cohn and Todling 1996; Pham et al. 1997; Heemink et al. 2001; Lermusiaux and Robinson 1999; Pham 2001; Farrell and Ioannou 2001a; Hoteit et al. 2002, 2003, 2005), and implicitly assumes that state estimation errors can be accurately modeled in a severely reduced dimensional subspace (Lermusiaux and Robinson 1999; Hamill et al. 2002). It enforces a smooth analysis since the filter correction is only applied using the leading modes of the analysis covariance matrix (Cane et al. 1996; Lermusiaux and Robinson 1999). Smoothness in the analysis is produced by assuming relatively large spatial scales for the uncertainty in the starting conditions for each forecast step, which leads to a concentration of variance in only a few modes of the covariance matrix (Cane et al. 1996; Cohn and Todling 1996; Lermusiaux and Robinson 1999). In addition, the dissipative and driven nature of the geophysical fluid systems also concentrates the energy at large scales, meaning a red spectrum of variability (Daley 1991), or, for others, suggests the existence of a low-dimensional attractor (Pham et al. 1997; Lermusiaux and Robinson 1999). In practice, a red spectrum is often indistinguishable from a low-dimensional attractor, as both can be efficiently described by a limited number of functions or modes (West and Mackey 1991; Osborne and Pastorello 1993). For simplicity, we refer to this as a system with a limited number of “effective” degrees of freedom (EDOF). Additionally, as the covariance matrixes of the Gaussian mixtures are kept small during the filter operations, a local linearization about the center of the mixture components (the particles) is applied. This leads to a Kalman-type correction for each particle complementing the usual particle-type correction. The resulting filter, called low-rank kernel particle Kalman (LRKPK) filter, basically runs an ensemble of low-rank Kalman filters and then provides the optimal (minimum variance) analysis state as the weighted mean of all the subfilters’ analyses. As in the EnK filter, the Kalman-type correction attenuates the degeneracy problem, which allows the filter to efficiently operate with relatively small-size ensembles (roughly of the same order as the EnK filter). A similar approach was implemented by Houtekamer and Mitchell (1998) who used a pair of ensemble Kalman filters to deal with the problem of inbreeding, generally associated with the use of small-size ensembles. The LRKPK filter is first tested with the simple but chaotic and highly nonlinear Lorenz model (Lorenz 1963). This model is a simplified form of the complicated system describing the dynamics of fluid motion and the heat flow in term of three ordinary partial equations. Assimilation results from a realistic application with a general circulation Princeton Ocean Model (POM) of the Mediterranean Sea are then reported and discussed.

The paper is organized as follows. The characteristics of the optimal nonlinear filter are briefly recalled in section 2. The LRKPK filter is then introduced in section 3 and its algorithm is summarized in section 4. The design and preliminary assimilation results of numerical experiments are then presented in sections 5 and 6. A general discussion concludes the paper in section 7.

## 2. The optimal nonlinear filter

**x**

*is the state vector (to be estimated) of dimension*

_{k}*n*;

**y**

*is the observation vector of dimension*

_{k}*p*; 𝗠

*and 𝗛*

_{k}*are two continuously differentiable maps from ℝ*

_{k}*to ℝ*

^{n}*and from ℝ*

^{n}*to ℝ*

^{n}*, respectively representing the transition and the observational operators; and*

^{p}*and*

**η**_{k}**ε**

*denote the dynamical and the observational noise. We assume that the random vectors*

_{k}*and*

**η**_{k}**ε**

*are independent and independent of*

_{k}**x**

_{k−1}, and are Gaussian with mean zero and covariance matrices 𝗤

*and 𝗥*

_{k}*, respectively. Such system is somewhat less general than the one considered in particle filtering but is still quite general and covers most practical situations.*

_{k}Starting from a random initial condition **x**_{0} with a known probability density function, the optimal nonlinear filter provides the conditional density function of the system state **x*** _{k}* at each time t

*given all available measurements up to t*

_{k}*. To simplify the notation, we shall write*

_{k}**y**

_{1:k}as a shorthand for

**y**

_{1}, . . . ,

*. Let*

**y**_{k}*p*

_{k}_{|k−1}(·|

**y**

_{1:k−1}) be the conditional (predictive) density function of

**x**

*given*

_{k}**y**

_{1:k−1}, and let

*p*(·|

_{k}**y**

_{1:k}) be the conditional (analysis) density function of

**x**

*given*

_{k}**y**

_{1:k}. The nonlinear filtering algorithm consists of two steps, which we summarize below. The reader is referred to Doucet et al. (2001) for an extensive description of the filter.

*The prediction step*. Suppose that the required analysis density*p*_{k}_{−1}(·|**y**_{1:k−1}) at time*t*_{k}_{−1}is available. The prediction step involves using the model Eq. (1) to obtain the predictive density*p*_{k}_{|k−1}(·|**y**_{1:k−1}) at the time of the next available observation*t*via the Chapman–Kolmogorov equation:where_{k}*p*(**x**|**x**_{k−1}=**u**) is the conditional density of the state vector**x**to be at_{k}*x*at time*t*given that it was at_{k}*u*at time*t*_{k}_{−1}. Under the assumptions made on the model noise,*p*(*x*_{k}|**x**_{k−1}=**u**) =*ϕ*[**x**− 𝗠(_{k}**u**); 𝗤], wheredenotes the Gaussian density of zero mean and covariance matrix_{k}**Σ**. Thus,*The correction step*. After a new observation**y**has been made, we recover the analysis density_{k}*p*(·|_{k}**y**_{1:k}) at time*t*using the Bayes rule,The analysis density is therefore obtained by multiplying the prior predictive density by the observation likelihood and normalizing by_{k}*b*= ∫_{k}_{ℝn}*p*_{k|k−1}(**u**|**y**_{1:k−1})*ϕ*[**y**_{k}− 𝗛_{k}(**u**); 𝗥_{k}]*d***u**to ensure a probability density.

While the expressions of the state PDFs can be easily obtained, determining the value of the predictive density at each point in state space is practically impossible for large dimensional systems, as in meteorology and oceanography. This actually requires the evaluation of 𝗠* _{k}*(

**x**) for a large set of values of

*x*, given that one single evaluation can be quite costly in realistic applications. The particle filters approximate the state PDFs by convex mixtures of Dirac functions. In the next section, we will resort to the kernel method to approximate the state PDFs by mixtures of Gaussian distributions.

## 3. The low-rank kernel particle Kalman filter

*N*independent samples (observations)

**x**

^{1}, . . . ,

**x**

^{N}from a (multivariate) density

*p*, an estimator

*p̂*of

*p*can be obtained by the kernel method as a mixture of

*N*Gaussian densities (Silverman 1986),where 𝗣 is a positive definite matrix. In practice, 𝗣 is very often taken to be

*h*

^{2}times the sample covariance matrix of the observations, and

*h*is a bandwidth parameter to be chosen. Inspired from this estimator, we propose to approximate the state PDFs in the optimal nonlinear filter by mixtures of

*N*Gaussian densities of the formwhere the subscript

*t*replaces

*k*at the analysis time and

*k*+ 1|

*k*at the prediction time,

**x**

*are vectors in ℝ*

^{i}_{t}^{n}called particles,

*w*are probabilities (also called weights), and 𝗣

^{i}_{t}*is a positive definite matrix. Similar estimators have been already considered by Anderson and Anderson (1999) and Chen and Liu (2000). Here, we further assume that the matrices 𝗣*

_{t}*are*

_{t}*small*, in some sense, and of low rank

*N*− 1 (the rank of a sample covariance matrix of an ensemble of

*N*members). The first condition will be used to locally linearize the model around the particles ensemble. The second condition is needed for applications with high-dimensional systems, as in atmospheric and oceanic data assimilation problems where

*n*is of the order of 10

^{8}, because the manipulation of the associated full size covariance matrices is not possible in practice. This actually allows the decompositionwhere 𝗟

*and 𝗨*

_{t}*are*

_{t}*n*× (

*N*− 1) and (

*N*− 1) × (

*N*− 1) matrices, respectively, which avoids the manipulation of the huge matrices 𝗣

*by only including the matrices 𝗟*

_{t}*and 𝗨*

_{t}*in the filter’s algorithm. As discussed by Pham (2001), the approximated rank, number of particles plus one, should be larger than the number of EDOF to expect an acceptable filter behavior. A good behavior is achieved when the filter brings the estimation error to an acceptable level of observational and representational error (Cane et al. 1996; Lermusiaux and Robinson 1999). An estimate of the EDOF can be made from the number of positive Lyapounov exponents (Kaplan and Yorke 1979), or can be bounded above by the empirical orthogonal function (EOF) spectrum, which is the number of EOFs needed to account for most of the system variance (Farrell and Ioannou 2001b). For simplicity and convenience, we only compute the latter in our numerical applications, which is sufficient for setting an appropriate rank for the mixtures covariance matrices.*

_{t}*will be determined as function of the*

_{t}**x**

*followingwhere 𝗫*

^{i}_{t}_{t}= [

**x**

^{1}

_{t}· · ·

**x**

*], and 𝗧 is a*

^{N}_{t}*N*× (

*N*− 1) full rank matrix with zero columns sum. A convenient choice of such matrix isPostmultiplication with 𝗧 implicitly subtracts the {

**x**

^{1}

_{t}· · ·

**x**

*}-ensemble mean. An efficient algorithm to recursively update the parameters,*

^{N}_{t}*w*,

^{i}_{t}**x**

*, and 𝗨*

^{i}_{t}*, of both mixtures is also provided. The algorithm consists of a correction step and a prediction step and is complemented by a resampling step, whenever needed.*

_{t}### a. The initialization step

**x**

^{i}

_{1|0}from the unconditional distribution of the state vector

**x**

_{1}at this time, and take

*w*

^{i}_{1|0}= 1/

*N*, and 𝗣

_{1|0}=

*h*

^{2}cov(

**x**

^{i}

_{1|0};

*w*

^{i}_{1|0}), where

*h*is a small tuning parameter and cov(

**x**

*;*

^{i}*w*) denotes the sample covariance matrix of the particles

^{i}**x**

*associated with the weights*

^{i}*w*, namelyThe above initialization would result in a predictive density

^{i}*p*

_{1|0}centered about the average of the

**x**

^{i}

_{1|0}with a covariance matrix (1 +

*h*

^{2})cov(

**x**

^{i}

_{1|0};

*w*

^{i}_{1|0}), which is larger than the theoretical covariance matrix of

**x**

_{1}by about a factor (1 +

*h*

^{2})(

*N*− 1)/

*N*. This can be beneficial as it means that we err on the safe side by overestimating the initial error covariance matrix (Anderson and Anderson 1999).

*n*of the state vector is much larger than the number of particles

*N*that can be used. The matrix cov(

**x**

*;*

^{i}*w*) will therefore be singular of rank ≤

^{i}*N*− 1. By letting

**X**= [

**x**

^{1}· · ·

**x**

*] and 𝗪 the diagonal matrix with diagonal elements*

^{N}*w*

^{1}, . . . ,

*w*, it is possible to verify that cov(

^{N}**x**

*;*

^{i}*w*) can be decomposed as

^{i}^{1}The matrix 𝗣

_{1|0}can therefore be factorized as in Eqs. (9)–(10) intowith

In practice, very little information is available on the distribution of the initial state vector *p*_{x1}. However, this is often not a serious problem since several studies have found that *p*_{x1} does not need to be set with high accuracy because it does not have a significant impact on the long-term behavior of the filter (Doucet et al. 2001). In atmospheric and oceanic applications, it is important that the initial estimate of *p*_{x1} take into account the main physical quantities that govern the evolution of the state of these systems. Omitting such quantities may badly affect the filter’s behavior, as they tend to persist over time. We therefore estimate the statistics of *p*_{x1} from a sample of model outputs. In the absence of prior information, we follow the common practice and assume *p*_{x1} to be Gaussian (Doucet et al. 2001). Under the assumption of low-rank covariance matrix, a second-order exact drawing can be performed as described by Pham (2001) to sample the **x**^{i}_{1|10} such that their mean and sample covariance matrix exactly matches the mean and 1/(1 + *h*^{2}) times the covariance matrix of *x*_{1|0}. The covariance matrix of the resulting initial density estimate *p*_{1|0} then matches that of *p*_{x1}.

### b. The correction step

_{k|k−1}is small,

*ϕ*[

**x**−

**x**

^{i}

_{k|k−1}; 𝗣

_{k|k−1}] becomes negligible as soon as

**x**is not sufficiently close to

**x**

^{i}

_{k|k−1}. This allows the linearization of the observation operator 𝗛

*around*

_{k}**x**

^{i}

_{k|k−1}, and therefore to approximate the

*i*th component of Eq. (17) bywhere

**y**

^{i}

_{k|k−1}= 𝗛

_{k}(

**x**

^{i}

_{k|k−1}) and

**H**

*denotes the gradient of 𝗛*

^{i}_{k}*at the point*

_{k}**x**

^{i}

_{k|k−1}. With the significant computational burden in mind, we further assume that 𝗛

*is*

_{k}*nearly linear*so that its gradient depends weakly on the point where it is evaluated. This allows us to take

**H**

*≡*

^{i}_{k}**H**

_{k}(the gradient of 𝗛

*at the predicted state Σ*

_{k}^{N}

_{i=1}

*w*

^{i}_{k|k−1}

**x**

^{i}

_{k|k−1}, e.g.). Following similar calculations to the one used in the derivation of the standard Kalman filter (e.g., Todling 1999), expression Eq. (18) can be rewritten aswhereThe analysis density

*p*(

_{k}**x**|

**y**

_{1:k}) can be, therefore, approximated by the Gaussian mixturewhere the new

*w*weights are updated withThis shows

^{i}_{k}*p*(·|

_{k}**y**

_{1:k}) is also a mixture of Gaussian densities, as stated before. Moreover, the covariance matrix 𝗣

*of the components of this mixture, being bounded above by the assumed small 𝗣*

_{k}_{k|k−1}in Eq. (23), remains small.

The filter’s correction step can be interpreted as composed of two types of corrections: a *Kalman-type correction* defined by Eqs. (20)–(23) and a *particle-type correction* defined by Eqs. (24)–(25). The Kalman-type correction reduces the risk of degeneracy by pulling the particles toward the true state of the system Eq. (20). This can also be seen from Eq. (25), which has the same form as the standard particles’ reweight equation, but uses the covariance matrix of the predictive measure **Σ*** _{k}* as the “observation covariance” matrix rather than 𝗥

*, which is used in the standard particle filter. Since*

_{k}**Σ**

*is always greater than 𝗥*

_{k}*, the particles close to the observation will somehow get less weight than in the standard particle filter, while those far from the observation will receive more weight. This means that the support of the local predictive density and likelihood will be more coherent than in the particle filter. Resampling will therefore be needed less often, thus reducing Monte Carlo fluctuations.*

_{k}_{k|k−1}was decomposed into 𝗟

_{k|k−1}𝗨

_{k|k−1}𝗟

^{T}

_{k|k−1}as in Eqs. (9)–(10). Using Eq. (23), it can be seen that 𝗣

*can be also factorized into 𝗟*

_{k}_{k|k−1}

_{k}𝗟

^{T}

_{k|k−1}, whereLetting 𝗟

_{k}= [

**x**

^{1}

_{k}· · ·

**x**

*]𝗧 = 𝗫*

^{N}_{k}_{k}𝗧 and using Eqs. (10) and (23), we have 𝗟

*= 𝗟*

_{k}_{k|k−1}𝗕

*, whereThis shows that the covariance matrix of the analysis mixture 𝗣*

_{k}*remains of the form 𝗟*

_{k}_{k}𝗨

_{k}𝗟

^{T}

_{k}as in Eqs. (9)–(10) withNote that when the observational operator 𝗛

*is linear,This formula can still be used when 𝗛*

_{k}*is nonlinear to avoid linearization, as proposed by Pham (2001).*

_{k}### c. The prediction step

*p*(·|

_{k}**y**

_{1:k}) in the form of a mixture of Gaussian densities Eq. (24), with the mixture covariance matrix 𝗣

*being small and factorized as 𝗟*

_{k}_{k}𝗨

_{k}𝗟

^{T}

_{k}. Using Eq. (5), the predictive density at the next step isAgain, since

*ϕ*[

**u**−

**x**

*; 𝗣*

^{i}_{k}_{k}] becomes negligible as soon as

**u**is not sufficiently close to

**x**

*, the model can be linearized around the*

^{i}_{k}**x**

*to obtain the approximationwhere*

^{i}_{k}**M**

^{i}

_{k+1}denotes the gradient of 𝗠

_{k+1}at

**x**

*. As before, we will assume that 𝗠*

^{i}_{k}_{k+1}is nearly linear so that its gradient depends weakly on the point where it is evaluated, and therefore take

**M**

^{i}

_{k+1}≡

**M**

_{k+1}. With these approximations, the integral in the above equation appears as the density of the sum of a Gaussian random vector of mean 𝗠

_{k+1}(

**x**

*) and covariance matrix 𝗤*

^{i}_{k}_{k+1}, and an independent Gaussian vector of mean zero and covariance matrix

**M**

_{k+1}𝗣

_{k}

**M**

^{T}

_{k+1}. Therefore,This shows that the predictive density remains a mixture of Gaussian densities of mean

**x**

^{i}

_{k+1|k}= 𝗠

_{k+1}(

**x**

*), covariance matrixand weight*

^{i}_{k}*w*

^{i}_{k+1|k}=

*w*. The covariance matrix of the analysis density 𝗣

^{i}_{k}*was decomposed into 𝗟*

_{k}_{k}𝗨

_{k}𝗟

^{T}

_{k}with 𝗟

*= 𝗫*

_{k}*𝗧, the above equation becomesTo avoid the linearization of the model, we follow Pham (2001) and we compute 𝗟*

_{k}_{k+1|k}fromIn the case of a perfect model, that is, with no dynamical noise (𝗤

*= 0), 𝗣*

_{k}_{k+1/k}is again of the form Eqs. (9)–(10). The correction step can therefore be repeated as before. When the model is imperfect, the rank of the mixture covariance matrix 𝗣

_{k+1|k}will continuously increase without any limit even if 𝗤

_{k+1}was of low rank. However, under the assumption of 𝗤

_{k+1}of low rank, several techniques can be used to avoid this problem; for example, projecting the dynamical noise on the subspace spanned by 𝗟

_{k+1|k}, or reapproximating the matrix 𝗣

_{k+1|k}by a (

*N*− 1)-rank matrix, or using ensemble representation of the model error, so that 𝗣

_{k+1|k}remains of the form 𝗟

_{k+1|k}𝗨

_{k+1|k}𝗟

^{T}

_{k+1|k}, as suggested by Hoteit et al. (2007).

### d. The resampling step

Resampling is crucial in the particle filter to avoid the degeneracy of the particles. The same problem might also occur in the LRKPK filter, albeit to a lesser degree since the matrix **Σ*** _{k}* in Eq. (22) is greater than 𝗥

*, which means that the weights are distributed more uniformly than in the case of the particle filter. Another problem may occur in our filter is that the matrix 𝗣*

_{k}_{k+1|k}in Eq. (31) is generally greater than 𝗣

*; either because of the presence of dynamical noise and/or the amplification effect of the multiplication by*

_{k}**M**

_{k+1}, hence the mixture covariance matrices might become too large to justify the approximations needed to derive Eq. (18) and Eq. (30). We rely on resampling to reduce the size of 𝗣

_{k+1|k}. Note that a standard “full” resampling would require particles selection, which eliminates particles with low weights and duplicate particles with high weights. To avoid Monte Carlo fluctuations associated with such selection, a “partial” resampling is used here when the weights are already sufficiently uniform. Partial resampling is less destructive (of information) than full resampling. It amounts to adding some noise to the particles, while the weights remain unchanged. This is similar to the resampling procedure of the standard particle filter, which adds noise to the particles to mimic the dynamical noise. In any case, since resampling inevitably entails some loss of accuracy, it should be used only when necessary.

_{k+1|k}[always decomposed as in Eqs. (9)–(10)]. The weights

*w̃*

^{i}_{k+1|k}are set to 1/

*N*if full resampling is performed and remain unchanged (=

*w*) in the case of partial resampling.

^{i}_{k}*p̃*

_{k+1|k}(

**x**|

**y**

_{1:k}) is to follow the theory of the kernel density estimator (Silverman 1986); draw the

**x̃**

^{i}

_{k+1|k}according to the density Eq. (34); and then take

_{k+1|k}=

*h*

^{2}

**Π**

_{k+1|k}, where

**Π**

_{k+1|k}is the covariance matrix of the predictive density,and

*h*is a small bandwidth parameter, but can be chosen to some extent. The kernel density estimator is, however, biased: if the

**x̃**

^{i}

_{k+1|k}are sampled according to Eq. (34), then Eq. (35) has the expectationThe above calculations show that the bias can be completely eliminated if the

**x̃**

^{i}

_{k+1|k}is drawn from a density of the same form as Eq. (34), but replacing 𝗣

_{k+1|k}with 𝗣

_{k+1|k}−

_{k+1|k}. This obviously requires the matrix 𝗣

_{k+1|k}−

_{k+1|k}to be nonnegative. Moreover, bias is not the only criterion, and the variance needs to be considered as well (a small bias might be allowed if it improves the variance and achieves less estimation error). We therefore propose to draw the

**x̃**

^{i}

_{k+1|k}from the densitywhere

*h̃*∈ (0,

*h*) is another parameter. The idea is then to choose the

*h*and

*h̃*that minimize the mean integrated squares error (MISE) of the estimator Eq. (35). It turned out that for a given

*h̃*, the MISE is (asymptotically) minimized with respect to

*h*≥

*h̃*at a unique point

*h*

_{opt}(

*h̃*), and further the attained minimum decreases as

*h̃*increases. This means that

*h̃*should be chosen as large as possible. But

*h̃*is constrained by the condition 𝗣

_{k+1|k}−

*h̃*

^{2}

**Π**

_{k+1|k}> 0 and the corresponding

_{k+1|k}is small. Given Eqs. (14) and (32), the first constraint is equivalent to 𝗨

_{k}−

*h̃*

^{2}𝗩

_{k}> 0, where

^{2}This is true if 𝗖

^{−1}

_{k}𝗨

_{k}(𝗖

^{T}

_{k})

^{−1}>

*h̃*

^{2}𝗜

_{d}(𝗜

*being the identity matrix), with 𝗖*

_{d}_{k}𝗖

^{T}

_{k}is the Cholesky decomposition of 𝗩

*. In general, this constraint always entails the second one, so the latter may be ignored. We therefore choose*

_{k}*h̃*as the square root of the smallest eigenvalue

*h̃** of the matrix 𝗖

^{−1}

_{k}𝗨

_{k}(𝗖

^{T}

_{k})

^{−1}. Then the corresponding

*h*

_{opt}(

*h̃*) can be computed.

In practice, we found that *h̃** can be quite small. As a result, the Kalman-type correction had little effect, and the filter behaved more like a particle filter, which is undesirable. As we care more about the behavior of the filter than the accuracy of the density estimator, it is of interest to focus on the reduction of Monte Carlo fluctuations at the expense of bias. Such bias does not cause great harm; it actually means that the predictive density *p _{k}*

_{+1|k}, and therefore the particles ensemble, is more spread out than the true one and this helps reduce the risk of degeneracy. Intuitively, as

*p*

_{k}_{+1|k}is made more diffusive, the filter relies more on recent observations than the model and past observations. This has a similar effect to the widely used forgetting (or inflation) factor in Kalman filtering (Jazwinski 1970), and helps attenuate the propagation of the different sources of errors in the filter (such as Monte Carlo fluctuations, low-rank and model error approximations, and system linearization in our case). It is therefore quite reasonable to sacrifice some filter’s performance (on average) to reduce the risk of divergence. This means that it could be more beneficial to use a value of

*h*larger than the “optimal” one. In the absence of a precise rule, we will consider

*h*as a tunable parameter and try to select it empirically, by trial and error.

For partial resampling, we draw the new particles **x̃**^{i}_{k+1|k} according to the Gaussian density of mean **x̃**^{i}_{k+1|k} and covariance matrix 𝗣_{k+1|k} − *h̃*^{2}**Π**_{k+1|k}. The above calculations concerning the bias of the density estimator *p̃*_{k+1|k}(·|**y**_{1:k}) remain valid, but not those concerning the variance since the **x̃**^{i}_{k+1|k} are not drawn from the same distribution anymore. We therefore take *h̃* = *h̃**, and for the same reasons as above, we choose *h* a priori as a tuning parameter.

Note that after a full or a partial resampling, 𝗣_{k+1|k} is reset to _{k+1|k} − *h*^{2}**Π**_{k+1|k}. The matrix 𝗨_{k+1|k} needs to be replaced by _{k+1|k} so that _{k+1|k} is again factorized into _{k+1|k}_{k+1|k}^{T}_{k+1|k}, where _{k+1|k} is function of the new particles, that is, _{k+1|k} = _{k+1|k}𝗧.

*w*

^{i}_{k+1|k}are not uniformly distributed (full resampling), or if the matrix 𝗣

_{k+1|k}is not small (partial resampling). A simple test for the first condition is to consider the entropy of the weights −Σ

^{N}

_{i=1}

*w*

^{i}_{k+1|k}log

*w*

^{i}_{k+1|k}, which reaches its maximum log

*N*when the distribution of the weights

*w*

^{i}_{k+1|k}is uniform. Thus, we can consider applying a full resampling if the quantityexceeds some threshold. Concerning the second condition, the largest eigenvalues of the matrix 𝗖

^{−1}

_{k}𝗨

_{k}(𝗖

^{T}

_{k})

^{−1}should be monitored. In practice, the filter’s covariance matrices 𝗣

*and 𝗣*

_{k}_{k+1|k}do not change much after a prediction or a correction step. After a resampling step, these matrices can be expected to remain small for at least several filtering cycles. This is of course a problem dependent but is often the case in practice, as the sampling interval is generally small (so that the system does not change much in this interval) and the information provided by an observation is not so significant to produce an important correction. To avoid the computation of the eigenvalues of 𝗖

^{−1}

_{k}𝗨

_{k}(𝗖

^{T}

_{k})

^{−1}, a simple rule is to wait, say

*m*filtering cycles, before resampling again. Resampling is then partial if the entropy test reveals that the distribution of the weights is still not too far from being uniform.

## 4. Summary of the LRKPK filter algorithm

The filter’s algorithm is summarized below. Of course, the initialization step is only applied once. The forecast and correction steps must be repeated and resampling can be skipped or done partially. After every prediction or correction step, the minimum-variance estimate of the system state is obtained as the weighted mean of the predicted or the analyzed particles.

*Initialization.*Starting from an initial prediction state of mean*m*_{0}and covariance matrix 𝗣_{0}, draw an ensemble**x**^{1}_{1|0}, . . . ,**x**^{N}_{1|0}according to the Gaussian distribution*ϕ*[*m*_{0}; (1/1 +*h*^{2})𝗣_{0}]. Then set*w*^{i}_{1| 0}= 1/*N*for all*i*and take 𝗣_{1|0}= 𝗟_{1|0}𝗨_{1|0}𝗟^{T}_{1|0}, where 𝗟_{1|0}= 𝗫_{1|0}𝗧 and 𝗨_{1|0}=*h*^{2}(𝗧^{T}𝗪^{−1}_{1|0}𝗧)^{−1}= (*h*^{2}/*N*)(𝗧^{T}𝗧)^{−1}.*Correction step.*First compute**y**^{i}_{k|k−1}= 𝗛_{k}(**x**^{i}_{k|k−1}) for each*i*and determine**Σ**from Eq. (22). Then apply a_{k}- Kalman-type correction: Compute 𝗚
from Eq. (21) using Eq. (26), and use it to correct the forecast particles_{k}**x**^{i}_{k|k−1}with the new observation**y**via Eq. (20) to obtain the analysis particles_{k}**x**. Then take 𝗟^{i}_{k}= 𝗫_{k}𝗧 and update 𝗨_{k}by Eq. (28) so that 𝗣_{k}_{k}= 𝗟_{k}𝗨_{k}𝗟^{T}_{k}. - Particle-type correction: Update the particles weights with Eq. (25).

- Kalman-type correction: Compute 𝗚
*Prediction step.*Integrate each particle**x**with the model to the time of the next available observation to determine the^{i}_{k}**x**^{i}_{k+1|k}and keep the weights unchanged*w*^{i}_{k+1|k}=*w*. Then take 𝗟^{i}_{k}_{k+1|k}= 𝗫_{k+1|k}𝗧 and 𝗨_{k+1|k}= 𝗨._{k}*Resampling step.*Compute the matrix*V*from Eq. (39) and its Cholesky decomposition 𝗖_{k}_{k}𝗖^{T}_{k}, and set*h̃** as the square root of the smallest eigenvalue of 𝗖^{−1}_{k}𝗨_{k}(𝗖^{−1}_{k})^{T}. Then, for each*i*, draw a random Gaussian vector**ξ**^{i}_{k+1}of mean zero and covariance matrix 𝗨_{k}−*h̃**^{2}𝗩_{k}. If 𝗘_{k+1}, as computed from Eq. (40), is smaller than a threshold*η*, perform a- Partial resampling: Simply add the 𝗟
_{k+1| k}**ξ**^{i}_{k+1}to the**x**^{i}_{k+1| k}to obtain the new particles,**x̃**^{i}_{k+1|k}, that is,and set*w̃*^{i}_{k+1|k}=*w*^{i}_{k+1|k}=*w*.^{i}_{k}

_{k+1}>*η*, perform a- Full resampling: Select
*N*particles among the**x**^{i}_{k+1| k}according to their weights*w*^{i}_{k+1| k}, then add 𝗟_{k+1| k}**ξ**^{i}_{k+1}to the*i*th selection to obtain the new particle*x̃*^{i}_{k+1|k}, that is,where 𝗦_{k+1}is the selection matrix with exactly one nonzero element equal to one in each column, and set*w̃*^{i}_{k+1|k}= 1/*N*.

- Partial resampling: Simply add the 𝗟

_{k+1|k}from the

**x̃**

^{i}

_{k+1|k}and setwhere 𝗗

_{k+1}= [

_{k+1}+ (

*ξ*^{1}

_{k+1}· · ·

*ξ*^{N}

_{k+1})𝗧]

^{−1}, and

_{k+1}= 𝗜

_{d}if partial resampling was performed, or

_{k+1}= (𝗧𝗧

^{T})

^{−1}𝗧

^{T}𝗦

_{k+1}𝗧 if a full resampling was performed.

## 5. Application to the Lorenz model

*s*= 10,

*r*= 28, and

*b*= 8/3, to obtain a chaotic solution. The design of the experiment is identical to that of Pham (2001), and therefore the results of the LRKPK filter can be evaluated with those reported by Pham (2001) for several ensemble/particle-based filters. The Lorenz equations are first integrated between

*t*= 0 and

*t*= 25 with the standard Runge–Kutta scheme using a step size of 0.005 and starting from the initial state (

*x*

_{0},

*y*

_{0},

*z*

_{0}) = (−0.587276, −0.563678, 16.8708). This is the reference run. A set of 400 states was formed by retaining the reference solution at intervals of 0.05 starting from

*t*= 5 to avoid the transitory phase. Observations of the variable

*x*only are simulated at intervals of 0.05 by adding a random normal noise with mean zero and variance 2 to the reference solution. The state’s initial PDF is assumed to be Gaussian with mean

*x*

_{1|0}and covariance matrix 𝗣

_{1|0}. Here,

**x**

_{1|0}and 𝗣

_{1|0}were set as the mean and sample covariance matrix of the 400 retained reference states. A rank-2 approximation of 𝗣

_{1|0}was then computed by applying an EOF analysis on the reference states as described by Pham et al. (1997). This analysis provides the best low-rank matrix approximation for a sample covariance matrix (Preisendorfer 1988). The rank-2 approximation is in line with the effective dimension of the Lorenz model (2.06). This means that the filter will run with only 3 particles. The initial 3 particles

**x**

^{i}

_{1|0}were randomly sampled using the second-order exact drawing scheme (Pham 2001). The performance of the filter is measured by the root-mean-square (rms) error (1/

Figure 1 plots the rms error of the LRKPK filter in the top panel and the corresponding solutions for the *z* component (not assimilated) in the lower panel. The resampling step was applied every two filtering cycles and the resampling threshold and the parameter *h* were set to be 0.25 and 0.9, respectively. It can be seen that the LRKPK filter does a very good job at tracking the reference solution while capturing all the model phase transitions. The rms error is also rather consistent over the entire assimilation window. It is very comparable to those obtained by Pham (2001), with the EnK filter using larger number of particles. The filter was also able to provide reliable estimates for the *z* component of the model, showing an efficient propagation of information to a nonobserved variable. We must mention that the performance of the filter was quite sensitive to the choice of the tuning parameters of the resampling step. A bad choice of these parameters might result in much worse performance than those reported here. Likewise better performance could be achieved with different choices.

The results of this section suggest that the LRKPK filter exploits the limited number of EDOF of the Lorenz system to efficiently operate with a small number of particles. The next section will test the performance of the LRKPK filter with a realistic ocean general circulation model (OGCM).

## 6. Application to an OGCM

We present preliminary results from a first application of the LRKPK filter for assimilation of synthetic sea surface height (SSH) data into an OGCM of the Mediterranean Sea following a twin experiments approach.

### a. The ocean model

We use POM, which is a primitive-equations finite-difference model formulated under the hydrostatic and Boussinesq approximations. The model solves the 3D Navier–Stokes equations on an Arakawa-C grid with a numerical scheme that conserves mass and energy. The spatial differencing schemes are central and explicit in the horizontal and central and implicit in the vertical. Time stepping is achieved using a leapfrog scheme associated with an Asselin filter. The numerical computation is split into an external barotropic mode with a short time step [dictated by the Courant–Friedrichs–Lewy (CFL) condition] solving for the time evolution of the free surface elevation and the depth averaged velocities, and an internal baroclinic mode that solves for the vertical velocity shear. Horizontal mixing in the model is parameterized according to Smagorinsky (1963) while vertical mixing is calculated using the Mellor and Yamada 2.5 turbulence closure scheme. The model state vector is composed of all prognostic variables of the model at every sea grid point. The reader is referred to Blumberg and Mellor (1987) for a detailed description of POM.

The model domain covers the entire Mediterranean basin extending from 7°W to 36°E and 30° to 46°N and has one open boundary located at 7°W. Open boundary conditions were set as follows:

- Zero gradient condition for the free surface elevation.
- Flather’s (1976) boundary conditions for the normal barotropic velocity normal.
- Sommerfeld radiation for the internal baroclinic velocities.
- Temperature and salinity are advected upstream. When there is inflow through the boundary, these fields are prescribed from the Mediterranean Ocean Database (MODB) MODB-MED4 seasonal climatology.

The horizontal resolution is ¼° × ¼° with 25 sigma levels in the vertical logarithmically distributed near the surface and the bottom. The number of grid points was therefore 175 × 65 × 25. The model bathymetry was obtained from the U.S. Navy Digital Bathymetric Data Bases (DBDB) DBDB5 and DBDB1 and is shown in Fig. 2. The surface forcing (monthly climatological wind stress, upward heat flux, net shortwave radiation and evaporation rate) were derived from the 1979–1993 ECMWF global 1° × 1° 6-h reanalysis, except for the precipitation fields, which were derived from the Jaeger monthly climatology. Bulk formulas were used to compute the surface momentum, heat, and freshwater fluxes at each time step of model integration taking into account the SST predicted by the model itself. The model dynamics were first adjusted to achieve a perpetually repeated seasonal cycle before applying the interannual atmospheric forcing by integrating the model climatologically for 20 yr. This climatological run was initialized with the MODB-MED4 spring temperature and salinity profiles and the initial velocities were set to zero.

### b. Experimental setup

#### 1) Filter initialization

The filter is initialized by a Gaussian PDF of mean **x**_{1|0} and covariance matrix 𝗣_{1|0} respectively taken as the mean and sample covariance matrix of a large historical set 𝗛* _{S}* of state vectors simulated from a long model run. A low-rank approximation

_{1|0}= 𝗟

_{1|0}𝗨

_{1|0}𝗟

^{T}

_{1|0}is determined by applying an EOF analysis on 𝗛

*The initial particles*

_{S}.**x**

^{i}

_{1|10}were then randomly sampled using the second-order exact drawing scheme (Pham 2001).

The historical set 𝗛* _{S}* was constructed as follows. The model was first integrated for a 2-yr period (1980–1981) starting from the end of the 20-yr spinup run to achieve a quasi adjustment of the model climatological dynamics to the ECMWF interannual forcing. Next, another integration of 4 yr (1982–1985) was carried out to generate 𝗛

*by retaining one model output (state vector) every two days. Since the state vector is composed of variables of different nature, a multivariate EOF analysis was applied on the sampled set of 730 state vectors. In this analysis, the state variables were normalized by the standard deviation of each state variable spatially averaged over all sea grid points. About 50 EOFs were needed to account for 90% of the system variance. Given that the individual variance explained by the remaining EOFs was insignificant, this number, 50, provides an estimate of the upper bound for the EDOF of the system (Farrell and Ioannou 2001b). This suggests that a covariance matrix of rank 50, or even less, would likely provide a sufficiently accurate approximation of the covariance matrix of the Gaussian mixture used to approximate the PDF of the system state.*

_{S}#### 2) Twin-experiments design

A reference model run was first carried out over a 1-yr period starting from 1 January 1986. A sequence of 73 reference states was formed by retaining one model output every 5 days. These states, considered as the “true states,” are used to extract the pseudoobservations, and to evaluate the filter’s behavior by comparing them with state vectors as estimated by the filter. This allows the assessment of the filter’s performance with nonobserved variables. The assimilation experiments were then carried out over the same period, using pseudoobservations of SSH extracted from the reference states every 4 grid points. Independent Gaussian errors of zero mean and 3-cm standard deviation are added to the observations. All experiments were performed in a perfect-model context (𝗤* _{k}* = 0) and the observational error covariance matrix 𝗥

*is diagonal with (3 cm)*

_{k}^{2}diagonal elements. Another model run initialized from the filter’s initial state estimate

*x*

_{1|0}and integrated over 1986 without any assimilation was also performed to assess the relevance of the assimilation.

Figure 3 plots the time evolution of the basin average kinetic energy during the experiments (initialization/EOF-analysis/twin-experiment) periods. The kinetic energy exhibits a strong seasonal cycle to which interannual anomalies are superimposed; the most important being that of year 1981, forced by the corresponding wind stress anomalies.

### c. Assimilation results

In this section, the performance of the LRKPK filter is examined in a series of data assimilation experiments. We first present the overall behavior of the filter and then discuss the effect of varying the filter’s parameters, as the rank of the mixture covariance matrices and the bandwidth parameter, on the filter’s performance (basically to determine the setup for the main experiment).

#### 1) Main experiment

The LRKPK filter was implemented using 50 particles with resampling every 5 filtering cycles and a bandwidth parameter *h* = 0.4. The evolution of the RRMS for this run as function of time is plotted in Fig. 4. The temporal development of the RRMS is characterized by a large reduction of the estimation error with respect to the free run at the first analysis step. Subsequent analyzes are less significant and the filter is able to stabilize the state estimation error at about 70% less than the free run. The assimilation is also shown to significantly improve the estimation of all model state variables with respect to the model free run all over the assimilation window, and, as can be expected, the best assimilation results were achieved for the observed variable SSH.

To evaluate the filter’s performance in capturing the variability of the model, Fig. 5 plots the spatial distribution of the SSH rms estimation error for (a) the free run and (b) the filter’s run with respect to the reference run. Centers of large errors (6 cm) are located in the central Balearic basin and the Tyrrhenian Sea within the western Mediterranean and the central Ionian within the eastern part of the basin. Most of these errors are related to the variability of the Atlantic waters (AW) current flowing along the north African coasts within the Western Mediterranean and the Atlantic-Ionian (AI) stream, which is the continuation of this current within the Ionian basin. The filter greatly improves the solution for the SSH with respect to the free run. In particular, the general variability as well as the mean position of the AW current and the AI stream were efficiently captured by the assimilation system. This resulted in a drastic reduction of the SSH misfits below 2 cm all over the Mediterranean basin. It is also important to examine the filter’s behavior in the intermediate and deep ocean layers to make sure that the surface observations were properly assimilated by the model and to assess the efficiency of the filter in propagating this information to the deep layers. Figure 6 shows the spatial distribution of the mean rms estimation error for (a) the free run, and (b) the filter’s run, for a zonal temperature transect along 33.75°N. Temperature misfits for the free run are concentrated between the surface and 300 m, which is approximately the depth of the Levantine waters produced annually within the eastern Mediterranean. In the particular section shown here, most of the error (reaching 1.2°C) is located within the central Ionian basin and is related to the SSH misfits shown before (Korres and Lascaratos 2003). At all depths, the filter significantly improves the estimation of the temperature with respect to the model free run, particularly in the eastern basin where large filter/data misfits were completely removed. This suggests high capabilities for efficiently propagating only surface altimetric information to nonobserved variables.

Overall, the filter was able to significantly improve the estimation of all model state variables with respect to the model free run, while efficiently propagating surface altimetric information to the deep ocean.

#### 2) Sensitivity with respect to the rank of the mixture covariance matrices

Sensitivity assimilation experiments were carried out to examine the effect of varying the rank *r* of the Gaussian mixture covariance matrices on the filter’s behavior. The rank *r* determines the number of particles *N* (=*r* + 1) to be used in the filter, and hence sets the computational cost of the assimilation system.

Figure 7 shows the time evolution of the RRMS for the model state variables as they result from the LRKPK filter using mixture covariance matrices with three different numbers of particles; 30, 50, and 100. In all these experiments, the bandwidth parameter was set to *h* = 0.4 and resampling was performed every 5 filtering steps. These plots suggest that an ensemble with a limited number of 30 particles provides reliable estimates for the assimilated variable (SSH) at reasonable computational cost. For the same variable, the RRMS obtained using 50 and 100 particles are not significantly different. Concerning the estimation of nonobserved variables, the filter shows some weaknesses when small-size ensembles are used, although the overall performance is still reasonable. This suggests that the multivariate propagation of assimilated information requires well-resolved covariance matrices between SSH and other variables, which does not seem feasible with only 30 particles. The use of more particles significantly attenuates this problem and stabilizes the filter’s behavior for all model state variables and all over the assimilation window. It further allows for more degrees of freedom (larger covariance matrices ranks), which enables better fit to the data. In this particular system, benefits from doubling the number of particles from 50 to 100 were not significant. Considering the consequent computational burden associated with the increase in the number of particles, the LRKPK filter can be implemented with 50 particles. This is a small ensemble for a nonlinear filter, even for this model configuration, which has a rather limited number of EDOF. This is consistent with the results of section 5, suggesting that the LRKPK filter exploits the limited number of EDOF of the system to efficiently operate with a small number of particles.

#### 3) Sensitivity with respect to resampling parameters

Following the discussion in section 3d, the frequency *m* (in terms of filtering cycles) of how often resampling needs to be performed and the value of the bandwidth parameter *h*, which adjusts the “size” of the mixture covariance matrices after resampling are chosen empirically in the LRKPK filter. It is therefore necessary to conduct assimilation experiments to evaluate the sensitivity of the filter’s performance with respect to those parameters.

Assimilation results (not shown here) from three sensitivity experiments with different resampling frequencies; *m* = 3, 5, and 10, all with *h* = 0.4 and 30 particles to reduce computational burden, mainly revealed very little differences in the filter’s performances whether resampling was applied every 3 or 5 filtering cycles (slightly better performances were obtained with *m* = 5), and both generally provided better results than the run with resampling every 10 filtering cycles. This suggests that resampling is generally useful to limit the overdispersion of the particles, but can be sometimes omitted because of the Kalman-type correction that reduces the risk of degeneracy.

We then present the results of three filter runs with different values of the bandwidth parameter; *h* = 0.2, 0.4, and 0.8, respectively. Again, to save computing time, the filter was implemented with 30 particles in all these runs. The time evolution of the RRMS as they result from the three assimilation runs is shown in Fig. 8. These plots basically suggest that the best performances were obtained using *h* = 0.4. They support our discussion in section 3d about the appropriate choice for *h*: not too large to justify the approximations in Eqs. (18) and (30), and not too small to enhance the benefit of the Kalman-type correction. This “best” value of *h* is, however, only marginal as the appropriate choice depends on the different properties of the assimilation system (even on the size of the ensemble). It may therefore be different if another system or setup was considered.

## 7. Summary and discussion

Most analysis schemes of current sequential data assimilation techniques are based on Gaussian distributions of the model state. However, for nonlinear models, the distribution of the model state is not Gaussian even when the system statistics are Gaussian. This means that the above assimilation schemes are only suboptimal (more precisely, they are only optimal among linear analysis estimators). The solution of the nonlinear data assimilation problem is well known and is provided by the optimal nonlinear filter, which theoretically offers a simple method to estimate the PDF of the system state. Several attempts to develop a discrete algorithm for an efficient implementation of this filter were presented, mainly based on a point-mass representation of the state PDFs (particle filter). Besides being computationally prohibitive for large dimensional systems, these filters greatly suffer from the degeneracy of their particles, which very often causes the divergence of the filter.

A new approximate solution of the optimal nonlinear filter suitable for applications in meteorology and oceanography has been presented. A pilot implementation in the simple Lorenz model and in a test case assimilating pseudoaltimetric data into a realistic ocean general circulation model is shown, which preliminary proves its feasibility. The new filter, called the low-rank kernel particle Kalman (LRKPK) filter, is based on a Gaussian mixture representation of the state PDFs complemented by a local linearization of the system around the mean of each Gaussian component. With application to high-dimensional oceanic and atmospheric systems in mind, the covariance matrix of the Gaussian mixture was further assumed of low rank and the local linearization was replaced by a “one vector” linearization. This resulted in a new filter in which the standard particle-type correction of the particles weights is complemented by a Kalman-type correction, similar to the popular ensemble Kalman filter but using the covariance matrix of the mixture instead of the sample covariance matrix of the ensemble. As in the ensemble Kalman filter, the Kalman correction attenuates the degeneracy of the particles by pulling them toward the true state of the system, which enables the filter to efficiently operate with reasonable size ensembles. Combined with the low-rank approximation that allows avoiding the manipulation of full size covariance matrices, it enables the use of the filter for oceanic and atmospheric data assimilation problems.

The chosen test situation was that of a Princeton Ocean Model (POM) of the Mediterranean Sea within which several mesoscale eddies are interacting. In such testing conditions, the LRKPK filter was found to be fairly effective in monitoring the flow state and evolution disposing of surface-only pseudoaltimetric data. Further work will consider more complex situations, both from the model point of view; different model setups with stronger nonlinear configurations, and the assimilation of real data from the Ocean Topography Experiment (TOPEX)/Poseidon satellite, for example. A close theoretical and practical comparison between the LRKPK and the popular ensemble Kalman filter is also of interest to assess the relevance of the nonlinear analysis step and will be pursued in the near future. This preliminary application was a necessary step before realistic applications were undertaken and it provided us with encouraging results regarding that purpose.

We thank Dr. Bruce Cornuelle for valuable comments and discussions.

## REFERENCES

Anderson, J., , and S. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts.

,*Mon. Wea. Rev.***127****,**2741–2758.Blumberg, A. F., , and G. L. Mellor, 1987: A description of a three-dimensional coastal ocean circulation model.

*Three-Dimensional Coastal Ocean Circulation Models,*N. S. Heaps, Ed., Coastal Estuarine Science Series, Vol. 4, Amer. Geophys. Union, 1–16.Cane, M. A., , A. Kaplan, , R. N. Miller, , B. Tang, , E. C. Hackert, , and A. J. Busalacchi, 1996: Mapping tropical Pacific sea level: Data assimilation via a reduced state Kalman filter.

,*J. Geophys. Res.***101****,**22599–22617.Chen, R., , and J. Liu, 2000: Mixture Kalman filters.

,*J. Roy. Stat. Soc. A***62****,**493–508.Cohn, S. E., , and R. Todling, 1996: Approximate data assimilation schemes for stable and unstable dynamics.

,*J. Meteor. Soc. Japan***74****,**63–75.Daley, R., 1991:

*Atmospheric Data Analysis*. Cambridge University Press, 471 pp.Doucet, A., , N. de Freitas, , and N. Gordon, 2001:

*Sequential Monte Carlo Methods in Practice*. Springer-Verlag, 581 pp.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**10143–10162.Evensen, G., 1992: Using the extended Kalman filter with a multilayer quasi-geostrophic ocean model.

,*J. Geophys. Res.***97****,**17905–17924.Farrell, B. F., , and P. J. Ioannou, 2001a: State estimation using a reduced-order Kalman filter.

,*J. Atmos. Sci.***58****,**3666–3680.Farrell, B. F., , and P. J. Ioannou, 2001b: Accurate low-dimensional approximation of the linear dynamics of fluid flow.

,*J. Atmos. Sci.***58****,**2771–2789.Flather, R. A., 1976: A tidal model of the northwest European continental shelf.

,*Mem. Soc. Roy. Sci. Liege***10****,**141–164.Fukumori, I., , and P. Malanotte-Rizzoli, 1995: An approximate Kalman filter for ocean data assimilation: An example with an idealized Gulf Stream model.

,*J. Geophys. Res.***100****,**6777–6794.Gauthier, P., , P. Courtier, , and P. Moll, 1993: Assimilation of simulated wind Lidar data with a Kalman filter.

,*Mon. Wea. Rev.***121****,**1803–1820.Hamill, T. M., , C. Snyder, , and R. E. Morss, 2002: Analysis-error statistics of a quasigeostrophic model using three-dimensional variational assimilation.

,*Mon. Wea. Rev.***130****,**2777–2790.Heemink, A. W., , M. Verlaan, , and A. J. Segers, 2001: Variance reduced ensemble Kalman filtering.

,*Mon. Wea. Rev.***129****,**1718–1728.Hoteit, I., , D-T. Pham, , and J. Blum, 2002: A simplified reduced-order kalman filtering and application to altimetric data assimilation in tropical Pacific.

,*J. Mar. Syst.***36****,**101–127.Hoteit, I., , D-T. Pham, , and J. Blum, 2003: A semi-evolutive filter with partially local correction basis for data assimilation in oceanography.

,*Oceanol. Acta***26****,**511–524.Hoteit, I., , G. Korres, , and G. Triantafyllou, 2005: Comparison of extended and ensemble based Kalman filters with low and high resolution primitive equation ocean models.

,*Nonlinear Processes Geophys.***12****,**755–765.Hoteit, I., , G. Triantafyllou, , and G. Korres, 2007: Using low-rank ensemble Kalman filters for data assimilation with high dimensional imperfect models.

,*J. Numer. Anal. Ind. Appl. Math.***2****,**67–78.Houtekamer, P. L., , and L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Jazwinski, A. H., 1970:

*Stochastic Processes and Filtering Theory*. Academic Press, 376 pp.Kalman, R. E., 1960: A new approach to linear filtering and prediction problems.

,*Trans. ASME, J. Basic Eng.***82D****,**35–45.Kaplan, J. L., , and J. A. Yorke, 1979: Preturbulence: A regime observed in a fluid flow model of Lorenz.

,*Commun. Math. Phys.***67****,**93–108.Kivman, G., 2003: Sequential parameter estimation for stochastic systems.

,*Nonlinear Processes Geophys.***10****,**253–259.Korres, G., , and A. Lascaratos, 2003: An eddy resolving model Aegean and Levantine basins for the Mediterranean System Pilot Project (MFSPP): Implementation and climatological runs.

,*Ann. Geophys.***21****,**205–220.Lermusiaux, P. F. J., , and A. R. Robinson, 1999: Data assimilation via error subspace statistical estimation. Part I: Theory and schemes.

,*Mon. Wea. Rev.***127****,**1385–1407.Lorenz, E. N., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20****,**130–141.Osborne, A. R., , and A. Pastorello, 1993: Simultaneous occurrence of low-dimensional chaos and colored random noise in nonlinear physical systems.

,*Phys. Lett. A***181****,**159–171.Pham, D. T., 2001: Stochastic methods for sequential data assimilation in strongly nonlinear systems.

,*Mon. Wea. Rev.***129****,**1194–1207.Pham, D. T., , J. Verron, , and M. C. Roubaud, 1997: Singular evolutive Kalman filter with EOF initialization for data assimilation in oceanography.

,*J. Mar. Syst.***16****,**323–340.Preisendorfer, R. W., 1988:

*Principal Component Analysis in Meteorology and Oceanography*. Developments in Atmospheric Science Series, Vol. 17, Elsevier, 425 pp.Silverman, B. W., 1986:

*Density Estimation for Statistics and Data Analysis*. Chapman and Hall, 175 pp.Smagorinsky, J., 1963: General circulation experiments with the primitive equations. I: The basic experiment.

,*Mon. Wea. Rev.***91****,**99–164.Todling, R., 1999: Estimation theory and foundations of atmospheric data assimilation. Data Assimilation Office, Goddard Space Flight Center, DAO Office Note 1999–01, 187 pp.

Van Leeuwen, P. J., 2003: A variance-minimizing filter for large-scale applications.

,*Mon. Wea. Rev.***131****,**2071–2084.West, B. J., , and H. J. Mackey, 1991: Geophysical attractors may be only colored noise.

,*J. Appl. Phys.***69****,**6747–6749.

^{1}

This can be shown by noticing that the matrix 𝗧(𝗧^{T}𝗪^{−1}𝗧)^{−1}𝗧^{T}𝗪^{−1} is the orthogonal projection operator, with respect to the metric 𝗪^{−}^{1}, to the linear subspace spanned by vectors with components sum equal to 0.

^{2}

It is important to note that the filter error covariance matrix is not 𝗣* _{k}* but cov(

**x**

*;*

^{i}_{k}*w*) + 𝗣

^{i}_{k}_{k}. The first term represents the dispersion of the particles and the second the covariance matrix associated with the particles. When 𝗣

*is approximated by a (*

_{k}*N*− 1)-low-rank matrix, both cov(

**x**

*;*

^{i}_{k}*w*) and 𝗣

^{i}_{k}*are factorized as 𝗟*

_{k}_{k}(𝗧

^{T}𝗪

^{−1}

_{k}𝗧)

^{−1}𝗟

^{T}

_{k}and 𝗟

_{k}𝗨

_{k}𝗟

^{T}

_{k}, and therefore the filter error covariance matrix is factorized as 𝗟

_{k}𝗩

_{k}𝗟

^{T}

_{k}.