## Abstract

Hybrid variational–ensemble data assimilation (hybrid DA) is widely used in research and operational systems, and it is considered the current state of the art for the initialization of numerical weather prediction models. However, hybrid DA requires a separate ensemble DA to estimate the uncertainty in the deterministic variational DA, which can be suboptimal both technically and scientifically. A new framework called the ensemble–variational integrated localized (EVIL) data assimilation addresses this inconvenience by updating the ensemble analyses using information from the variational deterministic system. The goal of EVIL is to encompass and generalize existing ensemble Kalman filter methods in a variational framework. Particular attention is devoted to the affordability and efficiency of the algorithm in preparation for operational applications.

## 1. Introduction

In ensemble data assimilation (DA), defining a proper way to update ensemble members when observations are available has been a constant research topic during the two last decades. Starting with the early work of Evensen (1994) about the ensemble Kalman filter (EnKF), a first correction has been proposed by Burgers et al. (1998) and Houtekamer and Mitchell (1998) to ensure that an appropriate ensemble spread is maintained: the perturbation of observations assimilated by each ensemble member. A problem with this solution noticed in Pham (2001) is the correlation between these perturbations and the prior ensemble perturbations for a limited ensemble size, which is not desirable. To avoid perturbing observations, the solution used in most subsequent filters is to assimilate unperturbed observations in order to update the ensemble mean and then to apply a linear transformation to the perturbations of ensemble members about the mean. Within this general class of so-called square root filters, we can cite the ensemble transform Kalman filter (ETKF) of Bishop et al. (2001), corrected by Wang et al. (2004); the ensemble square root filter (EnSRF) of Whitaker and Hamill (2002); and the ensemble adjustment Kalman filter (EAKF) of Anderson (2003). A theoretical comparison of square root filters in a unified formalism can be found in Tippett et al. (2003).

Variational methods can also be used to update ensembles. In the ensemble of data assimilations (EDA) of Fisher (2003), a minimization is performed for each ensemble member to assimilate perturbed observations, as in the EnKF. Conversely, in the maximum likelihood ensemble filter (MLEF) of Zupanski (2005), the best estimates are computed with a variational method, and then ensemble perturbations are updated with an ETKF-like method.

In recent years, the use of ensembles within a variational framework has also been significantly improved via the ensemble variational (EnVar) method of Lorenc (2003) and Buehner (2005). The EnVar framework is very effective in using localized covariances from an ensemble, potentially hybridized with static covariances, in a variational cost function, but its only way of updating the prior ensemble perturbations is via a costly EDA running an EnVar for each member, with perturbed observations. As a consequence, two separate systems are normally used, as illustrated in Fig. 1a. Two main issues arise with this type of dual system: First, there is a lack of consistency. Ensemble DA perturbations are supposed to sample variational DA background errors, while both systems rely on different assumptions, different algorithms, and sometimes use different observations. Second, two systems have to be maintained and improved instead of one, requiring a larger workforce.

This paper proposes a method to update both the deterministic and the ensemble backgrounds consistently, within a single system, as illustrated in Fig. 1b. This new technique called ensemble–variational integrated localized (EVIL) data assimilation is based on an EnVar. The Ritz pairs, a free by-product of the deterministic minimization with a Lanczos algorithm, are used to update the ensemble perturbations in a single step, similarly to the sequential filters. Contrary to the EDA, only one minimization has to be performed, but the benefit of variational methods, especially for high-dimensional systems, is retained.

The EVIL method relies on the ability of a limited set of Ritz vectors to represent the update from the background error covariance to the analysis error covariance , that is, to capture the impact of the whole observation system on the error covariance. The required size of such a set to get an accurate update of perturbations is a crucial point to estimate the real cost of EVIL. Indeed, the more Ritz pairs are needed, the more iterations have to be performed in the deterministic minimization. Using a set of vectors to approximate the Hessian of the cost function has already been tested by the European Centre for Medium-Range Weather Forecasts (ECMWF) during the development of a reduced-rank Kalman filter (Fisher 1998; Fisher and Andersson 2001; Beck and Ehrendorfer 2005). A common conclusion of these studies is that the procedure works as long as a “significantly large” number of vectors is used. Simple experiments show that the minimum number of Ritz pairs required to get a sufficient correction on the ensemble is case dependent, even if some general features can be analyzed.

In section 2, the variational framework is summarized to support the development of EVIL provided in section 3. Section 4 is focused on a general comparison between EVIL and sequential filters. Preliminary experiments are detailed in section 5 to assess the impact of the localization and the convergence behavior of EVIL, followed by a discussion and conclusions in section 6.

## 2. Variational framework

### a. Objectives and notations

EVIL relies on an approximate eigendecomposition of the Hessian matrix of the cost function in order to compute sequential filters for the ensemble. Thus, reminders about the variational framework are useful before detailing the practical implementation of EVIL.

Standard notations are used throughout this section:

The background state is denoted , and the background error covariance matrix is .

Observations are gathered into a vector , and the observation error covariance matrix is .

The analysis state is denoted , and the analysis error covariance matrix is .

The observation operator

*H*can transform a model state into observation space. Its Jacobian matrix is denoted and can be approximated only in practice.

### b. Preconditioned incremental cost function

In variational DA, the analysis is obtained via the minimization of a potentially nonquadratic cost function (**x**). The multi-incremental approach (Courtier et al. 1994) overcomes the difficulties due to the nonlinearity of operators used in the cost function (**x**) by minimizing successive quadratic approximations *J*(*δ***x**) instead. For simplicity, we show only the equation for the first increment, with *δ***x = x − x**^{b} (for later increments the equation is modified to include the difference between the latest guess and the background):

where **d** = **y** − *H*(**x**^{b}) is the residual vector.

The cost function (1) shows two main obstacles:

The application of the inverse of the background error covariance matrix to a vector is required, which can be hard to compute.

The conditioning of the cost function is often poor; the eigenvalues of the Hessian matrix

**∇∇**^{T}*J*(*δ***x**) are not clustered, which makes the minimization convergence very slow.

A solution to overcome these issues is to apply a control variable transform (CVT; Lorenc 1988). The new control variable **v** is defined such that *δ***x** = **v**, where is a square root of ( = ^{T}). This operation, also known as square root preconditioning, leads to a new cost function , which is defined as

If is not square and invertible, then (**v**) ≠ *J*(**v**) for a random **v**. However, Ménétrier and Auligné (2015b) have shown that it was not an issue as long as **v** remains in a certain manifold, in which the minimum lies and most minimizers are actually working.

In the EnVar framework, can be built as a linear combination of several terms, either dynamically based, via a parameterization calibrated by the ensemble, or fully ensemble based, with a localization of raw covariances. In this hybrid formalism, the control vector **v** is split into a section **v**^{c} associated with the dynamically based covariance matrix of square root and a section **v**^{e} associated with the localized ensemble-based covariance matrix of square root :

where and are the hybridization weights. Equivalent formulas for the square root of a localized ensemble-based covariance matrix were proposed by Lorenc (2003) and Buehner (2005). They will be detailed in section 4 [(50) and (51)]. As shown in Wang et al. (2007), the equivalent hybrid covariance matrix is given by .

### c. Analysis error covariance in preconditioned space

Minimizing the quadratic cost function (2) is equivalent to a linear analysis in the Kalman filtering framework. The analysis error covariance matrix is then given by

where

is the Kalman gain. The matrix can be developed and factorized by on the left side and ^{T} on the right side to get

where

is the analysis error covariance matrix in the preconditioned space. Given a square root of **Π** denoted , a square root of can be expressed from (6) as

### d. Eigendecomposition

Interestingly, **Π** can be transformed using the Woodbury matrix identity (Woodbury 1950) to get the inverse of the Hessian matrix of the cost function (2):

Since the Hessian matrix is real symmetric positive definite, it can theoretically be diagonalized in a unitary basis:

where is a unitary matrix

and is a diagonal matrix whose diagonal coefficients are positive. The eigendecomposition of **Π** is given by

and its symmetric square root is given by

### e. Ritz pairs

The Lanczos algorithm can be used to minimize the cost function (2), as shown in Desroziers and Berre (2012), El Akkraoui et al. (2013), or Gürol et al. (2014). Moreover, this algorithm provides a limited set of approximate eigenpairs called the Ritz pairs, which are approximations of the Hessian matrix extremal eigenpairs. The standard Lanczos algorithm for this kind of symmetric problem is detailed in Golub and Van Loan (1996, 470–498). In the DA context, a description of Ritz limited memory preconditioners can be found in Tshimanga et al. (2008).

After *q* iterations, *q* orthogonal Ritz vectors forming the columns of are available, corresponding to *q* Ritz values filling the diagonal matrix .

### f. Direct and alternative expressions

It should be noted that since is unitary, alternative expressions can be found for (10), (12), and (13), respectively:

However, if eigenpairs {} are approximated by the Ritz pairs {}, then this equivalence does not hold. More precisely, if we denote {}, the set of eigenpairs that has not been computed yet at iteration *q*, then the subspaces spanned by and are orthogonal. Equation (10) can be rewritten as

and the difference between direct and alternative expressions can be analyzed as follows:

For direct expressions, approximations are progressively built starting from scratch for

*q*= 0. Implicitly, it assumes that , so that the second term in (17) vanishes.For alternative expressions, approximations are progressively corrected starting from the identity matrix for

*q*= 0. Implicitly, it assumes , so that the second term in (17) is equal to

Depending on the case, direct or alternative expressions can be chosen for the practical implementation of EVIL filters, as detailed further in this paper.

## 3. Implementation of EVIL filters

### a. Objectives and notations

Within the variational framework detailed in the previous section and using the Ritz pairs obtained from the deterministic minimization, EVIL can encompass and generalize several widely used sequential filters (EnKF, ETKF–MLEF, and EAKF–EnSRF) for more complex background error covariance models. In this section, high-dimensional systems, for which the affordable number of Ritz pairs is several orders of magnitudes smaller than the size of the control vector (*q* ≪ *m*), are considered to derive their practical implementation.

The following notations for ensembles are used throughout this section:

Ensemble backgrounds {} and analyses {} are numbered with an index

*k*varying between 1 and*N*as well as explicitly perturbed observations {}.Ensemble covariance matrices for backgrounds and analyses are respectively denoted and .

Three specific implementations of the same concept are now detailed: stochastic EVIL (S-EVIL), deterministic EVIL (D-EVIL), and resampled EVIL (R-EVIL).

### b. Ensemble Kalman filter: S-EVIL

The EnKF proposed by Houtekamer and Mitchell (1998) uses perturbed observations {} to update the ensemble backgrounds {} into ensemble analyses {}:

The ensemble-mean equation is given by

Assuming that the observation operator can be linearized,

we can subtract (22) from (21) and normalize the result by *N* − 1 to get the normalized perturbations:

In the original formulation of the EnKF, is purely sampled from the ensemble to compute the Kalman gain ( = ), but in general some processing of covariances is applied (e.g., localization and inflation). At optimality, the Kalman gain could be expressed as . The analysis error covariance matrix can be approximated with the Ritz pairs using the direct expression . Thus, the analysis for *q* = 0 is given by

which means that the background perturbations are kept unchanged if no observation is assimilated. This useful property is lost if the alternative expression was used to approximate . In this case, the posterior spread could even be larger than the prior spread, which is not consistent. Thus, the update equation for this stochastic EVIL or S-EVIL is given by

### c. Square root filter: D-EVIL

Unlike the EnKF, which has a stochastic component coming from the observations perturbation, the square root filters are deterministic. In the ETKF of Bishop et al. (2001), corrected by Wang et al. (2004), and the MLEF of Zupanski (2005), the matrix is postmultiplied by to get :

The Kalman filter equation for the analysis error covariance [(4)] is adapted in the ETKF case, where the ensemble analysis covariance matrix should verify as

Equivalently, should verify as

Conversely, in the EnSRF of Whitaker and Hamill (2002) and the EAKF of Anderson (2003), the matrix is premultiplied by to get :

where is also defined in such a way that (28) is verified. A synthesis about square root filters can be found in Tippett et al. (2003). If the ETKF is specifically designed to be unbiased, then it is equivalent to the EAKF, as shown in Livings et al. (2008).

Actually, the ETKF (27)–(29) can be seen as a particular case of (7)–(8) obtained in the variational framework. In this precise case, the general background error covariance matrix is reduced to the raw ensemble background covariance matrix , its square root is reduced to , the square root of the analysis covariance matrix is reduced to , and the matrix is reduced to .

The implementation of a square root filter with EVIL is based on the definition of a matrix that should verify as

Because it depends on the role of in the modeling of , finding such a matrix is not always possible. For a raw ensemble covariance matrix (without localization or hybridization), . Simple forms also exist for if the spatial localization is represented spectrally (e.g., Buehner 2005) or using recursive filters (e.g., Wang et al. 2008). As long as such a can be defined, the ensemble update is given by

The square root of the analysis covariance matrix can be approximated with the Ritz pairs using the alternative expression . In this case, the analysis for *q* = 0 is given by

This desired property is not verified if the direct expression is used to approximate . Thus, the update equation for this deterministic EVIL or D-EVIL is given by

### d. Resampling method: R-EVIL

The square root of the analysis error covariance matrix can also be used to resample the posterior perturbations from random vectors of size *m*, drawn from a centered and normalized distribution (zero mean, unit variance). If we define the matrix as

then the updated equation for this resampled EVIL or R-EVIL is given by

This method could be also interesting for ensemble forecasting. Whatever the number of prior perturbations (even if a static is used), R-EVIL can generate as many consistent posterior perturbations as needed to initialize ensemble forecasts.

### e. Implementation of S-, D-, and R-EVIL in preconditioning

#### 1) Full preconditioner

We define two matrices and by

Using the eigenpairs {} of given in (10), and the fact that (Ménétrier and Auligné 2015b), a few lines of calculus show that

{} are the eigenpairs of , and

{} are the eigenpairs of .

The Lanczos procedure detailed in El Akkraoui et al. (2013) uses a full preconditioning, initially developed by Derber and Rosati (1989) within a conjugate gradient algorithm. The same iterates and same estimations of extremal eigenvalues are obtained with full and square root preconditionings. However, with the full preconditioning, we obtain

the Ritz pairs {} approximating the eigenpairs {}, and

the Ritz pairs {} approximating the eigenpairs {}.

The approximate eigenvectors are linked to via

Thus, columns of are orthonormal, columns of and are orthonormal, and columns of are orthonormal:

#### 2) Restricted preconditioner

We define two matrices and by

Using the results of the previous subsection, it is straightforward to show that

{} are the eigenpairs of , and

{} are the eigenpairs of .

The Lanczos procedure detailed in Gürol et al. (2014) uses a restricted preconditioning, which produces the same iterates (although in observation space) as a full preconditioning. However, with the restricted preconditioning, we obtain

the Ritz pairs {}, approximating the eigenpairs {}, and

the Ritz pairs {}, approximating the eigenpairs {}.

The approximate eigenvectors are linked to and via

Matrices and require less memory than and if the number of observations *p* is lower than the state size *n*. However, extra multiplications by , its transpose, and are required, as shown in the following paragraph.

#### 3) Summary of EVIL transformations

EVIL expressions are transformed to make use of the Ritz vectors produced by the Lanczos procedure with full and restricted preconditioning.

## 4. Differences between sequential filters and EVIL

### a. Localization: A concept with multiple meanings

Localization was originally designed for sequential filters and aims at restraining the influence of an observation on the analysis increment to a given radius. Indeed, covariances between an observation and grid points beyond this radius are considered too affected by sampling noise and therefore do not carry reliable information from the observation to the increments. However, localization seems to be a double-edged sword. On one hand, it is an essential tool to filter sampling noise in prior covariances to avoid spurious increments. Moreover, localization also increases the dimension of the subspace in which the analysis increment is constrained to lie. On the other hand, if localization is too tight, increments are not broad enough and larger scales are not accurately analyzed, destroying some of the balances between variables. This emphasizes the need to objectively and carefully choose localization.

In practice, localization is a common word used in three different contexts:

For postmultiplication square root filters (ETKF, MLEF), the usual way to apply localization is to split the domain into small subdomains and to perform independent analyses for each one. Thus, observations inside a given subdomain do not affect other subdomains analyses. The local ensemble transform Kalman filter (LETKF) algorithm of Hunt et al. (2007) provides a proper way to deal with observations close to the subdomain border and has become very popular. Even if the division into subdomains allows for a straightforward parallelization, it also produces a discontinuous analysis where larger scales are sometimes poorly analyzed and encounters difficulties with nonlocal observations that would span over several subdomains.

- For premultiplication square root filters (EnSRF, EAKF) and for the classical EnKF, the domain localization is also possible, but a Schur product (element by element) between a localization matrix and the prior covariance matrix is generally preferred: However, an approximation of the effect of localization is used in practice, assuming that observations are local enough: where is a localization matrix between model and observation spaces, and is a localization matrix within observation space. In general, it is very complicated to ensure the consistency between , , and . Another issue with this formalism is the need for different localization matrices for various observation types, requiring methods to compute adaptive localizations (Anderson 2007; Bishop and Hodyss 2009; Anderson 2012; Anderson and Lei 2013).
- In the EnVar formalism, the ensemble increment is defined as where is the subsection of the so-called alpha control variable corresponding to the
*k*th ensemble member, which is related to the subsection of the control variable**v**via a smoothing operator : Equations (50) and (51) define a matrix such that . Lorenc (2003) and Buehner (2005) have shown that if is the square root of , then is a square root of as defined in (48). Thus, localization in the EnVar does not explicitly depend on observation types and can be objectively diagnosed from the ensemble with methods developed in Ménétrier et al. (2015a,b). Moreover, adding a hybrid term in this framework is trivial, as shown in (3). EVIL is based on this formalism, which might be a scientific and technical advantage compared to other square root filters (Ménétrier and Auligné 2015a).

### b. Consistency of deterministic and ensemble analyses

Recently, several operational centers have started to use ensemble perturbations in a variational framework, as illustrated in Fig. 1a. This can be achieved via the EnVar formalism (Lorenc 2003; Buehner 2005), as in Clayton et al. (2013), Buehner et al. (2013), Wang et al. (2013), or via a parameterization of as in Varella et al. (2011) and Bonavita et al. (2012). The ensemble perturbations are generally taken from an independent EnKF or a square root filter in the first case and from an EDA in the second.

As mentioned in the introduction, the ensemble DA algorithm usually used to update perturbations is not consistent with the variational system in the sense that their effective gain matrices (even not explicitly computed) are different. If instead an EDA is used, then in practice to reduce cost it is usually run with a reduced resolution. Thus, the prior (respective posterior) perturbations (respective ) do not correctly sample the deterministic background (respective analysis) error. Conversely, the Ritz pairs used to update perturbations in EVIL are computed within the minimization, which ensures consistency between deterministic and ensemble analyses. This consistency means that posterior perturbations are enhanced with the information coming from the full modeling of , regardless of the method used for this task: localization, hybridization, multiresolution ensembles, and so on. Concretely, EVIL posterior perturbations can benefit from both ensemble and climatological components of in a hybrid formalism, whereas the posterior perturbations of an EnKF or a square root filter can only benefit from the ensemble covariances, which can be poor if the ensemble is small.

### c. Observation handling

The handling of observations is another important difference between variational and sequential algorithms, as detailed in Lorenc (2003).

As mentioned in the previous subsection, the localization in variational systems is defined in model space, contrary to other schemes. This is very convenient to get a consistent localization when nonlocal observations are used, such as satellite radiances or radar reflectivities.

To deal with the nonlinearity of the observation operator *H* (potentially including the forecast model in 4D applications), variational methods use the multi-incremental technique of Courtier et al. (1994) or a simpler version that does not require to rerun the model (Lorenc et al. 2000). In the case where several inner loops are performed, it is not clear yet whether EVIL should use Ritz pairs

from all inner loops, which would require an expensive reorthogonalization of Ritz vectors, or

from the last loop only, which might explore fewer direction, the start of the last inner loop being closer from the minimum.

The detailed study of this issue will be the subject of a future work, a single inner loop being used in the following experiments.

Finally, it should be noticed that the observation operator *H* is applied only in the deterministic minimization for D-EVIL and R-EVIL, whereas it is also applied to each ensemble member for S-EVIL, similarly to the EnKF scheme.

## 5. Preliminary experiments

### a. Experimental setup

#### 1) Input data

We have chosen to test EVIL in a simplified framework, with a noncycled data assimilation using a 50-member ensemble from a real NWP system (GSI–EnKF). To keep the problem simple enough, we have extracted slabs of temperature at a single vertical level around 800 m. These perturbations, coming from ARW simulations at 20-km resolution over the continental U.S. (CONUS) domain (~60 000 grid points), are used to define the background error and the prior perturbations. A set of 260 synthetic observations are assimilated. A 2D EnVar is performed with a Lanczos algorithm, using a pure ensemble background error and an isotropic and homogeneous Gaussian localization function for all the experiments:

where *r* is the distance, and is the localization length scale set at 500 km. In practice, we apply this localization via spectral transforms, but it also could have been applied with other widely used methods (e.g., recursive filters or diffusion) without altering the results significantly.

#### 2) Advantages and limitations

The main advantage of this experimental design is to keep the system simple enough for early testing but complex enough with its heterogeneous and anisotropic ensemble covariances. Figure 2 shows that prior perturbations’ variances are heterogeneous over the domain, with large-scale structures over Pacific and Canada and small-scale structures related to convection over the central plains of the United States. Figure 3 shows the correlation functions with the center of each tile. Correlations of the prior perturbations are also heterogeneous and anisotropic, with a strong land–sea contrast for instance.

This experimental setup has some limitations. Since the system is not cycled, the cumulating effects of cycling are not studied. From a general point of view, this kind of system is still very simple compared to the complexity of a real NWP system. It is not relevant to compare algorithms performances at this point, only their behaviors.

### b. Impact of localization

To clearly assess the effects of localization, observations are concentrated inside a disk in the middle of the domain, as illustrated in Fig. 4. This observation network is referred to as a single disk network.

#### 1) Ensemble perturbations increment

Following the EVIL expressions in section 3, the posterior perturbations matrix is obtained if no iteration is performed (*q* = 0) and is equal to

the prior perturbations for S-EVIL and D-EVIL , or

a resampling of the prior perturbations for R-EVIL .

The Frobenius norm of is given in Fig. 5 to assess the effect of the number of iterations *q* on . Thus, it should be noticed that the resampling step for R-EVIL is not visible in this figure. Also, the same random sample **Ξ** is used for [(36)]. It appears that most of the correction on initial perturbations is built at the beginning of the minimization for all algorithms; the following iterations bring minor refinements only. As expected, the EnKF and the EnSRF have results that are close to S-EVIL and D-EVIL at full convergence, respectively.

The fact that all algorithms do not converge to the same solution is not surprising. Indeed, the theory imposes that all algorithms converge to the same posterior covariance matrix [Kalman filter equation (4)] if no localization is applied (and if the ensemble size grows to infinity for the EnKF), but the individual members can be different. However, if a localization is applied, this property vanished, as illustrated in the appendix. Thus, the increments computed by various methods can be significantly different.

To illustrate the spatial features of ensemble perturbation increments, the first column of is illustrated at full convergence ( = 260) in Fig. 6. The increments are clearly limited by the localization within an area around the observations for S- and D-EVIL. For R-EVIL, the effect of resampling is obvious, with an incremental spread over the whole domain. However, if we look at the first column of , we can see that the successive corrections to the initial perturbations are also localized for R-EVIL.

#### 2) Posterior variance

The posterior variance is computed at each grid point and each iteration *q*. Figure 7 shows the horizontally average variance ratio . As expected, this ratio decreases as the iterations go along. Consistent with the results of the appendix, the algorithms ordered by decreasing posterior spread ratio are R-EVIL, D-EVIL, EnSRF, S-EVIL, and EnKF. Whereas S-EVIL and EnKF are similar in the appendix, the EnKF is more underdispersive here because the observations are assimilated serially.

To estimate the distribution of the variance decrease resulting from the assimilation of observations, the ratio is displayed in Fig. 8 for the three EVIL algorithms. For R-EVIL, the ratio is also displayed. It appears that for all algorithms, the variance is decreased around the observed area. This decrease is stronger and more spatially spread for S-EVIL and D-EVIL than for R-EVIL. For the latter, the variance after the resampling step is almost equal to the prior variance , so there is no significant difference between the two bottom panels of Fig. 8.

#### 3) Posterior correlations

To assess the impact of localization on posterior correlations, localization length scales of 500 and 100 km are applied in Fig. 9. Consistent with the results of the appendix, the posterior correlations of R-EVIL are localized, even if some very weak resampling noise is visible outside of the localization radius. For S-EVIL and D-EVIL, the localization of prior covariances has almost no impact on posterior correlations.

#### 4) Summary

The main results of this experiment can be summarized as follows:

The total increment is localized for S- and D-EVIL (as well as for the EnKF and EnSRF) but is spread over the whole domain for R-EVIL.

The variance decrease is localized for all algorithms.

The posterior correlations are not localized for S- and D-EVIL (as well as for the EnKF and EnSRF), but they are for R-EVIL

Thus, R-EVIL has a very different behavior from other usual schemes. A potential positive aspect is the localization of posterior covariances, which is consistent with the Kalman filter equation (4). Potential issues are the possible loss of the signal if the localization is too tight and a balance breakdown due to the resampling step.

### c. Convergence behavior

As mentioned in the introduction, some methods trying to represent the Hessian matrix from a limited set of vectors (e.g., reduced-rank Kalman filter) have reached the conclusion that a significantly large number of vectors was necessary to reach success. The question is if such a significantly large number of vectors (and thus of iterations) can be reached in operational systems at an affordable cost. The answer is definitively complex and application dependent, but a few indications can be drawn from the following simple experiments.

#### 1) Posterior variance for various observation networks

The same experiment as in the previous subsection is run with modified observation networks:

a single disk network with twice as many observations (520);

a single disk network with half as many observations (130);

a double disk network with as many observations (260), the west disk being more densely observed than the east disk (see Fig. 10); and

a uniform network with as many observations (260).

Figure 11 shows the average variance ratio for the D-EVIL method as a function of the number of iterations for every network. Obviously, the maximum number of iterations depends on the number of assimilated observations.

This experiment clearly shows that the final average variance ratio depends on the observations’ number and their distribution. In our case, for the same number of observations, decreasing posterior variances are obtained for the single disk, double disk, and uniform network. However, it should be noted that this result also depends on the prior variance distribution.

To assess the rate of convergence of the posterior variance, it is interesting to compute how many iterations are required to reach 90% of the variance ratio decrease, denoted . Table 1 shows that is not a linear function of the number of observations and also strongly depends on the network distribution.

#### 2) A spectrum-dependent problem

where we used the definition of [(37a)]. Thus, the impact of observations on the error covariance is given by . The weight of each eigenvector (column of ) in this correction is driven by the corresponding eigenvalue in **Θ**; large eigenvalues generate an important contribution, whereas eigenvalues close to 1 have little influence.

As a consequence, eigenvalues significantly larger than 1 are very important to get an accurate update from to from a limited number of approximate eigenpairs. Thus, the shape of the eigenspectrum is crucial to know if EVIL can provide a relevant posterior ensemble from a limited number of Ritz pairs. As an illustration, Fig. 12 shows the eigenspectrum of obtained at convergence for the single disk, double disk, and uniform networks with 260 observations. It appears that the uniform network leads to a larger proportion of eigenvalues that are significantly higher than 1. This property can be directly linked to the higher ; more eigenvalues (and thus more iteration) are required to get a significant part of the error covariance correction.

#### 3) Heterogeneous observation networks

An interesting feature can be analyzed from the double disk network. In this case, the west disk is more densely observed than the east disk, with 190 and 70 observations, respectively. In Fig. 13, we have split the variance ratio for the west and east parts of the domain. It appears that the variance ratio decreases faster in the densely observed side ( = 110) than in the sparsely observed side ( = 180). This is a confirmation of a well-known property of the Lanczos procedure; the earliest iterations represent the corrections to the error covariances in the data-dense areas, the data-sparse area being corrected by further Ritz pairs. A confirmation of this fact can be found in Fig. 14, where the first column of is illustrated after 20, 30, and 40 iterations and at full convergence ( = 260). After 20 iterations, the main structures are already built in the data-dense area, whereas no significant signal can be found in the data-sparse area. At least 30 iterations are required for some increment to emerge in the data-sparse area. The panel at 40 iterations confirms that building the increment takes more iterations in the data-sparse area (further from convergence) than in the data-dense area (closer to convergence).

In a real NWP data assimilation system, satellite data might be an issue for EVIL. Indeed, these observations are abundant but weakly informative individually and uniformly distributed over large areas where conventional observations are absent. We can reasonably assume that they will flatten the eigenspectrum of the Hessian matrix and thus increase the number of iterations required to decrease the posterior spread in EVIL.

## 6. Discussion and conclusions

Variational and ensemble data assimilation are two powerful techniques to enhance the quality of initial conditions in NWP systems. The convergence of both techniques has started with the pioneering work of Lorenc (2003) and Buehner (2005), and this paper proposes to carry on in this direction. With EVIL, a single system can handle both deterministic and ensemble analyses. This reduces development and running costs, while maintaining the consistent use of observations and analysis equation needed for a good representation of errors (Berre et al. 2006).

The background error covariances in an NWP data assimilation system using an EnKF-based method are affected by major inaccuracies due to its truncated representation by an affordable ensemble, which is very small compared to the state space, and to the lack of understanding about the sources and detailed structure of model errors and other system errors in the data assimilation scheme. To counter these, it is usual to modify them using somewhat ad hoc inflation schemes, localization methods, and hybridization with a static covariance. The analysis error covariance from an EnKF or EVIL has to be modified by all these approximate steps before it becomes the background error covariance for the next cycle. So there is little benefit from an analysis error covariance estimate representing details that would be lost in subsequent steps.

Much of the useful information in a covariance approximation comes from the flow-dependent growth of some perturbations. For example, usable background error covariance estimates can be obtained with an error breeding method (Toth and Kalnay 1997) that has no knowledge of the analysis error covariance and uses an ad hoc rescaling of perturbations. An EVIL scheme that represents those aspects of the analysis error covariance that can be built from a small ensemble might be expected to provide an ensemble background estimate that is much better than an error breeding scheme and perhaps as accurate as an EnKF scheme. Moreover, it achieves this as a by-product of an EnVar method that is designed without compromise to give the best analysis, about which the ensemble is centered.

Interestingly, the behavior of posterior covariances for the resampling version R-EVIL is very different from what happens with S-EVIL and D-EVIL when a localization is applied. The former has an asymptotic posterior covariance matrix consistent with the Kalman filter equation (4) but generates nonlocalized increments because of its initial resampling. Conversely, the latter generate localized increments, but their asymptotic posterior covariance matrices are not consistent with the Kalman filter equation (similarly to serial ensemble Kalman filters). However, noting the discussion earlier in this section, trying to verify the Kalman filter equation (4) might not be ideal. It could be more interesting to seek a posterior ensemble that, when localized, samples of (4). For this and other reasons, the relative quality of these algorithms will depend on the whole configuration (model, resolution, observations, ensemble size, etc.), and their objective evaluation is beyond the scope of this work. For all EVIL algorithms, the advantages for operational systems are significant. Indeed, they allow for a straightforward transfer of all benefits from variational methods to ensemble data assimilation. The implementation of EVIL in an existing variational system is rather simple as long as the minimization already provides Ritz pairs.

However, an important issue is the number of Ritz pairs (i.e., of iterations) required to provide a high-quality update of the prior ensemble into a posterior ensemble, especially for high-dimensional problems such as operational NWP systems. Several factors, such as prior ensemble distribution, the observation network size, distribution, and DFS, can influence the eigenspectrum of the Hessian matrix, whose shape is directly linked to the number of Ritz pairs required for EVIL. Thus, the affordability and applicability of the method fundamentally depends on the considered data assimilation system. Two rather simple techniques could be used to tackle the convergence issue:

EVIL can be performed on local domains via tiling of the model domain (primal space) or local batches of observations (dual space). The extreme case would have tiles so small that EVIL converges to a LETKF (Hunt et al. 2007). The potential advantage of EVIL is the absence of a hard connection between the localization length scale and the tiling as in the LETKF.

EVIL can be performed on multiple perturbed right-hand sides to increase the number of Ritz vectors per iteration. An obvious candidate for this is the block Lanczos algorithm (Montgomery 1995).

Real case testing and ongoing work to improve the method when too many Ritz pairs are required will be the subject of a future paper.

## Acknowledgments

Authors would like to thank Chris Snyder (NCAR), Yann Michel (CNRM-GAME/Météo-France), and Marc Bocquet (CEREA/INRIA) for numerous fruitful discussions during the development of EVIL and Luke Peffers (US Air Force) for his early contribution. Michelle Menard (NCAR) is also thanked for her careful proofreading. This work has been partially funded by the Air Force Weather Agency.

### APPENDIX

#### Localization Impact on Posterior Covariances

##### a. Analytical framework

We consider a two-variable system (*x*_{1}, *x*_{2})^{T}, for which *x*_{1} only is observed. We assume that the ensemble size *N* is infinite so that there is no sampling noise in sample covariances. The prior covariances, localization, observation operator, and observation error covariances are respectively defined as

where 0 ≤ *L* ≤ 1. The localization can be applied to to give the background error covariance matrix:

The Kalman gain is given by

and the posterior covariance matrix for the Kalman filter given by (4) becomes

##### b. Posterior covariance matrices

The posterior covariance matrices can be computed for all algorithms:

##### c. Impact on variances

As expected, the variance at the observation point decreases because of the assimilation by a similar amount for all algorithms, which does not depend on the localization. The variance at a distant point from the observation is also decreased, but the decrease amplitude depends on the algorithm and on the localization applied, as shown in Fig. A1a. We notice that the decrease is the same for all algorithms if no localization is applied. The decrease vanishes if a localization of zero is applied, since no information is transmitted from the observation to the distant grid point. Between those extreme values, we have

where and are respectively the prior and posterior variances at a distant point from the observation. We can conclude that the EnKF/S-EVIL, EnSRF, and D-EVIL are always less dispersive than the Kalman filter/R-EVIL. It should be noted that other sources of over- or underdispersion might affect the system in a cycled case (e.g., representation of model and observation error). The effect noted here is intrinsic to the algorithm.

##### d. Impact on covariances and correlations

Figure A1b shows that the covariance between the observation point and a distant point decrease during the assimilation by a similar amount for all algorithms if no localization is applied (*L* = 1). However, for lower values of localization, we have

where and are respectively the prior and posterior covariances between the observation point and a distant point. An interesting feature appears for the Kalman filter/R-EVIL: if a localization is applied in the background error covariance matrix, then it is also applied to the posterior covariance matrix. This fact can be checked by expanding (A4) for :

Conversely, the localization has no impact on posterior covariances for the EnKF/S-EVIL. Surprisingly, an “antilocalization” impact is observed on the EnSRF and D-EVIL, where posterior covariances are relaxed toward prior covariances if a localization is applied.

## REFERENCES

*Proc. ECMWF Seminar on Recent Developments in Data Assimilation for Atmosphere and Ocean*, Reading, United Kingdom, ECMWF, 45–63.

*Matrix Computations*. John Hopkins University Press, 728 pp.

*Advances in Cryptology—EUROCRYPT ‘95*, L. C. Guillou and J.-J. Quisquater, Eds., Lecture Notes in Computer Science, Vol. 921, Springer, 106–120.