## 1. Introduction

The problem of estimating the background error statistics is an important issue in the ensemble filtering and hybrid data assimilation algorithms that employ ensembles for error analysis and propagation. Increasing the accuracy in estimating the background error statistics remains a scientific and technical challenge, because the (co)variance estimates have to be drawn from a relatively small number of samples contaminated by the noise of diverse origin.

A particular type of background error covariance (BEC) estimation technique employs an ensemble of assimilations (e.g., Fisher 2003; Berre et al. 2006) to assess the covariance structure from the ensemble average. Because of computational limitations, ensemble size rarely exceeds 100 members in practice, thus limiting the accuracy of the straightforward averaging approach because of the significant level of sampling noise. The impact of sampling noise on the accuracy of the BEC estimates has been addressed by Houtekamer and Mitchell (1998) and Hamill et al. (2001) and led to the development of the filtering techniques based on the Schur product of the sample correlations with the heuristic filters (localization operators). This approach tends to localize covariances in physical space and suppresses long-range correlations, whose accuracy is most affected by the sampling noise (e.g., Houtekamer and Mitchell 2001; Buehner 2005).

In the last decade, the localization techniques have been under rapid development in several directions with the major objective to relax the spatial homogeneity assumption underlying the original scheme. In particular, Fisher (2003), Deckmyn and Berre (2005), and Pannekoucke et al. (2007) utilized a wavelet approach to account for inhomogeneities in the covariance structure; Wu et al. (2002) and Purser et al. (2003) employed recursive filters to localize the covariances; Weaver and Courtier (2001), Pannekoucke and Massart (2008), and Weaver and Mirouze (2012) used a closely related diffusion operator approach; and Pannekoucke (2009) explored a hybrid scheme, featuring wavelet technique in combination with the diffusion method, while Anderson (2007) employed a sampling error approach to derive localization from multiple ensembles in the framework of the hierarchical ensemble filter technique. In the oil and gas exploration industry, anisotropic localization functions were derived by combining the regions of sensitivity of the well data with prior geological models (e.g., Emerick and Reynolds 2011; Chen and Oliver 2010).

Another direction in the localization techniques was pioneered by Bishop and Hodyss (2007) who proposed to augment the original ensemble by including Schur cross products of the spatially smoothed ensemble members. Further development of this approach (Bishop and Hodyss 2009a,b; Bishop et al. 2011; Bishop and Hodyss 2011) demonstrated its flexibility in adapting the covariances to the 4D background flow structures, especially in the case of strongly inhomogeneous statistics. A certain disadvantage of the adaptive localization (AL) technique is a relatively high computational cost, associated with the necessity to operate with the expanded ensemble. A good review of the filtering/localization techniques was recently given by Berre and Desroziers (2010).

In this study we employ the numerical experimentation approach of Weaver and Mirouze (2012) who tested various approximations of the ensemble-generated covariance matrix by the exponent of the diffusion operator in an idealized configuration. The presented work considers four localization techniques applied to three different covariance models in a realistically inhomogeneous 2D setting. Our major focus is on comparing nonadaptive and adaptive localization methods with the techniques based on modeling sample covariance by polynomial functions of the diffusion operator. To make the comparison, we construct inhomogeneous covariance matrices

## 2. Methods of covariance localization

### a. Traditional scheme

*K*normalized error perturbations about the ensemble mean listed as columns of the

*K*×

*N*matrix

*N*is much larger than

*K*, and the sample estimate (1) always contains spurious correlations at large distances. To increase the accuracy in approximation of the BEC matrix

**x**separated by distances larger than a certain prescribed value

*d*(localization scale). Technically, such a “localized” covariance matrix

_{ℓ}is obtained as the elementwise (Schur) product ∘ of the raw sample covariance

_{d}, whose off-diagonal elements are set to zero if the distance between correlated points exceeds

*d*:

*N*−

*K*+ 1 is very large, and thus likely inconsistent with the rank of the true BEC matrix. A disadvantage of the technique is that it relies on a heuristic matrix

_{d}, which does not explicitly take into account inhomogeneity and anisotropy of the background flow which affects the BEC evolution.

### b. Adaptive methods

*J*=

*K*(

*K*+ 1)/2 is the size of the modulated ensemble. Assuming that the columns of the

*J*×

*N*matrix

_{d}:

Recent experiments with this improved AL scheme have shown its good localization properties and reasonable numerical performance (Bishop and Hodyss 2011). A certain disadvantage of the method is the numerical cost: apart from the necessity to smooth ensemble members, multiplication by *KJN* × *N* matrix, whose columns are **w**_{n} are the columns of the square root of _{d}.

### c. Modeling sample covariance

Another way of estimating the true covariance is to create its full-rank covariance model using the low-rank ensemble approximation (1). In recent years this approach, fueled by the developments in covariance modeling with the diffusion operator (e.g., Weaver and Courtier 2001; Xu 2005; Yaremchuk and Smith 2011; Yaremchuk and Sentchev 2012), has been studied by many authors (e.g., Belo Pereira and Berre 2006; Pannekoucke and Massart 2008; Pannekoucke 2009; Sato et al. 2009; Weaver and Mirouze 2012).

The idea of the approach is to parameterize the structure of the true BEC matrix by the diffusion tensor field *D ^{αβ}*(

*), which defines the positive-definite diffusion operator*

**x****∇**

_{α}

*D*

^{αβ}**∇**

_{β}.

To avoid confusion with notations, vectors and matrices in state space ℝ^{N} are denoted by the boldface roman and boldface san serif fonts, respectively. In the 2D physical space ℝ^{2} we adopt tensor notation, where vectors and matrices are boldface and italicized, Greek indices enumerate coordinates, take the values 1 and 2, and summation is assumed over repeating indices.

The operator *F* of *F* could be computed recursively and at the same time it should invert the spectrum of *F*{

*n*th-order binomial (spline) approximations:

**v**), and

**v**∈ ℝ

^{N}is the vector of rms error variances (square roots of the diagonal of

*υ*(

*) of*

**x****v**are relatively well known from the ensemble statistics as they suffer less from sampling errors than ensemble estimates of the correlations. In its turn, the correlation matrix

*F*{

**f**of

*F*{

This study employs functions *F _{e}* and

*F*for approximating the BEC matrix by selecting

_{n}*D*(

^{αβ}*) in a way that the matrix*

**x***F*{

*. In that case, the correlation matrix elements*

**D***,*

**x***) are locally homogeneous (LH); that is, they depend only on the relative position*

**y***=*

**r***−*

**x***of the correlated points*

**y***,*

**x***, and can be written down explicitly (e.g., Yaremchuk and Smith 2011):*

**y***and*

**D***r*from the diagonal is shown in Fig. 1.

**r**= 0), yields the following relationships, useful for estimation of the diffusion tensor for the models (9)–(10), respectively:

*n*< 3. Expressions (12)–(13) were obtained in the 2D Cartesian coordinates by Weaver and Mirouze (2012). Similar relationships hold for an arbitrary correlation model satisfying the conditions of local homogeneity and appropriate differentiability of the correlation function at

*r*= 0 (appendix A).

^{−1}in idealized 2D setting. The approach has a few drawbacks. First, the gradient computation tends to amplify sampling noise in the estimate of

^{−1}. The inversion of

^{−1}is also prone to error amplification. For these reasons, the technique is often supplemented by additional smoothing (Raynaud et al. 2009; Berre and Desroziers 2010; Weaver and Mirouze 2012). Second, the relationship (14) cannot be applied to the BEC models that are not differentiable at the diagonal, such as the second-order (

*n*= 2) spline model (7) in 3D, which is characterized by the exponential correlation function.

An alternative approach is to estimate the diffusion tensor directly by minimizing the difference between the ensemble estimate of the correlations in the vicinity of the diagonal and its local analytic approximations (9)–(10). This approach is likely to be more robust, as it does not involve differentiation and matrix inversion and can be formulated as a least squares problem in the space of the unknown elements of

In the following sections we compare efficiency of the four localization methods: nonadaptive (section 2a), adaptive (section 2b), and the two described above methods of retrieving the diffusion tensor from the ensemble covariances. For brevity, we will refer to the latter two methods as “differential” and “integral” diffusion localization (DL) schemes.

To explore the efficiency, we adopt the following experimentation strategy: after specifying the “true” covariance matrices

## 3. Methodology

### a. Experimental setting

Numerical experiments with simulated ensembles were performed as follows. First, the true BEC matrix was specified together with the ensemble by selecting a variance distribution **v**(**x**) and a correlation model (6)–(7) in a real oceanic domain shown in Fig. 2. The variance distribution was chosen to simulate surface temperature variations in the northern Gulf of Mexico near the mouth of Mississippi. The true distribution of

_{e}and

_{2}were computed explicitly: first, all the columns of

*F*(

*δ*functions located in every grid point of the domain. The resulting matrices were then renormalized by their diagonal elements using (8), and the true BEC matrices were then obtained by

_{e}and

_{2}are shown in Fig. 3. The maximum anisotropy is observed in the southeast corner of the domain characterized by the steepest topography. The total number of matrix elements was 4603

^{2}

^{7}.

_{e}and

_{m}were generated by

*K*×

*N*matrix, whose columns are the random vectors with

*N*= 4603

*δ*-correlated components evenly distributed with unit variance and the square root is defined by

^{1/2}(

^{1/2})

^{T}. The value of

*K*was 20 000.

The ensembles _{e} and _{m} were then used to estimate the true covariances _{e} and _{2} with the four localization techniques described in the previous section. The only exception is the differential method, which was not used with the spline model (7) because the corresponding correlation function (10) is not differentiable at the origin. In all the experiments the localization matrix _{d} was Gaussian (9) with the isotropic diffusion tensor *d* is a tuning parameter defined in the next section.

*F*{

_{e}**x**was approximated by the recursive scheme:

*λ*of

*λ*/

*n*. Similarly,

*F*

_{2}{

**x**was computed by iteratively solving the system of equations,

Computing the action of the operators *F*{^{1/2}, which was obtained by halving the number of time steps *n* in (17) and removing the square in the lhs of (18).

With the simulated ensembles in (16) at hand, the sample covariance matrices **x**_{k} randomly picked from these ensembles. Using the same samples, rms error variance fields

Given these ensemble statistics, the localized estimates of the true covariance matrix were computed with four localization techniques described in the previous section [(2), (5), and (9)–(14) for the DL estimates].

*) from sample correlations*

**x***ω*is a small vicinity of

**x**. Similar approach was tested in a less general formulation by Pannekoucke and Massart (2008) for the 2D Gaussian correlations. To minimize (19) we used the M1QN3 algorithm of Gilbert and Lemarechal (1989) that reduced the

*L*

_{2}norm of the cost function gradient by three orders of magnitude in 3–6 iterations.

To distinguish between the two DL schemes, the corresponding estimates will be labeled by the superscripts ′ and ° for the differential [(12)–(14)] and integral [(9)–(11), (19)] approaches, respectively.

*F*{

**d**= (det

^{−1/2}and

*γ*= 0.33;

_{e}*γ*

_{2}= 0.28 for the

*F*and

_{e}*F*

_{2}models, respectively (Yaremchuk and Carrier 2012).

### b. Numerical implementation

In addition to comparing the skills of the localization methods, their computational efficiencies are also compared. In practical applications, _{ℓ} and *KN* × *N* and *KJN* × *N* matrices, whose columns are **x**_{k} ∘ **w**_{n} and _{d} were computed explicitly with the analytic equation (9). At distances exceeding several localization scales the elements were set to zero to avoid senseless multiplications by the tails of the Gaussian exponent. In the numerical experiments this “cutoff” distance was set to 3*d*. The nonzero elements of the columns **w**_{n} of

To explore the impact of the ensemble size on accuracy of the localization schemes, experiments were performed with five ensemble sizes: *k* = 4, 10, 50, 200, and 1000. The respective modulated ensembles (section 2b) were computed in a different manner for various *k*. For *k* = 4 and 10 both double and triple Schur products of the raw ensemble members were used, thus creating *J*_{4} = (4 × 5)/2 + (4 × 4 × 5)/2 = 50 and *J*_{10} = (10 × 11)/2 + (10 × 10 × 11)/2 = 605 members. For *k* = 50 and 200 only the double products were used. The respective ensemble sizes were 1275 and 20 100. With *k* = 1000 only 20 000 randomly selected pairs were used to create {**x**_{j}}. The smoothing operator *d _{s}* was different from

*d*. Both

*d*and

*d*were optimized in every experiment to minimize the distance (21) from the true covariance.

_{s}The DL algorithms had additional specific features. Estimates of *l* = 30 km, then symmetrized and checked for the positive definiteness. In the case of a negative eigenvalue (a common situation for *k* = 4, 10), the tensor was discarded. The resulting gaps were filled with horizontal interpolation and smoothed again with the same scale.

When computing *ω* was a square four grid steps in size. Tensor parameters were smoothed with the same scale as has been used in the computations of

## 4. Results

### a. Skill comparison

Figure 4 compares skills [(21)] of the four localization techniques for the Gaussian covariance model as a function of the number of ensemble members *k*. The straight dashed lines provide errors for the raw variance and covariance estimates without localization. As expected, both *ρ*(*ρ*([*ρ*([

For *k* = 4, the difference between *ρ*(_{ℓ}) and *k* < 500, the adaptive scheme delivers a 2–3 times better estimate than the nonadaptive localization (NAL) technique, but this advantage disappears at *k* > 500 because of the increase of raw ensemble skill. This type of behavior has been also observed in the experiments where we kept both localization scale *d* and the smoothing scale *d _{s}* constant and equal to 100 km (i.e., did not optimize their values for a given

*k*). In that case the error curves converged at slightly larger

*k*~ 1200–1500.

The DL schemes demonstrate a significantly better performance at *k* < 20, although *n* = 10. Flattening of the curves for *k* can be explained by two factors. The first one is a certain inconsistency of the true covariance structure with the LH assumption used in the derivation of (9)–(14): Fig. 2 shows that the typical scale of variability of the diffusion tensor’s axes is compatible with their magnitude throughout the domain, and in some places (e.g., steep bottom regions in the southwest) it is even smaller than the local decorrelation scales. The second factor is associated with the violation of the LH assumption in computing the normalization factors with (20). Although (20) is capable of approximating the diagonal elements at the error level of 5%–10%, its contribution to the asymptotic error of 0.4 (Fig. 4) is not negligible. Similar observations are reported in the idealized experiments of Weaver and Mirouze (2012).

Figure 5 shows the absolute difference between the eight columns of *k* > 30 (i.e., when the variance estimation error falls below 10%; lower dashed line in Fig. 4). The impact of the diagonal approximation error is less visible when comparing covariance matrices in terms of (22), which is more sensitive to the errors in the off-diagonal elements (Fig. 6).

The degree of inhomogeneity of the true covariance can, in principle, be assessed from asymmetry of the local correlations derived from the ensemble when *k* is large enough to suppress sampling noise. When the LH assumption is satisfied with high accuracy, the correlation matrix elements satisfy (9)–(10), and therefore should be nearly invariant under the mirror transformations * r* → −

*in the vicinity of the diagonal. We checked this property for the true correlation matrices and found relatively high degrees of asymmetry (0.24 and 0.28 for*

**r**_{e}and

_{2}, respectively). In combination with 5%–8% diagonal errors, these figures may explain the asymptotic error level in approximating the true covariances by the DL schemes (Fig. 4).

Another feature observed in the experiments, is a persistently better performance of the DL methods at small ensemble sizes *k* (Figs. 4 and 6). One may assume that this property could be attributed to the fact that the DL schemes have an a priori advantage because the structure of the true covariances is already embedded into the underlying diffusion models used for approximation. To check this, we generated an alternative true covariance matrix _{n}, which was far enough from both _{e} and _{2} to eliminate this advantage (Fig. 7).

To do this, we randomly picked 1000 members from each of the ensembles _{e} and _{2}, and then generated additional 20 000 members using the adaptive technique described in section 2b. Pairs for Schur cross products were composed by randomly picking members from the two ensembles and never from one. The resulting 22 000-member ensemble was used to compute _{n} with (1). After that the columns of _{n} were additionally smoothed and renormalized to have the same variance _{e} and _{2}.

Figure 8 demonstrates that in the case of _{n} model the approximation errors of the DL schemes are still below the errors of the AL scheme when *n* < 30–40. Furthermore, the DL schemes keep being competitive in the entire range of the practical ensemble sizes (up to *n* = 150–200). We therefore may assume that better performance at small ensemble sizes in an intrinsic property of the DL method, which could possibly be explained by its enhanced ability to better capture near-diagonal structure of the correlations. However, only experiments with real assimilation systems can confirm this hypothesis.

One can notice a relatively weak performance of the AL scheme (thick gray line in Fig. 8) as compared to the case of true covariance described by the _{e} model (Fig. 4). Such a behavior can be explained by the fact that the smoothing scale *d _{s}* was the same as was used for generation of the modulated ensembles in the experiments with

_{e}. In general, adjustment of the localization scales significantly improved the approximation accuracy of

_{ℓ}and

*k*for the standard localization scheme whose optimal values of

*d*(

*k*) changed in a wide range from

*d*(4) = 30 to

*d*(1000) = 500 km. For the adaptive scheme variations of

*d*were significantly smaller:

*d*(4) = 100 to

*d*(4) = 500 km.

These figures shed some light on the role near-diagonal elements play in the overall structure of the considered covariance matrices. It appears that accurate estimation of these elements eliminates a larger portion of the error in approximation of the true covariance. To support this idea, we computed distances between the three considered covariances _{e}, _{2}, and _{n} and their approximations obtained by setting to zero all the off-diagonal elements, located farther than a certain distance *r* (measured in physical space) from the diagonal. As expected, the major portion of the error is eliminated when elements within the mean decorrelation scale are accounted for. This feature of the considered covariances partly explains the better skill of the DL schemes that are “more focused” on accurate representation of the near-diagonal structure of the covariance matrices. In addition, DL models are capable to deliver better smoothness away from the diagonal, which is essential for elimination the imbalance problems that may arise when prediction models are used with the resulting analysis (e.g., Kepert 2011).

### b. Computational efficiency

In the previous section we have shown that DL schemes appear to be competitive in accuracy with both NAL and AL techniques when the number of ensemble members *k* is relatively small. When *k* > 70 − 100, the AL scheme provides better accuracy (Figs. 4–8), but the DL method may still remain competitive up to *k* ~ 100. On the other hand, it is much less computationally expensive, because it does not require generation of the costly modulated ensemble.

The cost of localization is defined by the multiplication of the square root of the localized covariance matrix by a state vector. In the case of the NAL scheme, this product involves *M* ~ *kNn _{d}* multiplications, where

*n*is the number of nonzero elements in the column of

_{d}*J*times larger and may require significant computational resources.

The cost of implementing the DL schemes consists of two components: estimation of the diffusion tensor and multiplication by the square root of the localization matrix. The number of multiplications required to compute *D*′ at a grid point is approximately proportional to 9*k*, because local correlations have to be computed only in the nearest neighborhood of the diagonal and each computation involves *k* products of the ensemble members. Differentiation, inversion [(12)–(13)], and smoothing adds approximately 50 operations for a grid point thus giving the estimate of *M*′ *k* + 50)*N* for the overall cost of computing *D*′. The cost of multiplication by the square root of *Nn*_{*}*m*, where *n*_{*} = 9 is the number of elements in the (2D) stencil of *m* ~ 10^{2} is the number of either “time steps” in case of _{e} or the number of iterations in solving the respective linear system in the case of _{2} localization model. This brings the estimate of the total number of operations to *M*′ ~ 9(*k* + *m* + 5)*N*.

Computing ** D**′ because it involves solving a minimization problem at every grid point. In the 2D case considered, estimation of

*k*+ 20

*n*) operations per grid point where

_{o}*n*= 5 is the average number of iterations required for convergence of the minimization routine and 25 is the number of grid points occupied by the optimization subdomain

_{o}*ω*[(19)].

*n*= 49 for the number of grid points in the localization stencil, the following estimates can be obtained:

_{d}*k*≫ 1 and taking the NAL cost

*M*= 50

*kN*as a benchmark, the following estimates of the (normalized by

*M*) localization costs

*m*ranged between 120–180 for the Gaussian model and 150–300 for the spline model. Thus, for the ensemble size of

*k*= 50 both DL models appear to be computationally competitive with the NAL technique

*k*, although their accuracy tends to stagnate (Figs. 4, 6, and 8).

## 5. Conclusions

Numerical experiments with the DL schemes in a realistically inhomogeneous 2D setting have shown their competitiveness with the NAL and AL methods in terms of accuracy within the range of ensemble sizes *k* ~ 20–100 used in the data assimilation practice. For larger ensemble sizes the DL method does not give any error improvement as it reaches the limits imposed by the assumption of local homogeneity.

From the computational point of view, the DL method appears to be compatible with the NAL technique, which is in turn less expensive than the adaptive algorithms proposed by Bishop and Hodyss (2007, 2009a,b). Conducted experiments also indicate that the AL method is significantly more accurate than NAL in the case of strongly inhomogeneous covariances when the ensemble size is less than several hundred.

Comparison of the differential and integral DL schemes have shown that the differential method is 20%–50% less computationally expensive, although it appears to be somewhat less robust and accurate when applied in realistically inhomogeneous environment. An advantage of the integral approach is that it can be utilized with correlation models that are not differential at the origin.

It should be also noted that the computational efficiency of the DL schemes strongly depends on the number of iterations *m* needed to compute the action of the localization operator on a state vector. This number is controlled by the ratio of the local decorrelation scale (length of the largest principal axis of

The DL algorithms have enough room for further development along several directions. In particular, the degree of local inhomogeneity of the target covariance could possibly be assessed by monitoring dependence of spatial asymmetry of the local correlations on the number of ensemble members used for their evaluation. This information could then be blended in the cost function (19) to prevent overfitting sample correlations by the analytic model. Efficient higher-order approximations to the diagonal elements of *F*{**f** ∘ **v**. This approach can simultaneously reduce sampling errors in the variance field **v** estimates and errors associated with the LH assumption in computing the diagonal elements of *F*{

One should also keep in mind that ensembles encountered in large geophysical DA problems are likely to have more complicated structure than the simulated ensembles described by (16). In particular, real-life ensembles are often biased and they do not normally demonstrate *k*^{−1/2} error scaling for realistic ensemble sizes. On the other hand, the “true” covariance matrices are never known and can hardly be computed for real applications in the nearest future. As a consequence, the only way to compare localization techniques is to estimate their forecast skill and computational efficiency within the real DA problems. Presented results give only an indication that further studies of the DL methods are worth pursuing as they seem to be competitive with other localization techniques. A definite answer could be given only by the aforementioned experiments with real ensembles, which will be the subject of our future research.

## Acknowledgments

This study was supported by the ONR Program Element 0602435N as part of the projects “Observation Impact” and ECOVARDA. Dmitri Nechaev was supported by the North Pacific Research Board Award 828 and by the NSF Award 362492-190200-05.

## APPENDIX A

### Differentiation of the Correlation Functions

*C*(

*r*) by the subscript

*r*and the inverse of the diffusion tensor by

*R*. The second derivative of a correlation function

_{αβ}*C*(

*r*) is

*r*→ 0.

*r*→ 0 yields (12). Similar operation with the rhs of (10) shows that the second term in the rhs of (A3) is zero if

*n*> 2, whereas the first term is equal to

*n*/(2 −

*n*). Note that constraint

*n*> 2 is imposed by the condition of differentiability of the correlation function (10) at

*r*= 0.

*r*= 0 and satisfying the local homogeneity condition. Therefore, the differential method that is based on the relationship

## APPENDIX B

### Estimating Second Derivatives of the Correlation Function from Ensemble Perturbations

**x**,

**y**) ≡

_{xy}can be represented in two ways:

**x**=

**y**) under the assumption of local homogeneity

_{xy}=

_{x−y}implies that

**y**=

**x**in all the expressions involving

**r**= 0 also implies that its gradients at

**r**= 0 are zero and, therefore, two middle terms in the rhs of (B2) vanish. After taking into account the right equality in (B1) and the definition

_{xx}= 1, (B2) transforms into

## APPENDIX C

### Diffusion Tensor Model

*ν*is the square root of the local diffusion tensor (

*ν*^{T}

**=**

*ν**a*

_{0}= 15 km is the background decorrelation scale,

*aa*

_{0}is the square root of the larger eigenvalue of

*γ*is the direction of the eigenvector, corresponding to this eigenvalue. The larger principal axis of

*h*(

*x*,

*y*) contours and its magnitude is proportional to the bottom slope

*a*and

*γ*are defined by

*θ*stands for the step function. With this definition, the diffusion is isotropic (

**=**

*ν**a*

_{0}

*s*, which is chosen to be

_{c}*s*= 0.0003. In this case, only 20% of points in the domain were characterized by isotropic diffusion.

## REFERENCES

Anderson, J. L., 2007: Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter.

,*Physica D***230**, 99–111.Belo Pereira, M., and L. Berre, 2006: The use of ensemble approach to study the background error covariances in a global NWP model.

,*Mon. Wea. Rev.***134**, 2466–2489.Berre, L., and G. Desroziers, 2010: Filtering of background error variances and correlations by local spatial averaging: A review.

,*Mon. Wea. Rev.***138**, 3693–3720.Berre, L., E. Stefanesku, and M. Belo Pereira, 2006: The representation of the analysis effect in three error simulation techniques.

,*Tellus***58A**, 196–209.Bishop, C. H., and D. Hodyss, 2007: Flow adaptive moderation of spurious ensemble correlations and its use in ensemble based data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133**, 2029–2044.Bishop, C. H., and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models.

,*Tellus***61**, 84–96.Bishop, C. H., and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere.

,*Tellus***61**, 97–111.Bishop, C. H., and D. Hodyss, 2011: Adaptive ensemble covariance localization in ensemble 4D-VAR state estimation.

,*Mon. Wea. Rev.***139**, 1241–1255.Bishop, C. H., D. Hodyss, P. Steinle, H. Sims, A. M. Clayton, A. C. Lorenc, D. M. Barker, and M. Buehner, 2011: Efficient ensemble covariance localization in variational data assimilation.

,*Mon. Wea. Rev.***139**, 573–580.Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background error covariances: Evaluation in a quasi-operational NWP setting.

,*Quart. J. Roy. Meteor. Soc.***131**, 1013–1043.Chen, Y., and D. S. Oliver, 2010: Cross-covariances and localization for EnKF in multiphase flow data assimilation.

,*Comput. Geosci.***14**, 579–601.Deckmyn, A., and L. Berre, 2005: A wavelet approach to representing background error covariances in a limited-area model.

,*Mon. Wea. Rev.***133**, 1279–1294.Emerick, A., and A. Reynolds, 2011: Combining sensitivities and prior information for covariance localization in the ensemble Kalman filter for petroleum reservoir applications.

,*Comput. Geosci.***15**, 251–269.Fisher, M., 2003: Background error covariance modelling.

*Proc. ECMWF Seminar on Recent Developments in Data Assimilation,*Reading, United Kingdom, ECMWF, 45–63. [Available online at http://www.ecmwf.int/publications/library/ecpublications/_pdf/seminar/2003/sem2003_fisher.pdf.]Gilbert, J. Ch., and C. Lemarechal, 1989: Some numerical experiments with variable-storage quasi-Newton algorithms.

,*Math. Program.***45**, 407–435.Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129**, 2776–2790.Herdin, M., N. Czink, H. Özcelik, and E. Bonek, 2005: Correlation matrix distance, a meaningful measure for evaluation of nonstationary MIMO channels.

*Proc. 61st IEEE Vehicular Technology Conf.,*Stockholm, Sweden, Institute of Electrical and Electronics Engineers, 136–140.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter.

,*Mon. Wea. Rev.***126**, 796–811.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129**, 123–137.Kepert, J. D., 2011: Balance-aware covariance localisation for atmospheric and oceanic ensemble Kalman filters.

,*Comput. Geosci.***15**, 239–250.Paige, C., B. Parlett, and H. van der Vorst, 1995: Approximate solutions and eigenvalue bounds from Krylov subspaces.

,*Numer. Linear Algebra Appl.***2**, 115–134.Pannekoucke, O., 2009: Heterogeneous correlation modeling based on the wavelet diagonal assumption and on the diffusion operator.

,*Mon. Wea. Rev.***137**, 2995–3012.Pannekoucke, O., and S. Massart, 2008: Estimation of the local diffusion tensor and normalization for heterogeneous correlation modelling using a diffusion equation.

,*Quart. J. Roy. Meteor. Soc.***134**, 1425–1438.Pannekoucke, O., L. Berre, and G. Desroziers, 2007: Filtering properties of wavelets for local background error correlations.

,*Quart. J. Roy. Meteor. Soc.***133**, 363–379.Purser, R. J., W. Wu, D. F. Parrish, and N. M. Roberts, 2003: Numerical aspects of the application of recursive filters to variational statistical analysis. Part II: Spatially inhomogeneous and anisotropic general covariances.

,*Mon. Wea. Rev.***131**, 1536–1548.Raynaud, L., L. Berre, and G. Desroziers, 2009: Objective filtering of the ensemble-based error variances.

,*Quart. J. Roy. Meteor. Soc.***135**, 1003–1014.Sato, Y., M. S. F. V. De Pondeca, R. J. Purser, and D. F. Parrish, 2009: Ensemble-based background error covariance implementations using spatial recursive filters in NCEP’s grid-point statistical interpolation system. NCEP Office Note 459, Camp Springs, MD, 20 pp. [Available online at http://www.emc.ncep.noaa.gov/officenotes/newernotes/on459.pdf.]

Weaver, A. T., and P. Courtier, 2001: Correlation modeling on a sphere using a generalized diffusion equation.

,*Quart. J. Roy. Meteor. Soc.***127**, 1815–1846.Weaver, A. T., and I. Mirouze, 2012: On the diffusion equation and its application to isotropic and anisotropic correlation modeling in variational assimilation.

, doi:10.1002/qj.1955, in press.*Quart. J. Roy. Meteor. Soc.*Wu, W.-S., R. J. Purser, and D. F. Parrish, 2002: Three-dimensional variational analysis with spatially inhomogeneous covariances.

,*Mon. Wea. Rev.***130**, 2905–2916.Xu, Q., 2005: Representations of inverse covariances by differential operators.

,*Adv. Atmos. Sci.***22**, 181–198.Yaremchuk, M., and S. Smith, 2011: On the correlation functions associated with the polynomials of the diffusion operator.

,*Quart. J. Roy. Meteor. Soc.***137**, 1927–1932.Yaremchuk, M., and M. Carrier, 2012: On the renormalization of the covariance operators.

,*Mon. Wea. Rev.***140**, 637–649.Yaremchuk, M., and A. Sentchev, 2012: Multi-scale correlation functions associated with polynomials of the diffusion operator.

,*Quart. J. Roy. Meteor. Soc.***138**, 1948–1953, doi:10.1002/qj.1896.