## 1. Introduction

Atmospheric data assimilation is the process of estimating the spatiotemporally evolving state of the atmosphere based on observations. The resulting state estimate at a given time is called analysis. Modern data assimilation algorithms obtain the analysis by a statistical interpolation process: the analysis is computed by updating an a priori estimate of the state, called background, based on the observed information assuming that the background and observation errors are random variables with known statistical parameters (Daley 1991; Kalnay 2003). In particular, the data assimilation schemes, which are able to handle the large number of state variables and observations in a realistic operational or research application, assume that the probability distribution of the background and observation errors is a multivariate normal distribution with a known covariance matrix. The focus of this paper is on the estimation of the covariance matrix that describes the distribution of the background error. This matrix is called the background error covariance matrix and we denote it by

Sequential data assimilation schemes use a short-term forecast started from the analysis of the previous analysis time as background. Thus, *M* × *M* matrix, where *M* is the number of gridpoint variables in the model. The entries **x*** ^{b}*. Each diagonal element

*u*−

*υ*|, can be defined for each covariance

*u*−

*υ*| = 0.)

*K*-member sample of the background error; then filtering the sampling noise by a statistical postprocessing of

*K*-member ensemble of analyses for the previous analysis time to obtain a background ensemble

Because of the large computational expense associated with an ensemble of model integrations, the computationally affordable sample size is limited; for example, the typical value of *K* in a practical application is between 20 and 100 (e.g., Houtekamer and Mitchell 2005; Hamill 2006). The small ensemble size *K* poses two important challenges for the statistical postprocessing algorithm that provides the final estimate *u* − *υ*| is large. Second, *K* ≪ *M*), because the typical dimension of the state vector in a state-of-the-art atmospheric model is *M* = 10^{5}–10^{8}.

Practical implementations of the EnKF address the aforementioned two problems by applying a physical-distance-dependent stationary (time independent) filter function to the sample covariances (e.g., Houtekamer and Mitchell 1998, 2001; Hamill et al. 2001; Anderson 2001; Ott et al. 2004). This approach is called *localization*, because it forces the covariance to zero beyond a prescribed distance *d*. Some localization functions do not change the sample-based estimate *u* − *υ*| associated with the pair of state vector components is smaller than *d*, but replaces *u* − *υ*| ≥ *d* (e.g., (Houtekamer and Mitchell 1998; Ott et al. 2004; Szunyogh et al. 2005, 2008; Hunt et al. 2007); other localization functions taper the covariance to zero gradually with increasing distance |*u* − *υ*| (e.g., Houtekamer and Mitchell 2001; Hamill et al. 2001). Such tapering functions modify all entries of the sample covariance matrix, except for the diagonal elements. Experience with the different EnKF algorithms and localization strategies suggests that tapering greatly increases the accuracy of the analyses, especially when the size of the ensemble is small (e.g., Houtekamer and Mitchell 2005; Hamill 2006) or in the presence of model error (e.g., Zhang et al. 2009b).

While there exist dynamical arguments to support localization (Yoon et al. 2010) and it may formally solve the problems it is designed to address,^{1} the particular shape of the tapering function is typically selected based on intuition. Moreover, if the true covariance structure of the system is complex and does not decrease monotonically with distance, the localized sample covariance may not be a good fit to the true covariance structure. The ad hoc nature of most localization algorithms used in the literature motivates us to start exploring the effects of covariance filtering on the performance of the EnKF in a more systematic way. In particular, we compare the results obtained with localization to the results obtained with an adaptive *nonparametric* method to estimate the entries of

The rest of the paper is organized as follows. Section 2 is a formal description of covariance filtering in EnKF. Section 3 explains the design of the data assimilation runs that provide the data for our quantitative investigations. This section includes a brief description of the particular implementation of the serial square root EnKF scheme of Whitaker and Hamill (2002) on the Lorenz-96 model (Lorenz 1996; Lorenz and Emanuel 1998), which we use to carry out all data assimilation experiments described in this paper. In section 4, we explain the nonparametric statistical method we use to estimate the covariance, while in section 5 we compare the performance of a standard localization method to that of the nonparametric scheme in estimating the background covariance. Then, in section 6, we compare the performance of the different filtering strategies by numerical experiments. Our conclusions are drawn in section 7.

## 2. Covariance filtering in EnKF

In this section, we illustrate the process of estimating

### a. The EnKF algorithm of Whitaker and Hamill (2002)

*M*-vector

**x**

*is the analysis;*

^{a}**x**

*; and*

^{b}**x**

*is replaced with the ensemble mean*

^{b}^{2}

^{3}and the errors in the observations of the different components of the state vector are uncorrelated. We index each observation with the index of the variable it observes, that is, the observation of

*x*is

_{u}*x*. In Eq. (9), the components of the vector

_{u}*r*is the observation error variance for the observation

_{uu}### b. Covariance filtering

In the computational algorithm defined by Eqs. (8)–(11), the background error covariance enters in Eq. (9) through *u* − *υ*|.

A potential problem with covariance filtering is that it produces an analysis ensemble, {**X**^{a}^{(k)}: *k* = 1, … , *K*}, that is not fully consistent with the background error covariance used in the computation of the Kalman gain: while the Kalman gain is computed based on the filtered estimate

### c. Covariance inflation

In a square root EnKF scheme, such as the algorithm of Hamill and Whitaker (2005), the diagonal entries of the sample covariance matrix *ρ* > 1 variance inflation factor. This approach is called multiplicative variance inflation and was introduced in Anderson and Anderson (1999). It should be noted that the multiplicative variance inflation increases not only the diagonal elements of

## 3. Experiment design

In this section, we briefly introduce the Lorenz-96 model and our approach to generate the time evolving “true” states and the simulated observations of the “true” states.

### a. The Lorenz-96 model

*M*= 40 scalar variables

*x*,

_{υ}*υ*= 1, … ,

*M*; where

*x*

_{M}_{+1}(

*t*) =

*x*

_{1}(

*t*),

*x*

_{−}_{1}(

*t*) =

*x*

_{M−}_{1}(

*t*),

*x*

_{0}(

*t*) =

*x*(

_{M}*t*), and

*F*is a constant forcing term. While a partial differential equation to which Eq. (12) would be a finite-dimensional approximation is not known to exist, the variables

*x*,

_{υ}*υ*= 1, … , 40 are usually thought of as gridpoint values of a scalar atmospheric variable along a latitude circle. Using this analog, the time evolution of the model for

*F*= 8 resembles the propagation of a wavenumber 8 dispersive wave characterized by a westward (in the direction of decreasing

*υ*) phase speed and an eastward (in the direction of increasing

*υ*) group velocity. The model is chaotic: it has 13 positive Lyapunov exponents and its Lyapunov dimension is 27.1 (Lorenz and Emanuel 1998). Thus, in the Lorenz-96 model, similar to the situation in the storm-track regions in an NWP model, the spatiotemporal evolution of the uncertainties is governed by unstable dispersive waves and accurate estimates of the state over an extended time period can be obtained only by the frequent assimilation of observations. In spite of its skill in mimicking an important feature of the propagation of state estimation errors in realistic models of the atmosphere, the Lorenz-96 model is a highly idealized analog of a realistic atmospheric model. Most importantly, the spatial correlations between

*x*at the different “grid points” does not change smoothly with distance as in a realistic model.

_{v}^{4}We choose the Lorenz-96 model because (i) its low dimensionality allows us to test a computationally relatively expensive nonparametric scheme to filter the ensemble-based estimate of the covariance; (ii) filtering the ensemble-based covariance by localization has a well-documented positive effect on the accuracy of the analyses for this model (Whitaker and Hamill 2002); and (iii) this model has an excellent track record in providing the initial test environment for EnKF schemes (e.g., Whitaker and Hamill 2002; Ott et al. 2004; Zhang et al. 2009b), which later proved competitive with the state-of-the-art data assimilation schemes of the operational centers.

### b. Generation of the time series of “true” states and the observations

We solve Eq. (12) using a fourth-order Runge–Kutta time integration scheme and a time step of 0.05 dimensionless time unit. This time unit is defined by the *e*-folding time of the dissipation in the model (e.g., Lorenz and Emanuel 1998). Assuming that the *e*-folding time of dissipation in the atmosphere is about 5 days, the time of 0.05 dimensionless time unit is equivalent to about 6 h in real time. We carry out all numerical experiments under the perfect model hypothesis, generating the “true” state space trajectory with a long time integration of the model. The initial condition for this integration is obtained by adding small-magnitude random noise to the unstable steady-state solution *x _{u}* = 0,

*u*= 1, … ,

*M*; then, discarding the initial transient part of the trajectory (first 1000 time steps) and defining the true states

**x**

*(*

^{t}*t*) with the states along the remaining portion of the trajectory. Simulated observations are generated for each time step by adding normally distributed random noise with expectation zero and standard deviation 1 to each variable

*x*,

_{u}*u*= 1, … ,

*M*. That is, the observation error covariance matrix

^{5}We estimate the state at each observation time by assimilating the simulated observations with the algorithm described by Eqs. (8)–(11).

## 4. Statistical methodology

We now introduce a nonparametric statistical method to estimate the covariance matrix, which is computationally feasible but alleviates some of the potential problems with the sample and the localized sample covariance matrices. We first review some terminologies in spatial statistics, then introduce the nonparametric scheme, gradually relaxing the assumptions we make about the covariances. We illustrate our main points with quantitative results for the Lorenz-96 model. In these calculations, we use a sample covariance matrix based on a *K* = 5000 member ensemble, which we denote by *F*/10, to the unstable steady-state solution *x _{u}* = 0,

*u*= 1, … ,

*M*; then, choosing

*ρ*= 1.005, because the primary role of variance inflation for such a large ensemble under the perfect model scenario is to compensate for the effects of nonlinearities in the model dynamics.

### a. Terminology

We introduce the notation *nonstationary* in space, when *u* and *υ*. When the mean is constant and the covariance depends on the locations only through the difference of the two locations, that is, *C*_{1}, then we call the process *stationary*. Finally, when the mean is constant and the covariance depends on the locations only through the distance between the two locations, that is, *C*_{2}, we call the process *isotropic*.

One option to estimate the covariance function is to use a *parametric* covariance model. There are various parametric covariance models that are isotropic, for example, the exponential function *α*, *β*, *ν* are positive parameters and *x* > 0). Parametric covariance models are also available for certain nonstationary processes (Paciorek and Schervish 2006; Jun and Stein 2007). In some situations, we may model a nonstationary process as a sum of several independent, locally stationary processes with simple parametric stationary (or isotropic) covariance functions (Fuentes 2001). Parametric methods, however, in general, are not as flexible as *nonparametric* methods, which do not make presumptions about the general shape of the covariance function. For instance, the nonstationarity of the background covariance structure for the Lorenz-96 model is complex and there is no obvious flexible parametric covariance model to fit the covariance structure. This complexity is illustrated with Fig. 1, which shows the sample covariance function for various ensemble sizes in the Lorenz-96 model: the sample covariance structure is not symmetric around the center and has a strong dependence on the location. These are clear signs of strong anisotropy and nonstationarity of the background error. This result motivates us to search for an appropriate nonparametric method.

We use a *kernel smoothing* approach as our nonparametric estimation method, which requires the selection of a kernel function and a bandwidth. A kernel is a nonnegative function, which is symmetric around zero, and decreases monotonically as |*u* − *υ*| goes to ∞; for example, *G*(*u* − *υ*) = exp[−(*u* − *υ*)^{2}], is a Gaussian kernel function. There are various statistical methods to choose optimal kernels and bandwidths, and in many applications the effect of the selection of the kernel is insignificant compared to the selection of the bandwidth.^{6} In what follows, we first develop a nonparametric method to estimate the covariance function for the stationary case and then we extend the method to the nonstationary case.

### b. Stationary estimate

*h*is a bandwidth,

*G*(·) is a kernel function, and

*k*th ensemble perturbation

*u*and

*υ*. As Hall and Patil (1994) note, Eq. (13) may not give a positive definite function and they suggest a modified version of Eq. (13) to ensure positive definiteness and to achieve nice mathematical properties of the estimator. Here we do not worry about the positive definiteness of the estimate provided by Eq. (13), as it is not the final scheme we intend to use. Equation (13) can be easily adapted for

*d*-dimensional case (

*d*> 1) by simply letting

*i*and

*j*inside of the parentheses be the coordinates of the

*i*th and

*j*th locations on the

*d*-dimensional domain.

Figure 2 shows the estimated covariance function obtained with Eq. (13) for *u* = 20 and *υ* = 1, … , 40. Because of the stationarity assumption, the shape of the estimated covariance function is the same for all *u* = 1, … , 40. Moreover, the estimated covariance function is symmetric around the center. Each row shows the result for a given bandwidth. The curves in Fig. 2 do not match the sample covariances in Fig. 1, further suggesting that the assumption of stationarity for the background error in the Lorenz-96 model is not appropriate.

Fitted covariance between the locations *u* = 20 and *υ* = 1, … , 40 using the nonparametric approach under stationarity assumption.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Fitted covariance between the locations *u* = 20 and *υ* = 1, … , 40 using the nonparametric approach under stationarity assumption.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Fitted covariance between the locations *u* = 20 and *υ* = 1, … , 40 using the nonparametric approach under stationarity assumption.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

### c. Nonstationary estimate

*G*(

_{i}*u*;

*h*) and

*G*(

_{j}*υ*;

*h*) are kernel functions with a fixed bandwidth

*h*. See the appendix for the sketch of proof for the positive definiteness of Eq. (14). Since the kernel functions depend on the two locations,

*u*and

*υ*, Eq. (14) provides a flow-dependent estimate of the covariance. We may, for instance, use the Gaussian kernel functions

*G*, since it reduces the number of terms in the double summation over

*i*and

*j*. Figure 3 shows the estimate of the covariance function

*u*and provides a better fit to the large scale structure of the sample covariance (Fig. 1) than the stationary estimate (Fig. 2).

Fitted covariance between the locations *u* and *υ* using the nonparametric approach for the nonstationary case. The two chosen *u* values are the same as in Fig. 1. The black curve gives

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Fitted covariance between the locations *u* and *υ* using the nonparametric approach for the nonstationary case. The two chosen *u* values are the same as in Fig. 1. The black curve gives

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Fitted covariance between the locations *u* and *υ* using the nonparametric approach for the nonstationary case. The two chosen *u* values are the same as in Fig. 1. The black curve gives

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

### d. Adaptive bandwidth

As seen before, both the stationary and nonstationary estimates of the covariance require the selection of a bandwidth *h*: a larger value of *h* results in a smoother estimate of the covariance, because it involves a stronger averaging of the sample errors over the neighboring locations. On the one hand, when *h* goes to zero, the estimate provided by Eq. (14) becomes similar to the estimate provided by the sample covariance estimate, except that in Eq. (14) we divide the sums by *K* instead of *K* − 1 and filter the background error perturbation *h* is large, the estimated covariance is smooth and, in the extreme case, it becomes constant. This is because when *h* is large, the kernel values in Eqs. (13) and (14) are almost constant for all *i*, *j*, *u*, and *υ*. See the bottom row in Fig. 3 for an example.

Because we expect the signal-to-noise ratio to be lower at larger distances |*u* − *υ*|, we introduce a bandwidth *h* = *h*(|*u* − *υ*|) that increases with the distance.^{7} In particular, to make *h* smoothly varying with the distance, we let *h* = *h*_{1} exp{(|*u* − *υ*|/*h*_{2})^{2}}. Thus, we make the bandwidth adaptive at the price of replacing the single bandwidth parameter *h* with a pair of parameters, *h*_{1} and *h*_{2}. Figure 4 shows examples of the dependence of the adaptive bandwidth on the two parameters (Fig. 4, top panel) and a comparison of the corresponding covariance estimates with the sample covariance using 20 ensemble members (Fig. 4, bottom panel). The two figures use the same color scale. For the bottom panel, we also display sample covariances using 20 and 5000 ensemble members for comparison. When *h*_{1} = 10^{−7} and *h*_{2} = 1, the adaptive bandwidth is about 10^{−7} at all distances; thus, the nonparametric estimate becomes similar to the sample covariance. The results for *h*_{1} = 0.01 and *h*_{2} = 2 are fairly similar. For the other choices of *h*_{1} and *h*_{2} the bandwidth increases with the distance and the corresponding fitted covariance values are close to zero after a short distance (Fig. 4).

A few adaptive bandwidth examples and the corresponding covariance fits with 20 ensemble members. (top) The shape of four adaptive bandwidths against the distance for the four combinations of *h*_{1} and *h*_{2} values. (bottom) The corresponding fitted covariance values between the locations *u* = 3 and *υ* = 1, … , 40 with 20 ensemble members. Same scale is used for the two figures for the adaptive bandwidth case. For the bottom panel, we have two additional curves displaying sample covariances with 20 and 5000 ensemble members for comparison.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

A few adaptive bandwidth examples and the corresponding covariance fits with 20 ensemble members. (top) The shape of four adaptive bandwidths against the distance for the four combinations of *h*_{1} and *h*_{2} values. (bottom) The corresponding fitted covariance values between the locations *u* = 3 and *υ* = 1, … , 40 with 20 ensemble members. Same scale is used for the two figures for the adaptive bandwidth case. For the bottom panel, we have two additional curves displaying sample covariances with 20 and 5000 ensemble members for comparison.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

A few adaptive bandwidth examples and the corresponding covariance fits with 20 ensemble members. (top) The shape of four adaptive bandwidths against the distance for the four combinations of *h*_{1} and *h*_{2} values. (bottom) The corresponding fitted covariance values between the locations *u* = 3 and *υ* = 1, … , 40 with 20 ensemble members. Same scale is used for the two figures for the adaptive bandwidth case. For the bottom panel, we have two additional curves displaying sample covariances with 20 and 5000 ensemble members for comparison.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

## 5. Quantitative comparison

Next, we compare the covariance estimates provided by the nonparametric scheme with those based on the sample and the localized sample covariance at a fixed time step. For this comparison, we again use ^{8}. For a given ensemble size *K* we randomly select 200 sets of *K* ensemble members from the 5000-member ensemble from which

Results are shown for various values of the parameters *h*_{1} and *h*_{2} in Fig. 5. The “optimal” pair of values is *h*_{1} = 0.05 and *h*_{2} = 0.8 in terms of the median, and *h*_{1} = 0.1 and *h*_{2} = 1.5 in terms of the mean. For localization, we use the fifth-order kernel function defined by Eq. (4.10) of Gaspari and Cohn (1999), which is the most widely used localization function in the EnKF literature. This function has a single parameter *c*, which controls the localization length: while the function formally becomes zero at distance 2*c*, the filtered covariances are already nearly zero at distance *c*. Figure 6 shows the median and mean of the Frobenius norm for different values of *c*. For a 20-member ensemble, the “optimal” localization length, with respect to the Frobenius norm, is smaller than *c* = 24, the value that was reported to be optimal with respect to the analysis accuracy for a 10-member ensemble in Whitaker and Hamill (2002). Also, there is a noticeable difference between the “optimal” localization length with respect to the median, *c* = 7.5, and to the mean, *c* = 5.

Frobenius norm of the (left) median and (right) mean of the differences between the fitted covariances from the nonparametric scheme and *h*_{1} and *h*_{2} values. The median and mean are computed from 200 independent trials.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Frobenius norm of the (left) median and (right) mean of the differences between the fitted covariances from the nonparametric scheme and *h*_{1} and *h*_{2} values. The median and mean are computed from 200 independent trials.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Frobenius norm of the (left) median and (right) mean of the differences between the fitted covariances from the nonparametric scheme and *h*_{1} and *h*_{2} values. The median and mean are computed from 200 independent trials.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Frobenius norm of the differences between the fitted covariances from localization and

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Frobenius norm of the differences between the fitted covariances from localization and

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Frobenius norm of the differences between the fitted covariances from localization and

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The covariance estimates provided by the nonparametric scheme and the localization are compared for the “optimal” values of the parameters of the two schemes (Fig. 7). Overall, with respect to the mean, the nonparametric scheme provides the most accurate estimate except when the ensemble size is 5. The advantage of this scheme over the sample covariance and the localization with *c* = 24 is particularly large for the small ensembles (*K* ≪ 50). The advantage of the nonparametric scheme over localization with *c* = 5 is much more modest. In addition, with respect to the median, the localization with *c* = 5 outperforms the nonparametric scheme when *K* < 50. The system used here gives a highly nonstationary covariance structure for the background errors. The nonparametric scheme would be superior if the true covariance structure was stationary or isotropic [Eq. (13)].

Frobenius norm of the differences between the fitted covariances and

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Frobenius norm of the differences between the fitted covariances and

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

Frobenius norm of the differences between the fitted covariances and

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

## 6. Analysis experiments with filtered covariances

In section 5 we compared the accuracy of the filtered covariance estimates for a single analysis time. In this section, we carry out analysis experiments with a *K* = 20 member ensemble using the filtered estimates of the background covariance in the computation of the Kalman gain. The filter is run for 1500 time steps and statistics are computed based on the last 1000 time steps.

### a. Verification methods

*m*at time

*t*, where

_{n}*u*. Since

*t*, for which only a single realization,

_{n}*u*th column of

*u*th column of

We emphasize that Eqs. (19) and (20) do not provide information about the accuracy of the estimates *t _{n}*; instead, these relations can be used to verify the consistency between a time series of the estimated background error covariance matrix and a time series of the true background error. Such consistency is a necessary, but not sufficient, condition for

### b. Dependence of the results on the filtering parameters

To establish a baseline of the analysis error, against which we can measure the effectiveness of covariance filtering, we first carry out an analysis experiment using the unfiltered sample covariance matrix *ρ* = 1.06; using smaller values of *ρ* makes the filter unstable, while increasing *ρ* leads to increasingly larger values of Δ.

First, we filter the sample covariance estimates using localization. Whitaker and Hamill (2002) studied, in detail, the sensitivity of the analysis error to the localization parameter *c* and the variance inflation coefficient *ρ* for *K* = 10. They found that the root-mean-square error in the analysis mean *c* = 24 and *ρ* = 1.03.^{9} Using the same values of *c* and ρ for *K* = 20, we obtain a root-mean-square error of Δ = 0.19. This value indicates an error reduction of 0.04 (about 17%) compared to the case when no covariance filtering is used. For *c* = 5, the value that we found optimal for a single analysis time in section 5, the minimum value of Δ is 0.22, which can be obtained using *ρ* > 1.025. That is, the performance of the EnKF for *c* = 5 is clearly inferior to that for *c* = 24, despite our earlier result that *c* = 5 provides a more accurate estimate at a single analysis time.

Finally, we investigate the sensitivity of the analysis error to the parameters *h*_{1} and *h*_{2} using the nonparametric scheme with adaptive bandwidth to filter the covariance. We show results for *ρ* = 1.025, the value of the variance inflation we found to provide the smallest value of Δ. The results are summarized in Fig. 8. In this figure, the white area indicates the parameter range where the filter fails, indicated by a value of Δ, which is larger than one, the root-mean-square of the observation error. The typical value of Δ in this region is between 3.0 and 4.0, which is similar to the standard deviation of the temporal changes in the model variables. An interesting feature in Fig. 8 is the sharp boundary between the parameter ranges where the filter fails and where the analysis error is the smallest. The parameter range where the performance of the EnKF is nearly optimal (marked by blue shades) is wide: a large value of *h*_{2} cannot be used with a small value of *h*_{1}, but once *h*_{1} is larger than about 0.03, the analysis error becomes insensitive to the choice of *h*_{2}. In essence, an increase of the bandwidth with distance is important only when the bandwidth at zero distance is small. In the blue region, the root-mean-square error is about Δ = 0.2, lower than that for localization with *c* = 5, but slightly worse than that for localization with *c* = 24.

The grayscale shades show the value of Δ for *ρ* = 1.025 as a function of the parameters *h*_{1} and *h*_{2}.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The grayscale shades show the value of Δ for *ρ* = 1.025 as a function of the parameters *h*_{1} and *h*_{2}.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The grayscale shades show the value of Δ for *ρ* = 1.025 as a function of the parameters *h*_{1} and *h*_{2}.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

### c. Consistency between the estimated and the true errors

Figure 9 illustrates the level of consistency between the sample covariance and the covariance for the true background error. While the general shape of the covariance function is captured well by the sample covariance, at short distances, |*u* − *υ*| ≤ 2 (18 ≤ *υ* ≤ 22), the absolute value of the covariance is somewhat overestimated. This overestimation can be easily corrected by reducing the variance inflation, but reducing the variance inflation quickly leads to a loss of the stability of the EnKF. At longer distances, beyond |*u* − *υ*| ≥ 5, the time mean of the sample covariance is about zero, while the time mean of the true error is small, but not zero at most distances. We note that the near-zero values at the larger distances for the sample covariance are due to the filtering effect of time averaging, as we observe relatively large instantaneous estimates of the covariance (results not shown) at those distances. This result suggests that the relatively large instantaneous values at the larger distances are dominated by statistical fluctuations.

The components *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

We show results on the consistency of the estimates of the covariance for *c* = 5 (Fig. 10) and *c* = 24 (Fig. 11). In these figures, both the sample and the localized estimates slightly overestimate the variance. As in the case of the results shown in Fig. 9, this overestimation can be corrected by reducing the variance inflation, but reducing the variance inflation makes the filter unstable. At distances |*u* − *υ*| ≤ 2, the consistency is clearly better for *c* = 24 than for *c* = 5, as for the latter choice of *c*, the magnitude of the negative covariance at |*u* − *υ*| = 2 is underestimated. The difference in the accuracy of the estimates at |*u* − *υ*| = 2 may explain the better performance of the EnKF for *c* = 24 than for *c* = 5. For *c* = 5, at distances |*u* − *υ*| > 10, the filtered estimate is zero, while the sample covariance shows some distance-dependent variation. For *c* = 24, the sample and the filtered covariance shows a high level of consistency with each other. This is an indication that the localized covariance with *c* = 24 leads to a better consistency between the localized covariance estimate and the ensemble perturbations.

The components *c* = 5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *c* = 5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *c* = 5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *c* = 24 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *c* = 24 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *c* = 24 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

We show results for two different choices of the bandwidth parameters of the nonparametric estimation scheme: *h*_{1} = 0.05, *h*_{2} = 1.5 (Fig. 12) and *h*_{1} = 0.11, *h*_{2} = 1.5 (Fig. 13). A common feature of these figures is that the consistency between the sample and the filtered estimates is lower than for the localization-based filtering for *c* = 24. (We recall that localization with *c* = 24 provides the most accurate analysis among all filtering schemes tested here.) Thus, we conclude that the scheme that provides the most accurate analysis, on average, is the scheme for which the sample and the filtered estimates are most consistent with each other (localization with *c* = 24). This is also the scheme that provides the best consistency with the true errors for short distances. The results of this section suggest that a better metric of covariance filtering skill would be one that combined a measure of closeness to the sample covariance matrix for a very large ensemble with a measure of similarity between the climatological averages of the filtered and sample covariance.

The components *h*_{1} = 0.05 and *h*_{2} = 1.5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *h*_{1} = 0.05 and *h*_{2} = 1.5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *h*_{1} = 0.05 and *h*_{2} = 1.5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *h*_{1} = 0.11 and *h*_{2} = 1.5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *h*_{1} = 0.11 and *h*_{2} = 1.5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

The components *h*_{1} = 0.11 and *h*_{2} = 1.5 for *u* = 20.

Citation: Monthly Weather Review 139, 9; 10.1175/2011MWR3577.1

## 7. Conclusions

In this paper, we investigated the effects of filtering the sample covariances used in the computation of the Kalman gain in an EnKF on the accuracy of the estimates of the background error covariance. We considered two particular approaches to filter the sample covariance: the more traditional approach of covariance localization and a new kernel smoothing method with variable bandwidth to obtain nonparametric estimates of the covariances.

We found that for a single analysis time, the nonparametric scheme provided the overall most accurate estimate of the covariances and that localization provided more accurate estimates with short localization length than with long localization length. We also found, however, that when the analysis is cycled, localization with long localization length provided more accurate analysis in the root-mean-square sense than the nonparametric scheme or localization with short localization length. We explained this result with the better consistency between the covariance estimates, the true background error covariance, and the ensemble perturbations that represent the background uncertainty. Our results suggest that preserving such consistency is important.

Finally, we note that the Lorenz-96 model, with its rapidly changing background covariance between locations, poses a considerable challenge to the covariance estimation methods. It is plausible that some of our findings would not hold for a more realistic model in which the background error covariance changes in a smoother way. In such a model, distinguishing between the spatially rapidly changing sampling noise and the smoother changing true covariance should be easier, which we expect to benefit the new kernel smoothing method more than localization.

## Acknowledgments

Mikyoung Jun, Marc G. Genton, and Fuqing Zhang acknowledge the support from the National Science Foundation (ATM 0620624). Mikyoung Jun’s research is also supported by NSF Grant DMS-0906532. Istvan Szunyogh acknowledges the support from NSF (ATM 0935538) and ONR (N000140910589). Marc Genton’s research is supported by NSF DMS-1007504. Fuqing Zhang acknowledges the support from ONR Grant N000140410471. Craig H. Bishop acknowledges support from ONR Project Element 0602435N, Project Number BE-435-003. The authors are grateful to Herschel Mitchell (the Editor), Andrew Tangborn, and one anonymous reviewer for valuable comments.

## APPENDIX

### Proof of the Positive Definiteness of Eq. (14)

We give a brief sketch of the proof for the positive definiteness of the covariance function in Eq. (14).

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment filter for data assimilation.

,*Mon. Wea. Rev.***129**, 2884–2903.Anderson, J. L., 2007: Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter.

,*Physica D***230**, 99–111.Anderson, J. L., and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts.

,*Mon. Wea. Rev.***127**, 2741–2758.Berre, L., and G. Desroziers, 2010: Filtering of background error variance and correlations by local spatial averaging: A review.

,*Mon. Wea. Rev.***138**, 3693–3720.Bishop, C. H., and D. Hodyss, 2007: Flow adaptive moderation of spurious ensemble correlations and its use in ensemble based data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133**, 2029–2044.Bishop, C. H., and D. Hodyss, 2009a: Ensemble covariances adaptively localized with ECO-RAP. Part 1: Tests on simple error models.

,*Tellus***61**, 84–96.Bishop, C. H., and D. Hodyss, 2009b: Ensemble covariances adaptively localized with ECO-RAP. Part 2: A strategy for the atmosphere.

,*Tellus***61**, 97–111.Daley, R., 1991:

*Atmospheric Data Analysis*. Cambridge University Press, 457 pp.Fan, J., and Q. Yao, 2003:

*Nonlinear Time Series: Nonparametric and Parametric Methods*. Springer-Verlag, 552 pp.Fuentes, M., 2001: A high frequency kriging approach for non-stationary environmental processes.

,*Environmetrics***12**, 469–483.Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125**, 723–757.Hall, P., and P. Patil, 1994: Properties of nonparametric estimators of autocovariance for stationary random fields.

,*Probab. Theory Relat. Fields***99**, 399–424.Hamill, T. M., 2006: Ensemble-based data assimilation.

*Predictability of Weather and Climate,*T. Palmer and R. Hagedorn, Eds., Cambridge University Press, 124–156.Hamill, T. M., and J. S. Whitaker, 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches.

,*Mon. Wea. Rev.***133**, 3132–3147.Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129**, 2776–2790.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129**, 123–137.Houtekamer, P. L., and H. L. Mitchell, 2005: Ensemble Kalman filtering.

,*Quart. J. Roy. Meteor. Soc.***131**, 3269–3289.Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter.

,*Physica D***230**, 112–126.Jazwinski, A. H., 1970:

*Stochastic Processes and Filtering Theory*. Academic Press, 376 pp.Jun, M., and M. L. Stein, 2007: An approach to producing space–time covariance functions on spheres.

,*Technometrics***49**, 468–479.Kalnay, E., 2003:

*Atmospheric Modeling, Data Assimilation and Predictability*. Cambridge University Press, 341 pp.Lorenz, E. N., 1996: Predictability: A problem partly solved.

*Proc. Seminar on Predictability,*Shinfield Park, Reading, Berkshire, United Kingdom, European Centre for Medium-Range Weather Forecasts, 1–18.Lorenz, E. N., 2005: Designing chaotic models.

,*J. Atmos. Sci.***62**, 1574–1587.Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulations with a small model.

,*J. Atmos. Sci.***55**, 399–414.Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation.

,*Tellus***56A**, 415–428.Paciorek, C., and M. Schervish, 2006: Spatial modelling using a new class of nonstationary covariance functions.

,*Environmetrics***17**, 483–506.Szunyogh, I., E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, E. Ott, and J. A. Yorke, 2005: Assessing a local ensemble Kalman filter: perfect model experiments with the National Centers for Environmental Prediction global model.

,*Tellus***57A**, 528–545.Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model.

,*Tellus***60A**, 113–130.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130**, 1913–1923; Corrigendum,**134**, 1722.Yoon, Y.-N., E. Ott, and I. Szunyogh, 2010: On the propagation of information and the use of localization in ensemble Kalman filtering.

,*J. Atmos. Sci.***67**, 3823–3834.Zhang, F., Y. Weng, J. Sippel, and C. Bishop, 2009a: Cloud-resolving hurricane initialization and prediction through assimilation of Doppler radar observations with an ensemble Kalman filter.

,*Mon. Wea. Rev.***137**, 2105–2125.Zhang, F., M. Zhang, and J. Hansen, 2009b: Coupling ensemble Kalman filter with four-dimensional variational data assimilation.

,*Adv. Atmos. Sci.***26**, 1–8.

^{1}

Localization also makes the data assimilation algorithms more suitable for implementation on massively parallel computer architectures.

^{2}

The set of analysis perturbations that satisfies this condition is not uniquely defined.

^{3}

This assumption is solely made to simplify the notation and has no effect on the generality of our results.

^{4}

This shortcoming of the model was successfully corrected by a modification of Eq. (12) in Lorenz (2005). Unfortunately, this improvement of the model was achieved at the expense of increasing the number of variables, which limits the appeal of the improved model as a low computational cost alternative to a more realistic atmospheric model.

^{5}

We also carried out experiments in which we observed every third location, but because the results were qualitatively similar to those for observing all locations, we do not report the results for that setting.

^{6}

For more details on kernel estimation methods, see Fan and Yao (2003).

^{7}

Localization of the sample covariance is equivalent to using an *h*, which is nearly zero within the localization length and large outside the localization length.

^{8}

For a matrix

^{9}

The correct figure to support their result can be found in the corrigendum of Whitaker and Hamill (2002).